This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Loosing PRBS patterns in cold temperatures.

Part Number: DS125DF111

Tool/software:

Hello TI support,

I am debugging my new system with DS125DF111 on it .

It is connected to CPU SerDes in XFI mode from one side ( channel A ) and to 10G fiber transceiver (channel B ) on another side.

I am testing it with EXFO ( traffic generator tester ) in EtherBERT sending PRBS31 patterns through the retimer to CPU and back.

It worked very well until I placed the system to thermal chamber and run it in cold .

When temperature passed below -10C the EXFO start receiving pattern losses.

My question is if it is something inside retimer that sensitive to temperature change (especially to cold) and if yes – how to tune it?

I also can replicate it on my work table with cold spray. It looks like the retimer sensitive to cold on the side of pins 17, 18,  19-24 , 1, 2.

Please check the registers dump from both channels when temperatures below -10C ( cold ) and above -10C ( hot ) .

 

on Cold register dump from retimer 1:

Channel A:

     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
00: 00 00 dc 00 00 00 00 00 00 00 10 0f 08 00 93 69    ..?.......???.?i
10: 3a 20 a0 90 00 10 7a 25 40 37 00 03 24 00 e1 55    : ??.?z%@7.?$.?U
20: 00 00 00 40 03 00 12 2d 72 40 30 00 72 80 00 26    ...@?.?-r@0.r?.&
30: 00 40 11 88 bf 1f 31 00 10 00 00 33 8d 00 80 00    .@????1.?..3?.?.
40: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
50: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 f9    ....?...??i?????
60: 00 b2 90 b3 cd 00 00 00 00 0a 44 40 00 00 00 00    .????....?D@....
70: 03 04 01 10 10 10 00 00 b0 95 69 d5 99 a5 e6 f9    ??????..??i?????
80: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
90: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 f9    ....?...??i?????
a0: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
b0: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 f9    ....?...??i?????
c0: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
d0: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 f9    ....?...??i?????
e0: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
f0: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 04    ....?...??i?????

Channel B:

     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
00: 00 00 dc 00 00 00 00 00 00 00 10 0f 08 00 93 69    ..?.......???.?i
10: 3a 20 a0 90 00 10 7a 25 40 37 00 03 24 00 e1 55    : ??.?z%@7.?$.?U
20: 00 00 00 40 00 00 00 2d 75 40 30 00 72 80 00 26    ...@...-u@0.r?.&
30: 00 40 11 88 bf 1f 31 00 10 00 00 33 8d 00 80 00    .@????1.?..3?.?.
40: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
50: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 f9    ....?...??i?????
60: 00 b2 90 b3 cd 00 00 00 00 0a 44 40 00 00 00 00    .????....?D@....
70: 03 05 10 10 10 10 00 00 b0 95 69 d5 99 a5 e6 f9    ??????..??i?????
80: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
90: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 f9    ....?...??i?????
a0: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
b0: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 f9    ....?...??i?????
c0: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
d0: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 f9    ....?...??i?????
e0: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
f0: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 05    ....?...??i?????
on HOT register dump from retimer 1:

Channel A:

     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
00: 00 00 dc 00 00 00 00 00 00 00 10 0f 08 00 93 69    ..?.......???.?i
10: 3a 20 a0 90 00 10 7a 25 40 37 00 03 24 00 e1 55    : ??.?z%@7.?$.?U
20: 00 00 00 40 00 00 00 32 6a 20 30 00 72 80 00 26    ...@...2j 0.r?.&
30: 00 40 11 88 bf 1f 31 00 10 00 00 33 8e 00 80 00    .@????1.?..3?.?.
40: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
50: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 f9    ....?...??i?????
60: 00 b2 90 b3 cd 00 00 00 00 0a 44 40 00 00 00 00    .????....?D@....
70: 03 03 02 10 10 10 00 00 b0 95 69 d5 99 a5 e6 f9    ??????..??i?????
80: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
90: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 f9    ....?...??i?????
a0: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
b0: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 f9    ....?...??i?????
c0: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
d0: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 f9    ....?...??i?????
e0: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
f0: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 04    ....?...??i?????

Channel B:

     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
00: 00 00 dc 00 00 00 00 00 00 00 10 0f 08 00 93 69    ..?.......???.?i
10: 3a 20 a0 90 00 10 7a 25 40 37 00 03 24 00 e1 55    : ??.?z%@7.?$.?U
20: 00 00 00 40 00 00 00 2e 7b 40 30 00 72 80 00 26    ...@....{@0.r?.&
30: 00 40 11 88 bf 1f 31 00 10 00 00 33 8e 00 80 00    .@????1.?..3?.?.
40: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
50: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 f9    ....?...??i?????
60: 00 b2 90 b3 cd 00 00 00 00 0a 44 40 00 00 00 00    .????....?D@....
70: 03 02 10 10 10 10 00 00 b0 95 69 d5 99 a5 e6 f9    ??????..??i?????
80: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
90: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 f9    ....?...??i?????
a0: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
b0: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 f9    ....?...??i?????
c0: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
d0: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 f9    ....?...??i?????
e0: 00 40 80 50 c0 90 54 a0 b0 95 69 d5 99 a5 e6 f9    .@?P??T???i?????
f0: 00 00 00 00 80 00 00 00 b0 95 69 d5 99 a5 e6 05    ....?...??i?????

  • Hi Lev,

    I have been working with FAE Klaus Wawrzyniak to support this case. I reviewed the register dumps and did not see anything unusual. In both cold temp and hot temp cases, all retimer channels have CDR lock (reg 0x02), do not have CDR lock loss or signal detect loss interrupts (reg 0x01), and show good HEO and VEO values (regs 0x27 and 0x28).

    I also can replicate it on my work table with cold spray. It looks like the retimer sensitive to cold on the side of pins 17, 18,  19-24 , 1, 2.

    Can you explain in greater detail specifically what testing or investigation you performed to reach this conclusion?

    Best,

    Lucas

  • Hi Lucas,

    I can't give my conclusion 100%. As I said above - I used cold spray locally cool down the retimer circuitry .

    When I cooling in approximation closer to pins that I mentioned - I received the pattern losses much faster then from another sides of the IC.

    Now , I am testing my rev01 board with the same circuitry on it in the chamber at -45C and it behave better then my rev02.

    I've got only 2 pattern looses during 1:40 hours run , when with rev02 I receive the 2 or 3 pattern losses in a minute.

    The difference between rev01 and rev02 are as following:

    I added 1G SFP cage and changed PCB fabricator with the same stuck-up requirements.

    Nothing was changed on retimer circuitry between rev01 and rev02.

    Also marking on rev01 is  - 7BA2NYU / 2D111B2

    marking on rev02 is  - 7CAHH4U / 2D111B2

    Can you clarify the meaning of the first part of the marking (7BA2NYU ??).

    I also can confirm that after I loopback Channel IN B to Channel OUT B ,the pass from SFP+ to retimer and back is working good on all temperature range.

  • Hi Lev,

    I also can confirm that after I loopback Channel IN B to Channel OUT B ,the pass from SFP+ to retimer and back is working good on all temperature range.

    Can you clarify, do you mean you looped OUTA to INB or OUTB to INA? Is the signal path in this case EXFO --> 10G transceiver --> retimer channel A --> retimer channel B --> 10G transceiver --> EXFO? If this is the case and the issue disappears, then this suggests a different component in your signal chain is responsible for pattern losses.

    Best,

    Lucas

  • Hi Lucas,

    I looped IN B to OUT A  (sorry about typo above) for  retimer 1 on left side diagram and traffic is going without any problems from EXFO1 to transceiver to retimer 1 and back to EXFO1 (see the diagram) .

    Let me explain the full configuration.

    As you can see on diagram the two identical modules connected to the system and send packets to each other via QSGMII backplane interface using just one SGMII - 1G . Because of this bottle neck I am running less then 10% load on EXFOs to prevent oversubscription .  

    Please also pay attention that these modules assembled with 3 different PCBs.

    The CPU is on first board -> board to board connector -> second board -> board to board connector ->  retimer and transceiver on the third board.

    The EXFO1 generate PRBS31 traffic and send it to CPU1 (blue) through 10G XFI interface   CPU1 send packets to CPU2 (Red ) via QSGMII interface .

    CPU2 send packets through 10G XFI interface via retimer 2  ->  second 10G transceiver to EXFO2 . EXFO2 is running in smart loopback mode - which is just return all packets back to EXFO1.The EXFO1 is receiving all packets back , analyze them and shows if all patterns received or some lost. 

    As I mentioned above , by loopback IN_B to OUT_A on retimer 1 I have no issues on all range of temperatures from -45 to +70C.

    Also I see much less pattern looses if I loopback only IN_A to OUT_B of retimer 2. Meaning that packets are flow only through retimer1 :
    EXFO1 -> transceiver 1 -> retimer1-> CPU1 ->  CPU2  -> CPU1 -> retimer 1 -> transceiver 1 -> EXFO1.

    By cooling in room temperature with cooling spray only retimer1 or retimer2 circuitry - I can see the pattern losses are starting. When I stop cooling the retimer - the pattern losses disappeared and traffic running normally .

    All these issues I saw with Rev02 of third board.

    Today I changed only the third board from rev02 to rev01 , as I mentioned before , and rate of pattern looses is much better.

    In 4:40 hours of running - only 8 pattern losses at -45C.

    Hope it helps to understand the configuration and failure  .

  • Hi Lev,

    Thank you for the detailed explanation. Allow me to brainstorm next debug steps with my team and I'll get back to you within a few days.

    Best,

    Lucas

  • Sure,

    Thank you Lucas !

  • Hi Lev,

    After brainstorming with my team I have several ideas about what could be going wrong here.

    Since loopback on retimer 1 resulted in no issue and cold spraying INA/OUTB side of retimer 1 resulted in the issue occurring, it seems clear to me that the root cause of the issue exists somewhere between retimer 1 and CPU 1. Is it possible that CPU 1 receiver has some marginality issue that appears when the retimer transmitter is at cold temp?

    We have validation data for this retimer which shows good performance down to -40C. I noticed you ran tests down to -45C. Note that DS125DF111 is only rated for ambient temperature down to -40C.

    I have a few debug ideas to check if CPU 1 RX has some marginality issue.

    1. Can you cold spray CPU 1 RX while keeping the rest of the system at room temp and check if the issue occurs?
    2. Does your CPU show diagnostics? Can you check BER and eye opening on the CPU 1 RX when the full system is at low temp versus room temp?
    3. Does your CPU have a PRBS generator and checker? Can you try transmitting PRBS31 from CPU1 --> retimer 1 with loopback INA to OUTB --> check for PRBS errors on CPU1 at low temp? Alternatively, if your CPU has PRBS checker but no generator, you can generated PRBS31 data from the retimer and check for errors on CPU1.

    I also have a few general debug suggestions, if the above suggestions disprove a CPU marginality issue.

    1. Can you enable CDR loss of lock and signal loss interrupts by asserting 0x56[1:0] on both retimers, both channels? Check register 0x01 after the issue occurs to see if the interrupts get asserted.
    2. Can you monitor CDR lock (reg 0x02) and HEO/VEO measurements (regs 0x27, 0x28) when cold spraying INA/OUTB side of retimer 1?
    3. Are you able to capture eye diagrams at the SFP transceiver outputs?

    Finally, I have a few questions.

    1. Where are AC coupling caps located on the receiver and transmitter sides of both retimers? What is the temperature rating of these caps?
    2. You mentioned you are running less than 10% load on EXFOs to prevent oversubscription. Can you share more details about how this is being implemented with PRBS31 pattern?

    Best,

    Lucas

  • Hi Lev,

    I'd like to kindly follow up on this. Can you share your current debug status?

    Best,

    Lucas

  • Hi Lev,

    Thanks for taking the time to meet with me today and cover your recent testing and setup. I have a few debug suggestions you can try.

    1. Is your CPU capable of measuring eye opening? We see from HEO and VEO values reported in register 0x27 and 0x28 that the eye opening looks wide open from the retimer's perspective. It would be good to understand if it looks the same from the CPU's perspective.

    2. Try using adapt mode 3 by writing 0x31=0x60. This changes the algorithm that adapts CTLE and DFE equalization.

    3. Try adjusting CDR bandwidth. I suggest you try each of the values shown in this table.

    Best,

    Lucas

  • Hi Lucas,

    Thanks for your help and support.

    I will check with NXP support if CPU capable of measuring eye  opening .

    I tried change to adapt mode . It didn't change anything.

    I also tried adjust CDR bandwidth. With "00" value it stop sending packets . 

    with all other values the system behave the same :(

    I will try to play more with registers.

    Regards ,

    Lev

  • Hi Lev,

    Thank you for trying my debug suggestions. I'll continue to think about additional ideas you can try. Hope you have a good vacation!

    Best,

    Lucas