This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TLK10034: I have one of 4 ports that are accumulating data errors during BER testing.

Part Number: TLK10034

I have a PCB design where 4 - TLK10034's are implemented. This effectively produces 16 channels on my PCB. The LS side of the device uses the XAUI interface. The HS side feeds a 10G optical transceiver. For 14 out of the 16 channels, the BER testing looks perfect (for 1 terabyte or more of data transfer there are no errors). For purposes of discussion I will refer to the different TLK's by letters (A, B, C and D). On TLK's B and C, these perform with no issues during BER testing. For TLK A 3 of the 4 ports perform with no issues during BER. One channel (Channel 2) will do an ethernet ping test with no problems (no packet loss when run for 1 minute or more). However, during BER testing, we are getting traffic errors that seem to be somewhat erratic (sometimes it will run well for awhile and then randomly produce errors and so on). We have a similar situation on TLK D. On the same channel (Channel 2) we see the same kinds of BER errors. The other 14 channels seem to run perfectly. We have run loopback tests where the loopback was done internally on the LS and HS side. The problem appears to be with the HS side since running the loopback internally or externally on the LS side produces the same results. We have inspected the physical aspects of the nets involved. The connection between the TLK A channel 2 HS port to the optical transceiver has been verified in both schematic and gerber data. The length of the differential net is approx. 1600mils which is comparable or even shorter than some of the channels that have no issues. For our configuration, we have the following register overrides from default...

Device Address Register Address Default Value Required Value Notes:
0x07 0x0000 0x3000 0x2000 Bit 12 changed from '1' to '0'
0x01 0x0096 0x0002 0x0000 Bit 1 changed from '1' to '0'
0x1E 0x000E 0x0000 0x000E Bits 1,2,3 changed from '0' to '1' to initiate a datapath reset
0x1E 0x0003 0x5848
0x1E 0x0004 0x5550

The 1st turns of autonegotiation which we need since we are connecting to an optical transceiver.
The second turns of KR_training which we also need due to the devices we are connecting to.
The third does a datapath reset which is required by the second item.
The 4th and 5th are specific values that we determined by trial and error to tune the channels for best performance.

The above, same register updates are performed for each of the 4 channels on the TLK10034. The values for each channel are identical.

We are currently changing the registers via an attached USB dongle that interfaces with the TLK10034 GUI. Based on the above changes, we are able to get 14 of the 16 channels working with no issues. One thing that we did notice is that adding some delay after the datapath reset appeared to determine how solid and repeatable the results were.

All this being said, we need assistance with debugging the final two channels on this PCB. As I mentioned, we have gone through the physical connections (there appears to be no issues). We have done tuning to improve the performance of the channels (this worked for 14 of the 16 channels, tuning will not make the remaining 2 channels better).

Some things that would help...

Is there a proper sequence for implementing the above register updates that we should follow? (i.e. does the sequence of events matter, should we be doing other resets while implementing the above register changes, should delays be inserted into the above sequence, etc...). Please advise.

As far as debugging the port with an issue, is there a procedure for stepping through settings to better understand the issue and pin point where we should be looking?

We need to resolve this issue. We have been trying various things over the past several weeks and we have not been able to pin point the issue.

Please take a look and get back to me ASAP. If you need further information please let me know what you need.

Thank you,
Mike Nycz

Back in July, the following response was posted....

Hi Mike,

Thanks for the detailed report. 

1). Does this issue move with the part or location dependent? Or may be a better question is do you see the exact same behavior on multiple boards? 

2). Also, do you see the same bit error issue on A3 if there is no traffic on BCD?

3). If you use high speed scope, do you see any difference between B& C versus A & D?

4). Also, if you use high speed high impedance differential scope probe on A3 versus A2 and meanwhile trigger on A2, do you see jitter, drift, or waform jumping on A3?

You may have done these test already but i am hopnig knowing these result can shed further light into this issue.

Regards,, Nasser

The answer to #1 is that we see similar results on another board (we currently have 2).

For #2 we are only looking at 2 ports at a time. Port 1 is looped on the XAUI side to Port 2. We then connect the fiber side of ports 1 and 2 to an Anritsu MT1000A. 

I will take a look at #3 and #4. I have been away from this project for awhile and amazingly the problem did not go away over time. :)

  • Hi,

    Request noted. TI will provide feedback by early next week.

    Cordially,

    Rodrigo Natal

    HSSC Applications Engineer

  • Hi,

    Related to: "The connection between the TLK A channel 2 HS port to the optical transceiver has been verified in both schematic and gerber data. The length of the differential net is approx. 1600mils which is comparable or even shorter than some of the channels that have no issues. For our configuration, we have the following register overrides from default..."

     

    The above feedback makes me think about the possibility of the TLK device automatic EQ adaption resulting in over-equalization scenario for some iterations. The below settings may be worth trying if we suspect the issue is TLK Rx over-equalization.

    • ENTRACK – Register HS_SERDES_CONTROL_3, Device Address 0x1E, Register Address 0x04, Bit 15
    • Activating this parameter adds intersymbol interference (ISI) to the received signal enabling the receiver to better compensate for short channel with little to no loss. An example application that would implement the ENTRACK feature is SFI/XFI applications, where the channel between SerDes and the optical channel is typically very short, to limit the amount of ISI.
    • HS_EQPRE[2:0] - Device Address 0x1E, Register Address:0x0004, bits 14:12
      • These bits configure the Serdes Rx precursor equalizer selection. For low insertion loss input channel the pre-cursor may be set to b000 or b111 (i.e disabled).

    Cordially,

    Rodrigo Natal

    HSSC Applications Engineer

  • Rodrigo,

    Thank you for the response. We have done the two things you mention above. These did not fix our problem. What other things can we try to diagnose the problem? One other thing that we noticed was that when we read all of the registers, there are a number of error count registers. Some of them have a value of 0, some have values of FFFF. Does this have any significance? We are really struggling to get this clear out. We have played with setting in registers 0x0003 and 0x0004 with no success. Is there something else we should be looking at?

  • Hi,

    One more question to eliminate signal integrity and/or EQ issue as potential root cause. Does the Tx device connected to TLK Rx have either pre-cursor or post-cursor de-emphasis enabled? If so, could you run system test with those EQ parameters disabled?

    Cordially,

    Rodrigo Natal

    HSSC Applications Engineer

  • The transceiver does not have those functions turned on.

    Thank you,

    Mike Nycz

  • Hi Mike,

    Question: Can you double confirm that the issue is isolated to two specific TLK channels on your board? For these two channels that show problem, does the issue happen all the time or only some iterations of link test?

    I would think that, if the issue were a logic and/or protocol handling case at SerDes level, it would not be this deterministic. I would speculate you would see it across all channels and it would be more random.

    Question: Would you be able to provide s-parameters for TLK input trace for both a "good channel vs a "fail" channel"?

    One hypothesis would be that PLL lock is being lost on the two flagged channels. Below are a couple of suggested settings.

    • Try setting PLL to its highest bandwidth setting. See bits 9:8 in table below

    Device Address: 0x1E Register Address:0x0002 Default: 0x811D

    Bit

    Name

    Description

    Access

    15:10

    RESERVED

    For TI use only (Default 9’b100000)

    RW

    9:8

    HS_LOOP_BANDWIDT H[1:0]

    HS Serdes PLL Loop Bandwidth settings

    00 = Rserved

    01 = Narrow bandwidth (Default 2'b01)

    11 = Highest bandwidth. Recommended for 10GBASE-KR.

    RW

    • From the datasheet table below try the following:
      • Enable CDR mode suitable for short channel operation by setting bit 5 to 1
      • Try different settings of HS_CDRTHR[1:0] (i.e. bits 9:8]) to see if performance improves

    Device Address: 0x1E Register Address:0x0004 Default:0x1400

    Bit

    Name

    Description

    Access

    15

    HS_ENTRACK

    HSRX ADC Track mode. This setting is automatically controlled through link training and value set through this register bit is ignored unless related OVERRIDE bit is set.

    0 = Normal operation (Default 1’b0)

    1 = Forces ADC into track mode

    RW

    14:12

    HS_EQPRE[2:0]

    Serdes Rx precursor equalizer selection

    000 = 1/9 cursor amplitude

    001 = 3/9 cursor amplitude (Default 3’b001)

    010 = 5/9 cursor amplitude

    011 = 7/9 cursor amplitude

    100 = 9/9 cursor amplitude

    101 =11/9 cursor amplitude

    110 = 13/9 cursor amplitude

    111 = Disable

    RW

    11:10

    HS_CDRFMULT[:10

    ]

    Clock data recovery algorithm frequency multiplication selection

    00 = First order. Frequency offset tracking disabled

    01 = Second order. 1x mode

    10 = Second order. 2x mode (Default 2’b10)

    11 = Reserved

    RW

    9:8

    HS_CDRTHR[1:0]

    Clock data recovery algorithm threshold selection

    00 = Four vote threshold (Default 2’b00)

    01 = Eight vote threshold

    10 = Sixteen vote threshold

    11 = Thirty two vote threshold

    RW

    7

    RESERVED

    For TI use only (Default 1’b0)

    RW

    6

    HS_PEAK_DISABL E

    HS Serdes PEAK_DISABLE control

    0 = Normal operation (Default 1’b0)

    1 = Disables high frequency peaking. Suitable for <6 Gbps operation

    RW

    5

    HS_H1CDRMODE

    0 = Normal operation (Default 1’b0)

    1 = Enables CDR mode suitable for short channel operation.

    RW

    4:0

    HS_TWCRF[4:0]

    Cursor Reduction Factor (Default 5’b00000). This setting is automatically controlled through link training and value set through this register bits is ignored unless related

    RW

    Cordially,

    Rodrigo Natal

  • Rodrigo,

        In answer to your first question... It is definitely confined to two channels. Channel A and B are the problem channels and channels C and D work. In fact there are four TLK's on the board. Some of them work well on all 4 channels. Only two of the TLK's have this problem. The other frustrating thing is there are times where Channel A and B will run without errors for a terabyte of data. But then we repower the board and this time we run with 50 to 200 errors in a terabyte. As far as your second question, I do not have S parameters for the nets involved. I will say that the connections from the TLK to the optical transceiver are on 100 Ohm differential lines that are about 1.5" long. They via down to the routing plane (which has solid GND planes above and below the traces). They stay on one layer until they reach the optical transceiver. Then via up to the transceiver and make the connection. So it does not appear to be an SI type problem. I've looked at some of the channels that work fine and they look similar or even worse from an SI perspective. We then tried the values you had suggested above and the results seem to be the same. Some times (infrequently) we can power cycle to a situation where we are clean. But on the next power cycle the problem returns.

    These are the register values that we are using for our configuration....

    PRTAD[4:0] Device Address Register Address Default Value Required Value
    00000 0x1E 0x0000 0x0020 0x8020
    00000 0x07 0x0000 0x3000 0x2000
    00000 0x01 0x0096 0x0002 0x0000
    00000 0x1E 0x8020 0x0000 0x03FF
    00000 0x1E 0x0003 0x8848 0x5848
    00000 0x1E 0x0004 0x1400 0x5500
    00000 0x1E 0x0005 0x2000 0x2000

    We also tried register 0x0004 set to D500 which turns on ENTRACK mode. This seems to be better, but the problem does not go away completely. The above is shown for PRTAD 00000, however the same values are loaded into all channels for a given TLK.

    We tried CDR mode as well. This showed no effect.

    We also loaded our configuration on the TLK10034 EVM board. This works great on all channels. This is not a direct comparison in that the optical transceivers are different. On the EVM we are using an SFP module and on our board we are using a Reflex Photonics transceiver. From the above, we are turning off auto-negotiation and KR-training since we are driving the optical transceiver and not a true KR port. After these values are loaded, we perform a datapath reset to ensure that the changes are picked up correctly. 

    One thought was whether we have a sequencing issue? Specifically, does it matter what order the above changes are applied?

    If you have any other ideas of things to try please let me know.

    I know that the proper channel for dealing with issues is via this E2E forum. Is it possible to get a phone call set up to discuss this? Or at least, is there a proper channel to make that happen?

    Thank you,

    Mike Nycz

  • Related to: "We also loaded our configuration on the TLK10034 EVM board. This works great on all channels. This is not a direct comparison in that the optical transceivers are different. On the EVM we are using an SFP module and on our board we are using a Reflex Photonics transceiver. From the above, we are turning off auto-negotiation and KR-training since we are driving the optical transceiver and not a true KR port. After these values are loaded, we perform a datapath reset to ensure that the changes are picked up correctly."

     

    • This above result is very interesting. The fact that your register configuration works in the EVM level testing seems to suggest to me that it is probably correct
    • I would speculate that the system issue you are observing may not be specific to the TLK SerDes per se (either its PHY layer function or configuration), but rather an issue with the Reflex photonics transceiver high-speed performance (perhaps its total jitter level). Or at least an interoperability issue between this Reflex module and the TLK chip
    • Question: Can you comment on differences for the high-speed electrical output for the SFP module versus the Reflex photonics transceiver? Are you able to provide an example 10Gbps electrical output eye diagram for both the SFP module and the Reflex transceiver?

    Rodrigo