TLK10034: I have one of 4 ports that are accumulating data errors during BER testing.

Michael Nycz

Part Number: TLK10034

I have a PCB design where 4 - TLK10034's are implemented. This effectively produces 16 channels on my PCB. The LS side of the device uses the XAUI interface. The HS side feeds a 10G optical transceiver. For 14 out of the 16 channels, the BER testing looks perfect (for 1 terabyte or more of data transfer there are no errors). For purposes of discussion I will refer to the different TLK's by letters (A, B, C and D). On TLK's B and C, these perform with no issues during BER testing. For TLK A 3 of the 4 ports perform with no issues during BER. One channel (Channel 2) will do an ethernet ping test with no problems (no packet loss when run for 1 minute or more). However, during BER testing, we are getting traffic errors that seem to be somewhat erratic (sometimes it will run well for awhile and then randomly produce errors and so on). We have a similar situation on TLK D. On the same channel (Channel 2) we see the same kinds of BER errors. The other 14 channels seem to run perfectly. We have run loopback tests where the loopback was done internally on the LS and HS side. The problem appears to be with the HS side since running the loopback internally or externally on the LS side produces the same results. We have inspected the physical aspects of the nets involved. The connection between the TLK A channel 2 HS port to the optical transceiver has been verified in both schematic and gerber data. The length of the differential net is approx. 1600mils which is comparable or even shorter than some of the channels that have no issues. For our configuration, we have the following register overrides from default...

Device Address Register Address Default Value Required Value Notes:
0x07 0x0000 0x3000 0x2000 Bit 12 changed from '1' to '0'
0x01 0x0096 0x0002 0x0000 Bit 1 changed from '1' to '0'
0x1E 0x000E 0x0000 0x000E Bits 1,2,3 changed from '0' to '1' to initiate a datapath reset
0x1E 0x0003 0x5848
0x1E 0x0004 0x5550

The 1st turns of autonegotiation which we need since we are connecting to an optical transceiver.
The second turns of KR_training which we also need due to the devices we are connecting to.
The third does a datapath reset which is required by the second item.
The 4th and 5th are specific values that we determined by trial and error to tune the channels for best performance.

The above, same register updates are performed for each of the 4 channels on the TLK10034. The values for each channel are identical.

We are currently changing the registers via an attached USB dongle that interfaces with the TLK10034 GUI. Based on the above changes, we are able to get 14 of the 16 channels working with no issues. One thing that we did notice is that adding some delay after the datapath reset appeared to determine how solid and repeatable the results were.

All this being said, we need assistance with debugging the final two channels on this PCB. As I mentioned, we have gone through the physical connections (there appears to be no issues). We have done tuning to improve the performance of the channels (this worked for 14 of the 16 channels, tuning will not make the remaining 2 channels better).

Some things that would help...

Is there a proper sequence for implementing the above register updates that we should follow? (i.e. does the sequence of events matter, should we be doing other resets while implementing the above register changes, should delays be inserted into the above sequence, etc...). Please advise.

As far as debugging the port with an issue, is there a procedure for stepping through settings to better understand the issue and pin point where we should be looking?

We need to resolve this issue. We have been trying various things over the past several weeks and we have not been able to pin point the issue.

Please take a look and get back to me ASAP. If you need further information please let me know what you need.

Thank you,
Mike Nycz

Back in July, the following response was posted....

Hi Mike,

Thanks for the detailed report.

1). Does this issue move with the part or location dependent? Or may be a better question is do you see the exact same behavior on multiple boards?

2). Also, do you see the same bit error issue on A3 if there is no traffic on BCD?

3). If you use high speed scope, do you see any difference between B& C versus A & D?

4). Also, if you use high speed high impedance differential scope probe on A3 versus A2 and meanwhile trigger on A2, do you see jitter, drift, or waform jumping on A3?

You may have done these test already but i am hopnig knowing these result can shed further light into this issue.

Regards,, Nasser

The answer to #1 is that we see similar results on another board (we currently have 2).

For #2 we are only looking at 2 ports at a time. Port 1 is looped on the XAUI side to Port 2. We then connect the fiber side of ports 1 and 2 to an Anritsu MT1000A.

I will take a look at #3 and #4. I have been away from this project for awhile and amazingly the problem did not go away over time. :)

over 3 years ago

0 Rodrigo Natal over 3 years ago

TI__Mastermind 19155 points

Hi,

Request noted. TI will provide feedback by early next week.

Cordially,

Rodrigo Natal

HSSC Applications Engineer

0 Rodrigo Natal over 3 years ago

TI__Mastermind 19155 points

Hi,

Related to: "The connection between the TLK A channel 2 HS port to the optical transceiver has been verified in both schematic and gerber data. The length of the differential net is approx. 1600mils which is comparable or even shorter than some of the channels that have no issues. For our configuration, we have the following register overrides from default..."

The above feedback makes me think about the possibility of the TLK device automatic EQ adaption resulting in over-equalization scenario for some iterations. The below settings may be worth trying if we suspect the issue is TLK Rx over-equalization.

ENTRACK – Register HS_SERDES_CONTROL_3, Device Address 0x1E, Register Address 0x04, Bit 15

Activating this parameter adds intersymbol interference (ISI) to the received signal enabling the receiver to better compensate for short channel with little to no loss. An example application that would implement the ENTRACK feature is SFI/XFI applications, where the channel between SerDes and the optical channel is typically very short, to limit the amount of ISI.

HS_EQPRE[2:0] - Device Address 0x1E, Register Address:0x0004, bits 14:12
- These bits configure the Serdes Rx precursor equalizer selection. For low insertion loss input channel the pre-cursor may be set to b000 or b111 (i.e disabled).

Cordially,

Rodrigo Natal

HSSC Applications Engineer

0 Michael Nycz over 3 years ago in reply to Rodrigo Natal

Intellectual 835 points

Rodrigo,

Thank you for the response. We have done the two things you mention above. These did not fix our problem. What other things can we try to diagnose the problem? One other thing that we noticed was that when we read all of the registers, there are a number of error count registers. Some of them have a value of 0, some have values of FFFF. Does this have any significance? We are really struggling to get this clear out. We have played with setting in registers 0x0003 and 0x0004 with no success. Is there something else we should be looking at?

0 Rodrigo Natal over 3 years ago in reply to Michael Nycz

TI__Mastermind 19155 points

Hi,

One more question to eliminate signal integrity and/or EQ issue as potential root cause. Does the Tx device connected to TLK Rx have either pre-cursor or post-cursor de-emphasis enabled? If so, could you run system test with those EQ parameters disabled?

Cordially,

Rodrigo Natal

HSSC Applications Engineer

0 Michael Nycz over 3 years ago in reply to Rodrigo Natal

Intellectual 835 points

The transceiver does not have those functions turned on.

Thank you,

Mike Nycz

0 Rodrigo Natal over 3 years ago in reply to Michael Nycz

TI__Mastermind 19155 points

Hi Mike,

Question: Can you double confirm that the issue is isolated to two specific TLK channels on your board? For these two channels that show problem, does the issue happen all the time or only some iterations of link test?

I would think that, if the issue were a logic and/or protocol handling case at SerDes level, it would not be this deterministic. I would speculate you would see it across all channels and it would be more random.

Question: Would you be able to provide s-parameters for TLK input trace for both a "good channel vs a "fail" channel"?

One hypothesis would be that PLL lock is being lost on the two flagged channels. Below are a couple of suggested settings.

Try setting PLL to its highest bandwidth setting. See bits 9:8 in table below

Device Address: 0x1E Register Address:0x0002 Default: 0x811D
Bit	Name	Description	Access
15:10	RESERVED	For TI use only (Default 9’b100000)	RW
9:8	HS_LOOP_BANDWIDT H[1:0]	HS Serdes PLL Loop Bandwidth settings 00 = Rserved 01 = Narrow bandwidth (Default 2'b01) 11 = Highest bandwidth. Recommended for 10GBASE-KR.	RW

From the datasheet table below try the following:
- Enable CDR mode suitable for short channel operation by setting bit 5 to 1
- Try different settings of HS_CDRTHR[1:0] (i.e. bits 9:8]) to see if performance improves

Device Address: 0x1E Register Address:0x0004 Default:0x1400
Bit	Name	Description	Access
15	HS_ENTRACK	HSRX ADC Track mode. This setting is automatically controlled through link training and value set through this register bit is ignored unless related OVERRIDE bit is set. 0 = Normal operation (Default 1’b0) 1 = Forces ADC into track mode	RW
14:12	HS_EQPRE[2:0]	Serdes Rx precursor equalizer selection 000 = 1/9 cursor amplitude 001 = 3/9 cursor amplitude (Default 3’b001) 010 = 5/9 cursor amplitude 011 = 7/9 cursor amplitude 100 = 9/9 cursor amplitude 101 =11/9 cursor amplitude 110 = 13/9 cursor amplitude 111 = Disable	RW
11:10	HS_CDRFMULT[:10 ]	Clock data recovery algorithm frequency multiplication selection 00 = First order. Frequency offset tracking disabled 01 = Second order. 1x mode 10 = Second order. 2x mode (Default 2’b10) 11 = Reserved	RW
9:8	HS_CDRTHR[1:0]	Clock data recovery algorithm threshold selection 00 = Four vote threshold (Default 2’b00) 01 = Eight vote threshold 10 = Sixteen vote threshold 11 = Thirty two vote threshold	RW
7	RESERVED	For TI use only (Default 1’b0)	RW
6	HS_PEAK_DISABL E	HS Serdes PEAK_DISABLE control 0 = Normal operation (Default 1’b0) 1 = Disables high frequency peaking. Suitable for <6 Gbps operation	RW
5	HS_H1CDRMODE	0 = Normal operation (Default 1’b0) 1 = Enables CDR mode suitable for short channel operation.	RW
4:0	HS_TWCRF[4:0]	Cursor Reduction Factor (Default 5’b00000). This setting is automatically controlled through link training and value set through this register bits is ignored unless related	RW

Cordially,

Rodrigo Natal

0 Michael Nycz over 3 years ago in reply to Rodrigo Natal

Intellectual 835 points

Rodrigo,

In answer to your first question... It is definitely confined to two channels. Channel A and B are the problem channels and channels C and D work. In fact there are four TLK's on the board. Some of them work well on all 4 channels. Only two of the TLK's have this problem. The other frustrating thing is there are times where Channel A and B will run without errors for a terabyte of data. But then we repower the board and this time we run with 50 to 200 errors in a terabyte. As far as your second question, I do not have S parameters for the nets involved. I will say that the connections from the TLK to the optical transceiver are on 100 Ohm differential lines that are about 1.5" long. They via down to the routing plane (which has solid GND planes above and below the traces). They stay on one layer until they reach the optical transceiver. Then via up to the transceiver and make the connection. So it does not appear to be an SI type problem. I've looked at some of the channels that work fine and they look similar or even worse from an SI perspective. We then tried the values you had suggested above and the results seem to be the same. Some times (infrequently) we can power cycle to a situation where we are clean. But on the next power cycle the problem returns.

These are the register values that we are using for our configuration....

PRTAD[4:0]	Device Address	Register Address	Default Value	Required Value

00000	0x1E	0x0000	0x0020	0x8020

00000	0x07	0x0000	0x3000	0x2000
00000	0x01	0x0096	0x0002	0x0000
00000	0x1E	0x8020	0x0000	0x03FF
00000	0x1E	0x0003	0x8848	0x5848
00000	0x1E	0x0004	0x1400	0x5500
00000	0x1E	0x0005	0x2000	0x2000

We also tried register 0x0004 set to D500 which turns on ENTRACK mode. This seems to be better, but the problem does not go away completely. The above is shown for PRTAD 00000, however the same values are loaded into all channels for a given TLK.

We tried CDR mode as well. This showed no effect.

We also loaded our configuration on the TLK10034 EVM board. This works great on all channels. This is not a direct comparison in that the optical transceivers are different. On the EVM we are using an SFP module and on our board we are using a Reflex Photonics transceiver. From the above, we are turning off auto-negotiation and KR-training since we are driving the optical transceiver and not a true KR port. After these values are loaded, we perform a datapath reset to ensure that the changes are picked up correctly.

One thought was whether we have a sequencing issue? Specifically, does it matter what order the above changes are applied?

If you have any other ideas of things to try please let me know.

I know that the proper channel for dealing with issues is via this E2E forum. Is it possible to get a phone call set up to discuss this? Or at least, is there a proper channel to make that happen?

Thank you,

Mike Nycz

0 Rodrigo Natal over 3 years ago in reply to Michael Nycz

TI__Mastermind 19155 points

Related to: "We also loaded our configuration on the TLK10034 EVM board. This works great on all channels. This is not a direct comparison in that the optical transceivers are different. On the EVM we are using an SFP module and on our board we are using a Reflex Photonics transceiver. From the above, we are turning off auto-negotiation and KR-training since we are driving the optical transceiver and not a true KR port. After these values are loaded, we perform a datapath reset to ensure that the changes are picked up correctly."

This above result is very interesting. The fact that your register configuration works in the EVM level testing seems to suggest to me that it is probably correct
I would speculate that the system issue you are observing may not be specific to the TLK SerDes per se (either its PHY layer function or configuration), but rather an issue with the Reflex photonics transceiver high-speed performance (perhaps its total jitter level). Or at least an interoperability issue between this Reflex module and the TLK chip
Question: Can you comment on differences for the high-speed electrical output for the SFP module versus the Reflex photonics transceiver? Are you able to provide an example 10Gbps electrical output eye diagram for both the SFP module and the Reflex transceiver?

Rodrigo

Interface

Interface forum

TLK10034: I have one of 4 ports that are accumulating data errors during BER testing.