C6678 PCIe failing in asynchronous mode

Sachan Rao

Hello,

I have asynchronous PCIe implemented with communication between C6678 (ROOT) and Xilinx FPGA (ENDPOINT). After running for a while, PCIe link goes down.

It appears that the link down issues are being caused by the asynchronous clocks. The link is stable if I use the same clock source for the FPGA and DSP. This, however, is not a solution since I would need to respin my hardware to implement it. So I would very much like to get the asynchronous mode working properly.

I have a few questions:

1)    The frequency difference between the two crystals is 0-100ppm. Up to what frequency offset was asynchronous mode validated?
2)   Could you please list all settings that need to be changed for asynchronous mode to function properly?
3)   I've noticed that, on the failing boards, the link tends to go down when temperature is decreased. This coincides with increased clock drift on both crystals. Is it possible that the C6678 cannot keep up with a quickly varying frequency offset between crystals. Has this been verified by TI and at what rate of change of offset?

Thanks,

Sachan

over 10 years ago

0 Raja over 10 years ago

TI__Guru* 81335 points

Hi Sachan Rao,

I think we have validated only PCIe asynchronous mode between 2 EVMs. Please refer below thread,

What is the boot mode used? Are you running IBL on C6678 ?

Thank you.

0 Sachan Rao over 10 years ago

Prodigy 230 points

Hi Raja,

The boot mode is not really relevant since the link goes down while the application is running.

Are you able to answer any of the other questions above?

Thanks,
Sachan

0 David (ASIC) Liu over 10 years ago in reply to Sachan Rao

TI__Guru**** 185921 points

Sachan

Is SSC enabled or disabled?

Have you tried to sweep C6678 CDR bits setting to see if it improves the performance?

Thanks

David

0 Raja over 10 years ago in reply to Sachan Rao

TI__Guru* 81335 points

Hi Sachan,

I have requested to PCIe experts to look into this thread. Thank you for your patience.

0 Sachan Rao over 10 years ago in reply to David (ASIC) Liu

Prodigy 230 points

Please note that SSC is disabled. I have tried every CDR setting, but none have improved the performance. Also, the design guide says to leave CDR settings at default for PCIe.

0 David (ASIC) Liu over 10 years ago in reply to Sachan Rao

TI__Guru**** 185921 points

Sachan

Would you please dump out PCIe SerDes Status Register (PCIE_SERDES_STS) when the link goes down?

The CDR is capable of tracking +/-488ppm in asynchronous mode.

When you switch from asynchronous to synchronous mode, do you use the same crystal or a different clocking option?

Is it possible to switch to a crystal with tigher ppm tolerance across temperature?

Thanks
David

0 Sachan Rao over 10 years ago in reply to David (ASIC) Liu

Prodigy 230 points

David,

In both working and failure state, register PCIE_SERDES_STS has a value of 0x00000201.

We are running the same clock structure for FPGA and DSP. Crystal is FXO-LC725-100.00 and buffer is SY89854UMGTR from Micrel. When we use synchrnous, we connect clock that is currently on the FPGA to the DSP,

I don't think it should be necessary to switch to a tighter PPM tolerance since the max PPM offset for the clock is +/- 50 PPM.

Could you please answer list any register settings that need to be changed for asynchronous mode. Alo, has asynchronous mode been validated in a similar use case (ie. with FPGA using separate clock)?

Thanks,
Sachan

0 David (ASIC) Liu over 10 years ago in reply to Sachan Rao

TI__Guru**** 185921 points

Sachan

What value do you set NFTS to?

Thanks
David

0 Sachan Rao over 10 years ago in reply to David (ASIC) Liu

Prodigy 230 points

It is set to 0x64. What is the default value?

0 Sachan Rao over 10 years ago in reply to Sachan Rao

Prodigy 230 points

The ACK_FREQ register is set to 0x1B0F6400.

0 David (ASIC) Liu over 10 years ago in reply to Sachan Rao

TI__Guru**** 185921 points

Sachran

When exiting the electrical idle state, PCIe uses a fast training sequence to ensure the CDR can re-acquire lock quickly. The number of fast training sequences required is defined as N FTS. For a synchronous system, with CDR = 000, this should be set to 15. For an asynchronous system, where a worst case realignment of 1/2UI in the presence of an underlying frequency offset is required, this should be set to 100.

For now, the value is 100, can we try to this to 15 to see if it helps?

Thanks
David

0 Sachan Rao over 10 years ago in reply to David (ASIC) Liu

Prodigy 230 points

David,

Why do you think changing it from the default would help?

Also, since problem happens during operation, why should this value matter? Shouldn't this only matter during link training?

And what should L1 and L0s entry latencies be set to? They are set differently on the FPGA side.

0 David (ASIC) Liu over 10 years ago in reply to Sachan Rao

TI__Guru**** 185921 points

Sachran

L1 entry latency tends to be higher than L0s entry latency. L0s latency typically is <2us while L1 latency typically is <2~4us.

On the asynchronous mode operation, we have used a BERT, not FPGA to characterize the RX CDR frequency offset tracking range and verified RX CDR is able to track up to the specified offset range without any bit error.

Does this problem show up on every board? How low does the temperature go when the problem shows up, and does the problem gets worse with lower temperature? Can you put a frequency counter or scope in persistence mode on the clock to see how much the clock is drifting over time?

Thanks
David

0 Sachan Rao over 10 years ago in reply to David (ASIC) Liu

Prodigy 230 points

David,

The problem doesn't show up on all boards. The temperature drop to 42 degrees Celsius corresponds to a 3-5PPM drift of the clock. Problem only happens at the lower temperature.

Isn't the number of FTS determined at link training? Is it okay to arbitrarily change this value? And why would we not increase the value instead of decrease it?

Thanks,
Sachan

0 Sachan Rao over 10 years ago in reply to Sachan Rao

Prodigy 230 points

Hi David,

Any other ideas. Decreasing NFTS to 15 did not help. I descreased this value during operation. Is that okay? Or does it need to be decreased prior to link training?

Thanks,

Sachan

0 David (ASIC) Liu over 10 years ago in reply to Sachan Rao

TI__Guru**** 185921 points

Sachan

So 42 degree C is when the problem shows up? Once the link goes down, does the link come back up after link training?

If you start the link training at 42 degree C, does link come up? And if you move the temperature up and down, does the link go down again?

Is it possible to swap the oscillator between a good and a bad board, I want to see if the problem follow the oscillator or the board.

Thanks
David

0 Sachan Rao over 10 years ago in reply to David (ASIC) Liu

Prodigy 230 points

Hi David,

One thing I noticed is that when I disable ASPM on the Xilinx side (Enable ASPM optionality, disable ASPM support), the DSP crashes as soon as I attempt a memory write from the FPGA to the DSP. Any ideas why this may happen?

Thanks,

Sachan

0 David (ASIC) Liu over 10 years ago in reply to Sachan Rao

TI__Guru**** 185921 points

Sachan

We support ASPM in L0s and L1 state, not L2. You can program to enable or disable ASPM on DSP side.

Thanks

David

0 Sachan Rao over 10 years ago in reply to David (ASIC) Liu

Prodigy 230 points

David,

I've noticed that ASPM was not properly disabled on the FPGA.

My question to you is this:

1) If ASPM optionality is on and ASPM support is diasbled on the FPGA, does this effectively disable ASPM on the C6678?
2) I was told that, prior to version 2.1 of the PCI Express Base Specification, the ability to turn off ASPM was not available. Does C6678 adhere to version 2.1 of the spec? If not, how can we turn off ASPM?

Thanks,
Sachan

0 lding over 10 years ago in reply to Sachan Rao

TI__Guru* 95265 points

Sachan,

A1. No, this will not disable ASPM on C6678

A2. C6678 is adhere to PCIE Spec 2.0. If your goal is to disable the ASPM on C6678:

- the ASPM Control field (ACTIVE_LINK_PM) in the Link Control Register (LINK_STAT_CTRL, 0x2180_1080) should be set to ‘0’ to disable ASPM. This value should be set on both the Keystone device and the link partner device.

- It is recommended to change the advertised ASPM capabilities via the AS_LINK_PM field in the Link Capabilities (LINK_CAP, 0x2180_107C) configuration register. By default, this is set to advertise L0s support. Changing this field to a value of ‘0’ will indicate no support for ASPM. Note, this register is marked as read-only, but I think you can change it.

Regards, Eric

0 Sachan Rao over 10 years ago in reply to lding

Prodigy 230 points

Hi All,

I've noticed the following while debugging in the failed state:

1)   The FPGA is entering "Recovery.rcvrcfg". After a while, a timeout occurs and the FPGA goes to "Timeout to Detect" and then to "Detect Quiet"
2)   It is believed that the recovery state is being entered because transmit of the C6678 is idle.
3)   On the C6678 side, I've noticed that LOSDTCT0 = 1.

I'm not sure what is happening, but in the failed state all C6678 PCIe registers are reading back as zero. Any ideas what may be happening?

Thanks,
Sachan

0 Sachan Rao over 10 years ago in reply to lding

Prodigy 230 points

Ok,

So we now want to do a bit error rate test on the FPGA. In order to do this, it is necessary to put the PCIe on the C6678 in loop through mode; that is route the physical input directly to the output. Is this possible?

Thanks,

Sachan

0 David (ASIC) Liu over 10 years ago in reply to Sachan Rao

TI__Guru**** 185921 points

Sachan

Loopback is supported. Please refer to section 2.12 for the different loopback option, http://www.ti.com/lit/ug/sprugs6d/sprugs6d.pdf.

Processors

Processors forum

C6678 PCIe failing in asynchronous mode