This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C6678 PCIe failing in asynchronous mode

Hello,

I have asynchronous PCIe implemented with communication between C6678 (ROOT) and Xilinx FPGA (ENDPOINT). After running for a while, PCIe link goes down.

It appears that the link down issues are being caused by the asynchronous clocks. The link is stable if I use the same clock source for the FPGA and DSP. This, however, is not a solution since I would need to respin my hardware to implement it. So I would very much like to get the asynchronous mode working properly.


I have a few questions:

1)    The frequency difference between the two crystals is 0-100ppm. Up to what frequency offset was asynchronous mode validated?
2)    Could you please list all settings that need to be changed for asynchronous mode to function properly?
3)    I've noticed that, on the failing boards, the link tends to go down when temperature is decreased. This coincides with increased clock drift on both crystals. Is it possible that the C6678 cannot keep up with a quickly varying frequency offset between crystals. Has this been verified by TI and at what rate of change of offset?

Thanks,

Sachan

  • Hi Sachan Rao,

    I think we have validated only PCIe asynchronous mode between 2 EVMs. Please refer below thread,

    What is the boot mode used? Are you running IBL on C6678 ?

    Thank you.

  • Hi Raja,

    The boot mode is not really relevant since the link goes down while the application is running.

    Are you able to answer any of the other questions above?

    Thanks,
    Sachan
  • Sachan

    Is SSC enabled or disabled?

    Have you tried to sweep C6678 CDR bits setting to see if it improves the performance?

    Thanks

    David

  • Hi Sachan,

    I have requested to PCIe experts to look into this thread. Thank you for your patience.
  • Please note that SSC is disabled. I have tried every CDR setting, but none have improved the performance. Also, the design guide says to leave CDR settings at default for PCIe.
  • Sachan

    Would you please dump out PCIe SerDes Status Register (PCIE_SERDES_STS) when the link goes down?

    The CDR is capable of tracking +/-488ppm in asynchronous mode.

    When you switch from asynchronous to synchronous mode, do you use the same crystal or a different clocking option?

    Is it possible to switch to a crystal with tigher ppm tolerance across temperature?

    Thanks
    David
  • David,

    In both working and failure state, register PCIE_SERDES_STS has a value of 0x00000201.

    We are running the same clock structure for FPGA and DSP. Crystal is FXO-LC725-100.00 and buffer is SY89854UMGTR from Micrel. When we use synchrnous, we connect clock that is currently on the FPGA to the DSP,

    I don't think it should be necessary to switch to a tighter PPM tolerance since the max PPM offset for the clock is +/- 50 PPM.

    Could you please answer list any register settings that need to be changed for asynchronous mode. Alo, has asynchronous mode been validated in a similar use case (ie. with FPGA using separate clock)?

    Thanks,
    Sachan
  • Sachan

    What value do you set NFTS to?

    Thanks
    David
  • It is set to 0x64. What is the default value?

  • The ACK_FREQ register is set to 0x1B0F6400.
  • Sachran

    When exiting the electrical idle state, PCIe uses a fast training sequence to ensure the CDR can re-acquire lock quickly. The number of fast training sequences required is defined as N FTS. For a synchronous system, with CDR = 000, this should be set to 15. For an asynchronous system, where a worst case realignment of 1/2UI in the presence of an underlying frequency offset is required, this should be set to 100.

    For now, the value is 100, can we try to this to 15 to see if it helps?

    Thanks
    David
  • David,

    Why do you think changing it from the default would help?

    Also, since problem happens during operation, why should this value matter? Shouldn't this only matter during link training?

    And what should L1 and L0s entry latencies be set to? They are set differently on the FPGA side.
  • Sachran

    L1 entry latency tends to be higher than L0s entry latency. L0s latency typically is <2us while L1 latency typically is <2~4us.

    On the asynchronous mode operation, we have used a BERT, not FPGA to characterize the RX CDR frequency offset tracking range and verified RX CDR is able to track up to the specified offset range without any bit error.

    Does this problem show up on every board? How low does the temperature go when the problem shows up, and does the problem gets worse with lower temperature? Can you put a frequency counter or scope in persistence mode on the clock to see how much the clock is drifting over time?

    Thanks
    David
  • David,

    The problem doesn't show up on all boards. The temperature drop to 42 degrees Celsius corresponds to a 3-5PPM drift of the clock. Problem only happens at the lower temperature.

    Isn't the number of FTS determined at link training? Is it okay to arbitrarily change this value? And why would we not increase the value instead of decrease it?

    Thanks,
    Sachan
  • Hi David,

    Any other ideas. Decreasing NFTS to 15 did not help. I descreased this value during operation. Is that okay? Or does it need to be decreased prior to link training?

    Thanks,

    Sachan

  • Sachan

    So 42 degree C is when the problem shows up? Once the link goes down, does the link come back up after link training?

    If you start the link training at 42 degree C, does link come up? And if you move the temperature up and down, does the link go down again?

    Is it possible to swap the oscillator between a good and a bad board, I want to see if the problem follow the oscillator or the board.

    Thanks
    David
  • Hi David,

    One thing I noticed is that when I disable ASPM on the Xilinx side (Enable ASPM optionality, disable ASPM support), the DSP crashes as soon as I attempt a memory write from the FPGA to the DSP. Any ideas why this may happen?

    Thanks,

    Sachan

  • Sachan

    We support ASPM in L0s and L1 state, not L2. You can program to enable or disable ASPM on DSP side.

    Thanks

    David

  • David,

    I've noticed that ASPM was not properly disabled on the FPGA.

    My question to you is this:

    1) If ASPM optionality is on and ASPM support is diasbled on the FPGA, does this effectively disable ASPM on the C6678?
    2) I was told that, prior to version 2.1 of the PCI Express Base Specification, the ability to turn off ASPM was not available. Does C6678 adhere to version 2.1 of the spec? If not, how can we turn off ASPM?

    Thanks,
    Sachan
  • Sachan,

    A1. No, this will not disable ASPM on C6678

    A2. C6678 is adhere to PCIE Spec 2.0. If your goal is to disable the ASPM on C6678:

    - the ASPM Control field (ACTIVE_LINK_PM) in the Link Control Register (LINK_STAT_CTRL, 0x2180_1080) should be set to ‘0’ to disable ASPM. This value should be set on both the Keystone device and the link partner device.

    - It is recommended to change the advertised ASPM capabilities via the AS_LINK_PM field in the Link Capabilities (LINK_CAP, 0x2180_107C) configuration register. By default, this is set to advertise L0s support. Changing this field to a value of ‘0’ will indicate no support for ASPM. Note, this register is marked as read-only, but I think you can change it.

    Regards, Eric

  • Hi All,

    I've noticed the following while debugging in the failed state:


    1)    The FPGA is entering "Recovery.rcvrcfg". After a while, a timeout occurs and the FPGA goes to "Timeout to Detect" and then to  "Detect Quiet"
    2)    It is believed that the recovery state is being entered because transmit of the C6678 is idle.
    3)    On the C6678 side, I've noticed that LOSDTCT0 = 1.

    I'm not sure what is happening, but in the failed state all C6678 PCIe registers are reading back as zero. Any ideas what may be happening?


    Thanks,
    Sachan

  • Ok,

    So we now want to do a bit error rate test on the FPGA. In order to do this, it is necessary to put the PCIe on the C6678 in loop through mode; that is route the physical input directly to the output. Is this possible?

    Thanks,

    Sachan

  • Sachan

    Loopback is supported. Please refer to section 2.12 for the different loopback option, http://www.ti.com/lit/ug/sprugs6d/sprugs6d.pdf