This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Intermittent SRIO Training Failure

We have a board that has a C6657 DSP connected to an FPGA via SRIO.  We need the SRIO to run in x4 mode for required performance.  When we power on / reset the board, then run the SRIO init software, the SRIO usually trains correctly.  Intermittently, however, it trains to x1 mode.  When that happens, we cannot recover unless we reset the DSP, which is an unacceptable solution.

We've tried everything we can think of on the FPGA side (reset SRIO, reset FPGA, reconfigure FPGA, etc.), but nothing worked.

We have software (residing in boot flash memory) that initializes the SRIO, checks and displays lane status, then resets the DSP. Like I mentioned above, it mostly trains correctly.  Sometimes it trains to 1x and we can recover by re-running the init sequence (without resetting the DSP).  After 16 of those re-initialization attempts, if it doesn't train to x4, we display a failure message.  In either case, we reset the DSP to see what happens next.

When we see a failing case, the DSP reset always clears the error, and we are able to train the SRIO to x4.

I can go into more detail, if that would be helpful.

Thanks,
jw 

  • It appears that we have solved the problem.

    For our SRIO driver, we are using the code in device_srio_loopback.c in C6657 PDK 1.1.2.5.  In the function SrioDevice_init, lines 180-207 initialize the SRIO SERDES. That code does this:

    1. Configure the SERDES PLL
    2. Configure the SERDES receiver
    3. Configure the SERDES transmitter
    4. Wait for the SERDES PLL to lock

    We moved step 4 after step 1, and that seemed to fix the problem.  We ran our test over a weekend with no failures.  In the original configuration, the test would fail within 15 minutes.

    We suspect that the problem is that the SERDES RX and TX depend on the PLL.  Until the PLL locks, those registers are not valid, and writes to them do bad things.  That's all speculation, though.

    Perhaps this has been fixed in a later version of the PDK.  We can't update the PDK, since that would require regression testing here and for our customer.  Our schedules do not allow for that, so we're stuck with PDK 1.1.2.5,