This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C6474 SRIO Error-stopped state

I am developing code for the C6474 EVM board, using CCS 3.3.82.13, DSP/BIOS 5.33.04 and XDS510USB emulator.

 

I am finding at times that the SRIO_SP1_ERR_STAT register reports that the port is in one of the following states:

Bit 16: Output port is in the OUTPUT ERROR-STOPPED state.

Bit 8: Input port is in the INPUT ERROR-STOPPED state.

 

These states can occur at various times, either during power up when using an external end point (separate power supplies), during code startup, and configuration of SRIO port, where the ends of the link are starting up at different times.

The problem can occur on either port 0 or port 1, whichever port I am trying to use.

It can occur on port 0 if using a RapidFET probe to connect to the external connector, and the connector is plugged in late, or the probe is powered up at the wrong point in time, or the probe is reset for any reason.

The problem is intermittent.

 

I have tried to find how to clear these conditions, but SPRUG23.PDF offers no advice as to what the error actually indicates, how the error occured, or how to clear the error to allow the SRIO link to continue to be used.

 

Does anyone have any information on how to clear these stalled states, as the only way that I have found so far, is to power cycle, which is obviously not an acceptable solution.

 

  • James,

    The conditions for entering the input and output error stopped states are described in section 5.11 of the RapidIO Serial Specification:

    http://www.rapidio.org/specs/disclaimer?specfile=/zdata/specs/serial_book.pdf

    Basically, you enter these states because of bit errors that show up as protocol, packet, control symbol, or idle errors.  Generally hardware recovers from such errors automatically, but this would depend on the magnitude of the error burst.  The cases you mentioned above could easily cause bit errors.

    Executing the following step should cause the correct handshaking between link partners which will exit the input and output stopped states for both devices...

    **Write a value of 0x40FC8000 into the Port n Control Symbol Transmit register for the given DSP port you are working with.  You only need to do this on one link partner, not both.

    This causes a PNA and Link Request to be sent in the Stype 0 and 1 fields of a control symbol to the link partner.  The PNA causes the far end to issue a link request to the near end, and the link request causes the far end to issue link response.  The near end receiving the link request issues a link response. This is discussed in the thread:  http://community.ti.com/forums/t/4545.aspx

     

    Regards,

    Travis

  • Thanks Travis,

    hopefully this will cure the problem. Will report back if problem still continues.

    It is a shame, that SPRUG23.PDF does not contain more information on error handling. Details about the Port control symbol transmit register are minimal to say the least.

    Though with some careful reading of various sections of the SERIAL_BOOK.PDF in your link, it is possible to begin to understand how your solution could work.

     

  • An update.

     

    I have had limited success with using the value 0x40FC8000, sometimes this appears to clear an INPUT ERROR-STOPPED state.

    Other times, it transitions to an OUTPUT ERROR-STOPPED state.

    The value 0x40FC8000 suggests an ackID of 0, should I be using the ackID of the last received message, in place of ackID 0, I have tried this, but do not appear to have made much difference to the problem.

    Aditional information, I am connecting the C6474EVM board to a RapidFET probe, so it is the RapidFET probe that is at the other end of the link, and includes (I think) a C6455, and some sort of switch device, so I think that the immediate device connected to the C6474 port will be the switch device.

    I am unclear on quite what the difference between INPUT and OUTPUT stopped states are, and why when writing 0x40FC8000 to the SPn_CS_TX register can cause an INPUT ERROR-STOPPED state to clear but be replaced by an OUTPUT ERROR-STOPPED state.

    I am not sure if my problem may be some side effect of the hardware in the RapidFET probe, which is reacting badly to the 0x40FC8000 value, or if I need to take a different action to the OUTPUT ERROR-STOPPED state.

    Sections 5.11.2.6 and 5.11.2.7 give some info on recovery sequences, but it is not entirely clear what steps are automatic, and what are required to be provided by software.

     

     

  • Hi James,

    A couple comments that hopefully clear up a few things..

    -  See the attached for information on software error recovery using the 0x40FC8000 value

    5241.SRIO_Error_Recovery.pdf

    -  The fact that you are clearing the input error stopped but transitioning to the output error stopped state should not be related to this software error recovery mechanism.  As you can see in the pdf, it will clear both stopped states.  However, if the normal hardware recovery didn't work and the software recovery is needed, you may need to stop packet transmission in both directions until the states are cleared.  If there are packets being transmitted by one end of the link while the other is performing the recovery procedure, this out of sync behavior could cause what you are seeing.

    -  The ACKID=0 as part of the PNA is a don't care.  The ACKID of any PNA is considered invalid because it is possible that this field contained the bit error in the orginal errorred packet request.  Instead of relying on the ACKID value in the PNA, the PNA causes the link-request handshaking to occur in order to discover the true ACKID that is expected by the attached device.

    -  Again a normal bit error is recoverable by hardware, but depending on the magnitude of the error(s) and their location, if things get far enough out of sync between link partners such that the fatal port error in the RIO_SP(n)_ERR_STAT is set, then software will need to step in.


    Hope that helps,

    Travis

  • James,

    I just wanted to add some addtional information that may be handy.  See the attached pdf.  It details the difference between using the SP_CS_TX register to recover from the Input and Output Error-stopped states and the additional requirements for further aligning ackids.  It was unclear from your description if you were running into the latter case.

    Regards,

    Travis

  • Travis, thanks for that, I think that is probably the missing explanation that I was needing.

     

    Regards Gareth.