This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

data transmission failure at indefinite time from FPGA to TMS320C6657 with SRIO in Direct IO way

Other Parts Discussed in Thread: TMS320C6657

there is a data transmission for 320*240*8 image bits with  SRIO directIO  from a FPGA to a TMS320C6657 DSP on my board,  FPGA is in master,while DSP is in slaver , and SRIO on my board is connected directly between FPGA and DSP refer from TMS320C6657 EVM design, the SRIO configuration for  DSP is the Keystone_init() after i tested it successfully between 2 DSPs with 6678 EVM, the line speed is in 1.25G, mode 0,configuration 4, but i only used port 0 for image data transmission.

the promblem is that: most of the time the data transmission is successful and the Port0_ERR_STAT is 0x00000002 which means the port_ok bit is 1, However , there will be data transmission failure at indefinite time may be only  after 2 minutes or 20 minutes  and the  Port0_ERR_STAT is 0x00030306 or 0x03030306 which means the port_ok and Port_error bit be set 1, and this failure can not be allowed for my project. so , how i can figure out the reason cause the failure tha the Port0_ERR_STAT is 0x00030306 like ?

pls help, in my thanks.

Best Regards

  • These link level errors will cause the HW statemachines to try to recover through low-level handshaking.  They can be caused from multiple things, like bit errors, error states, ackID alignment.  Please take a look at the following threads and make sure you have implemented/followed them, particularly the Software error recovery steps before trying to send any data packets.

    http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/196080/850001.aspx#850001 - VMIN setting

    http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/170264/752157.aspx#752157 - Software error recovery

    http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/267043/937560.aspx#937560- disable C66x port-writes

    and the following may help decode any error conditions:

    http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/264325/927003.aspx#927003 – Error status, and debug gel

    Regards,

    Travis

  • Thanks for the link threads.

    First, i would like add someting about my hardware design for SRIO is that the RTORXN0/P0 pin of DSP is directly connected with MGTTXn0/p0 of FPGA, while the RX pin of FPGA is the same directly connected with TX pin of DSP,and each RX pin of the devices is added a capacitance with 0.1uF/16V.The design is refered to TMS320C6678EVM.

    VMIN setting------the VMIN _EXP bit has been set to 0xF, which i specified PLM_SP0_VMIN_EXP |=0x0F000000, and the regesiter value is 0x0F030300 whether the transmission is successful or failure.

    Software error recovery------I also write 0X40FC8000 to PLM_SP0_LONG_CS_TX1 before waiting the enable port0 ok from the advise,however, i feel confused that once i have seen the value is advised for 0X2003F044, but each of them i have tried, they all take no effect for my failure. when i set 0X40FC8000 for PLM_SP0_LONG_CS_TX1 , i find the resgister value is 0x40F08000 really whether it is successful or not.

    disable C66x port-writes------and the port_writes disable operation has also been done, that i have set EM_DEV_PW_EN &=0x00000000, and when i see the memory browser, the RIO_ERR_EN, RIO_SP0_RATE_EN =0x00000000.

    Addtionally, i want to post a comparison table for the successful and failure transmission of the register value after i have done above operations.

    Addr Register Sucessful value failure value
    0x0290b15c Sp0_CTL 0x00600001 0x00600001
    0x0290b158 SP0_ERR_STAT 0x00000002

    0x00030306/0x00020302

    0x0290c000 ERR_RPT_BH 0x30000007 0x30000007
    0x0290c008 ERR_DET 0x00000000 0x00000000
    0x0290c00c ERR_EN 0x00000000 0x00000000
    0x0290c040 SP0_ERR_DET 0x00000000 0x00000024
    0x0290c044 SP0_RATE_EN 0x00000000 0x00000000
    0x0290c048 SP0_ERR_ATTR_CAPT_DBG0 0x00000000 0x00000000
    0x0290c04c SP0_ERR_CAPT_0_DBG1 0x00000000 0x00000000
    0x0290c068 SP0_ERR_RATE 0x80000000 0x80000000
    0x0290c06c Sp0_ERR_THRESH 0xFFFF0000 0xFFFF0000
    0x0291B0E0 PLM_SP0_LONG_CS_TX1 0x40F08000 0x40F08000
    0x0291BD48 LOCAL_ERR_DET 0x00000000 0x00000000
    0x0291BD50 LOCAL_H_ADDR_CAPT 0x00000000 0x00000000
    0x0291BD54 LOCAL_ADDR_CAPT 0x00000000 0x00000000
    0x0291BD58 LOCAL_ID_CAPT 0x00000000 0x00000000
    0x0291BD5c LOCAL_CTRL_CAPT 0x00000000 0x00000000

    pls note that SP0_ERR_DET=0x00000024 while the Non-outstandingackID bit and Delineation error bit have been set to 1, what 's the error?  And how i can figure out my error now ?

    Thanks a lot

    Best Regards!

  • lai yi said:
    First, i would like add someting about my hardware design for SRIO is that the RTORXN0/P0 pin of DSP is directly connected with MGTTXn0/p0 of FPGA, while the RX pin of FPGA is the same directly connected with TX pin of DSP,and each RX pin of the devices is added a capacitance with 0.1uF/16V.The design is refered to TMS320C6678EVM

    AC coupled correct?  That should be fine.

    lai yi said:
    Software error recovery------I also write 0X40FC8000 to PLM_SP0_LONG_CS_TX1 before waiting the enable port0 ok from the advise,however, i feel confused that once i have seen the value is advised for 0X2003F044, but each of them i have tried, they all take no effect for my failure. when i set 0X40FC8000 for PLM_SP0_LONG_CS_TX1 , i find the resgister value is 0x40F08000 really whether it is successful or not.

    You are mixing solutions for different devices.  C66x devices use 0x2003F044 into the PLM Port n Control Symbol Transmit 1 register (RIO_PLM_SP(n)_LONG_CS_TX1).  C64x devices use 0x40FC8000 into the local Port n Control Symbol Transmit register (SP(n)_CS_TX).

    lai yi said:
    pls note that SP0_ERR_DET=0x00000024 while the Non-outstandingackID bit and Delineation error bit have been set to 1, what 's the error?  And how i can figure out my error now ?

    Delineation errors are not common.  They only occur when there are multiple adjacent bit errors, or you have some sort of clock drift adjustment.  Basically, the 8b/10b characters at the receiver, suddenly appear to straddle the boundary of two consecutive characters, i.e. the alignment seems to be incorrect all of a sudden.  The non-oustanding ACKID could also be from a bit error.  Again, there is HW statemachines to recover from that.  You will only get the Port_ERR you showed above if the HW can't recover after mutiple (4) attempts.  HW should be able to recover unless you see the Port_Err or Input/Output Error Stopped states.

    Regards,

    Travis

  • That will be a little clearness about the delineation error after the construction. However , i am still not sure how to correct the bit error cause i have encountered this delineation error bit again even if i have make the SP0_ERR_STAT is Ok and it's value is 0x00000002.

    So i tried to add the software assisted error recovery procedure which firstly write the 0x2003F044 into the  PLM_SP0_LONG_CS_TX1 , and it take no effect about the recovery. As additional steps must be immediately taken to align ACKIDs, i tried the following steps:

       SRIO_SP0_LM_REQ=4;(send input-status command)

        while(0==(SRIO_SP0_LM_RESP>>31));

       asm("nop 5");

       inboundACKID=(SRIO_SP0_LM_RESP&0x000003E0)>>5;

       SRIO_SP0_ACKID_STAT=(inboundACKID|(inboundACKID<<8)|((inboundACKID+1)<<24));

    when i add the following steps after i have tested the SPo_ERR_STAT.Port ERROR or  SPo_ERR_STAT. Input/Output Error Stopped , the procedure will stop in the step of  while(0==(SRIO_SP0_LM_RESP>>31)) and can't go on, if i remove the "while(0==(SRIO_SP0_LM_RESP>>31)) "step, i found the ACKID alignment can't not figure out my problem, i still found the SPo_ERR_STAT=0x03030306 or like that, and  SP0_ERR_DET=0x00000024 ?

    so is it the software Software Assisted Error recovery work or not, or how i can really figure my problem out now?

     

    Note :

       i would like to post another way to deal with my problem which i test my procedure occasionally. As i configure the serdes pll register for 0x000015A1 initially, when the data transimisson become failure that the SPo_ERR_STAT=0x03030306 or like that,  i write 0x000015A0 to  the serdes pll register(0x02620360) , i found  then all the related register value cleared at that time  , then i write the 0x000015A1 into the serdes pll register again,  i found my failed data transmission turn into sucessful, but the SPo_ERR_STAT become into 0x03020206, why?  And i  still doubted i have figure out the data transmission failure problem in this way.

       And i cound write SPo_ERR_STAT |=0x02020204 to clear the SPo_ERR_STAT into 0x00000002, but then i also found the  SP0_ERR_DET=0x01000024 , and the failed data transmission still occur, why?

     

    Pls really do me a favor to help me .

    thanks.

    Best regards!

  • lai yi said:
    That will be a little clearness about the delineation error after the construction. However , i am still not sure how to correct the bit error cause i have encountered this delineation error bit again even if i have make the SP0_ERR_STAT is Ok and it's value is 0x00000002.

    Are you checking for this error and clearing status of this error after initialization but before sending and packets?  It is possible this is getting set during the initialization process.  If you adjust the VMIN setting and clear this bit, then see the issue, you will need to look at signal integrity and your refclks.

    lai yi said:
    So i tried to add the software assisted error recovery procedure which firstly write the 0x2003F044 into the  PLM_SP0_LONG_CS_TX1 , and it take no effect about the recovery. 

    This process will only clear the input/output errored stopped states.  The FPGA should respond with the same handshaking as the DSP in response to the PNA and Link request input status control symbol.

  • lai yi said:
    so is it the software Software Assisted Error recovery work or not, or how i can really figure my problem out now?

    If both devices are coming out of reset, ACKIDs should really already be aligned.  Concentrate on getting rid of the error states first.  There is code posted in that other thread on doing the AckID alignment, I'm not sure if that is what you are using, but it is confirmed to work between DSPs.

    lai yi said:
    i found my failed data transmission turn into sucessful, but the SPo_ERR_STAT become into 0x03020206, why?

    When you disabled the serdes PLL, you can't read many of the registers in the peripheral because they are clocked off of a clock derived from the serdes clock.  That is why it shows 0's.  SRIO requires a very specific initialization sequence that is shown in our examples and platform lib.  You can not simply disable the PLL and reset everything.  Like I said, work on getting port_ok without any errors before any packet transmissions first.

    lai yi said:
    And i cound write SPo_ERR_STAT |=0x02020204 to clear the SPo_ERR_STAT into 0x00000002, but then i also found the  SP0_ERR_DET=0x01000024 , and the failed data transmission still occur, why?

    SP0_ERR_STAT is write 1 to clear register, except for the INPUT/OUTPUT ERROR STOPPED bits.  So it will clear these bits you mention.  However, it won't necessarily clear the condition that caused the bit to be set, for example the PORT_ERROR bit will be cleared, but unless the ACKIDs are aligned, you will still have problems next time you try to send a packet.

    Regards,

    Travis

  •          Are you checking for this error and clearing status of this error after initialization but before sending and packets?  It is possible this is getting set during the initialization process.  If you adjust the VMIN setting and clear this bit, then see the issue, you will need to look at signal integrity and your refclks.     

    I should say yes i have clearing stauts of SP0_ERR_DET (0x0290c040) after initialization , when the data transmission is ok at the beginning the  SP0_ERR_DET =0x00000000,but i also found  the  SP0_ERR_DET =0x00000024 after a while during the succeful transmission , so does that mean there will be only  signal integrity and your refclks stability?

            This process will only clear the input/output errored stopped states.  The FPGA should respond with the same handshaking as the DSP in response to the PNA and Link request input status control symbol.

    As i add the  PLM_SP0_LONG_CS_TX1 = 0x2003F044  after the  initialization, i can't see the the input/output errored stopped states have been cleared when the transmission occur again,  i should check the master FPGA respond status though it's not my work, and then what i should do in software error recovery?

     

  •                  If both devices are coming out of reset, ACKIDs should really already be aligned.  Concentrate on getting rid of the error states first.  There is code posted in that other thread on doing the AckID alignment, I'm not sure if that is what you are using, but it is confirmed to work between DSPs.

    I am not clear about the code on doing AckID alignment, I only see the Keystone_SRIO_match_ACKID module used int the SRIO_2DSP_TEST()function, However, i ever consult the TI FAE  the Keystone_SRIO_match_ACKID module isn't necessary for the slaver DSP as i used the DSP for the slaver.Anyway, i tried to add  the Keystone_SRIO_match_ACKID module in my function after the initialization is finished. Keystone_SRIO_match_ACKID module  as follow.

    SP0_LM_REQ=4;

    while(0==(SP0_LM_RESP>>31))

    {

     SP0_LM_REQ=4;

    }

    asm("nop 5");

    uiRemote_In_AckID=(SP0_LM_RESP&0x000003E0)>>5;

    SP0_AckID_STAT=uiRemote_In_AckID;

    do

    {

       ui_Local_In_AckID=(SP0_AckID_STAT&0x3F000000)>>0x00000018;

        uiMaintenanceValue=(((uiRemote_In_AckID+1)<<0x00000018) | ui_Local_In_AckID );

        uiResult=Keystone_SRIO_Maintenane(0,0,0xFF,0x148,GLOBAL_ADDR(& uiMaintenanceValue),0x81);

       if(uiResult)

       continue;

         uiResult=Keystone_SRIO_Maintenane(0,0,0xFF,0x148,GLOBAL_ADDR(& uiMaintenanceValue),0x80);

         uiRemote_Out_AckID=uiMaintenanceValue & 0x0000003F;

    }while(uiResult | (uiLocal_In_AckID+1 ! = uiRemote_Out_AckID));

    As above, the  Keystone_SRIO_match_ACKID module  called the Keystone_SRIO_LSU_transfer function which i ever confirm that the LSU configuration is not necessary for slaver DSP. So i tried get rid of the do-while circle only reserve the following steps.

    SP0_LM_REQ=4;

    while(0==(SP0_LM_RESP>>31))

    {

    SP0_LM_REQ=4;

    }

    asm("nop 5");

    uiRemote_In_AckID=(SP0_LM_RESP&0x000003E0)>>5;

    SP0_AckID_STAT=uiRemote_In_AckID;

     

    But the error transmission failure still occur on the same board.

    And if i add the maintenance function , my procedure will stop at the step of Keystone_SRIO_wait_LSU_Completion or the LUS_Reg6 is the busy which means the LSU can't complete the transfer. so  i think as this Keystone_SRIO_Maintenane(0,0,0xFF,0x148,GLOBAL_ADDR(& uiMaintenanceValue),0x80) module is used between 2DSPS , that will not work for the status between FPGA and DSP cause the offset addr 0x148 won't be exsit in the FPGA.

    So could you give me a copy of  the right code about AckID alignment though i am not sure whether it is the AckID alignment problem or not? or what can i do after i add the code except the d0-while circle when the error transmission still occur again and again?

    Lastly, could you pls tell me how to get the SRIO debug GEL and how to use it for my Port0 debug?

    Thanks really.

    Best regards!

     

     

     

  • The SRIO debug script to dump the registers is at: http://processors.wiki.ti.com/images/6/60/Keystone_SRIO_bug_report_gel.zip

    I'll ask someone to post the ackid alignment code.

    Regards,

    Travis

  • Here is the code snippet to re-align ACK IDs. You only need to run this code from one side of the SRIO link as it will send a maintenance write in order to overwrite the remote ACK IDs to match the local ACK ID values.

    CSL_SrioHandle hSrio;
    Uint8 ackIdStatus, linkStatus, i, localInboundAckId, localOutboundAckId;
    uint32_t newAckIdStat, remoteAckIdStat;
    SRIO_LSU_TRANSFER lsuTransferData;

    hSrio = CSL_SRIO_Open (0);

    //Uncomment the next two lines to clear all of the errors before sending the control symbols
    //hSrio->RIO_SP_ERR[0].RIO_SP_ERR_DET = 0x00000000;
    //hSrio->RIO_SP[0].RIO_SP_ERR_STAT = 0xFFFFFFFF;

    //Sending this control symbol will cause the Link Response Status below
    hSrio->RIO_PLM[0].RIO_PLM_SP_LONG_CS_TX1 = 0x2003F044;

    //ackIdStatus and linkStatus are described in the SPn_LM_RESP register in the SRIO user guide
    CSL_SRIO_GetLinkResponseStatusInfo (hSrio, 0, &ackIdStatus, &linkStatus);

    // ackIdStatus will contain the next inbound ACK ID that the remote device is expecting
    // We will place this value as the next outbound ACK ID from this local device
    newAckIdStat = ((ackIdStatus << 8) | ackIdStatus);
    hSrio->RIO_SP[0].RIO_SP_ACKID_STAT = newAckIdStat;

    localInboundAckId = ((hSrio->RIO_SP[0].RIO_SP_ACKID_STAT >> 24) & 0x1F);
    //Increment the ACKID below and place it in localOutboundAckId because once we send the maintenance packet this value will increment automatically
    localOutboundAckId= ((hSrio->RIO_SP[0].RIO_SP_ACKID_STAT + 1) & 0x1F);

    #ifndef _BIG_ENDIAN
    //Maintenance packets are sent in big endian so the value must be byte-swapped here
    remoteAckIdStat = localOutboundAckId |
    (localInboundAckId << 16) |
    localInboundAckId << 24;
    #else
    remoteAckIdStat = (localOutboundAckId << 24) |
    (localInboundAckId << 8) |
    localInboundAckId;
    #endif

    memset(&lsuTransferData, 0, sizeof(SRIO_LSU_TRANSFER));
    // Maintenance write to re-sync the mis-aligned ACK IDs
    lsuTransferData.rapidIOLSB = (uint32_t)0x148; //SRIO offset to SP(0)_ACKID_STAT
    // Local source address to transfer, if the address is located in an L2 memory then you must use a global address like this: l2_global_address((uint32_t)&remoteAckIdStat);
    lsuTransferData.dspAddress = (uint32_t)&remoteAckIdStat;
    lsuTransferData.bytecount = 4;
    lsuTransferData.idSize = 1;
    lsuTransferData.dstID = 0x1111;
    lsuTransferData.ttype = 1;
    lsuTransferData.ftype = 8;

    while (CSL_SRIO_IsLSUBusy(hSrio, 0) == TRUE){}
    //Send maintenance write packet
    CSL_SRIO_SetLSUTransfer(hSrio, 0, &lsuTransferData);