This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C6472 SRIO error/status registers - bits don't indicate error condition

Hi

We are interfacing to an Altera FPGA and sometimes after reset we have a condition where the C6472 and Altera SRIO status bits say the link is OK but data is not transfered. For writes to the DSP, we see the DSP's SP_ACKID_STAT field shows normal incrementing of the inbound and outstanding ACKIDs, yet the DSP's memory is not changing. And for reads issued by the FPGA, the read is timing out yet the ACKs are also incrementing.

On the DSP we are monitoring these regs and are the same as when the reads/writes work.

SP_LM_RESP

SP_ERR_STAT

LSU_REG6 (there shouldn't be anything here but watching it nonetheless)

ERR_DET

Are there any other registers we can monitor to find out why the reads and writes issued by the Altera SRIO are not correctly reading and writting?

Cheers

 

  • Sure sounds like the packet is being dropped at the logical layer for some reason.  If the configuration between working and non-working is the same, it would point to packet differences.  Are the packet DESTIDs the same for both cases?  Packets can be dropped if the DESTID of the packet doesn't match the local Base_ID, if you are not in promiscuous mode. 

    Regards,

    Travis

     

  • Hi Travis

    This is a point to point connection, so the device IDs are constant. I believe we have the promiscuous bit set.

    We have noticed that the Altera "packet_accepted" bit is not being set when the link is broken (i.e. DSP reads issued from on the Altera side). I'm going to write some code that will have the DSP send a dbell or small data packet when we observe this problem. That should give us some error bits on the DSP side.

    Since the DSP is ACKing the reads, the lack of "packet_accepted" on the Altera side suggests the DSP is not sending a data packet reply, or the Altera gets it but doesn't know that to do with it (which would suggest a DESTID problem). But sometimes, a Altera SRIO block disable/enable fixes the broken link, and sometimes it does not.

    Do you know of any SRIO analyzers that can be used for point-to-point systems (i.e. no switch is present).

    Cheers

    Eddie

  • Hi Travis

    I know at one time you mentioned the issue about the ACKIDs not being synchronized. Assuming this is the problem when we reload and run the DSP. The Altera documentation states

    If the link partner is reset when its expected ackID is not zero, a fatal error occurs
    when the link partner receives the next transmitted packet because the link partner’s
    expected ackID is reset to zero, which causes a mismatch between the transmitted
    ackID and the expected ackID. The fatal error causes a soft reset of the MegaCore
    function. After the soft reset completes, transmitted and expected ackIDs are
    synchronized and normal operation resumes. Only the packets that were queued at
    the time of the fatal error are lost.

    I've got a sequence that always works for resetting the DSP

    1. Disable FPGA SRIO port
    2. Reset DSP in SRIO boot mode
    3. Enable FPGA SRIO port

    Reads and writes from the FPGA into DSP memory always work. This process never fails.

    But if I do this

    1. Disable FPGA SRIO port
    2. Reload DSP and run (its running the DIO lib srio init code)
    3. Enable FPGA SRIO port

    The first time I do this, reads and writes work (yeah!!!). The second time, writes work and reads fail. When the FPGA issues a read, the DSP's PORT[0].SP_ACKID_STAT inbound ACKID is incrementing yet the FPGA is not indicating packet-accepted or packet-not-accepted. Nor are there any errors in the registers below. I've set both the DSP and FPGA to promiscuous mode. Any ideas on what I can check to find out why the DSP is not sending the data read  it increments its ACKID?

    SP_LM_RESP

    SP_ERR_STAT

    LSU_REG6 (there shouldn't be anything here but watching it nonetheless)

    ERR_DET

    SP_ERR_DET


    Since I can consistently get it to fail on the second DSP reload, yet it works all the time with resets, it would suggest the SRIO init code in the DIO lib is missing a crucial SRIO setup step.

    Cheers

     

  • Eddie3909 said:
    Do you know of any SRIO analyzers that can be used for point-to-point systems (i.e. no switch is present).

    There are some midbus connectors and analyzers out there.  I don't have any experience with them, but here is one example:

    http://www.nexustechnology.com/products/bus/srio/  If I remember correctly, they are Agilent  based, and there is a similar one for Tektronix.

     

    Regards,

    Travis

  • Eddie,

    I definitely think it is something with Ackids.  The first proceedure is definitely resetting the Ackids to 0 on the DSP.  The second proceedure, I'm not sure about.  Maybe the read response is dropped by the FPGA as a queued packet during the fatal error?  Instead of the read, send a doorbell and see if the interrupt bit ICSR is being set on the DSP.  If it is, but the doorbell response is not received by the FPGA, then you know the problem is on the DSP-->FPGA side.  If the interrupt is not set, you know it is on the FPGA-->DSP side. 

    Is it not possible to stick with the procedure that works all the time for you?

     

    Regards,

    Travis

  • tscheck said:
    Instead of the read, send a doorbell and see if the interrupt bit ICSR is being set on the DSP.

    The bit is not being set, yet SP_ACKID_STAT indicates the INBOUND count has incremented.

    When the system is running OK, the dbell bit in ICSR is being set and the INBOUND ACK count increments.

    tscheck said:
    If it is, but the doorbell response is not received by the FPGA,

    I verified that when the system is working OK, the FPGA rx packet-accepted and tx packet-transmitted bits are set when it sends a dbell to the DSP. When the system is not working, rx packet_accepted is not being set however the tx packet-transmitted is being set.

    When I try to send a dbell from the DSP to the FPGA I get:

       LSU0_REG6 (0x2d00418) = 0xe
        b4-b1 = 7
            : completed, packet not sent due to unavailable outbound credit at given priority.

    I added the code below before the rio_init()

        {
            CSL_Status              status;
            CSL_InstNum             srioNum = 0;  /* Instance number of the SRIO */
            CSL_SrioParam           srioParam;
            status = CSL_srioInit (&srioContext);
            srioObj.hCslObj = CSL_srioOpen (&srioCslObj, srioNum, &srioParam, &status);
            tmp = hSrioDirectIO->hCslObj->regs->PORT[0].SP_CTL;
            tmp = 0x00800001;  // setting PORT_DISABLE
            hSrioDirectIO->hCslObj->regs->PORT[0].SP_CTL = tmp;
            tmp = hSrioDirectIO->hCslObj->regs->GBL_EN;
            tmp = 0x00000000;  // clear b0 - Global enable
            hSrioDirectIO->hCslObj->regs->GBL_EN = tmp;
        } 

    and the SRIO port is 100% OK when following this  sequence

    1. Disable FPGA SRIO using Port 0 CSR reg PORT_DIS bit
    2. Reload and run DSP (DSP configures SRIO but does not issue any dbells, writes or reads)
    3. Enable FPGA SRIO using PORT_DIS bit

    Its been working really great. No need to cold boot the DSP and FPGA.

    Somehow, disabling the DSP's SRIO is allowing the ACKs to get properly sync'd.....I think.

    But whats interesting is when I send a dbell to the DSP, its ACK counter increments but the dbell bits are not set. Nor is an error condition recorded. The error condition only occurs when the DSP tries to send a dbell.

    One other strangity is the "unavailable outbound credit". Base on a comment you made in another post

    The outbound credit issue depends on a number of factors.  For example, maybe the endpoint or switch you are connected to can not keep up with the packet rate, buffers back up and eventually fill the TX buffers on the DSP. 

    http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/112/p/71298/346980.aspx#346980

    the error can't be due to me flooding the DSP's SRIO buffers because I'm only sending a single dbell.

    Cheers

     

  • I also figured out how to get the link to work without doing anything to the FPGA. I added the code above to disable the port using PORT_DISABLE and GBL_EN, then I sent

    4 control symbol reset-device resets (see Rapid IO Part6 - LP Serial Physical Layer Spec). I'm guessing this resets the FPGA's ACKIDs.

    void sendResetCommandToFpga(Uint32 cnt)
    {
        Uint32 link_status=0;
        Uint32 i = 0;
       
        printf ("ACTION : Sending SRIO reset to FPGA.\n");
        // Send reset request. (Standard says must send 4 in succession to reset port.)

        for(i = 0; i < cnt; i++)
        { 
            CSL_FINS(srioObj.hCslObj->regs->PORT[0].SP_LM_REQ, SRIO_SP_LM_REQ_COMMAND, 0x3);
        }

        HwMgr_timerWait(1000);  // input is in msec.
       
        link_status = srioObj.hCslObj->regs->PORT[0].SP_ERR_STAT ;
        while ((link_status & 0x00000002) != 0x00000002)
        {
            printf ("INFO : Link IS DOWN! Init DONE.\n");
            decodeSrioError();
            clearSrioError();
        }
        if ((link_status & 0x00000002) == 0x00000002)
        {
            printf ("INFO : Link IS UP! Init DONE.\n");
            decodeSrioError();
            clearSrioError();
        }   
    }

    We also just discovered that the Altera has a “Automatically synchronize transmitted ackID” and “Send link-request reset-device on fatal erros” option. Of which we only had the former configured and not the latter.

    We're about to try this on the FPGA and I hope we can ditch my function.

  • Eddie3909 said:
    the error can't be due to me flooding the DSP's SRIO buffers because I'm only sending a single dbell.

    The only thing I can think of here is that the port-writes are not disabled from the DSP and are filling the output buffers.  Or, the FPGA is sending some packet type that requires a response and those responses are filling the TX buffers.

     

    Regards,

    Travis

     

  • BTW Eddie, sounds like you have it working, so let me know if there is a question remaining or if you run into something else.

     

    Regards,

    Travis