This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SRIO descriptor placement

Hi,

Is there any limitation on placing SRIO descriptors in external linking ram? Do we have an example application which demonstrates the SRIO descriptors in external linking ram?

Following are our observations. 

Working Setup: 

Memory Region 0 with 16K descriptors for both SRIO Rx and Tx using internal Link RAM. 

Non-Working Setup: 

Memory Region 0 with 16K descriptors for SRIO Tx using internal Link RAM. and

Memory Region 1 with 16K descriptors for SRIO Rx using external Link RAM. 

Upon sending packets over SRIO, we see that our garbage queue of Transmission Error is getting populated. Not even a single descriptor is sent successfully. 

We are using pdk_C6678_1_1_2_5, bios_6_33_06_50, CGT 7.4.7 and edma3_lld_02_11_07_04. 

We use type 11 messaging passing and 4096 bytes Tx /Rx buffers. Lane is configured to 1x speed at 3.125.

We also ensure that the memory regions addresses are ascending while inserting memory into QMSS and there are no errors while setting up the QMSS. 

Regards

Judah

  • Hi,

    TI MCSDK SRIO example use the internal link RAM.

    In general, Queue managers need to be programmed with identical descriptor memory regions and likewise, the linking RAM

    registers in both QMMS need to be programmed identically. Please note that, Linking RAM 0 need to be configured to use internal QMSS memory where as for Linking RAM 1, it would use L2 or DDR. For efficiency reasons, it would be best to use the internal QMSS Link RAM memory.

    The Linking RAM Region 0 Base Address Register is used to set the base address for the first portion of the linking RAM. This address is used by the queue manager to calculate the 32-bit linking address for a given descriptor index. For more info. please see Table 4-1 for queue configuration region registers from the above user guide.

    I would recommend you to review the attached QMSS PPT slides for more understanding on QMSS descriptors and descriptor memory region  QMSS training.

    I will check with Multicore Navigator team and get back to you. Take a look at below links:

     Thanks,

  • Hi Ganapathy,
    We followed all your suggestions.
    Yet, we found ourselves in non-working set up as explained above.

    We will wait for multi-core navigator team inputs.

    Best Regards
    Judah
  • Hi Ganapathy,
    Thanks for your yesterday's suggestions. We cross checked at our side and found that we implemented all your suggestions.

    Are there any inputs from your Multi-core team?

    Regards
    Kishor
  • Hi Ganapathy,
    Few more points of information.

    1. We are using host descriptors
    2. We also have 64 K descriptors in another Region (2) for CPSW in both of the above mentioned setups.

    Thank you.

    Best Regards
    Kishor
  • Do you see the issue when all cache is disabled?  What TX garbage queue is being populated with the failed TX descriptors?  When you read the failed TX descriptors, to they look like they should when they were programmed? 

    If the SRIO peripheral is attempting to send the packets as evidence from the garbage queues being filled, then it is reading a descriptor from somewhere, but it seems the descriptor is incorrect in some way.


    Regards,

    Travis

  • Hi Travis,
    Here are the answers for the above queries.

    1. Issue is seen even after disabling cache
    2. Transmission Error garbage queue is being populated
    3. The failed descriptors are what we have programmed
    4. We re-checked again to confirm the descriptors are (infact) read from the correct queues.

    Any other inputs?

    Do you see a problem with SRIO Rx and TX descriptors being allocated from different pools (allocated by CPPI) and using different linking RAM?
    Please note only change we made is to push SRIO Rx Descriptors to external linking RAM and to draw these descriptors from a different CPPI pool - which drives the system go non-working. Otherwise all the 8 cores used to Rx and Tx on SRIO successfully .

    Regards
    Judah
    P.S: Apologies for the late reply.
  • Hi Judah,

    There shouldn't be any issue with using an external linking RAM. Are the descriptors involved with the external linking RAM in DDR memory? Are you successful when using the internal linking RAM with descriptors/buffers located in DDR or whatever memory you are using?

    The Trans_ERR Garbage queue is only popluated for two reasons. First, if the link partner sent an ERROR response, our device would stop trying to send the message and put the descriptor in the Trans_ERR garbage queue. This is the more obvious reason and should be easy to verify on your system if that is happening. The second reason, that this garbage queue is populated is when there is a RETRY from the link partner and when our device is trying to resend the retried packet, it gets a DMA error internally when trying to access the data buffer. A couple things to think about and check here. When the peripheral is trying to resend a given packet/segment of the message, it uses the buffer pointer in the descriptor + an address offset that depends on the particular segment it must resend. Because of this, you must use Host buffers only, which you mention that you do above, so that is good. When the peripheral tries to grab that particular segment worth of data buffer, it uses the CAU (master DMA port of the peripheral) to access VBUS. It does not use the CPDMA port. If the data buffer memory that you are trying to reach is not accessible for some reason, this would cause a DMA error. The PRIVID of the DMA port is probably different from the CPDMA port, so if the memory is access protected based on PRIVID, this could be the issue. Additionally, if the memory is DDR and ECC is used, if there is an ECC error, this could cause a DMA error. You might look at alignment and how the memory is used.

    Hope that helps,
    Travis
  • Hi Travis,

    Apologies for the late reply.

    I am working on this issue along with Judah.

    Please find the below answers along with your earlier questions:

    [Travis]: Are the descriptors involved with the external linking RAM in DDR memory?

    Ans: Yes. Descriptors and their Data buffers are in DDR3 memory whereas external linking RAM is placed in MSMC memory.

    [Travis]: Are you successful when using the internal linking RAM with descriptors/buffers located in DDR or whatever memory you are using?

    Ans: Yes.

    In your last reply, you mentioned that one of the reasons for Trans_ERR in Garbage queue as link partner could send an ERROR response. Here you meant for ‘link partner’ is consumer core ?

    On the non-working setup, we have read the values from some of below sRIO registers:

    SP0_ERR_STAT Port 0 Error and Status CSR :   0x 00 02 00 02

    SP1_ERR_STAT Port 1 Error and Status CSR :   0x  00 02 02 02

    SP2_ERR_STAT Port 2 Error and Status CSR :   0x  00 02 02 02

    SP0_ERR_DET Port 0 Error Detect CSR : 0x  00 00 00 12

    SP1_ERR_DET Port 1 Error Detect CSR : 0x  00 10 00 15

    SP2_ERR_DET Port 2 Error Detect CSR : 0x  00 10 00 15

    Garbage_Coll_QID 0: 0x  94 03 95 03

    Garbage_Coll_QID 1: 0x  96 03 97 03

    Garbage_Coll_QID 2: 0x  98 03 99 00

    SP0_ERR_RATE Port 0 Error Rate CSR: 0x 80 00  00 02

    Do you see any suspicious points from these sRIO register values ?

    On the non-working setup, we have observed that the first Tx descriptor which is associated with the first Tx packet itself trapped into Garbage queue with Trans_ERR.

    Requesting you to provide further inputs.

    Thanks & Best Regards,

    Rajanikanth.

  • Hi Travis,

    Requesting you to provide your inputs.

    Thanks & Best Regards,

    Rajanikanth.

  • Hi Travis,

    Waiting for your inputs.

    Your inputs help us to resolve this problem.


    Thanks & Best Regards,

    Rajanikanth.
  • <In your last reply, you mentioned that one of the reasons for Trans_ERR in Garbage queue as link partner could send an ERROR response. Here you meant for ‘link partner’ is consumer core ?>

    The device receiving the message request sent and error message response. I should have been more careful when saying link partner, because that is not technically correct. For example, lets say device A and B are both connected to a switch device. The link partners in this case are device A and the switch, and the other link partners are device B and the switch. If A sends a message to device B, and device B responds with a error response to A, then the TX descriptor in device A will be put on the Trans_Err garbage queue.

    < SP0_ERR_STAT Port 0 Error and Status CSR : 0x 00 02 00 02

    SP1_ERR_STAT Port 1 Error and Status CSR : 0x 00 02 02 02

    SP2_ERR_STAT Port 2 Error and Status CSR : 0x 00 02 02 02>

    Port Ok, on all ports. So that is good. Input and output error encountered is ok since you are not in any stopped state.

    <SP0_ERR_DET Port 0 Error Detect CSR : 0x 00 00 00 12

    SP1_ERR_DET Port 1 Error Detect CSR : 0x 00 10 00 15

    SP2_ERR_DET Port 2 Error Detect CSR : 0x 00 10 00 15>

    There are a lot of physical layer errors here, I'd suggest you make sure these are all cleared before starting any data transfers. Did you do this? Read the software error recovery document posted on this thread... e2e.ti.com/.../752157

    <Garbage_Coll_QID 0: 0x 94 03 95 03

    Garbage_Coll_QID 1: 0x 96 03 97 03

    Garbage_Coll_QID 2: 0x 98 03 99 00>

    These can be set to which ever queues you want to use for collecting the descriptors for the various error conditions. However, from these values, it looks like reserved bits are being set in these registers. So please make sure you are looking at the correct queues.

    Travis
  • Hi Travis,

    Thanks for the reply.

    1. We are clearing the errors for all ports as below:

     srioRegs->RIO_PLM[0].RIO_PLM_SP_LONG_CS_TX1 = 0x2003F044;
     srioRegs->RIO_PLM[1].RIO_PLM_SP_LONG_CS_TX1 = 0x2003F044;
     srioRegs->RIO_PLM[2].RIO_PLM_SP_LONG_CS_TX1 = 0x2003F044;
     srioRegs->RIO_PLM[3].RIO_PLM_SP_LONG_CS_TX1 = 0x2003F044;

    2.  We are using 6 general purpose queues (from 916 to 921) as garbage collection queues.

    Thanks & Regards,

    Rajanikanth. 

  • < srioRegs->RIO_PLM[0].RIO_PLM_SP_LONG_CS_TX1 = 0x2003F044;
     srioRegs->RIO_PLM[1].RIO_PLM_SP_LONG_CS_TX1 = 0x2003F044;
     srioRegs->RIO_PLM[2].RIO_PLM_SP_LONG_CS_TX1 = 0x2003F044;
     srioRegs->RIO_PLM[3].RIO_PLM_SP_LONG_CS_TX1 = 0x2003F044;>

    This will clear the input/output error stopped states, but you should also clear the other error registers by doing the appropriate write 0 or 1 to those registers accordingly.

    Regards,

    Travis

  • Hi Travis,

    Thanks for the reply.

    I will check and clear the other error rigisters as well.


    Thanks & Regards,

    Rajanikanth.