This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320C6678: SRIO submit multiple outstanding requests by multicore for posted WRITE operations

Part Number: TMS320C6678

Our customer wants to know how multicore can submit multiple outstanding requests of posted WRITE operations. It seems to be not supported by SRIO hardware that multicore submits multiple outstanding requests by using eight LSU register sets.

Serial RapidIO (SRIO) for KeyStone Devices User's Guide (Rev. B)
http://www.ti.com/lit/ug/sprugw1b/sprugw1b.pdf

(Page 57)

"There are eight LSU register sets. This allows eight outstanding requests for all
 transaction types that require a response (i.e. non-posted). For multicore devices,
 software manages the usage of the registers. A shared configuration bus (VBUSP
 interface) is used to access all register sets. A single core device can utilize all eight LSU
 blocks."

2.3.2.2.1 WRITE Transactions (Page 61)

"For posted WRITE operations that do not require a RapidIO response packet, a core
 may submit multiple outstanding requests. For instance, a single core may have many
 streaming write packets buffered at any given time, given out-going resources. In this
 application, the LSU can be released to the shadow registers as soon as the packet is
 written into the shared TX buffer pool. If the request has been flow controlled, the
 peripheral will set the completion code status register and appropriate interrupt bit of
 the ICSR. The control/command registers can be released after the interrupt service
 routine completes."

Our customer often sees that the output port receives a packet-retry control symbol after multicore submits multiple outstanding requests of posted WRITE operations by using one LSU register set for each core.

Should the customer change the current method that multicore submits multiple outstanding requests of posted WRITE operations?

If so, how can it be implemented?

Best regards,

Daisuke

  • The team is notified. They will post their feedback directly here.

    BR
    Tsvetolin Shulev
  • Hi Daisuke,

    I am looking into this.

    Best Regards,
    Yordan
  • Hi Yordan-san and Tsvetolin-san,

    Thank you for your reply.

    Our customer measured the internal interrupt timing on the C6678 side and the interrupt timing on the destination device side, in posted WRITE operations that multicore submits multiple outstanding requests by using one LSU register set for each core.

    http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/664134/2452847#2452847

    More than 1,000 systems of customer product using multiple C6678 already have been used by end customers. In addition, the system of new product using multiple K2H and C6678 already have been tested. So, they are very concerned whether the current use of SRIO is correct or not.

    Best regards,

    Daisuke

  • Hi Yordan-san,

    Do you have any update on this issue?

    The current method of our customer is as follows:

    - Using one LSU register set for each core

    - Four cores submits multiple outstanding requests of NWRITE (FType=5, TType=4) to one destination device.

    The following occurs as results:

    - The output port receives a packet-retry control symbol

    - The internal interrupt on the C6678 side occurs after less than 20usec after an operation for the requests is complete.

    - The interrupt on the destination device side occurs after approximately 400usec after an operation for the requests is complete.

    - All packets reach the destination device side and the transfer operation is complete.

    All packets reach the destination device, but it is very time consuming.

    Is it allowed that multicore submits multiple outstanding requests of posted WRITE operations by using more than one LSU register set?

    Best regards,

    Daisuke

  • Hi Yordan-san,

    Daisuke Maeda said:

    Is it allowed that multicore submits multiple outstanding requests of posted WRITE operations by using more than one LSU register set?

    Should multicore submit multiple outstanding requests of posted WRITE operations by sharing only one LSU register set?

    Please give me an answer as soon as possible. Your prompt reply would be appreciated. Sorry for taking your precious time.

    Best regards,

    Daisuke

  • Daisuke,

    >Is it allowed that multicore submits multiple outstanding requests of posted WRITE operations by using more than one LSU register set?
    As you posted, on page 57, "A single core device can utilize all eight LSU blocks." so it should have no problem to use more than one LSU register set.

    >Should multicore submit multiple outstanding requests of posted WRITE operations by sharing only one LSU register set?
    You probably meant one LSU register set for each core.
    From page 67, "However if multiple cores are using the LSU, the LSU will be interrupting all the cores even though the completion has been for a single core only. Therefore it is important to distinguish between the interrupts per core versus LSU. Because the EDMA is dedicated to an LSU, information-per-LSU is also needed."

    The current method of our customer:
    - Using one LSU register set for each core
    is as recommended.

    How does the 16 shadow registers are used for LSU0-3, is it 4-4-4-4? And what is the destination device, is it also C6678 or FPGA?

    Regards,
    Garrett
  • Hi Garrett-san,

    Thank you for your reply.

    Garrett Ding said:

    >Is it allowed that multicore submits multiple outstanding requests of posted WRITE operations by using more than one LSU register set?
    As you posted, on page 57, "A single core device can utilize all eight LSU blocks." so it should have no problem to use more than one LSU register set.

    Could you please take a look at my first post again?

    The SRIO User's Guide describes:

     - There are eight LSU register sets. This allows eight outstanding requests for all transaction types that require a response (i.e. non-posted) .

     - For posted WRITE operations that do not require a RapidIO response packet, a core may submit multiple outstanding requests.
       For instance, a single core may have many streaming write packets buffered at any given time, given out-going resources.

    However, the SRIO User's Guide does not mention the following:

     - The eight LSU register sets allow eight outstanding requests for posted WRITE operations that do not require a response.

     - For posted WRITE operations that do not require a response, multiple cores may submit multiple outstanding requests.
       For instance, each of multiple cores may have many streaming write packets buffered at any given time, given out-going resources.

    Is it really allowed that multiple cores submit multiple outstanding requests of posted WRITE operations by using more than one LSU register set?

    Garrett Ding said:

    >Should multicore submit multiple outstanding requests of posted WRITE operations by sharing only one LSU register set?
    You probably meant one LSU register set for each core.
    From page 67, "However if multiple cores are using the LSU, the LSU will be interrupting all the cores even though the completion has been for a single core only. Therefore it is important to distinguish between the interrupts per core versus LSU. Because the EDMA is dedicated to an LSU, information-per-LSU is also needed."

    No, I meant one LSU register set for multiple cores.

    If the current method of our customer is not allowed, it may be changed to what multiple cores share a single LSU register set for posted WRITE operations.

    Garrett Ding said:

    How does the 16 shadow registers are used for LSU0-3, is it 4-4-4-4? And what is the destination device, is it also C6678 or FPGA?

    The shadow register combinations are:

     LSU0/LSU4 9
     LSU1/LSU5 3
     LSU2/LSU6 2
     LSU3/LSU7 2

    The destination device is FPGA. It is connected via switches.

    Best regards,

    Daisuke

  • Hi Daisuke,

    I discussed the thread with Travis, some feedback:

    ----------------

    How do you exactly measure the latency (400us)?  Is it per packet really, or for the whole multi-packet transfer?  When do you start/stop the timer?

    We don’t think 1 versus 4 LSUs will matter depending on how you are measuring it.  The peripheral will send the packets out as soon as possible.  If using 1 LSU everything is sequential, and if using 4 LSUs the packets are interleaved with a round robin scheduler.  The fact that you are getting packet-retry control symbols means that the FPGA inbound handler is not able to keep up.  You can maybe get rid of the retries by reducing the transaction size per LSU programming, i.e. instead of sending 1KB per LSU programming, send 256B with 4 LSU programmings.

    There are no limitations when it comes to NWRITEs (since there are no responses), cores can initiate as many multi-packet transactions as the shadow registers will allow and there shouldn’t be any restriction on the number of LSUs used.  It becomes trickier using multiple LSUs with one core if the priorities aren’t different per LSU, because you can’t guarantee in-order.  Also, the core has to be able to handle and distinguish multiple LSU completion code/error handling etc.

    -----------------

    Regards,

    Garrett