This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C66x Throughput performance guide [pdf file] + SRIO in focus [section 14 - SRIO Throughput]

Hi All,

        In the C66x Throughput guide, SRIO is dealt in following ways

A. Type 11 Messaging.

B. DirectIO.

doubts on the same: - 

1. Taking a case of 4096 bytes data transfer [Table 25 - DirectIO Write Throughput with 4 Lanes of 5 Gbps PHY1], the units is put as 40067 [Total time taken]. can someone please tell what does this unit stands [in Microseconds or nanoseconds] for ?. Also this cycles are of SRIO Peripheral or PKTDMA peripheral or DSP Cycles?

2. Does this Table 25 also takes into account the cycles consumed by QMSS/CPPI peripheral, since if i understand from SRIO hardware peripheral, data is fetched using PKTDMA to L2SRAM.

3. Taking the same case of 4096 bytes [Table 27] [Type 11 Message Passing Throughput with 4 Lanes of 5 Gbps PHY1], the units is put as Average Cycles [88997 cycles]. Whose cycles are we talking about SRIO or QMSS or DSP? [clock speed of DSP and SRIO are different, how is cycles consumed calculated then]?

4. Does this Table 27 also takes into account the cycles consumed by QMSS/CPPI peripheral, since if i understand from SRIO hardware peripheral, data is fetched using PKTDMA to L2SRAM.

5. If i look at performance figures, DirectIO takes 40067 (Time units) and Type 11 messaging takes (88997 cycles). Assuming cycle = Unit time, what is causing the Type 11 messaging to consume MORE THAN DOUBLE cycles as compared to DirectIO?

6. Adding to point 5, the underlying transport is same [explained below]

6.A:-   4096 bytes = 16 packets*256 bytes [each shared buffer carries max of 256 bytes and SRIO has 16 buffers like that]. The underlying transport remains the same [i.e. 16 buffers of 256 bytes each]. In other words, only one LSU programming is required to transfer 4096 bytes and also 1 QMSS/CPPI programming is required to transfer 4096 byes of data [i assume header is also included as part of the data, please correct me here], then why the cycles are more than doubled [please throw more light on the inner details on how the cycles are computed]. Is my understanding on cycles being doubled correct?

7. Figure 2.1 in SRIO user guide [.pdf file] shows no connection between qmss/cppi and MAU and LSU, how qmss/cppi comes in DirectIO. Earlier devices [like 6488] didn't had any QMSS/CPPI concept, MAU and LSU were connected to bus. In this example looks like, MAU and LSU are also connected to QMSS/CPPI through bus. I think the diagram needs to be updated.

Thanks

RC Reddy

  • Hi All,

            In the C66x Throughput guide, SRIO is dealt in following ways

    A. Type 11 Messaging.

    B. DirectIO.

    doubts on the same: - 

    1. Taking a case of 4096 bytes data transfer [Table 25 - DirectIO Write Throughput with 4 Lanes of 5 Gbps PHY1], the units is put as 40067 [Total time taken]. can someone please tell what does this unit stands [in Microseconds or nanoseconds] for ?. Also this cycles are of SRIO Peripheral or PKTDMA peripheral or DSP Cycles?

    It's the time in cycles which @ 1GHz is actual the time in ns since it's 1ns / cycle.

    2. Does this Table 25 also takes into account the cycles consumed by QMSS/CPPI peripheral, since if i understand from SRIO hardware peripheral, data is fetched using PKTDMA to L2SRAM.

    Yes, this is included.  Keep in mind 8 LSU's and 16 transactions per LSU, so that's going to include the time to fetch.

    3. Taking the same case of 4096 bytes [Table 27] [Type 11 Message Passing Throughput with 4 Lanes of 5 Gbps PHY1], the units is put as Average Cycles [88997 cycles]. Whose cycles are we talking about SRIO or QMSS or DSP? [clock speed of DSP and SRIO are different, how is cycles consumed calculated then]?

    CorePac's (DSP) cycles. Cycle's consumed are measured from start to finish of all transfers using the TSCL/H timer which ticks once per CorePac Cycle.

    4. Does this Table 27 also takes into account the cycles consumed by QMSS/CPPI peripheral, since if i understand from SRIO hardware peripheral, data is fetched using PKTDMA to L2SRAM.

    Yes, per response in #2 above.

    5. If i look at performance figures, DirectIO takes 40067 (Time units) and Type 11 messaging takes (88997 cycles). Assuming cycle = Unit time, what is causing the Type 11 messaging to consume MORE THAN DOUBLE cycles as compared to DirectIO?

    Overhead associated with different SRIO protocols.  Type 11 in general is used more for message passing/control type activities while DirectIO in general is used more for data transfer.

    6. Adding to point 5, the underlying transport is same [explained below]

    6.A:-   4096 bytes = 16 packets*256 bytes [each shared buffer carries max of 256 bytes and SRIO has 16 buffers like that]. The underlying transport remains the same [i.e. 16 buffers of 256 bytes each]. In other words, only one LSU programming is required to transfer 4096 bytes and also 1 QMSS/CPPI programming is required to transfer 4096 byes of data [i assume header is also included as part of the data, please correct me here], then why the cycles are more than doubled [please throw more light on the inner details on how the cycles are computed]. Is my understanding on cycles being doubled correct?

    7. Figure 2.1 in SRIO user guide [.pdf file] shows no connection between qmss/cppi and MAU and LSU, how qmss/cppi comes in DirectIO. Earlier devices [like 6488] didn't had any QMSS/CPPI concept, MAU and LSU were connected to bus. In this example looks like, MAU and LSU are also connected to QMSS/CPPI through bus. I think the diagram needs to be updated.

    Thanks

    RC Reddy

  • Hi Chad,

                   Thanks for reply. I am awaiting replies for point 6 and 7.

    Thanks

    RC Reddy