This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MRd Vs. MWr in PCIe

Hi,

In “Throughput Performance Guide for C66x KeyStone Devices” [section 12 PCIe] it is mentioned that memory Read is more performant than Memory Write suing EDMA!

So  using EDMA (MSMC to MSMC), it seems really that MRd is more performant than MWr!!! So why this difference, Normally I think that MWr should be more performant [posted] ???

  • Delared,

    The Performance Guide has two columns for the PCIe throughput: 64byte payload size and 128byte payload size.

    In 64B case, the memory write and read performance are very similar, and actually Write is a little better than Read.

    In 128B case, because there are some overhead introduced by the bridge inside the device for PCIe memory write, the Write performance has some degradation comparing to the memory Read. We showed the actual throughput we could get in the device in the Performance Guide which may match your observation as well.

    So what you have measured should be correct that, with 128B payload size, memory Read performance is better than memory Write for PCIe.

  • Steven,

    This bridge may have an effect on the EDMA (especially EDMA0) transfer??

    Because I got different results Using EDMA0 (by default DBS=128) and EDMA1/EDMA2 (DBS=128):

    - Data size<1KB: EDMA0[up to 4.16 Gbps] is more performant than EDMA1/EDMA2 [up to 4.16 Gbps]

    - Data size>1KB: EDMA1/EDMA2 [5.53 Gbps] are more performant than EDMA0[5.17 Gbps]

    So why this difference ???

  • Delared,

    I think it is another topic for the EDMA difference.

    PCIe slave port and EDMA1/2 are connected to TeraNet 3_A (figure 4-2 in C6678 data manual). While EDMA0 is connected to TeraNet 2_A (figure 4-1 in C6678 data manual)

    So EDMA0 access to PCIe slave port will have longer path than EDMA1/2. It might explain the throughput difference you observed (not sure the difference you listed for data size<1KB).

    But the bridge overhead for the PCIe Write/Read performance is not specific to any EDMA, which is different.

  • Steven,

    Sorry, i would say for :  Data size<1KB: EDMA0[up to 4.28 Gbps] is more performant than EDMA1/EDMA2 [up to 4.16 Gbps]

    perhaps "EDMA0 access to PCIe slave port will have longer path than EDMA1/2 !"

    but i don't observe this limitation in memory Read transaction [i obtained the same result with both EDM0 and EDMA1/EDMA2]

  • Delared,

    I am not sure if the delta is related to throughput measurement or other things.

    But the delta is not very significant (~3% variation when data size<1KB and ~7% when data size>1KB).

    And basically the PCIe and EDMA setup is pretty straightforward. Use x2 lanes, GEN2 speed PCIe and EDMA with DBS=128 will give you the best performance.

    You can select either EDMA 0/1/2 for your application. I think the throughput you achieve here is pretty close to the testing results we have in the throughput performance user guide. And the key point is if the throughput performance you achieve in your testing could satisfy your application.

     

  • Ok ,

    As you said the delta is not very significant, and i hope that the throughput performance we achieve in our testing will satisfy our applications.

    thank you very much !