This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

IDMA throughput

Other Parts Discussed in Thread: OMAPL138

Hello,

I'm developing an application on OMAP L138 processor. I need to transfer data between L2 SRAM and L1 SRAM by using IDMA.

I would like to know which is the theoretical data throughput for block transfers (using IDMA):

1) from L2 SRAM to L1 SRAM.

2) from L1 SRAM to L2 SRAM

3) from L2 SRAM to L2 SRAM

4) from L1 SRAM to L1 SRAM

 

Thank you in advance,

Simone

 

  • Simone

    I don't think we have actual throughput data in context of OMAPL138 IDMA. We should be able to provide some "theoretical throughput" in terms of words/cycle ( I need some time to find this).

    I hope you are also aware of the general guidelines on when to use IDMA vs EDMA vs CPU, documented here

    http://processors.wiki.ti.com/index.php/OMAP-L1x/C674x/AM1x_SoC_Level_Optimizations#EDMA3_vs_IDMA_vs_c674x_CPU

    Regards

    Mukul

  • Mukul,

    thanks for the fast reply.

    I'm aware of the general guidelines about using IDMA, EDMA or CPU for data transfers,

    I would appreciate if you could give me some "theoretical throughput" of IDMA based data transfers from/to L1D SRAM to/from L2 SRAM.

     

    I just made a benchmark of IDMA transfers on the real target (OMAP L138). I made sure that the CPU, the EDMA and the other masters remained in idle state during the transfers in order to avoid memory access conflicts.

    Here are summarized the resulting throughputs:

    1) block transfers of 1000 words (32-bit) from L2 to L1D --> 550 CPU cycles (i.e about 2 words/CPUcycle or 4 words/EMCcycle)

    2) block transfers of 1000 words from L2 to L2 --> 1130 CPU cycles (i.e about 1 words/CPUcycle or 2 words/EMCcycle)

    3) block transfers of 1000 words from L1D to L1D --> 1110 CPU cycles (i.e about 1 words/CPUcycle or 2 words/EMCcycle)

     

    Are these results consistent with the theoretical ones?

     

    Regards,

    Simone

     

  • Hi Simone

    Your numbers are very close to what is expected.

    The "theoritical max throughput" based on bus width etc is 8 words/EMC cycle ( as it is a 256 bit bus internally, 1 word = 32 bit). Based on the architecture and analysis of the c674x core (as well as c64x+ cores) on the OMAPL138, the expected % utilization best case (100% being 8 words/EMC cycle) is roughly

    L1D-> L2 ~ 62%

    L2- -> L1D ~ 55%

    L2 ->L2 / L1D --> L1D ~ 25%

    So your 4 words/EMC cycle and 2 words/EMC cycle is correct. When source/destination is the same the utilization is lower as the memory ports are being dually used and not truly concurrent anymore.

    Hope this helps.

    Regards

    Mukul

  • Hi Mukul,

    thank you very much for your help.

     

    Regards,

    Simone