IDMA throughput

Simone Castrucci 82

Other Parts Discussed in Thread: OMAPL138

Hello,

I'm developing an application on OMAP L138 processor. I need to transfer data between L2 SRAM and L1 SRAM by using IDMA.

I would like to know which is the theoretical data throughput for block transfers (using IDMA):

1) from L2 SRAM to L1 SRAM.

2) from L1 SRAM to L2 SRAM

3) from L2 SRAM to L2 SRAM

4) from L1 SRAM to L1 SRAM

Thank you in advance,

Simone

over 15 years ago

0 Mukul Bhatnagar over 15 years ago

TI__Guru* 83965 points

Simone

I don't think we have actual throughput data in context of OMAPL138 IDMA. We should be able to provide some "theoretical throughput" in terms of words/cycle ( I need some time to find this).

I hope you are also aware of the general guidelines on when to use IDMA vs EDMA vs CPU, documented here

http://processors.wiki.ti.com/index.php/OMAP-L1x/C674x/AM1x_SoC_Level_Optimizations#EDMA3_vs_IDMA_vs_c674x_CPU

Regards

Mukul

0 Simone Castrucci 82 over 15 years ago in reply to Mukul Bhatnagar

Prodigy 30 points

Mukul,

thanks for the fast reply.

I'm aware of the general guidelines about using IDMA, EDMA or CPU for data transfers,

I would appreciate if you could give me some "theoretical throughput" of IDMA based data transfers from/to L1D SRAM to/from L2 SRAM.

I just made a benchmark of IDMA transfers on the real target (OMAP L138). I made sure that the CPU, the EDMA and the other masters remained in idle state during the transfers in order to avoid memory access conflicts.

Here are summarized the resulting throughputs:

1) block transfers of 1000 words (32-bit) from L2 to L1D --> 550 CPU cycles (i.e about 2 words/CPUcycle or 4 words/EMCcycle)

2) block transfers of 1000 words from L2 to L2 --> 1130 CPU cycles (i.e about 1 words/CPUcycle or 2 words/EMCcycle)

3) block transfers of 1000 words from L1D to L1D --> 1110 CPU cycles (i.e about 1 words/CPUcycle or 2 words/EMCcycle)

Are these results consistent with the theoretical ones?

Regards,

Simone

0 Mukul Bhatnagar over 15 years ago in reply to Simone Castrucci 82

TI__Guru* 83965 points

Hi Simone

Your numbers are very close to what is expected.

The "theoritical max throughput" based on bus width etc is 8 words/EMC cycle ( as it is a 256 bit bus internally, 1 word = 32 bit). Based on the architecture and analysis of the c674x core (as well as c64x+ cores) on the OMAPL138, the expected % utilization best case (100% being 8 words/EMC cycle) is roughly

L1D-> L2 ~ 62%

L2- -> L1D ~ 55%

L2 ->L2 / L1D --> L1D ~ 25%

So your 4 words/EMC cycle and 2 words/EMC cycle is correct. When source/destination is the same the utilization is lower as the memory ports are being dually used and not truly concurrent anymore.

Hope this helps.

Regards

Mukul

0 Simone Castrucci 82 over 15 years ago in reply to Mukul Bhatnagar

Prodigy 30 points

Hi Mukul,

thank you very much for your help.

Regards,

Simone

Processors

Processors forum

IDMA throughput