This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DMA Throughput

Hello,

Does anybody know the maximum bandwidth the DMA can achieve in moving data between RAM and a peripheral?  Consider there is nothing else in use in the system and only one DMA channel in operation.

DMA transactions are 8, 16, 32 or 64-bit long.  Does it take the same time to move one 8-bit chunk and 64-bit chunk?  My guess is that 8 to 32 bit are done at once and 64-bit would be done in two accesses because the CPU is 32-bit based.  So I would not expect performance increase from 32 to 64-bit (except for the fact that one 64-bit element cannot be interrupted while 2 elements of 32-bit can be).

Thanks,

Frederic P.

 

  • Hello Frederic,

    the bus system of the TMS570, except the peripheral bus, is 64-bit wide (including the CPU). If you have for example eight 8-bit values, 8-bit transfers will take 8 times longer than transferring the data with a single 64-bit transfer.

    For example if you do a CPU RAM to CPU RAM transfer you would greatly increase the performance if your data can be transferred in 64-bit chunks. If you transfer to a peripheral the peripheral bridge takes care of splitting up the 64-bit access into two 32-bit accesses, so the peripheral access takes longer, but you still get the benefit of only occupying the other bus system for less cycles due to the 64-bit access.

    The DMA has also the option to group data in order to reduce the bandwidth needed on the busses, so you could do for example a 64-bit read and it is split up into four 16-bit writes by the DMA.

    I hope this clarifies it a bit.

    Regards

    Frank

  • Hi Frank,

    It answers the second part of my question.

    As for the maximum achievable bandwidth, I am still in the dark.  Does the DMA transfert at one element per clock cycle (assuming source and destination buses can handle a whole element at once) ?  What are the frame and block overhead in terms of clock cycles? 

    I did not find any documentation (timing diagram) about that.

    Regards,

    Frederic P.

  • Hello Frederic,

    this is not easy to answer. It depends on various factors. For example the ratio between the different clocks (e.g. HCLK to VCLK), transfer from main memory to main memory, transfer from flash to main memory, transfer from main memory to peripheral, etc.

    In addition in a real system multiple bus masters are competing for the bus resources, although the bus design we are using tries to eliminate some of the bottlenecks in traditional designs.

    At the moment we do not have a good way to say how many cycles it is taking.

    Regards,

    Frank