This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

How to use four EDMA3TCs to transfer datas concurrently



Hi,
  I am using EDMA3 to transfer datas from DDR2 to DDR2 on DM6467. I can use anyone of EDMA3TC to transfer one event datas successfully,and the speed is about 750MB/s.
  However,when I use four EDMA3TC by set the value of DMAQNUM to transfer 4 event datas concurrently,the speed is only about 300MB/s.I thought it should be about 3GB/s(750MB/s*4). 
 Attachment is my EDMA3 test project. I really want to know the reason.How How to use four EDMA3TCs to transfer datas concurrently?

emda3.rar
  • Four independent EDMA3TC can function in parallel as they are separate hardware blocks. However the rate is also dependent on the bandwidth of end points (Source memory and Destination memory). Also the 4 TCs, internally use the same bus.
    So if you are using the 4 TCs simulataneously with common end points, or via the same internal switch fabric, lower data rates are possible.

    Hope this helps.
    Regards
    Varada


  • Varada:
        Thank you for your reply.I will test the speed of different end points when using four EDMA3TCs.But I donnot kown the internal switch fabric you mentioned.
        Is there any possible that the way I used to test transfer completion is wrong?I poll for completion,waiting on the expected bits to be set in IPR.I set TCC(transfer complete code code) of event0 to be 0,TCC of enent1 to be 1,TCC of envet2 to be 2 and TCC of event3 to be 3.Then trigged event0-event3 transfers manually,then wait until that IPR is 0xf.
        You can see specific codes in my "EDMA3 test project" attachment.Wait for your reply.

    Best wishes

  • You are doing it correct. Waiting for IPR is the fine. I hope you see completions of all 4 TCC. Can you tell me what is speed of the DDR clock on your system ?

     In the meanwhile, the following is a good application note, for your reference.

    http://focus.ti.com/lit/an/spraaw4b/spraaw4b.pdf

    Hope this helps.

    Regards

    Varada

  • How are you determining the transfer rate values of 750MB/s and 300MB/s? How is this being physically measured? Does it mean that 750MBytes could be read and 750MBytes could be written, all in 1 second? Or that a total of 750MBytes could be transferred?

    One way to measure the time to run all of the transfers would be to use the C64x+ internal timer. You can insert the following in your C file to measure the time from DMA start to DMA finish:

    top of file said:
    #include <c6x.h>

    top of main() said:
        unsigned long long ullTimeStart, ullTimeDiff, ullTimeEnd;
       
        TSCL = 0;

    before EventStartReg write said:
        ullTimeStart = TSCL;
        ullTimeStart += (unsigned long long)TSCH<<32;

    after while !complete tests said:
        ullTimeEnd = TSCL;
        ullTimeEnd += (unsigned long long)TSCH<<32;
        ullTimeDiff = ullTimeEnd - ullTimeStart;

    This will give you ullTimeDiff which will be the number of DSP clock cycles for the DMA(s) to execute to completion.

    A suggestion to simplify code changes when trying to run 1 channel or 2-4 channels for this test, you can change the CCNT value to 0 for the channels that you do not want to actually run. They will still get submitted and will still generate interrupts (set IPR), but no actual data transfers will occur for those channels.

    Some points to consider:

    1. If you are running the DDR2 clock at 300 MHz for a data rate of 600 MHz, the absolute peak data transfer rate for bursts will be 600 MWords per second or 2.4 GBytes/sec.
    2. Concurrent operation does not mean that you can read two words from DDR at the same time. This is physically impossible. Concurrent operation means that you can read from DDR and write to Config space and read from L2 and write to EMIFA - all four of these can happen at the same time. In fact, the TCs can be reading and writing at the same time if there is enough activity requested and the memories/peripheral/endpoints are different, so 8 total bus transactions can occur at a time - 4 reads and 4 writes. But they have to take turns if they are all accessing the same endpoint, like DDR.
    3. How is your DDR configured? In particular, how many banks do you have. If it is 4, then you are going to be thrashing the bank selections by trying to access 8 banks at the same time with your test. Two (2) DMA channels may get a bit better performance than just one, when everything is going to/from DDR as in your case. But if the source and destination addresses are all in different banks then moving beyond the number of available banks will mean that some accesses will have to close one bank and open another; this will slow down the process.
    4. Try measuring the throughput for 1 TC, then for 2, then 3, then 4, and compare the results.