This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

EDMA3 Performance in different memories



Hi everybody,

as edma is a good tool to design fast algorithms, I made some basic testing on C6657. I configured DMA to copy an array(64kB) and got some interesting timing results.

Can anybody verify theese measurements, not exactly but in general.

  • DDR3 to L2 ~ 100us
  • DDR3 to MSMC ~ 27us
  • L2 to L2 ~ 218us
  • DDR3 to DDR3 ~70us

What I'm wondering about is, why the access to the faster memories as L2 is slower then the ones to MSMC or DDR3.

 

best regards

Pay Gießelmann

 

  • All those numbers look bad, even the DDR3 to MSMC number.  Please see the Throughput Performance Guide for C66x Devices App Note for performance numbers.

    Best Regards,
    Chad

  • Thank you Chad,

    so, what might I do wrong, in the case of MSMC to DDR3 I get only about 1/4 (2500MB/s) of what I could expect? I configured one DMA channel of EDMA instance 2 on C6657 as AB synchronized.

    Next step, I tried out right now is to configure 2 DMA channels to copy one array (upper and lower half) this gives me the double throughput. Is this the right way to go to maximize data throughput?

    -edit-

    I tried out the effect of the RDRATE Register in the Transfer Controller to accelerate the transfer: Setting it to 0 and 1 results in the same timing i.e. i have at least four cycles latency in every read request.. The use of the right RDRATE Register could be proved by setting it to 0x2 which caused time to increase from 25us to 28us. Setting it to 0x3 results in about 53us.

    best regards,

    Pay Gießelmann

  • Even the slow EDMA1 and EDMA2 which are 128bit & CPU/3 should be able to handle it with one channel, but you could try EDMA0 which is 256bit wide and CPU/2 to make sure.

    I'm not sure if you have something potentially poor with your EDMA setup or something else either.  Can you dump the PaRAM values you're using.  Also, how are you measuring this?

    Best Regards,

    Chad

  • Hi,

    this is the parameter set I'm using:

    • a_edmaParameter.m_edmaITCCHEN = 0; // Intermediate transfer completion chaining enable
    • a_edmaParameter.m_edmaTCCHEN = 0; // Transfer complete chaining enable
    • a_edmaParameter.m_edmaITCINTEN = 0; // Intermediate transfer completion interrupt enable
    • a_edmaParameter.m_edmaTCINTEN = 1; // Transfer complete interrupt enable
    • a_edmaParameter.m_edmaTCC = 0; // Transfer complete code
    • a_edmaParameter.m_edmaTCCMODE = 0; // Transfer complete code mode 0: normal completion 1:early completion
    • a_edmaParameter.m_edmaFWID = 0x5; // FIFO width 0 - 5h: 8, 16, 32, 64, 128, 256 bit
    • a_edmaParameter.m_edmaSTATIC = 1; // Static set
    • a_edmaParameter.m_edmaSYNCDIM = 1; // Transfer synchronization dimension 0:A 1:AB
    • a_edmaParameter.m_edmaDAM = 0; // Destination address mode
    • a_edmaParameter.m_edmaSAM = 0; // Source address mode
    • a_edmaParameter.m_edmaSRC = (Uint32)t_source; // Source address
    • a_edmaParameter.m_edmaACNT = 64; // Count 1st dimension
    • a_edmaParameter.m_edmaBCNT = 1024; // Count 2nd dimension
    • a_edmaParameter.m_edmaDST = (Uint32)t_destination; // Destination address
    • a_edmaParameter.m_edmaDSTBIDX = 64; // Destination BCNT index
    • a_edmaParameter.m_edmaSRCBIDX = 64; // Source BCNT index
    • a_edmaParameter.m_edmaBCNTRLD = 0; // BCNT reload
    • a_edmaParameter.m_edmaLINK = 0xFFFFF; // Link address
    • a_edmaParameter.m_edmaDSTCIDX = 0; // Destination CCNT index
    • a_edmaParameter.m_edmaSRCCIDX = 0; // Source CCNT index
    • a_edmaParameter.m_CCNT = 1; // Count 3rd dimension

    Use of EDMA0 is not supported in C6657. I measure execution time with the Timestamp_get32() in SYS/BIOS. There might be some overhead caused by SYS/BIOS, but if I compare the timings using 1,2 or 4 channels (27 us, 13 us, 7 us) there might be something wrong in the configuration.

    What I'm wondering about while writing this is, that I map all channels to one queue i.e. they should be executed by the same TC.

    thank you for your help,

    best regards

    Pay Gießelmann

  • What are you using for the 2 and 4 channel setups?

    Can you dump what values you're getting back from the timestamps?

    What's the DDR speed you're running at?  What's the SYSCLK speed you're running at?

    Can you dump the PaRAM values from the memory window when it's setup?

    Best Regards,
    Chad

  • In 2 and 4 channel setup I use almost the same parameter set, the A-dimensions stays, the B-dimension is 1/2 (1/4) of whole data block. Start address for each channel is base (base + 1/4, base + 1/2, base + 3/4).

    DDR speed is 1333 MHz, SYSCLK I haven't set up yet, I'm not shure about it's reset value, but even if it was 1,25GHz, the timings are too slow.

    memory content for single direction setup: Base address: 0x02744000

    0x8010050C 0x80000000 0x04000040 0x0C000000 0x00400040 0x00004000 0x00000000 0x00000001

    Example timestamp is below, the cyclecount after "Time:" is token from software trigger to ISR, the overhead is from ISR (which posts event) to prozess.

    Time: 25667 Overhead: 1876

    Then I made a mistake in my program, output for the 4-channel setup is:

    Time: 7279 Overhead: 18030

    To explain: I wait for the interrupt of the first channel, then go to my handler task and poll the interrupt pending register to show me 0xF (for all channels complete). So what confused me in my older post, that I use the same queue for all transfers is right and I do NOT get better timings with this setup.

    Next step I corrected the queue setup, now each of the four channels has its own queue i.e. its own transfer controller.

    Time: 21556 Overhead: 3772

    That's again almost the same, what means that my first result was wrong, I don't get better timings using more channels. It's the EVM from D.SignT, I will ask the distributor for some benchmark on their board and stop my study here.

    Thank you very much for your help, I will post it here when I have new results.

    best regards

    Pay Gießelmann