This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

  • Resolved

RTOS/TMS320C6678: EDMA3 does not copy and runs slowly

Part Number: TMS320C6678

Tool/software: TI-RTOS

I made a small project using EDMA3_LLD. The project copies data from DDR3 memory to L2RAM memory. However, copying is not performed - the array in the memory L2SRAM is empty. Also, the test shows that the time to copy an array of 128 elements is about 2000 cycles! I use CCS 7.3 and EDMA LLD v.2.12.5.

What could be the problem?

Why such a low copy speed?

  • In reply to Michael Yurkov:

    Hi,

    Please post the Processor SDK RTOS version that you are using.

    Best Regards,
    Yordan

     


     Please make sure you read the forum guidelines first.

  • In reply to Yordan Kovachev:

    Hi,

    There are existing examples in EDMA LLD, check processors.wiki.ti.com/.../II_devices

    Q. What are the software building blocks: EDMA LLD, EDMA CSL, and StarterWare?
    Try the example to make sure it works by moving data from A to B, then adding your TSCL/TSCH profiling code.

    Besides, several RTOS driver examples use EDMA, such as PCIE and Hyperlink, they all use PCIE LLD. You can check how they move data.

    Regards, Eric
  • In reply to Yordan Kovachev:

    Hi, I use CCS v7.3 with Processor SDK RTOS v04.02.00.

  • In reply to Michael Yurkov:

    Have you tried any EDMA example mentioned in 01/02 post?

    Regards, Eric
  • In reply to lding:

    Hi, Eric.
    Thank you for reply!
    Yes, I tried EDMA examples. But these examples use A-synchronization, but I use AB-synchronization. My project is based on these examples, but for some reason it does not work. My question is precisely why copying is not happening? My second question is, why is copying so slow (about 15 clock cycles per count)? And yes, I conducted measurements by adding TSCL/TSCH profiling in my code.

  • In reply to Michael Yurkov:

    Hi,

    If the example is A-sync, you can change the OPT field to AB-sync. Firstly if the existing example worked as expected by moving the data? Then, how do you moving data? inside the chip or between two devices via a interface link PCIE or Hyperlink? If the address is L2, do you make it global address? For the TSCL/TSCH slowness, do you program the DSP main PLL to get the right time per CPU cycle?

    Regards, Eric
  • In reply to lding:

    Hello!

    1. Yes, the existing example worked as expected by moving the data.
    2. I move the data inside the chip.
    3. Yes, I forgot to make the address in L2 global. Now the data is copied correctly. Thank you, lding!
    4. I do the measurement of time in CPU cycles in the following way:
        // Intialize hardware timers
        TSCL = 0; TSCH = 0;
    
        // Compute the overhead of calling clock twice to get timing info
        t_start = _itoll(TSCH, TSCL);
        t_stop  = _itoll(TSCH, TSCL);
        t_overhead = t_stop - t_start;
    
        t_start = _itoll(TSCH, TSCL);
        EDMA_Result = EDMA3_DRV_enableTransfer (hEDMA, EDMA_chID, EDMA3_DRV_TRIG_MODE_MANUAL);
        EDMA_Result = EDMA3_DRV_waitAndClearTcc(hEDMA, EDMA_TCC);
        t_stop = _itoll(TSCH, TSCL);
        t_opt  = (t_stop - t_start) - t_overhead;

    At this time, 256 samples of floating-point data are copied over 1300 cycles, which is approximately 5 cycles per sample.

    Total copy speed is 4 (Byte) / 5 (cycles) * 1 GHz = 800 MB/s. In the document "Throughput Performance Guide for C66x KeyStone Devices" in Table 15 the speed 10664 MB/s is declared.

    Why do I get such a low EDMA speed?

  • In reply to Michael Yurkov:

    I made a project that copies 2x4096 samples in a floating-point format from one memory area to another and measures the copy speed. I run the code with the Blackhawk XDS560v2 Emulator on Evaluation Board TMDSEVM6678LE.
    With different combinations I got the following results:

    Source

    Destination

    Measured Speed, MB/s

    Speed in Document, MB/s

    DDR3

    MSMSRAM

    8000

    10664

    MSMSRAM

    DDR3

    9000

    10664

    DDR3

    L2SRAM

    4985

    10664

    L2SRAM

    DDR3

    5033

    10664

    I estimated the copy speed as follows: speed[MB/s] = 2*4096*sizeof(float)*1000[MHz]/t_opt.

    My project is EDMA3.zip

    Why are the results very different from the results in the document "Throughput Performance Guide for C66x KeyStone Devices"?

    Why does the speed depend on the direction of the copy?

  • In reply to Michael Yurkov:

    Hi,

    How many parallel EDMA transfer in place in your test? Using one EDMA channel is not enough, you need at least 3 EDMA transfer in parallel to achieve the speed in the document.

    What is your goal here? Try to duplicate the benchmark in the document or just try to get the best throughput for your 1-EDMA channel application? You can also do "-O3" optimization for the code.

    Regards, Eric

This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.