This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

EDMA transfer from MSMC to DDR throughput

Hello,

I use EDMA 0 to do transfer from MSMC to DDR.

Data transferred = 4 MB

Cycles number = 263953

Throughput measure = 14.80 GB/s !!!

I used the code provided by TI in "KeyStone DDR3 Initialization" "sprabl2a" to initialize DDR3.

I'm really confused because the bandwith of DDR3 is about 10 GB/s

  • What was the DDR Configured to operate at?

    What is the Core running at?

    What clock source did you use? Note that the 64bit timers are 1 tick for every 6 core clock cycles (they run off of SYSCLK/6)

    Best Regards,
    Chad

  • Hi Chad,

    Maybe there was a problem in the Evaluation module because i change it and it gives me a result in range which is about 9GB/s.

    I have an additional question please,

    Could the DIP switches on the EVM (especially BOOT MODE) affect the throughput measurement ?

    Sincerly

    Youssef

  • Yes, the dip switches that cover the bootmod, could put it in a bootmode that uses 3 of the Bootmode pins as information to state what the input clock is, and then the PLL's get programmed based on this value and you could be underclocking the DSP in relation to the DDR.  That said, all my questions above still stand.

    Best Regards,

    Chad

  • -  DDR3 PLL is programmed to generate a 666.67MHz clock (for DDR3-1333 operation).

    - The core is running at default frequence 1.0 Ghz

    - I use internal clock of the EVM 

    I obtained different results between Core0 and the other Cores(1 to 7) !!

    Thanks Chad

    Best regards

    Youssef

  • Youssef,

    The EVM doesn't have an internal clock. 

    Are you using TSCL/TSCH or Timer64 timers? 

    Do you have a code snippet of what you're doing to make these measurements?

    What are the different results between the different cores?  And are they being kicked off together?

    Best Regards,
    Chad

  • Hi Chad,

    - I use TSCL/TSCH.

    - Here is the part of the code where i measure performances :

    /* Trigger channel */

    CSL_edma3HwChannelControl(hChannel,CSL_EDMA3_CMD_CHANNEL_SET,NULL);

    start_V = _CSL_tscRead ();

    /* Poll on IPR bit 0 */
    do {
    CSL_edma3GetHwStatus(hModule,CSL_EDMA3_QUERY_INTRPEND,&regionIntr);
    } while (!(regionIntr.intr & 0x1));

    final_V = _CSL_tscRead ();

    Before starting the EDMA transfer i enable CSL_TSC in main function :

    void main (void)
    {

    cacheinit();
    init_DDR();
    _CSL_tscEnable();

    edma_MSMC_to_DDR(instNum, chanNum);

    Cycles_N = final_V - start_V;

    printf("%lld\n",Cycles_N);

    return;
    }


    - For resuts : 

    I transferred 32 KB from MSMC to DDR

    Using core0 i got 3329 Cycles => 9.17 GB/s

    Using core1 i got 4307 Cycles => 7.38 GB/s

    The methode that i used to calculate the transfer rate is :

    T.R = (Dtat_size * core_frequency)/(1024^3 * Cycles_number)

    - I noticed that the code is executed faster on core0 than the others.

    Thanks Chad and sorry for the long post.

    Best regards

    Youssef


  • Thanks, do you know what EDMACC and TC you're using for each of these cores? 

    Please note that the EDMA transfers are independent of the core's that spawn them.

    Best Regards,
    Chad

  • - I use EDMACC0 and TC0 to have a burst size of 128 B, the result is mentioned above.

    - I used also EDMACC1 and EDMACC2 with TC0 to have 128 B busrt size for both instances, they have less performances than EDMACC0 but the difference between the cores still persist.

    Thanks Chad.

    Best regards

    Youssef

  • Youssef,

    There must be some difference between your Core0 setup of the transfer and your Core1 setup of the transfer.  As I mentioned the EDMA is completely outside of the core and no information regarding which core triggers an EDMA transfer is used, and also your data is outside of core.  The only thing the core's do here is program the PaRAM's and record the TSCL/TSCH values and wait for a return.

    Best Regards,

    Chad