This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Number of cycle consumed in Accessing DDR3 are too high on TMDXEVM6618LXE EVM

Hi,

I am trying to estimate the number of cycles consumed in memory access of DDR3. I am currently using TMDXEVM6618LXE EVM. The following code snippet is consuming approx 52000 cycles if both the buffers are in DDR3 whereas in Simulator its consuming only 1023 cycles

 

PHYbuffers is given the DATA Section of DDR3

#pragma DATA_SECTION (abc_test, "PHYbuffers");

static cplx16_t abc_test[1000];

 

#pragma DATA_SECTION (xyz_test, "PHYbuffers");

static cplx16_t xyz_test[1000];

main()

{

UInt32  t_start,t_stop,t_overhead,cold_cycle;

cplx16_t xyz_test[1000],abc_test[1000];

memset(xyz_test,0,sizeof(cplx16_t)*1000);

    TSCL = 0;

    t_start = TSCL;

    t_stop  = TSCL;

    t_overhead = t_stop - t_start;

    t_start = TSCL;

memcpy(abc_test,xyz_test,sizeof(cplx16_t)*1000);

    t_stop = TSCL;

    cold_cycle  = t_stop-t_start-t_overhead;

    System_printf("Total Cycles are : %d\n",cold_cycle);

}
The output for Simulator is :
Total Cycles are : 1023
The output of EVM is :
Total Cycles are : 52092
Are these results correct or i am doing some mistake in checking out the cycles?
Thanks,
Deepain Nayyar 

  • Hi Nayyar,

    The possible reason is that you select "Functional Simulator" in your Target Configuration File. In that case, the simulator doesn't count for the memory access latnecy.

    So you can try to select "Cycle Approximate Simulator", it should be closer to the emulator's result.

    Allen

  • I concur with Allen's comments.

    That said, I'm not sure why one would want to transfer a block of DDR -> DDR using the CPU.  This normally would be done w/ QDMA or EDMA, or if it's a common task possibly using the QMSS and descriptors - preset it by defining the descriptor and then drop it on the queue and the CPU keeps processing what it needs to.

    Best Regards,
    Chad

  • Thanks Chad for the suggestion !!


    I have a query regarding the QMSS and descriptors.

    We have initially allocated all the descriptors required for a co processor(say FFTC) in DDR3 and we were able to PUSH/POP the Descriptors  form FDQ in to the Input queue  multiple times.

    But when we move the descriptors to MSMC region it works for the first time but in the next run the FDQ dont have Descriptors i.e the PKTDMA doesn't push back the descriptors to FDQ after processing the data.We see the Headers of Descriptors are correct but the output from FFTC is incorrect in the first run itself.

    Can you suggest me what may be wrong?

    Thanks,

    Deepain Nayyar

  • Deepain,

    This issue could be related to caching. Do you have cache turned on? If so, have you verified that all cached data has been flushed from the cache before pushing the descriptors onto a the TX queue? A quick way to check this would be to disable the cache and try you test again. 

    To disable the cache, including these header CSL files:

    #include <ti/csl/csl_cache.h>

    #include <ti/csl/csl_cacheAux.h>

    Then, during the initialization of your program, call these CSL APIs to disable L1 and L2 cache:

    CACHE_setL1DSize(CACHE_L1_0KCACHE); 

    CACHE_setL2Size(CACHE_0KCACHE);

    Please let me know if disabling the cache resolves your issue.

    Regards,

    Derek

  • Derek,

    It worked by Disabling the Cache !!

    We were using the "Fftc_osalEndDescMemAccess()" but its empty so it was not flushing as expected :(.

    I have couple of questions:-

    1.There should be an API for flushing  cache like "FLUSH_CACHE" can you tell me about that.

    2.Is there any provision to see whether a buffer/descriptor is cached like the memory address translation ( say a buffer having address 0x0c0022a0 is cacheable then this address should be viewed in some other cache memory location say 0x008022a0 and we can see the address
    0x008022a being updated and the  0x0c0022a0 not updated).

    Thanks,
    Krishna

  • Krishna,

    We were using the "Fftc_osalEndDescMemAccess()" but its empty so it was not flushing as expected :(.

    You can update the Fftc_osalEndDescMemAccess() function to add code to flush the cache as described below. In some cases, code to automatically flush the L1D is provided in the _osalEndDescMemAccess() function, and sometimes it is not. In either case, it is expected that the programmer will customize this function as required for their application.

    In response to your questions:

    1.There should be an API for flushing  cache like "FLUSH_CACHE" can you tell me about that.

    There are a couple of options for clearing data from the cache. The csl_cacheAux.h file provide APIs for flushing the L1D and L2 cache. Do you know if you are using L1D and L2 cache? By default, L1D cache is turned on, and L2 cache is turned off. You would have to manually turn on the L2 cache. 

    Anyways, here are a many of APIs that you can use to write-back data from the cache. I have attached the csl_cacheAux.h file for your reference. I suggest that you take a look at this file and determine which API best suits your situation. The options range from writing back and invalidating the entire L2 and L1D cache, to simply writing back (but not invalidating) certain blocks of memory. Please take a look at the APIs in the attached file, and let me know if you have any questions.

     

    2.Is there any provision to see whether a buffer/descriptor is cached like the memory address translation ( say a buffer having address 0x0c0022a0 is cacheable then this address should be viewed in some other cache memory location say 0x008022a0 and we can see the address 0x008022a being updated and the  0x0c0022a0 not updated).

    I am not sure if there is a mechanism like what you described, but you should not really need to know whether the address is cached our not. Whenever the DSP accesses a block of memory, it will automatically be cached (provided that cache is enabled). So if the PKTDMA (or another master) and the DSP are both accessing the same memory, then once that memory location has been accessed by the DSP, you will typically want to writeback and invalidate whatever data has been cached.

    If you take a descriptor as an example you would

    a. DSP makes all updates required for the descriptor (e.g. set packet size, buffer size, link buffer, etc.). These updates will be stored in the cache.

    b. Writeback and invalidate those lines in the cache. For example, you could call void CACHE_invL1d (void* blockPtr, Uint32 byteCnt, CACHE_Wait  wait)

    c. Now that the data has been written back to memory, the descriptor can be pushed into the desired queue, and the PKTDMA will be able to access the correct data.

     

    I hope this helps. Let me know if you have any more questions, or if this resolves your issue.

    Regards,

    Derek

    ------------------------------------------------------------------------------------------------------------

    Please click the Verify Answer button on this post if it answers your question.


  • Derek,

    Now with a combination of FLUSH cache  and Invalidate cache its working.

    One more thing I want to clarify regarding viewing of memory as Shared memory or cache(correct me if I am Wrong) :-
    1.In CCS when we see the memory browser there is an option to select cache(L1D ,L1P and L2) So I am Assuming when we select  Cache we see the data in Cache memory and when we dont select cache then we see in shared memory.

    2.In short any Coprocessor will always read and write data from/to Shared memory and not from/to cache.

    Thanks,

    Krishna

  • Krishna,

    I am glad to hear that your code is working now. 

    In response to your questions:

    1.In CCS when we see the memory browser there is an option to select cache(L1D ,L1P and L2) So I am Assuming when we select  Cache we see the data in Cache memory and when we dont select cache then we see in shared memory.

    Your understanding is correct.

    2.In short any Coprocessor will always read and write data from/to Shared memory and not from/to cache.

    Yes you are correct; any coprocessor will always write/read data directly to/from the shared memory. This is actually not limited to only the coprocessors, and can be extended to the EDMA and other masters in the system. The DSPs are the only masters where caching will occur. All other masters that access memory will not cause caching to occur.

    Regards,

    Derek