Hi.
RandyP: If you are out there, I desperately need your help with this one. Anyone else also welcome to assist... :
I am trying to copy a frame of (32 x 16384) unsigned 16-bit values from one location in external DDR to another using the 64x+ core of the C6474 DSP. However, naturally I am getting poor copying speed performance due to a lot of cache misses involved during access to L3 RAM. I have compared the results with that of the EDMA3, and currently the EDMA3 is performing 20 TIMES faster!! However, I am running into source and destination BIDX overflow issues with the EDMA3 when the size of the matrix is increased any further.
Default L1D cache size for the C6474 is 32kB. I have tried to disable the cache before the copy by setting the L1D cache size to 0 using the CSL CACHE_setL1dSize(CACHE_L1_0KCACHE), and restoring it afterwards back to 32k. I was hoping this would force the core to read-write directly to the DDR, bypassing the cache and alleviating the penalties of cache misses, and consequently performing faster.
However, I am always getting the same copying speed performance, regardless of the cache size setting, and pretty poor compared to the EDMA3.
I have 2x questions that I would like to get answers for:
- Am I following the right approach in trying to avoid cache misses, or is there some other way to achieve this?
- Will the EDMA3 ALWAYS have better performance to the core in this regard, and why?
Thank you for the help!
Estian.