This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C66x L2 cache access performance in K2HEVM

Guru 10570 points


Hello,
In my understanding, L2 cache enabling makes faster than disable configuration for DDR3 write access.
But C66x L2 cache enabling is slower in my test case in K2HEVM.

Question)

 - L2 cache is slow in C66x. Is that right?

 - Do you have any workaround?

Our test case)

unsigned char* pDst = (unsigned char*)(0x10880000);
unsigned char* pSrc = (unsigned char*)(0x80000000);

memset(pSrc, value, 512*1024);
CACHE_wbInvL2Wait();

memcpy(pDst, pSrc, 512*1024);
CACHE_wbInvL2Wait();

Other information)

Also I am referring document, it describes C6678 Memory Access Performance.
In Figure6, STDW enabled L2 Cache access is slower than Noncacheable access.(P16-17)


Best regards, RY

  • RY,

    One first comment on your test. It really should not overwrite the L2 cache space, which it is since the 32KB of L2 cache is at the end of the L2SRAM space, and the code is writing the last half of the L2 space (which includes the last 32KB which has been set as cache.)

    A more realistic test would be 1.) Set L2 Cache to be 0 or 256KB (for non-cache/cache scenerios) 2.) Use only 256KB of data 3.) Writeback and Invalidate L2 before performing a memset, which is only to set the DDR3 memory space, and not as a performance test 4.) Read via CPU, modify, and writeback to DDR3. This is more realistic of a memory access operation where the data if cache would eventually get written back - otherwise you'd DMA the data into a local space instead of direct reads. 5.) In the case of cache enabled version, perform an Writeback Invalidate at the end.

    Now if you look at these numbers, I'm sure it will tell a much different story, but one that is more realistic in a system.

    Here in comes the benefit of the cache, if you're accessing it over and over, which is very common in DSP applications, you don't have to go back out to external memories to get the data. And another point, often the data is never written back out. It's only if the data has been modified to be written back out does that data get written back out during normal caching operations. Note that if another cores using pieces of those data items, then those (and only those) need to be written back and invalidated so that the coherence between cores would be maintained.

    The STDW data in that App Note for the cacheable data makes the assumption that it is #1 resident in cache, has been modified (or a writeback order has been given), and that it is being written back out. 

    Best Regards,
    Chad