Hi, My name is peterson, I have a basic question about how processor access the memory elements in cache-enabled versus cache disabled scenarios.
In spru656a, Chapter-1, Page 1-4, the C6000 DSP Cache user guide, the cache-hit/miss scenario is described.
It tells that when cache is enabled, the DSP brings the off-chip slower memory data into the faster on-chip L2 or L1 Cache based on principle of locality. It gives us the faster access to our required data.
However when we see or review the assemly (.asm) file generated by the code composer studio against our C/C++ code (using -k option), we see that all memory reads/writes through LDW, LDDW, STW, LDBU etc, still take 4-cycles to complete.
Should not they have had taken lesser cycles for reading the required data ?, since its in on-chip memory now.
Even more, if the cache is not enebled, these memory accesses still take 4-cycles to complete. Similarly when we write our own assembly code, we always have to wait for the 4-cycles, before our required data is available in the desired CPU register.
Then what makes it different between cached versus non-cached memory accesses, because cache-enabled applications are very fast and deliver more performance than non-cached application.
Whats the magic behid it ?