Hi All
The table 1-5 of document TMS320C66x DSP cache User Guide give us a performance data of the L1D cache miss.
I wrote a program to test the miss pipeline.
First time, I put a 4K data buffer in DDR3, unsigned char buffer[0x1000];
for(i = 1; i < 0x1000; i++)
buffer[i] = buffer[i-1] + i & 0xfe;
Second, before this for loop, I use the touch function to achieve Parallel Read Miss.
touch(buffer,0x1000)
for(i = 1; i < 0x1000; i++)
buffer[i] = buffer[i-1] + i & 0xfe;
The execute cycles of the second time is biger than the first time.
I wanted to know how to use the cache miss pipeline to decrease the miss stall?
Thank you!