Hi,
I've wrote a C implementation of an algorithm on C6678, inputs and outputs are allocated in L1D (L1D cache is set to 0K), however, i'm having L1D stalls of 9000 cycles ! which represents 25% of the total number of cycles ..
Are we supposed to have stalls when using L1D ? or am i missing something ?
Compiler options : optimization level 3 ; optimize for speed 5
What could be the problem ? I wish someone could help ..
Thanks !