the question is about cache on dm6446,
the same code, Time-consuming is 2ms run on dm642 platform,but 6-7ms on dm6446.
//example code
int len = 352*288;
char * restrict img1 = malloc(len);
char * restrict img2 = malloc(len);
char * restrict img3 = malloc(len);
char * restrict img4 = malloc(len);
char * restrict img5 = malloc(len);
char * restrict img6 = malloc(len);
char * restrict img7 = malloc(len);
char * restrict img8 = malloc(len);
for(int i=0;i<len;i+=4)
{
/*****
some code
******/
img1 +=4;
img2 +=4;
img3 +=4;
img4 +=4;
img5 +=4;
img6 +=4;
img7 +=4;
img8 +=4;
}
//I check the asm code, the software pipeline is OK.
Reference to spru862b.pdf, I think there are many cache miss caused by "Capacity Misses".
when L1miss occur,it will read a line of 128byte from L2 cache ,and L2 cache read the line from DDR.
I changed the L1D and L2 to smaller size. the Time-consuming didn't change.
I do not know how to use the bigger L2 cache. If every time the L1 read miss occur, the data
it needed is on L2cache. It should be more effective.
but,I don't know how to achieve it.
because, there are many buffer need to be deal,I don't think ping-pong buffer is a good suggestion.
somebody can help me, or there is a sample code for this kind of problem.
I only want it run fast on dm6446 like on dm642.