cache Optimization on dm6446？？？？？

balanceren

the question is about cache on dm6446,
the same code, Time-consuming is 2ms run on dm642 platform,but 6-7ms on dm6446.
//example code
int len = 352*288;
char * restrict img1 = malloc(len);
char * restrict img2 = malloc(len);
char * restrict img3 = malloc(len);
char * restrict img4 = malloc(len);
char * restrict img5 = malloc(len);
char * restrict img6 = malloc(len);
char * restrict img7 = malloc(len);
char * restrict img8 = malloc(len);

for(int i=0;i<len;i+=4)
{
/*****
some code

******/
img1 +=4;
img2 +=4;
img3 +=4;
img4 +=4;
img5 +=4;
img6 +=4;
img7 +=4;
img8 +=4;
}

//I check the asm code, the software pipeline is OK.
Reference to spru862b.pdf, I think there are many cache miss caused by "Capacity Misses".
when L1miss occur,it will read a line of 128byte from L2 cache ,and L2 cache read the line from DDR.

I changed the L1D and L2 to smaller size. the Time-consuming didn't change.

I do not know how to use the bigger L2 cache. If every time the L1 read miss occur, the data

it needed is on L2cache. It should be more effective.

but,I don't know how to achieve it.
because, there are many buffer need to be deal,I don't think ping-pong buffer is a good suggestion.
somebody can help me, or there is a sample code for this kind of problem.
I only want it run fast on dm6446 like on dm642.

over 13 years ago

0 Cesar over 13 years ago

TI__Guru*** 138329 points

The DM642 has 256K L2 Memory/Cache whereas the DM6446 has 64K L2 Memory/Cache.

Memory intensive algorithms will be slower on DM6446.

Please check if you can organize your data structures in order to reduce the capacity misses

0 balanceren over 13 years ago in reply to Cesar

Prodigy 240 points

thanks Cesar.

I changed the L2 cache from 64k to 32k, the time-consuming didn't change, so I think the cache size is not the key.

I don't have a more effective utilization in L2 cache.

when a L1 read miss occur, the data is not exist in L2 cache too, so L2 read miss occur.

if when we load a 128byte line to L1 cache, we load more than 128bytes length data to L2 cache. we can reduce the L2 read miss.

Is it a basic property for L2 cache we can set?

0 Cesar over 13 years ago in reply to balanceren

TI__Guru*** 138329 points

The L1D Cache line size is 64 bytes and the L2 line size is 128 bytes. These values are fixed and can not be set. L2 cache is 4 way set associative.

If possible, please modify your application optimize the cache usage.

0 balanceren over 13 years ago in reply to Cesar

Prodigy 240 points

thanks Cesar

I optimize my code like this, and do a simple test.

eight images are stored in this way,

|---64bytes img0------||---64bytes img1------||---64bytes img2------||---64bytes img3------|
ptotal = (unsigned char *)memalign(352*288*8,128);

void testspeed(unsigned char * _pbgTotal)
{
uchar * restrict pimg0 = _pbgTotal;
uchar * restrict pimg1 = pimg0+64;
uchar * restrict pimg2 = pimg0+64*2;
uchar * restrict pimg3 = pimg0+64*3;

uchar * restrict pimg4 = pimg0+64*4;
uchar * restrict pimg5 = pimg0+64*5;
uchar * restrict pimg6 = pimg0+64*6;
uchar * restrict pimg7 = pimg0+64*7;

int i,j;
int len = 352*288;
int steplen = 64;
int deltastep = 64*7;
int addone = 1<<24 | 1<<16 | 1<<8 | 1;

for(j=0;j<(len/steplen);j++)
{
  for(i = 0;i<steplen;i+=4)
  {
   _amem4(pimg0) = _add4(_amem4_const(pimg0),addone);
   _amem4(pimg1) = _add4(_amem4_const(pimg1),addone);
   _amem4(pimg2) = _add4(_amem4_const(pimg2),addone);
   _amem4(pimg3) = _add4(_amem4_const(pimg3),addone);

   _amem4(pimg4) = _add4(_amem4_const(pimg4),addone);
   _amem4(pimg5) = _add4(_amem4_const(pimg5),addone);
   _amem4(pimg6) = _add4(_amem4_const(pimg6),addone);
   _amem4(pimg7) = _add4(_amem4_const(pimg7),addone);

   pimg0+=4;pimg1+=4;pimg2+=4;pimg3+=4;
   pimg4+=4;pimg5+=4;pimg6+=4;pimg7+=4;
  }
  pimg0+=deltastep;pimg1+=deltastep;pimg2+=deltastep;pimg3+=deltastep;
  pimg4+=deltastep;pimg5+=deltastep;pimg6+=deltastep;pimg7+=deltastep;
}
}

I think the conflict miss and capacity miss are not existing, but compulsory miss can't be avoided.

The total size of eight images is (352*288*8) bytes. so there will be (352*288*8)/128 times compulsory miss.

The original code of my arithmetic is like this below, I think these two functions "testspeed" and "testspeed2" are similar.

I test these two functions on dm642 and dm6446. the result is interesting.

	dm642	dm6446
testspeed	4-5ms	4-5ms
testspeed2	1ms	4-5ms

so, why the testspeed2 runs so fast on dm642???

void testspeed2(unsigned char * _pbgTotal)
{
int len = 352*288;
uchar * restrict pimg0 = _pbgTotal;
uchar * restrict pimg1 = pimg0+len;
uchar * restrict pimg2 = pimg0+len*2;
uchar * restrict pimg3 = pimg0+len*3;

uchar * restrict pimg4 = pimg0+len*4;
uchar * restrict pimg5 = pimg0+len*5;
uchar * restrict pimg6 = pimg0+len*6;
uchar * restrict pimg7 = pimg0+len*7;

int i,j;

int steplen = 64;
int deltastep = 64*7;

int addone = 1<<24 | 1<<16 | 1<<8 | 1;

for(j=0;j<(len/steplen);j++)
{
  for(i = 0;i<steplen;i+=4)
  {
   _amem4(pimg0) = _add4(_amem4_const(pimg0),addone);
   _amem4(pimg1) = _add4(_amem4_const(pimg1),addone);
   _amem4(pimg2) = _add4(_amem4_const(pimg2),addone);
   _amem4(pimg3) = _add4(_amem4_const(pimg3),addone);

   _amem4(pimg4) = _add4(_amem4_const(pimg4),addone);
   _amem4(pimg5) = _add4(_amem4_const(pimg5),addone);
   _amem4(pimg6) = _add4(_amem4_const(pimg6),addone);
   _amem4(pimg7) = _add4(_amem4_const(pimg7),addone);

   pimg0+=4;
   pimg1+=4;pimg2+=4;pimg3+=4;
   pimg4+=4;pimg5+=4;pimg6+=4;pimg7+=4;
  }
}
}

0 Mostafa El-Hashash over 9 years ago in reply to balanceren

Intellectual 395 points

Hello,
Did you know how to enable the cache and fix this because i am facing the same problem
Best regards

0 balanceren over 9 years ago in reply to Mostafa El-Hashash

Prodigy 240 points

The cache works fine,
the Dm642 has bug on timer, so it can't get the right time consume

0 Mostafa El-Hashash over 9 years ago in reply to balanceren

Intellectual 395 points

Thanks for replying

I am asking about enabling cache in DM6446

The L2 is set to 64k in the tcf file.

Is this way the cache is enabled or I should do something else

Regards

Processors

Processors forum

cache Optimization on dm6446？？？？？