TMS320C6678: C6678 image processing code run very slowly in shared memory

Shine

Genius 13655 points

Part Number: TMS320C6678

Hi Champs,

Below code ran very slowly in shared memory

unsigned short *IN1,*IN2,*IN3,*IN4,*IN5,*IN6,*IN7,*IN8,*IN9,*IN10,*IN11；

IN1=imgin_ptr;

IN2=IN1+640;

IN3=IN2+640;

IN4=IN3+640;

...

IN11=IN10+640;

OUT=imgout_ptr;

for (i=0;i<256;i++)

{

for(j=0;j<640;j++)

{

sum=(IN1[0]+IN2[0]+...IN11[0])/11;

* OUT++sum;

IN1++;

IN2++;

...

IN11++;

}

}

1. Only Core0 run the code

2. enable cache, L1P, L1D 32K, L2cache 128K.

3. Code is on LL2

4. use -O3

Processing 640x512 image 11x1 need about 3ms, 1x11 will less time.

How to optimize the code for better performance.

Thanks.

Rgds
Shine

0 Cvetolin Shulev-XID over 8 years ago

Hi Shine,

I've forwarded this to the RTOS experts. Their feedback should be posted here.

BR
Tsvetolin Shulev

0 lding over 8 years ago in reply to Cvetolin Shulev-XID

Processors forum