This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320C6678: C6678 image processing code run very slowly in shared memory

Genius 13655 points
Part Number: TMS320C6678


Hi Champs,

Below code ran very slowly in shared memory

unsigned   short  *IN1,*IN2,*IN3,*IN4,*IN5,*IN6,*IN7,*IN8,*IN9,*IN10,*IN11;

IN1=imgin_ptr;

IN2=IN1+640;

IN3=IN2+640;

IN4=IN3+640;

...

IN11=IN10+640;

OUT=imgout_ptr;

for (i=0;i<256;i++)

{

          for(j=0;j<640;j++)

          {

            sum=(IN1[0]+IN2[0]+...IN11[0])/11;

         * OUT++sum;

           IN1++;

          IN2++;

           ...

         IN11++;

           }

}

1. Only Core0 run the code

2. enable cache, L1P, L1D 32K, L2cache 128K.

3. Code is on LL2

4. use -O3

Processing 640x512 image 11x1 need about 3ms, 1x11 will less time.

How to optimize the code for better performance.

Thanks.

Rgds
Shine