This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

How to make "restrict" really work?

 

I wrote a function to copy data from in_data to out_data, and I used TSCL to calculate cycle counts.

the cycle counts is around  16313967 for 720*480*2 data size.

so I tried to add keyword restrict, but saw no improvement.

How to make loops faster?

 

void AssignY(unsigned char* restrict in_data,  int height, int width, unsigned char* restrict out_data)

{

int i, j;

int line = 0;

int dWidth = width<<1;

int offset;

 

for (i = 0; i < height; i++)

{  

for (j = 0; j < width; j++)

{

offset = line+j*2+1;

out_data[offset]  = in_data[offset];

}// end for

 

line += dWidth;

}

}

 

  • Please try some of the techniques described in this Wiki article.

    Thanks and regards,

    -George

  • Thanks, Georgem.

    This is so far I've tried. Still using my own function as mentioned previously.

     

    1. Updated Code Generation Tools from v6.0.8 to v6.1.18

    Result: 16313967 to 14279408. Not bad improvement. 

    2. Build Option Add:

    -o2 : Optimization level 2

    -s : Place optimizer generated comments in the assembly output

    -mw : Place extra information about software pipeline loops in the assembly output

    -mo : Place each section in its own sub-section

    Result: Nothing improve.

    3. Build Option Not Add:

    -g : Generate debug information

    Result: 14279408 to 10499341. Not bad improvement. 

    4. Add restrict to the pointer parameters

    Result: Nothing improve.

     

    I still wonder why restrict doesn't work?

     

     

  •  

    I improve my code from 10499341 to 6130357!

    but still without restrict, ha.

    Still have improving space with restrict?

     

    // ===============================================================

    void AssignY(unsigned char* in_data,  int height, int width, unsigned char* out_data)

    {

    int i;

    int size = height*width*2;

     

    for (i = 0; i < size; i+=8)

    {  

    *(out_data+=2)= *(in_data+=2);

    *(out_data+=2)= *(in_data+=2);

    *(out_data+=2)= *(in_data+=2);

    *(out_data+=2)= *(in_data+=2);

    }

    }

     

    even better 5272834

    // ===============================================================

    void AssignY(unsigned char* in_data,  int height, int width, unsigned char* out_data)

    {

    int i;

    int size = height*width*2;

     

    for (i = 0; i < size; i+=16)

    {  

       *(out_data+2)= *(in_data+2);

    *(out_data+4)= *(in_data+4);

    *(out_data+6)= *(in_data+6);

    *(out_data+8)= *(in_data+8);

    *(out_data+10)= *(in_data+10);

    *(out_data+12)= *(in_data+12);

    *(out_data+14)= *(in_data+14);

    *(out_data+16)= *(in_data+16);

    out_data += 16;

    }

    }

     

  • Have a look at the assembly code generated by the compiler.  Is a software pipelined loop generated?  What is the Iteration Interval of the loop?

    Are both the input and output buffers in internal memory?  If not, memory delays will dominate the execution time.

    This loop looks like a simple block copy.  If so, you should just call the library memcpy() instead, which is highly optimized for speed.