This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

problem with double memory access

I need to create a function. Input arguments of the function are Image and mask. The Function must copy to the output only unmasked pixels. Something like this:

for(i=0;i<1024;i++)          if(mask[i])*ouput++=input[i]

The bottle neck of this function is a memory access.

I have optimized this function:

 

void copy_image(

unsigned short *restrict input_image,

unsigned short *restrict ouput_image1,

unsigned short *restrict ouput_image2,

unsigned short *restrict _input_map)

{

       int i,j;

       double pix1234,pix5678;

      

       unsigned short pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8;

 

        

       const double *restrict part1 = (const double*)&input_image[0];

    const double *restrict part2  = (const double *)&input_image[1024];

       unsigned                   short map;

      

        _nassert(((unsigned)input_image)     % 8 == 0);

        _nassert(((unsigned)part1)        % 8 == 0);

        _nassert(((unsigned)part2)         % 8 == 0);

        _nassert(((unsigned)ouput_image1)    % 8 == 0);

        _nassert(((unsigned)ouput_image2)    % 8 == 0);

 

             for(i = 0; i < 1024; i += 1)

             {

             map=*_input_map++;

            

             pix1234= _amemd8_const((void *)&(part1[i]));

                 pixel1 = ((_extu(_hi(pix1234), 0, 16)) );

               pixel2 = ((_extu(_hi(pix1234), 16,16)) );

               pixel3 = ((_extu(_lo(pix1234), 0, 16)) );

               pixel4 = ((_extu(_lo(pix1234), 16,16)) );

                               

                    if((map & 0x1)==0)  *ouput_image1++=pixel1;

                    if((map & 0x2 )==0) *ouput_image2++=pixel2;

                    if((map & 0x4)==0)  *ouput_image1++=pixel3;

                    if((map & 0x8)==0)  *ouput_image2++=pixel4;

            

      

             pix5678=_amemd8_const((void *)&(part2[i]));

              pixel5 = ((_extu(_hi(pix5678), 0, 16)) );

               pixel6 = ((_extu(_hi(pix5678), 16,16)) ); /

               pixel7 = ((_extu(_lo(pix5678), 0, 16)) );

               pixel8 = ((_extu(_lo(pix5678), 16,16)) );

                               

                    if((map & 16)==0    )  *ouput_image1++=pixel5;

                    if((map & 32)==0    )  *ouput_image2++=pixel6;

                    if((map & 64)==0    )  *ouput_image1++=pixel7;

                    if((map & 128)==0   )         *ouput_image2++=pixel8;

                   

             }

}

 

In the file of analysis (*.asm) is written that for each pixel processor will spend five cycles.

But when I do a profiling I see that my function was done for 21700 cycles. 6*1024 – expected and 15000 - “L1D.Stall.write_buf_full”.

 

Then I have modified this code and deleted the second output buffer :

                               

                    if((map & 0x1)==0)  *ouput_image1++=pixel1;

                    if((map & 0x2 )==0) *ouput_image1++=pixel2;

                    if((map & 0x4)==0)  *ouput_image1++=pixel3;

                    if((map & 0x8)==0)  *ouput_image1++=pixel4;

            

 

                    if((map & 16)==0    )  *ouput_image1++=pixel5;

                    if((map & 32)==0    )  *ouput_image1++=pixel6;

                    if((map & 64)==0    )  *ouput_image1++=pixel7;

                    if((map & 128)==0   )         *ouput_image1++=pixel8;

 

Now there are 8 cycles per pixel and no any double writing to memory.  But number of writing bytes is the same.  So now I expect 8*1024 + 15000 cycles(L1D.Stall.write_buf_full).

But when I do a profiling I see that function was done for 8*1024 cycles. And I don’t understand why ???? Can’ you help me? How can I use double access without a huge memory writing buffer stall?

I can do a prefetch but it will be a waste of memory because I don’t know how many pixels will be masked.