This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Data Binning...

All:

I need to optimize a  for()  loop for the C5505...I have a raw buffer that needs to be dealt into 3 bins...

for (i=0;i<1000;i++)

{

      j = 3*i;

      buff1[i] = rawbuff[j];

      buff2[i] = rawbuff[j+1];

      buff3[i] = rawbuff[j+2];

}

I tried to make i & j global instead of local - no better results.

If I went from 1000 to 0, instead of 0 to 1000, would that be more optimal?

Would a do-while work better than the for()?

If I created a structure that had buff1, buff2, buff3, could that be made to work better than 3 independent buffers?

I can examine the resulting assembly, and create an optimized assembly routine, but I would rather not.

 

  • All:

    I just realized that the C55x has a repeat block assembly instruction.

    It could be that the main part of an assembly solution could be something like

    Repeat Block 3000 times.

        Move RawBuffer Value to ACC... with autoincrement of buffer pointer.

        Move ACC to BinX or Y or Z...with autoincrement.

    End Repeat.

     

    Does anyone have a snippet of code that does this?

     

  • Are you using the optimizer?  What is the type of the elements in the buffers?

    The compiler gets pretty good (but not perfect) results for int-sized elements.  It can do the auto-increment optimization without assistance.  Try this:

    /* cl55 -O2 deal.c */
    void deal(int buff1[], int buff2[], int buff3[], int rawbuff[restrict])
    {
        int i, j;
    
        for (i=0;i<1000;i++)
        {
            j = 3*i;
            buff1[i] = rawbuff[j];
            buff2[i] = rawbuff[j+1];
            buff3[i] = rawbuff[j+2];
        }
    }
    
  • The following loop would be faster than yours:

    for (i = 1000; i != 0; i--)

    {

        *buff1++ = *rawbuff++;

        *buff2++ = *rawbuff++;

        *buff3++ = *rawbuff++;


    }

    You can also double the speed by rewriting this loop in assembly code

  • All:

    Thanks for the input - looks like I can do it with a specific assembly routine - inside of my loop, it is 3 instructions, very similar to your solution Cong!

    I found that the RPTBLOCAL can be used with the following 3 instructions - simulated to some degree...

    One other caviat - turns out that my raw buffer is 32 bits wide and I need to copy the upper  16 bits to each of 3 bins.

    My repeat code looks like this:

    RPTBLOCAL  END_RPT

       mov    *(AR4+T0), *AR1+   ; Move value from raw buffer to bin 1.

       mov    *(AR4+T0), *AR2+   ; Move value from raw buffer to bin 2.

       mov    *(AR4+T0), *AR3+   ; Move value from raw buffer to bin 3.

    END_RPT:

    Thanks again for the input...