MUST_ITERATE AND UNROLL pragmas do not work

DSP dsp

Prodigy 10 points

I am trying to optimize the simple Sum of Product Code using #pragma directives.

Snippet of the code is shown below

#include <stdio.h>
unsigned char a[64];
unsigned char b[64];
int sum = 0;

main()
{
int i;

for(i = 0; i< 64; i++)
{
a[i] = b[i] = i;
}

//#pragma UNROLL(2)
//#pragma MUST_ITERATE( , , 2)
for(i = 0; i< 64; i= i++)
sum = sum + (a[i] * b[i]);

}

In the code above, I tried using both UNROLL and MUST_ITERATE pragma to unroll the loop two times. But, in both case, loop does not look to get unrolled and thereby reducing time.

Timing remained same with and without using pragma directives.

Apperciate your help in guidance to use #pragma directives

over 12 years ago

RandyP over 12 years ago

TI__Guru* 84110 points

Welcome to the TI E2E forum. I hope you will find many good answers here and in the TI.com documents and in the TI Wiki Pages. Be sure to search those for helpful information and to browse for the questions others may have asked on similar topics.

The E2E forum is a very large one, and it can be difficult to figure out where the right experts are. In your case with a compiler question, this should be posted in the TI C/C++ Compiler Forum instead of this device-based C64x Single Core DSP Forum. This thread will be moved there this time for your convenience.

It will help us to help you if you will tell us which device you are using, which version of Code Generation Tools you are using, which version of CCS, and what the compiler switch settings are.

For our DSPLIB, there are C and Assembly versions of many of the library functions. The FIR filter is the classic DSP algorithm and you can find examples there, plus a library function that will do the function you are trying to write.

In the TI Wiki Pages, you will find some articles and workshop material on optimization methods. You can search for "c6000 optimization" (no quotes) to find a list of several of these. Other keywords may help you find additional material.

The fact that you are accumulating 8-bit multiplications into a 32-bit result may impact the efficiency of the architecture. If you are just trying to play with the tools for gaining experience, then you may want to try with native 32-bit operations or use 16-bit data and 16-bit accumulation.

For 8-bit data, the optimal solution would probably be using 4 parallel operations, but it may depend on the specific device you are using.

Please let us know what you find from the TI Wiki Pages and what questions you may have from those new insights.

Regards,
RandyP

Archaeologist over 12 years ago

TI__Guru* 84225 points

The construct "i=i++" is illegal because it modifies i twice before a sequence point. Just use "i++"

When I uncomment those pragmas and compile this test case with -o2 (optimization level 2), the compiler does unroll and software pipeline the loop. As Randy suggests, we need to see the complete command line options as well as the version of the compiler (which is not the same as the version of CCS).

Archaeologist over 12 years ago in reply to RandyP

TI__Guru* 84225 points

RandyP said:
you are accumulating 8-bit multiplications

Actually, according to the rules of the C language, the 8-bit inputs are widened to 32-bit "int" before the multiplication, so each multiplication is actually (at C level) a 32x32 into 32-bit operation. The compiler will actually use the 16x16->32 multiplication instruction, but that's still considered a 32-bit multiplication as far as C is concerned.

Code Composer Studio™︎

Code Composer Studio forum

MUST_ITERATE AND UNROLL pragmas do not work