This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

c64x compiler not efficient with static inline function and structures

The compiler generates surprisingly slow code for the example below


Given this struct and function definition:

typedef struct _S32_FP
{
   S32 value;          
   S32 fractionBits;   
} S32_FP;

static inline S32_FP S32_FP_SCALE_UP(S32_FP a)
{
   S32_FP result;
   S32 shiftLeft = _norm(a.value);
   result.fractionBits = a.fractionBits + shiftLeft;
   result.value = a.value << shiftLeft;
   return result;
}

This code is much slower:

response.denominator = S32_FP_SCALE_UP(response.denominator);
response.numerator   = S32_FP_SCALE_UP(response.numerator);

...than this equivalent, manually inlined, version:

const S32 shiftLeftDenominator = _norm(response.denominator.value);
response.denominator.fractionBits += shiftLeftDenominator;
response.denominator.value <<= shiftLeftDenominator;
const S32 shiftLeftNumerator = _norm(response.numerator.value);
response.numerator.fractionBits += shiftLeftNumerator;
response.numerator.value <<= shiftLeftNumerator;

The compiler appears to get in trouble optimizing the struct assignments. 

The compiler option used are:

"C:\ti\ccsv5\tools\compiler\c6000/bin/cl6x" -fr <???> -c -mv64+ --diag_warning=225 --interrupt_threshold=70000 --abi=coffabi -O3 --auto_inline=10000 

  • I see essentially the same asm in either case;  the only difference I see is that the inlined version does one LDNDW instead of two LDW.  We'll need some context, and the compiler version, before we can see what you see.  For instance, is "response" a global or local?  Is it written in a loop?  Maybe show us the whole function.