Tool/software: TI C/C++ Compiler
For portability reasons I have wrapped some intrinsics in inline functions, e.g. static inline int16_t min_i16(int16_t a, int16_t b){ return __min(a, b);}; If I code a tight loop using this inline function the compiler does not produce optimal code.
The code (main.c):
#include <stdint.h>
int16_t array_min_explicit( int16_t *a, int num )
{
int16_t m = *a++;
#pragma UNROLL(1)
do
{
m = __min( m, *a++);
} while ( num-- );
return m;
}
inline int16_t min_i16( int16_t a, int16_t b) { return __min(a, b); }
int16_t array_min_inline( int16_t *a, int num )
{
int16_t m = *a++;
#pragma UNROLL(1)
do
{
m = min_i16( m, *a++);
}
while ( num -- );
return m;
}
The compiler command line:
"C:/ti/ccsv6/tools/compiler/ti-cgt-c2000_17.9.0.STS/bin/cl2000" -v28 -ml -mt --cla_support=cla1 --float_support=fpu32 --tmu_support=tmu0 --vcu_support=vcu2 -O4 --opt_for_speed=5 --fp_mode=relaxed --include_path="C:/ti/ccsv6/tools/compiler/ti-cgt-c2000_17.9.0.STS/include" --symdebug:none --diag_warning=225 --diag_wrap=off --display_error_number -k --preproc_with_compile --preproc_dependency="main.d" "../main.c"
The assembly output of interest:
_array_min_inline:
MOVZ AR6,AL
MOV AL,*XAR4++
$C$L1:
MOV AH,*XAR4++
MIN AL,AH
BANZ $C$L1,AR6--
; branchcc occurs
LRETR
_array_min_explicit:
MOVZ AR5,AL
MOV AL,*XAR4++
RPT AR5
|| MIN AL,*XAR4++
LRETR
The difference between the two codes is the use of a fast RPT loop vs a slow BANZ loop. Am I doing something wrong? The compiler documentation (SPRU514O 2.11 first line) suggests that inline functions are fully expanded prior to any optimisation.