Tool/software: TI C/C++ Compiler
Why doesn't this work properly?:
inline void VecMultVec(float const *x, float const *y, float *z, int len )
{
int n;
float16 *v1, *v2, *v3;
v1 = (float16*)x;
v2 = (float16*)y;
v3 = (float16*)z;
len >>= 4;
#pragma MUST_ITERATE(1,,)
for(n = 0; n < len; n++)
*v3++ = *v1++ * *v2++;
}
It runs faster than "natural" C code but the results are unexpected. The arguments are all guaranteed to be aligned to 16-byte boundaries and len is always a multiple of 16.