Tool/software: TI C/C++ Compiler
Why doesn't this work properly?:
inline void VecMultVec(float const *x, float const *y, float *z, int len ) { int n; float16 *v1, *v2, *v3; v1 = (float16*)x; v2 = (float16*)y; v3 = (float16*)z; len >>= 4; #pragma MUST_ITERATE(1,,) for(n = 0; n < len; n++) *v3++ = *v1++ * *v2++; }
It runs faster than "natural" C code but the results are unexpected. The arguments are all guaranteed to be aligned to 16-byte boundaries and len is always a multiple of 16.