This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Tool/software: TI C/C++ Compiler
Why doesn't this work properly?:
inline void VecMultVec(float const *x, float const *y, float *z, int len ) { int n; float16 *v1, *v2, *v3; v1 = (float16*)x; v2 = (float16*)y; v3 = (float16*)z; len >>= 4; #pragma MUST_ITERATE(1,,) for(n = 0; n < len; n++) *v3++ = *v1++ * *v2++; }
It runs faster than "natural" C code but the results are unexpected. The arguments are all guaranteed to be aligned to 16-byte boundaries and len is always a multiple of 16.
Is it possible that len is the length of your arrays in terms of elements, not the number of bytes? In that case "lwn >>= 4" would be wrong and would result in far too few iterations. (As far as I can see, float16 is a vector type comprising 4 floats (I.e. 16 bytes))
I cannot determine what causes the problem based on what I see here. Please submit a test case. For this situation, I would like a test case that can execute. So please provide the code which supplies the input data and checks the result for correctness. We also need to know the compiler version, and all the build options exactly as the compiler sees them. If it easier for you, feel free to put everything together in a CCS project, then package that up as described in the article Project Sharing.
Thanks and regards,
-George
I might be messing up my C or OpenCL, but I see a problem as well. Running the OP's original function only calculates the result for the first two floats, which seems incorrect. A non-inlined version gives values for all 16 floats but they all seem incorrect. A version that breaks up the pointer manipulation seems to work correctly. A version that uses indexing instead of pointer manipulation seems to work correctly.
#include <c6x_vec.h> inline void VecMultVec(float const *x, float const *y, float *z, int len ) { int n; float16 *v1, *v2, *v3; v1 = (float16*)x; v2 = (float16*)y; v3 = (float16*)z; len >>= 4; #pragma MUST_ITERATE(1,,) for(n = 0; n < len; n++) *v3++ = *v1++ * *v2++; } void VecMultVec2(float const *x, float const *y, float *z, int len ) { int n; float16 *v1, *v2, *v3; v1 = (float16*)x; v2 = (float16*)y; v3 = (float16*)z; len >>= 4; #pragma MUST_ITERATE(1,,) for(n = 0; n < len; n++) *v3++ = *v1++ * *v2++; } inline void VecMultVec3(float const *x, float const *y, float *z, int len ) { int n; float16 *v1, *v2, *v3; v1 = (float16*)x; v2 = (float16*)y; v3 = (float16*)z; len >>= 4; #pragma MUST_ITERATE(1,,) for(n = 0; n < len; n++) v3[n] = v1[n] * v2[n]; } inline void VecMultVec4(float const *x, float const *y, float *z, int len ) { int n; float16 *v1, *v2, *v3; v1 = (float16*)x; v2 = (float16*)y; v3 = (float16*)z; len >>= 4; #pragma MUST_ITERATE(1,,) for(n = 0; n < len; n++) { //*(v3++) = (*(v1++)) * (*(v2++)); *v3 = *v1 * *v2; v1++; v2++; v3++; } } int main() { float x[16] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}; float z[16] = {}; VecMultVec(x, x, z, 16); VecMultVec2(x, x, z, 16); VecMultVec3(x, x, z, 16); VecMultVec4(x, x, z, 16); return 0; }
C6655, CGT 8.2.1Test_proj.zip
Thank you for submitting a test case. I can reproduce the same wrong result. I filed CODEGEN-3613 in the SDOWP system to have this investigated. You are welcome to follow it with the SDOWP link below in my signature.
Thanks and regards,
-George
Yes, there is a problem with the "*v3++" part that causes it to write to the wrong locations.
I believe a workaround is to specify --opt_level=1 or greater. The problem seems to be specific to --opt_level=0, and that's what you get when you specify --vectypes=on without any --opt_level option.
Sorry I didn't see Paul's response. Looks like the issue has already been taken care of. Thanks!
- Yuan