Compiler/TMS320C6657: vector types weirdness

lemmiwinks

Part Number: TMS320C6657

Tool/software: TI C/C++ Compiler

Why doesn't this work properly?:

inline void VecMultVec(float const *x, float const *y, float *z, int len )
{
   int n;
   float16 *v1, *v2, *v3;

   v1 = (float16*)x;
   v2 = (float16*)y;
   v3 = (float16*)z;

   len >>= 4;

   #pragma MUST_ITERATE(1,,)
   for(n = 0; n < len; n++)
        *v3++ = *v1++ * *v2++;
}

It runs faster than "natural" C code but the results are unexpected. The arguments are all guaranteed to be aligned to 16-byte boundaries and len is always a multiple of 16.

over 8 years ago

0 Yordan Kovachev over 8 years ago

TI__Guru**** 161600 points

Hi,

Which SDK is this? Or are you using bare metal code?

Best Regards,
Yordan

0 lemmiwinks over 8 years ago in reply to Yordan Kovachev

Expert 1050 points

I don't understand the question. It's C code. It's using the C6x vector types (#include c6x_vec.h). Code Composer Studio using SYS/BIOS (or whatever it's been renamed to this month) and latest compiler.

0 lemmiwinks over 8 years ago in reply to lemmiwinks

Expert 1050 points

Anyone? Bueller?

0 lemmiwinks over 8 years ago in reply to lemmiwinks

Expert 1050 points

What do we have to do to get someone to look into this?

0 Markus Moll over 8 years ago in reply to lemmiwinks

Expert 1820 points

Is it possible that len is the length of your arrays in terms of elements, not the number of bytes? In that case "lwn >>= 4" would be wrong and would result in far too few iterations. (As far as I can see, float16 is a vector type comprising 4 floats (I.e. 16 bytes))

0 lemmiwinks over 8 years ago in reply to Markus Moll

Expert 1050 points

According to the compiler manual:
floatn: A vector of n 32-bit single-precision floating-point values.

Why does TI not respond? It's been a week. I hope this isn't indicative of the kind of support to expect.

0 Yordan Kovachev over 8 years ago in reply to lemmiwinks

TI__Guru**** 161600 points

Hi,

We're looking into this. Sorry for the delay.

Best Regards,
Yordan

0 Rahul Prabhu over 8 years ago in reply to Yordan Kovachev

TI__Guru** 116020 points

Sorry for the delayed response on this issue. The question relates to the TI C6000 compiler but was posted on device forums.

I have move the thread to the TI compiler forums and pinged the compiler experts to respond.

0 George Mock over 8 years ago

TI__Guru**** 244930 points

I cannot determine what causes the problem based on what I see here. Please submit a test case. For this situation, I would like a test case that can execute. So please provide the code which supplies the input data and checks the result for correctness. We also need to know the compiler version, and all the build options exactly as the compiler sees them. If it easier for you, feel free to put everything together in a CCS project, then package that up as described in the article Project Sharing.

Thanks and regards,

-George

0 user347219 over 8 years ago in reply to George Mock

Expert 1240 points

I might be messing up my C or OpenCL, but I see a problem as well. Running the OP's original function only calculates the result for the first two floats, which seems incorrect. A non-inlined version gives values for all 16 floats but they all seem incorrect. A version that breaks up the pointer manipulation seems to work correctly. A version that uses indexing instead of pointer manipulation seems to work correctly.

#include <c6x_vec.h>

inline void VecMultVec(float const *x, float const *y, float *z, int len )
{
   int n;
   float16 *v1, *v2, *v3;

   v1 = (float16*)x;
   v2 = (float16*)y;
   v3 = (float16*)z;

   len >>= 4;

   #pragma MUST_ITERATE(1,,)
   for(n = 0; n < len; n++)
        *v3++ = *v1++ * *v2++;
}

void VecMultVec2(float const *x, float const *y, float *z, int len )
{
   int n;
   float16 *v1, *v2, *v3;

   v1 = (float16*)x;
   v2 = (float16*)y;
   v3 = (float16*)z;

   len >>= 4;

   #pragma MUST_ITERATE(1,,)
   for(n = 0; n < len; n++)
        *v3++ = *v1++ * *v2++;
}

inline void VecMultVec3(float const *x, float const *y, float *z, int len )
{
   int n;
   float16 *v1, *v2, *v3;

   v1 = (float16*)x;
   v2 = (float16*)y;
   v3 = (float16*)z;

   len >>= 4;

   #pragma MUST_ITERATE(1,,)
   for(n = 0; n < len; n++)
        v3[n] = v1[n] * v2[n];
}

inline void VecMultVec4(float const *x, float const *y, float *z, int len )
{
   int n;
   float16 *v1, *v2, *v3;

   v1 = (float16*)x;
   v2 = (float16*)y;
   v3 = (float16*)z;

   len >>= 4;

   #pragma MUST_ITERATE(1,,)
   for(n = 0; n < len; n++)
   {
       //*(v3++) = (*(v1++)) * (*(v2++));
	   *v3 = *v1 * *v2;
	   v1++;
	   v2++;
	   v3++;
   }
}


int main()
{
    float x[16] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};
    float z[16] = {};	
    
    VecMultVec(x, x, z, 16);
    VecMultVec2(x, x, z, 16);
    VecMultVec3(x, x, z, 16);
    VecMultVec4(x, x, z, 16);
    
    return 0;
}

C6655, CGT 8.2.1Test_proj.zip

0 lemmiwinks over 8 years ago in reply to user347219

Expert 1050 points

Yes, that's the exact problem we're seeing. Only the first two values get computed. It's something to do with using pointers.

0 George Mock over 8 years ago in reply to lemmiwinks

TI__Guru**** 244930 points

Thank you for submitting a test case. I can reproduce the same wrong result. I filed CODEGEN-3613 in the SDOWP system to have this investigated. You are welcome to follow it with the SDOWP link below in my signature.

Thanks and regards,

-George

0 pf over 8 years ago in reply to user347219

TI__Expert 4930 points

Yes, there is a problem with the "*v3++" part that causes it to write to the wrong locations.

I believe a workaround is to specify --opt_level=1 or greater. The problem seems to be specific to --opt_level=0, and that's what you get when you specify --vectypes=on without any --opt_level option.

0 lemmiwinks over 8 years ago in reply to pf

Expert 1050 points

Sorry but that is incorrect. We are seeing this issue with --opt_level=1 or greater. In fact CCS forces you to turn on optimization when using the --vectypes option.

0 pf over 8 years ago in reply to lemmiwinks

TI__Expert 4930 points

I'm sorry, but I don't have a test case that shows that. The case from user347219 does behave correctly with -o1, near as I can tell.
If you can send us one that illustrates your situation, I can ensure that my patch fixes that, too.

0 lemmiwinks over 8 years ago in reply to pf

Expert 1050 points

So now you're saying that you have a patch? Setting --opt_level=1 is not a "patch" it's a workaround. It's a moot point anyways because it doesn't work with our software. Have you tried other optimization levels?

I don't have a test project other than our actual project and that is several hundred files and it would be weeks worth of lawyers drafting up NDA's for me to share it.

0 pf over 8 years ago in reply to lemmiwinks

TI__Expert 4930 points

Yes, it completed code review about an hour ago. It will take some time for the release to happen, though, and a workaround could permit your code to work in the meantime. Come to think of it, you already have other workarounds, suggested in the posted example.

It's unlikely we'd need the whole project to reproduce your issue, but that sounds moot too. If you compile with "--src_interlist --keep_asm" (-sk for short), and see stores that look like "*((float *)v3<float<[16]>*>++{64}+4)" -- that have both ++ and +integer -- then it's likely to be the same problem.

We will hope that the patch release solves it for you.

0 Yuan Zhao over 8 years ago in reply to user347219

TI__Expert 3995 points

Sorry I didn't see Paul's response. Looks like the issue has already been taken care of. Thanks!

- Yuan

Code Composer Studio™︎

Code Composer Studio forum

Compiler/TMS320C6657: vector types weirdness