This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler/TMS320C6657: vector types weirdness

Part Number: TMS320C6657

Tool/software: TI C/C++ Compiler

Why doesn't this work properly?:

inline void VecMultVec(float const *x, float const *y, float *z, int len )
{
   int n;
   float16 *v1, *v2, *v3;

   v1 = (float16*)x;
   v2 = (float16*)y;
   v3 = (float16*)z;

   len >>= 4;

   #pragma MUST_ITERATE(1,,)
   for(n = 0; n < len; n++)
        *v3++ = *v1++ * *v2++;
}


It runs faster than "natural" C code but the results are unexpected.  The arguments are all guaranteed to be aligned to 16-byte boundaries and len is always a multiple of 16.

  • Hi,

    Which SDK is this? Or are you using bare metal code?

    Best Regards,
    Yordan
  • I don't understand the question. It's C code. It's using the C6x vector types (#include c6x_vec.h). Code Composer Studio using SYS/BIOS (or whatever it's been renamed to this month) and latest compiler.
  • Anyone? Bueller?
  • What do we have to do to get someone to look into this?
  • Is it possible that len is the length of your arrays in terms of elements, not the number of bytes? In that case "lwn >>= 4" would be wrong and would result in far too few iterations. (As far as I can see, float16 is a vector type comprising 4 floats (I.e. 16 bytes))

  • According to the compiler manual:
    floatn: A vector of n 32-bit single-precision floating-point values.

    Why does TI not respond? It's been a week. I hope this isn't indicative of the kind of support to expect.
  • Hi,

    We're looking into this. Sorry for the delay.

    Best Regards,
    Yordan
  • Sorry for the delayed response on this issue. The question relates to the TI C6000 compiler but was posted on device forums.

    I have move the thread to the TI compiler forums and pinged the compiler experts to respond.
  • I cannot determine what causes the problem based on what I see here.  Please submit a test case.  For this situation, I would like a test case that can execute.  So please provide the code which supplies the input data and checks the result for correctness.  We also need to know the compiler version, and all the build options exactly as the compiler sees them.  If it easier for you, feel free to put everything together in a CCS project, then package that up as described in the article Project Sharing.

    Thanks and regards,

    -George

  • I might be messing up my C or OpenCL, but I see a problem as well.  Running the OP's original function only calculates the result for the first two floats, which seems incorrect.  A non-inlined version gives values for all 16 floats but they all seem incorrect.  A version that breaks up the pointer manipulation seems to work correctly.  A version that uses indexing instead of pointer manipulation seems to work correctly.

    #include <c6x_vec.h>
    
    inline void VecMultVec(float const *x, float const *y, float *z, int len )
    {
       int n;
       float16 *v1, *v2, *v3;
    
       v1 = (float16*)x;
       v2 = (float16*)y;
       v3 = (float16*)z;
    
       len >>= 4;
    
       #pragma MUST_ITERATE(1,,)
       for(n = 0; n < len; n++)
            *v3++ = *v1++ * *v2++;
    }
    
    void VecMultVec2(float const *x, float const *y, float *z, int len )
    {
       int n;
       float16 *v1, *v2, *v3;
    
       v1 = (float16*)x;
       v2 = (float16*)y;
       v3 = (float16*)z;
    
       len >>= 4;
    
       #pragma MUST_ITERATE(1,,)
       for(n = 0; n < len; n++)
            *v3++ = *v1++ * *v2++;
    }
    
    inline void VecMultVec3(float const *x, float const *y, float *z, int len )
    {
       int n;
       float16 *v1, *v2, *v3;
    
       v1 = (float16*)x;
       v2 = (float16*)y;
       v3 = (float16*)z;
    
       len >>= 4;
    
       #pragma MUST_ITERATE(1,,)
       for(n = 0; n < len; n++)
            v3[n] = v1[n] * v2[n];
    }
    
    inline void VecMultVec4(float const *x, float const *y, float *z, int len )
    {
       int n;
       float16 *v1, *v2, *v3;
    
       v1 = (float16*)x;
       v2 = (float16*)y;
       v3 = (float16*)z;
    
       len >>= 4;
    
       #pragma MUST_ITERATE(1,,)
       for(n = 0; n < len; n++)
       {
           //*(v3++) = (*(v1++)) * (*(v2++));
    	   *v3 = *v1 * *v2;
    	   v1++;
    	   v2++;
    	   v3++;
       }
    }
    
    
    int main()
    {
        float x[16] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};
        float z[16] = {};	
        
        VecMultVec(x, x, z, 16);
        VecMultVec2(x, x, z, 16);
        VecMultVec3(x, x, z, 16);
        VecMultVec4(x, x, z, 16);
        
        return 0;
    }

    C6655, CGT 8.2.1Test_proj.zip

  • Yes, that's the exact problem we're seeing. Only the first two values get computed. It's something to do with using pointers.
  • Thank you for submitting a test case.  I can reproduce the same wrong result.  I filed CODEGEN-3613 in the SDOWP system to have this investigated.  You are welcome to follow it with the SDOWP link below in my signature.

    Thanks and regards,

    -George

  • Yes, there is a problem with the "*v3++" part that causes it to write to the wrong locations.

    I believe a workaround is to specify --opt_level=1 or greater.  The problem seems to be specific to --opt_level=0, and that's what you get when you specify --vectypes=on without any --opt_level option.

  • Sorry but that is incorrect. We are seeing this issue with --opt_level=1 or greater. In fact CCS forces you to turn on optimization when using the --vectypes option.
  • I'm sorry, but I don't have a test case that shows that. The case from user347219 does behave correctly with -o1, near as I can tell.
    If you can send us one that illustrates your situation, I can ensure that my patch fixes that, too.
  • So now you're saying that you have a patch? Setting --opt_level=1 is not a "patch" it's a workaround. It's a moot point anyways because it doesn't work with our software. Have you tried other optimization levels?

    I don't have a test project other than our actual project and that is several hundred files and it would be weeks worth of lawyers drafting up NDA's for me to share it.
  • Yes, it completed code review about an hour ago. It will take some time for the release to happen, though, and a workaround could permit your code to work in the meantime. Come to think of it, you already have other workarounds, suggested in the posted example.

    It's unlikely we'd need the whole project to reproduce your issue, but that sounds moot too. If you compile with "--src_interlist --keep_asm" (-sk for short), and see stores that look like "*((float *)v3<float<[16]>*>++{64}+4)" -- that have both ++ and +integer -- then it's likely to be the same problem.

    We will hope that the patch release solves it for you.
  • Sorry I didn't see Paul's response.  Looks like the issue has already been taken care of.  Thanks!


    - Yuan