This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler/TDA3: Suboptimal VCOP instruction reordering

Part Number: TDA3

Tool/software: TI C/C++ Compiler

I'm trying to compile the following kernel-C code (ARP32 compiler v.1.0.8):

__vector d00, d01, d02, d03, d10, d11, d12, d13;
...
d00 = max(d00, d10);
d01 = max(d01, d11);
d02 = max(d02, d12);
d03 = max(d03, d13);
d00 = max(d00, d02);
d01 = max(d01, d03);
...

And the compiler creates this assembly:

        VMAX      V2,V14,V2             ; [DP_32_VCOP1] |400| 
||      VMAX      V6,V10,V6             ; [DP_32_VCOP2] |402| 

        VMAX      V2,V6,V2              ; [DP_32_VCOP1] |404| 
||      VMAX      V0,V12,V0             ; [DP_32_VCOP2] |401| 

        VMAX      V4,V8,V4              ; [DP_32_VCOP1] |403| 
        VMAX      V0,V4,V0              ; [DP_32_VCOP1] |405| 

The instructions are ordered in such way that the last pair of VMAX cannot be parallelized.