Part Number: TDA3
Tool/software: TI C/C++ Compiler
I'm trying to compile the following kernel-C code (ARP32 compiler v.1.0.8):
__vector d00, d01, d02, d03, d10, d11, d12, d13; ... d00 = max(d00, d10); d01 = max(d01, d11); d02 = max(d02, d12); d03 = max(d03, d13); d00 = max(d00, d02); d01 = max(d01, d03); ...
And the compiler creates this assembly:
VMAX V2,V14,V2 ; [DP_32_VCOP1] |400| || VMAX V6,V10,V6 ; [DP_32_VCOP2] |402| VMAX V2,V6,V2 ; [DP_32_VCOP1] |404| || VMAX V0,V12,V0 ; [DP_32_VCOP2] |401| VMAX V4,V8,V4 ; [DP_32_VCOP1] |403| VMAX V0,V4,V0 ; [DP_32_VCOP1] |405|
The instructions are ordered in such way that the last pair of VMAX cannot be parallelized.