This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F280049: Strange optimized code output - much slower than no optimization

Part Number: TMS320F280049

Hi

I'm getting strange "optimized" compiler output. I've made a minimal test case below to demonstrate (being a struct doesn't matter):

struct
{
   double a, b, c, d;
} teststruct;

int main(void)
{
   teststruct.a = 0;
   teststruct.b = 0;
   teststruct.c = 0;
   teststruct.d = 0;

   return 0;
}

With optimization disable or -o0 the assembly is

ZERO R0H ; [CPU_FPU] |8|
MOVW DP,#_teststruct ; [CPU_ARAU]
MOV32 @_teststruct,R0H ; [CPU_FPU] |8|
MOV32 @_teststruct+2,R0H ; [CPU_FPU] |9|
MOV32 @_teststruct+4,R0H ; [CPU_FPU] |10|
MOV32 @_teststruct+6,R0H ; [CPU_FPU] |11|
MOVB AL,#0 ; [CPU_ALU] |13|
LRETR ; [CPU_ALU]

With optimization at -o1 or above the assembly is

ZERO R3H ; [CPU_FPU] |8|
MOVW DP,#_teststruct ; [CPU_ARAU]
ZERO R2H ; [CPU_FPU] |9|
ZERO R1H ; [CPU_FPU] |10|
ZERO R0H ; [CPU_FPU] |11|
MOV32 @_teststruct,R3H ; [CPU_FPU] |8|
MOVB AL,#0 ; [CPU_ALU] |13|
MOV32 @_teststruct+2,R2H ; [CPU_FPU] |9|
MOV32 @_teststruct+4,R1H ; [CPU_FPU] |10|
MOV32 @_teststruct+6,R0H ; [CPU_FPU] |11|
LRETR ; [CPU_ALU]

In the unoptimized output zero is assigned to a single register than that is used for all variables. In the optimized version zero is first assigned to four registers then each register is uniquely assigned to a variable. This slows things down a lot when you want to zero a bunch of variables in a row in your time critical interrupt!

Any explanation or help in writing the C to get a better compiler output?

  • I should also add I have tried compiler v18.12.7.LTS and v20.8.0.STS

  • A second example of another effect. Repeatedly adding two float32 variables that exist on different data pages e.g.

       teststruct.pageA_a += teststruct.pageB_a;
       teststruct.pageA_b += teststruct.pageB_b;
       teststruct.pageA_c += teststruct.pageB_c;
       teststruct.pageA_d += teststruct.pageB_d;
       teststruct.pageA_e += teststruct.pageB_e;
       teststruct.pageA_f += teststruct.pageB_f;
       teststruct.pageA_g += teststruct.pageB_g;
       teststruct.pageA_h += teststruct.pageB_h;
       teststruct.pageA_i += teststruct.pageB_i;
       teststruct.pageA_j += teststruct.pageB_j;
       teststruct.pageA_k += teststruct.pageB_k;

    Optimization -o0 or less this takes 7 instructions per line

    MOVW DP,#_teststruct ; [CPU_ARAU]
    MOV32 R0H,@_teststruct ; [CPU_FPU] |44|
    MOVW DP,#_teststruct+598 ; [CPU_ARAU]
    MOV32 R1H,@_teststruct+598 ; [CPU_FPU] |44|
    ADDF32 R0H,R0H,R1H ; [CPU_FPU] |44|
    MOVW DP,#_teststruct ; [CPU_ARAU]
    MOV32 @_teststruct,R0H ; [CPU_FPU] |44|

    With optimization -o2 this takes 12 instructions per line, in three separate parts as it starts to use the stack for some reason.

    MOV32 R0H,@_teststruct+2 ; [CPU_FPU] |45|
    MOVW DP,#_teststruct+600 ; [CPU_ARAU]
    MOV32 *-SP[6],R0H ; [CPU_FPU] |45|
    MOV32 R0H,@_teststruct+600 ; [CPU_FPU] |45|
    MOVW DP,#_teststruct+4 ; [CPU_ARAU]
    MOV32 *-SP[4],R0H ; [CPU_FPU] |45|

    ADDF32 R2H,R2H,R0H ; [CPU_FPU] |45|
    MOVW DP,#_teststruct ; [CPU_ARAU]
    MOV32 *-SP[4],R2H ; [CPU_FPU] |45|
    MOV32 R2H,*-SP[8] ; [CPU_FPU] |45|

    MOV32 @_teststruct+2,R0H ; [CPU_FPU] |45|
    MOV32 R0H,*-SP[6] ; [CPU_FPU] |45|

  • Thank you for notifying us of this problem, and submitting a test case.  I can reproduce the same result.  I filed the entry EXT_EP-10264 to have this investigated.  You are welcome to use that link to follow it.

    Thanks and regards,

    -George