This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hi
I'm getting strange "optimized" compiler output. I've made a minimal test case below to demonstrate (being a struct doesn't matter):
struct
{
double a, b, c, d;
} teststruct;
int main(void)
{
teststruct.a = 0;
teststruct.b = 0;
teststruct.c = 0;
teststruct.d = 0;
return 0;
}
With optimization disable or -o0 the assembly is
ZERO R0H ; [CPU_FPU] |8|
MOVW DP,#_teststruct ; [CPU_ARAU]
MOV32 @_teststruct,R0H ; [CPU_FPU] |8|
MOV32 @_teststruct+2,R0H ; [CPU_FPU] |9|
MOV32 @_teststruct+4,R0H ; [CPU_FPU] |10|
MOV32 @_teststruct+6,R0H ; [CPU_FPU] |11|
MOVB AL,#0 ; [CPU_ALU] |13|
LRETR ; [CPU_ALU]
With optimization at -o1 or above the assembly is
ZERO R3H ; [CPU_FPU] |8|
MOVW DP,#_teststruct ; [CPU_ARAU]
ZERO R2H ; [CPU_FPU] |9|
ZERO R1H ; [CPU_FPU] |10|
ZERO R0H ; [CPU_FPU] |11|
MOV32 @_teststruct,R3H ; [CPU_FPU] |8|
MOVB AL,#0 ; [CPU_ALU] |13|
MOV32 @_teststruct+2,R2H ; [CPU_FPU] |9|
MOV32 @_teststruct+4,R1H ; [CPU_FPU] |10|
MOV32 @_teststruct+6,R0H ; [CPU_FPU] |11|
LRETR ; [CPU_ALU]
In the unoptimized output zero is assigned to a single register than that is used for all variables. In the optimized version zero is first assigned to four registers then each register is uniquely assigned to a variable. This slows things down a lot when you want to zero a bunch of variables in a row in your time critical interrupt!
Any explanation or help in writing the C to get a better compiler output?
A second example of another effect. Repeatedly adding two float32 variables that exist on different data pages e.g.
teststruct.pageA_a += teststruct.pageB_a;
teststruct.pageA_b += teststruct.pageB_b;
teststruct.pageA_c += teststruct.pageB_c;
teststruct.pageA_d += teststruct.pageB_d;
teststruct.pageA_e += teststruct.pageB_e;
teststruct.pageA_f += teststruct.pageB_f;
teststruct.pageA_g += teststruct.pageB_g;
teststruct.pageA_h += teststruct.pageB_h;
teststruct.pageA_i += teststruct.pageB_i;
teststruct.pageA_j += teststruct.pageB_j;
teststruct.pageA_k += teststruct.pageB_k;
Optimization -o0 or less this takes 7 instructions per line
MOVW DP,#_teststruct ; [CPU_ARAU]
MOV32 R0H,@_teststruct ; [CPU_FPU] |44|
MOVW DP,#_teststruct+598 ; [CPU_ARAU]
MOV32 R1H,@_teststruct+598 ; [CPU_FPU] |44|
ADDF32 R0H,R0H,R1H ; [CPU_FPU] |44|
MOVW DP,#_teststruct ; [CPU_ARAU]
MOV32 @_teststruct,R0H ; [CPU_FPU] |44|
With optimization -o2 this takes 12 instructions per line, in three separate parts as it starts to use the stack for some reason.
MOV32 R0H,@_teststruct+2 ; [CPU_FPU] |45|
MOVW DP,#_teststruct+600 ; [CPU_ARAU]
MOV32 *-SP[6],R0H ; [CPU_FPU] |45|
MOV32 R0H,@_teststruct+600 ; [CPU_FPU] |45|
MOVW DP,#_teststruct+4 ; [CPU_ARAU]
MOV32 *-SP[4],R0H ; [CPU_FPU] |45|
ADDF32 R2H,R2H,R0H ; [CPU_FPU] |45|
MOVW DP,#_teststruct ; [CPU_ARAU]
MOV32 *-SP[4],R2H ; [CPU_FPU] |45|
MOV32 R2H,*-SP[8] ; [CPU_FPU] |45|
MOV32 @_teststruct+2,R0H ; [CPU_FPU] |45|
MOV32 R0H,*-SP[6] ; [CPU_FPU] |45|
Thank you for notifying us of this problem, and submitting a test case. I can reproduce the same result. I filed the entry EXT_EP-10264 to have this investigated. You are welcome to use that link to follow it.
Thanks and regards,
-George