This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

cl55 generates useless moves for stack based variables

Hi all,

Compiling the following example (with cl55 -O[123]):

extern long f(long);
long t(long i)
{
f(i);
return i;
}

will generate the following assembler:

AADD #-3, SP
MOV AC0, dbl(*SP(#0)) ; |3|
MOV dbl(*SP(#0)), AC0 ; |3|    ///Why ?
CALL #_f ; |4| 
MOV dbl(*SP(#0)), AC0 ; |4|  
AADD #3, SP
RET

Which has a useless statement in moving the stack back to AC0. It just moved it from there. How can AC0 be different from the stack? Or am I missing something?

I noticed the compiler does this for a lot of stack based variables, this example shows it for pointers as well.

Compile using  cl55 -O3 -k -ml -v5502 main.c (-ml is vital here, because otherwise pointer size = 16 bit and very different code is generated)

extern int* f0();
extern void f1(int*);
void t(void)
{
int* i0 = f0();
int* i1 = f0();
int* i2 = f0();
  int* i3 = f0(); //First pointer that is saved on stack instead of register

i0=i0+10; f1(i0+10);
i1=i1+10; f1(i1+10);
i2=i2+10; f1(i2+10);
i3=i3+10; f1(i3+10);
}

the assembler:

MOV dbl(*SP(#0)), XAR3

MOV XAR3, dbl(*SP(#0))
|| AADD #10, AR3 ; |13|

 MOV dbl(*SP(#0)), XAR0 //  why is this? We have it already in XAR3.
AADD #10, AR0 ; |13|

CALL #_f1 ; |13|

Why is it first moving from stack to XAR3, then back to stack and then rereading it from stack to XAR0? It still is there in XAR3!

The basic problem seems that for variables that are on stack it moves to a register, do the requested operation, and stores it again to stack. So far so good

But if the same variable is now needed for a new operation it will again read it from stack, while it is already in a register.

 

  • This codes gives another clear example of the useless code generation.

    It get worse even for the multiply it is 1 byte extra (compared to a MOV AC0, AC1) and N cycles extra (assuming stack access takes more cycles then register moves and memory read is not cached)

    extern long f(long);
    long m(long i)
    {
    i = f(0);
    i += 1;
    f(i);
    i &= ~1;
    f(i);
    i |= 1;
    f(i);
    i ^= 2;
    f(i);
    i *= 5;
    return i;
    }

  • Thank you for submitting a test case.  I can reproduce what you see.  I filed SDSCM00042522 in the SDOWP system to have this addressed.  Feel free to track it with the SDOWP link in my sig below.

    Thanks and regards,

    -George

  • You are correct, this is a missed opportunity.  The compiler does a much better job optimizing 16-bit values, such as small data model pointers.  The issue here is simply that in the C5500 compiler, more attention was paid to optimizing 16-bit values than to 32-bit values.  SDSCM00042522 will be considered a performance enhancement request.  I cannot estimate when it might be implemented.

  • Archaeologist said:

    You are correct, this is a missed opportunity.  The compiler does a much better job optimizing 16-bit values, such as small data model pointers.  The issue here is simply that in the C5500 compiler, more attention was paid to optimizing 16-bit values than to 32-bit values.  SDSCM00042522 will be considered a performance enhancement request.  I cannot estimate when it might be implemented.

    Yes 16 bit operation are most effective, mostly because the instruction set even allows operation on 16 bit memory directly so there is no need for a register.

    Note that this missed optimization also influence pointers in large or huge model. Basically anything that does not fit in a 16 bit is highly likely to be influenced by this 'bug'. For 32 bit variables it's worst because they always live on stack (no AC is saved), pointers might be in registers, so you will only notices this 'bug' when having 4 or more pointer on stack.

    To me (not having any insight in the cl55 internal workings, I can easily be wrong) it does not seem to be a very difficult optimization. Even the "optimizing" assembler could do so (it would be the wrong place IMHO)