Hello all:
With optimization on, the C compiler can decide to allocate a register to what was a variable; after it does that, though, if that variable had been being used as an index on an indexed store/load, it seems like there's an unnecessary register copy that doesn't go away, because when the variable was 'real', it needed to be copied to a register for the indexed load/store.
This occurs with all levels of optimization.
Practically, what this leads to is that code like this:
uint8_t buf[16];
struct my_pointers {
uint8_t rd_ptr;
uint8_t wr_ptr;
} pointers;
struct my_pointers *p = &pointers;
int main(void) {
uint8_t tmp;
tmp = p->wr_ptr;
buf[tmp++] = P1IN;
tmp = tmp % 16;
p->wr_ptr = tmp;
}
is less efficient than this:
uint8_t buf[16];
struct my_pointers {
uint8_t rd_ptr;
uint8_t wr_ptr;
} pointers;
struct my_pointers *p = &pointers;
uint8_t *const rd_ptr = &(p->rd_ptr);
uint8_t *const wr_ptr = &(p->wr_ptr);
int main(void) {
uint8_t tmp;
tmp = *wr_ptr;
buf[tmp++] = P1IN;
tmp = tmp % 16;
*wr_ptr = tmp;
}
(The 'tmp' there doesn't do anything, removing it in this sample code and just working with p->wr_ptr/*wr_ptr doesn't change anything).
In the first case the first read of p->wr_ptr to a temporary variable is optimized to a register, but then the indexed store afterwards still includes a register copy before it:
MOV.B &pointers+1,r15 ; [] |23|
MOV.B r15,r14 ; [] |23|
MOV.B &PAIN_L+0,buf+0(r14) ; [] |23|
whereas in the second case, there is no read of p->wr_ptr needed, as now it's just a pointer. So it generates:
MOV.B &pointers+1,r15 ; [] |37|
MOV.B &PAIN_L+0,buf+0(r15) ; [] |37|
In the assembly file, you can see that in the optimized case, there's been an intermediary variable optimized away - it says "r15 assigned to $O$C1".
Obviously with lots of structure pointers like this, this cost can get pretty high if there are lots of indexed loads/stores.
I'm attaching a "test case" - which includes a 'bad style' indexed store and a good style' indexed load (and the reverse, commented out) so you can see.8877.main.cpp