This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Compiler/MSP430-GCC-OPENSOURCE: Encountered interrupt handlers with excessive register use

Part Number: MSP430-GCC-OPENSOURCE

Tool/software: TI C/C++ Compiler

I have an interrupt routine that compares 2 function pointers and seeing unnecessary push/pop operations.

First example, the first pointer, RX_RECEIVE_IRQ, changes as a receive operation progresses.

The value u.rx.last_rx_cb is a function pointer of the same type that records the last observed interrupt handler.

If the interrupt handler differs from what was observed previously, then comms are active.

As can been seen, although u.rx.last_rx_cb is at a known compile time address, GCC copies the u pointer into a register and dereferences with an offset.

Second and more scary, RX_RECEIVE_IRQ is copied into a register despite being tagged volatile!

typedef void (*irq_cb)();

volatile irq_cb RX_RECEIVE_IRQ;

union {
	struct {
               int16_t data0;
int16_t data1; irq_cb last_rx_cb; } rx; } u; __attribute((interrupt)) void timeout(){ /* If the rx timeout has changed since last * time, then no timeout */ if(RX_RECEIVE_IRQ != u.rx.last_rx_cb){ u.rx.last_rx_cb = RX_RECEIVE_IRQ; return; } u.rx.last_rx_cb = RX_RECEIVE_IRQ; ... } 0000cc3e <rx_timeout>: cc3e: 0d 12 push r13 ; cc40: 0c 12 push r12 ; cc42: 1d 42 18 02 mov &0x0218,r13 ;0x0218 cc46: 3c 40 52 03 mov #850, r12 ;#0x0352 cc4a: 82 9d 56 03 cmp r13, &0x0356 ; cc4e: 06 20 jnz $+14 ;abs 0xcc5c cc50: 9c 42 18 02 mov &0x0218,4(r12) ;0x0218 cc54: 04 00 cc56: 3c 41 pop r12 ; cc58: 3d 41 pop r13 ; cc5a: 00 13 reti

The second code example is modified to move last_rx_cb out of the union/struct combo and into its own variable.

In this case, last_rx_cb is not copied into a register.

However, the former example should work without the extra push operation.

__attribute((interrupt))
void timeout(){
	/* If the rx timeout has changed since last
	 * time, then no timeout */
	if(RX_RECEIVE_IRQ != last_rx_cb){
		last_rx_cb = RX_RECEIVE_IRQ;
		return;
	}

	last_rx_cb = RX_RECEIVE_IRQ;
...
}


0000cc3e <rx_timeout>:
    cc3e:	0c 12       	push	r12		;
    cc40:	1c 42 18 02 	mov	&0x0218,r12	;0x0218
    cc44:	1c 92 9e 02 	cmp	&0x029e,r12	;0x029e
    cc48:	05 20       	jnz	$+12     	;abs 0xcc54
    cc4a:	92 42 18 02 	mov	&0x0218,&0x029e	;0x0218
    cc4e:	9e 02 
    cc50:	3c 41       	pop	r12		;
    cc52:	00 13       	reti	

  • I can't replicate that generated code. Perhaps because of  missing code or I am using a different version of the compiler and options. The target makes a difference too.

    The use of an offset when accessing last_rx_cb tells me that there is more to that union/struct than you are showing.

  • I thought I had put in the version number, sorry

    MSP430-GCC 8.3.0.16 64 bit, from the TI website

    CFLAGS = -mmcu=$(MCU) -std=gnu11 -g -Os \
    -Wall -Wunused -ffunction-sections -fdata-sections -minrt -fomit-frame-pointer -fwrapv -MMD \
    $(INCLUDES) $(USER_DEFINES)

    If you don't mind, what compiler/settings did you use?

    And what was your compiler output?

    I will try to put together another example.

  • I don't try to keep up with the latest version so I am behind a bit. gcc reports: "msp430-elf-gcc (Mitto Systems Limited - msp430-gcc 7.3.2.154) 7.3.2"

    I definitely didn't use that set of options. I did try both -Os and -O2 and a CPU and CPUX target. (F149 and fr5969)

    __attribute((interrupt))
    void timeout(){
        4402:	0c 12       	push	r12		;
    
    00004404 <.LCFI0>:
    	/* If the rx timeout has changed since last
    	 * time, then no timeout */
    	if(RX_RECEIVE_IRQ != u.rx.last_rx_cb){
        4404:	1c 42 04 1c 	mov	&0x1c04,r12	;0x1c04
        4408:	82 9c 02 1c 	cmp	r12,	&0x1c02	;
        440c:	05 24       	jz	$+12     	;abs 0x4418
    
    0000440e <.L4>:
    		u.rx.last_rx_cb = RX_RECEIVE_IRQ;
    		return;
    	}
    
    	timer = 1;
    	u.rx.last_rx_cb = RX_RECEIVE_IRQ;
        440e:	92 42 04 1c 	mov	&0x1c04,&0x1c02	;0x1c04
        4412:	02 1c 
    
    00004414 <.Loc.27.1>:
    
    }
        4414:	3c 41       	pop	r12		;
        4416:	00 13       	reti

    The only difference I noticed with the CPUX target was the use of pushm/popm if there were multiple registers to move.

  • David,

      I was able to replicate your result when last_rx_cb was the only member in the union.

      I have edited the above code to include 2 16 bit integer fields that I had omitted accidently.

      Please try adding those, thank you

    -Dave

  • OK, I am now seeing the same thing.

    I see no problem with copying the value into a register just before the comparison. There would be a problem if it used that value when it updated last_rx_cb later. It doesn't.

    Your funky union (only one member?) is likely the source of the trouble. If you absolutely must have faster code, then do it in assembly. I doubt if the code generator will improve. I am still waiting for the bug where an ISR pushes ALL registers if it calls another function to be fixed. Even the ones it doesn't use and the called function is required to preserve.

  • Dave,

      That's not the entirety of the code, only enough to replicate the issue.

      The other part of the union is the transmit portion, which I have omitted.

      While the generated assembler is correct is not efficient and I have a very time demanding FSK decoding process.

      "Traditional" advice is to let the compiler do all the optimizing.

      The compiler should NOT be emitting offset calculations when the end result is already known.

      I would call this a bug when the expected behaviour is not questionable.

      I believe the bug to which you are referring, MSPGCC-39, should be resolved in 8_3_0_0

  • Ok, first off let me say that I agree with you about the union offset issue. This should be optimized into a single access. The problem comes from the fact that the address of "u" is a link time constant, and MSP430-GCC does not currently have a way to emit a single instruction which corresponds to "&u+4".

    The assembly code emitted by GCC is:

    MOV.W #u, R12                  
    MOV.W &RX_RECEIVE_IRQ, 4(R12)  

    The TI compiler emits

    MOV.W    &RX_RECEIVE_IRQ+0,&u+4

    So we will look to extend GCC so it will handle these offsets of constant addresses. I'll post again in this thread once I've filed a ticket to track it and made it public.

    Now onto the volatile issue, I'm just going to refer to your second example without the union.

    I get different generated assembly code to you. Mine doesn't have the comparison (I guess it is implied there is extra code before the end of the function):

    void timeout2(){                                                  
        if(RX_RECEIVE_IRQ != last_rx_cb){            
            last_rx_cb = RX_RECEIVE_IRQ;  
    return; } last_rx_cb = RX_RECEIVE_IRQ; }
    msp430-elf-gcc -mmcu=msp430f149 -std=gnu11 -Os -Wall -Wunused -ffunction-sections -fdata-sections -minrt -fomit-frame-pointer -fwrapv -I/home/jozef/msp430/msp430-gcc-support-files/include -L/home/jozef/msp430/msp430-gcc-support-files/include  tester.c -S
    timeout2:
            ; start of prologue
            PUSH    R12
            ; end of prologue
            MOV.W   &RX_RECEIVE_IRQ, R12
            MOV.W   &RX_RECEIVE_IRQ, &last_rx_cb
            ; start of epilogue
            POP     R12
            RETI

    I don't see any problem with the generated code w.r.t to volatile accesses. We know that no matter what the result of "RX_RECEIVE_IRQ != last_rx_cb", "last_rx_cb = RX_RECEIVE_IRQ; " will be executed then the function will exit. So the first read of the volatile variable must be preserved, and then we must also of course preserve the second read.

    So GCC can see that the result of the if() comparison is inconsequential for the execution of your function, so it optimizes it out. I suspect that the reason R12 is still considered used and we have the unnecessary PUSH and POP is because GCC won't remove the volatile access "MOV.W   &RX_RECEIVE_IRQ, R12" that is left over from the comparison. 

    Note that for example if "return;" statement is removed, the CMP instruction is put back in. That is because now we could have either 1 or 2 volatile reads, and both must be preserved.

    timeout2:
            ; start of prologue
            PUSH    R12
            ; end of prologue
            MOV.W   &RX_RECEIVE_IRQ, R12
            CMP.W   &last_rx_cb, R12 { JEQ  .L2
            MOV.W   &RX_RECEIVE_IRQ, R12
    .L2:
            MOV.W   &RX_RECEIVE_IRQ, &last_rx_cb
            ; start of epilogue
            POP     R12
            RETI
    

    I think your initial question was about the unnecessary use of R12 in the above code? This could be a missed optimization caused by GCC having to to treat RX_RECEIVE_IRQ as volatile. If it's not volatile, it is more efficient to load into a register then perform the rest of the operations on it. But since we always have to read it from memory, the "CMP" should be directly between "RX_RECEIVE_IRQ" and "last_rx_cb". I'll look into this as well.

    Final note, as David mentioned, it is fine for "last_rx_cb" to be loaded into a register immediately before the comparison. We have assurances within GCC that no instruction will be inserted between the MOV and CMP instructions that could affect the volatile variable.

  • David Schultz36 said:
    I doubt if the code generator will improve. I am still waiting for the bug where an ISR pushes ALL registers if it calls another function to be fixed. Even the ones it doesn't use and the called function is required to preserve.

    This is fixed in the recently released 8.3.0.16 http://software-dl.ti.com/msp430/msp430_public_sw/mcu/msp430/MSPGCC/latest/index_FDS.html

    There were code generation improvements in 8.2.0.52, particularly in the large memory model, code size was greatly reduced.

    In general we focused on code size improvements in the libraries for the initial 8.2.0.52 release, since there were a lot of issues there and the code size impact of fixing those is larger than individual tweaks to code gen.

    Also bear in mind there is a period of a couple of months (in the best case) between reporting an issue and a release containing a fix for that issue being available.

    We have further code generation improvements in the next release which will be out in a couple of months, and will continue to improve it after that. Threads like this are very helpful for prioritizing individual cases that users are spotting.

  • Josef, as always thank you for the detailed answer.

    The CMP instruction referencing the register rather than the memory address is what was most scary to me.

    Since these issues have your attention I'll consider this resolved.

    I would explore the use of the TI compiler, but the lack the "labels as values" feature is a no go for me.

**Attention** This is a public forum