This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

MSP430FR2422: MSP430-GCC 8.3.1.0 produces poor performance code in interrupt handler

Part Number: MSP430FR2422

I have an I2C receive handler in a timing sensitive environment.

The vast majority of my I2C commands are only 2 bytes, so I have optimized performance for this case.

My interrupt handler reads from UCB0IV and dispatches to my RX handler using the typical IV offset trick.

The branch instruction reads the actual function address from RAM and dispatches the RX task to the target function.

In this example, my target functions are i2c_rx_0, i2c_rx_1 and i2c_rx_data.

i2c_rx_0 handles the first byte

i2c_rx_1 handles the second

i2c_rx_data handles the rest with bounds checking, etc but it is slow

I am finding that the code generated by GCC is subpar.

The flags used are 

CFLAGS = -mmcu=$(MCU) -std=gnu11 -g -Os \
-Wall -pedantic -Wunused \
-ffunction-sections -fdata-sections -minrt -fomit-frame-pointer -fwrapv -MMD \
$(INCLUDES) $(USER_DEFINES)

I am expecting i2c_rx_0 and i2c_rx_1 to be only 21 clock cycles but I am seeing 29 clock cycles.

In what I post following, I should the culprit line that inflates the cycle count commented in i2c_rx_0 but left in i2c_rx_1

__attribute__ ((interrupt(EUSCI_B0_VECTOR)))
static void  USCIB0_ISR (void){
                __asm__ (
                                "add &0x056e,r0 \n"
                                /* SKIP This */
                                "reti \n"
                                /* ALIFG */
                                "reti \n"
                                /* NACKIFG */
                                "reti \n"

                                ....

                                /* RXIFG0 */
                                "jmp I2C_RX0\n"

                                ...

                                "I2C_RX0: br &APP_IRQ_I2C_RX0\n"
                                ....
        }


__attribute((interrupt))
void i2c_rx_0()
{
        /* Store buffer in 0th position,
         * setup next interrupt call for 1st position. */
        i2c_cmd_buffer[0] = UCB0RXBUF_L;
        APP_IRQ_I2C_RX0 = i2c_rx_1;
        //i2c_data_ptr = i2c_cmd_buffer + 1;
}




__attribute((interrupt))
void i2c_rx_1()
{
        /* Store buffer in 1st position,
         * setup next interrupt call for general data. */
        i2c_cmd_buffer[1] = UCB0RXBUF_L;
        APP_IRQ_I2C_RX0 = i2c_rx_data;
        i2c_data_ptr = i2c_cmd_buffer + 2;
}


Compiler output for i2_rx_0
0xf8d8: 0x42d2 mov.b &0x054c, &0x27b0                   6
0xf8da: 0x054c
0xf8dc: 0x27b0
0xf8de: 0x40b2 mov.w #0xf8e6, &0x27de                   5
0xf8e0: 0xf8e6
0xf8e2: 0x27de
0xf8e4: 0x1300 reti                                     5


Compiler output for i2c_rx_1
0xf8e6: 0x150c pushm.a #1, r12                          3
0xf8e8: 0x403c mov.w #0x27b0, r12                       2
0xf8ea: 0x27b0
0xf8ec: 0x42dc mov.b &0x054c, 0x1(r12)                  6
0xf8ee: 0x054c
0xf8f0: 0x0001
0xf8f2: 0x40b2 mov.w #0xf902, &0x27de                   5
0xf8f4: 0xf902
0xf8f6: 0x27de
0xf8f8: 0x532c add.w #2, r12                            1
0xf8fa: 0x4c82 mov.w r12, &0x27c0                       4
0xf8fc: 0x27c0
0xf8fe: 0x170c popm.a #1, r12                           3
0xf900: 0x1300 reti                                     5

  • Hi Dave, compare with machine code of i2_rx_0 and i2c_rx_1, more machine code is needed for interrupt stack handling. Do you think it is normal?

  • No I do not. To get the output I want, I have to use inline assembler

    __attribute((interrupt))
    void i2c_rx_0()
    {
            i2c_cmd_buffer[0] = UCB0RXBUF_L;
            APP_IRQ_I2C_RX0 = i2c_rx_1;
            __asm__ (
                    "mov    #0x27b1,        &0x27c0\n"
            );
    }
    
    0xf8d8: 0x42d2 mov.b &0x054c, &0x27b0                   6
    0xf8da: 0x054c
    0xf8dc: 0x27b0
    0xf8de: 0x40b2 mov.w #0xf8ec, &0x27de                   5
    0xf8e0: 0xf8ec
    0xf8e2: 0x27de
    0xf8e4: 0x40b2 mov.w #0x27b1, &0x27c0                   5
    0xf8e6: 0x27b1
    0xf8e8: 0x27c0
    0xf8ea: 0x1300 reti                                     5
    

  • The compiler makes no claims about generating perfect code, or the best possible. it will generally be pretty good. As for the inline assembly, you should let the compiler help you out:

    asm("mov #i2c_cmd_buffer+2,&i2c_data_ptr");

    You should probably also include the extra stuff to tell the compiler about side effects just to be safe. Although with such a short and simple function it probably doesn't matter too much.

  • Confirmed with a small example such as below:

    char arr[2];
    char a;
    char *ptr;
    
    void
    foo (void)
    {
      arr[1] = a;
      ptr = arr + 2;
    }

    which produces the following assembly code:

    foo:
            MOV.W   #arr, R12
            MOV.B   &a, 1(R12)
            ADD.W   #2, R12
            MOV.W   R12, &ptr
            RET

    The problem is that as soon as GCC sees the two uses of "arr", it wants to store it in a register to try reduce the cost of the subsequent instructions using "arr". But it hasn't taken into account the fact that  "ptr = arr + 2;" requires two instructions if "arr" is in a register, which negates any possible cost improvement from storing arr in a register to begin with.

    I'm working on improving the GCC cost model for MSP430 instructions in the next release, I will keep this as a test case and try and fix it along with that.

**Attention** This is a public forum