MSP430FR2422: MSP430-GCC 8.3.1.0 produces poor performance code in interrupt handler

Dave Bender

Intellectual 620 points

Part Number: MSP430FR2422

I have an I2C receive handler in a timing sensitive environment.

The vast majority of my I2C commands are only 2 bytes, so I have optimized performance for this case.

My interrupt handler reads from UCB0IV and dispatches to my RX handler using the typical IV offset trick.

The branch instruction reads the actual function address from RAM and dispatches the RX task to the target function.

In this example, my target functions are i2c_rx_0, i2c_rx_1 and i2c_rx_data.

i2c_rx_0 handles the first byte

i2c_rx_1 handles the second

i2c_rx_data handles the rest with bounds checking, etc but it is slow

I am finding that the code generated by GCC is subpar.

The flags used are

CFLAGS = -mmcu=$(MCU) -std=gnu11 -g -Os \
-Wall -pedantic -Wunused \
-ffunction-sections -fdata-sections -minrt -fomit-frame-pointer -fwrapv -MMD \
$(INCLUDES) $(USER_DEFINES)

I am expecting i2c_rx_0 and i2c_rx_1 to be only 21 clock cycles but I am seeing 29 clock cycles.

In what I post following, I should the culprit line that inflates the cycle count commented in i2c_rx_0 but left in i2c_rx_1

__attribute__ ((interrupt(EUSCI_B0_VECTOR)))
static void  USCIB0_ISR (void){
                __asm__ (
                                "add &0x056e,r0 \n"
                                /* SKIP This */
                                "reti \n"
                                /* ALIFG */
                                "reti \n"
                                /* NACKIFG */
                                "reti \n"

                                ....

                                /* RXIFG0 */
                                "jmp I2C_RX0\n"

                                ...

                                "I2C_RX0: br &APP_IRQ_I2C_RX0\n"
                                ....
        }


__attribute((interrupt))
void i2c_rx_0()
{
        /* Store buffer in 0th position,
         * setup next interrupt call for 1st position. */
        i2c_cmd_buffer[0] = UCB0RXBUF_L;
        APP_IRQ_I2C_RX0 = i2c_rx_1;
        //i2c_data_ptr = i2c_cmd_buffer + 1;
}




__attribute((interrupt))
void i2c_rx_1()
{
        /* Store buffer in 1st position,
         * setup next interrupt call for general data. */
        i2c_cmd_buffer[1] = UCB0RXBUF_L;
        APP_IRQ_I2C_RX0 = i2c_rx_data;
        i2c_data_ptr = i2c_cmd_buffer + 2;
}


Compiler output for i2_rx_0
0xf8d8: 0x42d2 mov.b &0x054c, &0x27b0                   6
0xf8da: 0x054c
0xf8dc: 0x27b0
0xf8de: 0x40b2 mov.w #0xf8e6, &0x27de                   5
0xf8e0: 0xf8e6
0xf8e2: 0x27de
0xf8e4: 0x1300 reti                                     5


Compiler output for i2c_rx_1
0xf8e6: 0x150c pushm.a #1, r12                          3
0xf8e8: 0x403c mov.w #0x27b0, r12                       2
0xf8ea: 0x27b0
0xf8ec: 0x42dc mov.b &0x054c, 0x1(r12)                  6
0xf8ee: 0x054c
0xf8f0: 0x0001
0xf8f2: 0x40b2 mov.w #0xf902, &0x27de                   5
0xf8f4: 0xf902
0xf8f6: 0x27de
0xf8f8: 0x532c add.w #2, r12                            1
0xf8fa: 0x4c82 mov.w r12, &0x27c0                       4
0xf8fc: 0x27c0
0xf8fe: 0x170c popm.a #1, r12                           3
0xf900: 0x1300 reti                                     5

over 5 years ago

0 Xiaodong LI over 5 years ago

TI__Mastermind 22935 points

Hi Dave, compare with machine code of i2_rx_0 and i2c_rx_1, more machine code is needed for interrupt stack handling. Do you think it is normal?

0 Dave Bender over 5 years ago in reply to Xiaodong LI

Intellectual 620 points

No I do not. To get the output I want, I have to use inline assembler

__attribute((interrupt))
void i2c_rx_0()
{
        i2c_cmd_buffer[0] = UCB0RXBUF_L;
        APP_IRQ_I2C_RX0 = i2c_rx_1;
        __asm__ (
                "mov    #0x27b1,        &0x27c0\n"
        );
}

0xf8d8: 0x42d2 mov.b &0x054c, &0x27b0                   6
0xf8da: 0x054c
0xf8dc: 0x27b0
0xf8de: 0x40b2 mov.w #0xf8ec, &0x27de                   5
0xf8e0: 0xf8ec
0xf8e2: 0x27de
0xf8e4: 0x40b2 mov.w #0x27b1, &0x27c0                   5
0xf8e6: 0x27b1
0xf8e8: 0x27c0
0xf8ea: 0x1300 reti                                     5

0 David Schultz over 5 years ago in reply to Dave Bender

Guru 23005 points

The compiler makes no claims about generating perfect code, or the best possible. it will generally be pretty good. As for the inline assembly, you should let the compiler help you out:

asm("mov #i2c_cmd_buffer+2,&i2c_data_ptr");

You should probably also include the extra stuff to tell the compiler about side effects just to be safe. Although with such a short and simple function it probably doesn't matter too much.

0 Jozef_Lawrynowicz over 5 years ago

Expert 2410 points

Confirmed with a small example such as below:

char arr[2];
char a;
char *ptr;

void
foo (void)
{
  arr[1] = a;
  ptr = arr + 2;
}

which produces the following assembly code:

foo:
        MOV.W   #arr, R12
        MOV.B   &a, 1(R12)
        ADD.W   #2, R12
        MOV.W   R12, &ptr
        RET

The problem is that as soon as GCC sees the two uses of "arr", it wants to store it in a register to try reduce the cost of the subsequent instructions using "arr". But it hasn't taken into account the fact that "ptr = arr + 2;" requires two instructions if "arr" is in a register, which negates any possible cost improvement from storing arr in a register to begin with.

I'm working on improving the GCC cost model for MSP430 instructions in the next release, I will keep this as a test case and try and fix it along with that.

**Attention** This is a public forum

MSP low-power microcontrollers

MSP low-power microcontroller forum

MSP430FR2422: MSP430-GCC 8.3.1.0 produces poor performance code in interrupt handler