MSPM0G3507: "Hard Fault" when running images built with the GNU ARM toolchain

Part Number: MSPM0G3507

Tool/software:

Hi TI Experts!

I'm seeing weird issues when running images built with the GNU ARM toolchain on the MSPM0G3507. I've determined 2 spots where "hard faults" occur until now but they occur only when:

1. Running "release builds", i.e. compiled with -Os.
2. No debugger is connected.

These conditions make it a little difficult to pinpoint what exactly triggers such a hard fault, but I think I was able to do that for one such occurrence:

In a release build with GNU ARM toolchain arm-gnu-toolchain-14.2.rel1-mingw-w64-i686-arm-none-eabi, the hard fault seems to occur when the routine DL_SYSCTL_configSYSPLL() of TI SDK version 2_03_00_07 writes to SYSCTL->SOCLOCK.SYSPLLPARAM0:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
void DL_SYSCTL_configSYSPLL(DL_SYSCTL_SYSPLLConfig *config)
{
/* PLL configurations are retained in lower reset levels. Set default
* behavior of disabling the PLL to keep a consistent behavior regardless
* of reset level. */
DL_SYSCTL_disableSYSPLL();
/* Check that SYSPLL is disabled before configuration */
while ((DL_SYSCTL_getClockStatus() & (DL_SYSCTL_CLK_STATUS_SYSPLL_OFF)) !=
(DL_SYSCTL_CLK_STATUS_SYSPLL_OFF)) {
;
}
// set SYSPLL reference clock
DL_Common_updateReg(&SYSCTL->SOCLOCK.SYSPLLCFG0,
((uint32_t) config->sysPLLRef), SYSCTL_SYSPLLCFG0_SYSPLLREF_MASK);
// set predivider PDIV (divides reference clock)
DL_Common_updateReg(&SYSCTL->SOCLOCK.SYSPLLCFG1, ((uint32_t) config->pDiv),
SYSCTL_SYSPLLCFG1_PDIV_MASK);
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

To repeat:
The hard fault does not occur when:

1. The code was compiled with -O0. This is also true when I compile only this specific routine with -O0 (using __attribute__((optimize("O0")))).
2. I'm running the code with a debugger connected.

As such, any programming issues (i.e. the argument config holding some faulty data) can be ruled out from my perspective.

The other occurrence seems to appear with GNU ARM toolchain v9.2.1 only and occurs in the FreeRTOS routine in vPortSuppressTicksAndSleep() (this requires #define configUSE_TICKLESS_IDLE 1). I'm not sure exactly where in this routine the hard fault occurs in this case.

What makes this occurrence even weirder is the fact that it goes away when I add an arbitrary interrupt handler to my code instead of letting the "Default_Handler" manage it ... even though this interrupt is neither used nor enabled in the firmware!

Sounds crazy, right? However:

I can reproduce this behavior consistently. Likewise, I can consistently get rid of these issues when I apply any of the "work arounds" mentioned above.

Do you have any explanation for this behavior?

Thanks,
Chris.

  • Hi Chris,

    Look like this is the compiler issues.

    1. The code was compiled with -O0. This is also true when I compile only this specific routine with -O0 (using __attribute__((optimize("O0")))).
    2. I'm running the code with a debugger connected.

    Is this reproduced in the SDK example project?

    I can then report to TI team to see any comments. While, the fixed should be some thing that depends on the ARM toolchain team.

    I haven't install this GNU Compiler. What I sugges is below:

    So, can you add some code alignments inside the this API functions and see whether it still occurs with hardfault. Something like below:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    __NOP();
    // save CPUSS CTL state and disable the cache
    uint32_t ctlTemp = DL_CORE_getInstructionConfig();
    DL_CORE_configInstruction(DL_CORE_PREFETCH_ENABLED, DL_CORE_CACHE_DISABLED,
    DL_CORE_LITERAL_CACHE_ENABLED);
    __NOP();
    // populate SYSPLLPARAM0/1 tuning registers from flash, based on input freq
    SYSCTL->SOCLOCK.SYSPLLPARAM0 =
    *(volatile uint32_t *) ((uint32_t) config->inputFreq);
    SYSCTL->SOCLOCK.SYSPLLPARAM1 =
    *(volatile uint32_t *) ((uint32_t) config->inputFreq + (uint32_t) 0x4);
    __NOP();
    // restore CPUSS CTL state
    CPUSS->CTL = ctlTemp;
    __NOP();
    // set feedback divider QDIV (multiplies to give output frequency)
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Maybe you can use TI ARM CLANG compiler for the development, and I believe it will have no issues:

    https://www.ti.com/tool/ARM-CGT?keyMatch=TI%20CLANG&tisearch=universal_search 

    B.R.

    Sal

  • Hi Sal,

    Is this reproduced in the SDK example project?

    No. I'm seeing this issue in the firmware we have written for our custom HW design. I did go to lengths to confirm what I wrote above already and don't have the time to try to replicate it using one of the SDK examples (which probably wouldn't run on our custom HW w/o considerable changes anyway).

    So, can you add some code alignments inside the this API functions and see whether it still occurs with hardfault.

    I tried that and the issue goes away. However:
    I have listed ways to work around this issue above already. As such, I know how to work around this specific instance or manifestation of this issue.

    The more interesting questions are:

    1. Why does it occur in the first place?
    2. Are there any other places in your SDK or elsewhere (like in the FreeRTOS code mentioned above) where it might appear as well?

    From my perspective, this indicates some issue in the MSPM0* MCU itself!

    What the two manifestations of this issue seem to have in common is that the core clock or something closely related to it changes.

    The first manifestation changes the SYSPLL configuration (which should have no direct and/or immediate impact on the core clock, but still) whereas the second manifestation in FreeRTOS puts the MCU to sleep by executing the assembler instruction wfiAs I said above, I was not (yet) able to pinpoint the specific instruction that triggers the hard fault, but it is somewhere around the execution of the assembler instruction wfi in the FreeRTOS routine vPortSuppressTicksAndSleep().

    The way in which one can work around this issue indicates that a specific sequence of (assembler) operations in combination with altering the core clock (or something closely related to it) triggers such a hard fault.

    This is what concerns me the most. 

    Maybe you can use TI ARM CLANG compiler for the development, and I believe it will have no issues

    It did see the "FreeRTOS issue" (see above) also with the TI ARM CLANG compiler once, but am no longer able to reproduce it. Regardless:
    We are using the GNU ARM toolchain for lots of other products and don't necessarily want to divert from that.

    Also:
    I have analyzed the assembler code that triggers this hard fault and don't see anything wrong with that. I'm attached the file for your reference. It contains:

    1. The values of the lr and pc registers reported in the hard fault.
    2. A snapshot of the C routine DL_SYSCTL_configSYSPLL().
    3. The corresponding assembler code with my personal comments and a somewhat accurate mapping of this code to the C code.

    The hard fault occurs at address 0x00001342.

    DL_SYSCTL_configSYSPLL_gcc_release.txt
    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    lr = 0x0000BEDF
    pc = 0x00001342
    void DL_SYSCTL_configSYSPLL(DL_SYSCTL_SYSPLLConfig *config)
    {
    /* PLL configurations are retained in lower reset levels. Set default
    * behavior of disabling the PLL to keep a consistent behavior regardless
    * of reset level. */
    DL_SYSCTL_disableSYSPLL();
    /* Check that SYSPLL is disabled before configuration */
    while ((DL_SYSCTL_getClockStatus() & (DL_SYSCTL_CLK_STATUS_SYSPLL_OFF)) !=
    (DL_SYSCTL_CLK_STATUS_SYSPLL_OFF)) {
    ;
    }
    // set SYSPLL reference clock
    DL_Common_updateReg(&SYSCTL->SOCLOCK.SYSPLLCFG0,
    ((uint32_t) config->sysPLLRef), SYSCTL_SYSPLLCFG0_SYSPLLREF_MASK);
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
     

    Thanks,
    Chris.

  • Hi Chris,

    As implemented in the SDK, it disable cache and then populate SYSPLLPARAM0/1 tuning registers from flash. So. if the compiler mess it up for the code optimization, then it will result in the issues. [That is why I add a code alignment here for the test.]

    Although it shows the instruction finished, can you take a look at CPUSS->CTL, and see whether it is updated after this assembly code executed.

    B.R.

    Sal

  • Hi Sal,

    So. if the compiler mess it up for the code optimization

    I think it's obvious that the compiler didn't "mess up" anything. The (optimized) assembly code generated by TI ARM CLANG for this sequence of operations isn't all that much different:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    ldr r4, .LCPI0_2 Address of CPUSS->CTL into r4
    ldr r5, [r4]
    movs r6, #7
    ands r6, r5 uint32_t ctlTemp = DL_CORE_getInstructionConfig(); -> r6
    movs r5, #5
    str r5, [r4] DL_CORE_configInstruction(DL_CORE_PREFETCH_ENABLED, DL_CORE_CACHE_DISABLED, DL_CORE_LITERAL_CACHE_ENABLED);
    ldr r5, [r0, #36]
    ldr r7, [r5]
    str r7, [r2, #32] SYSCTL->SOCLOCK.SYSPLLPARAM0 = *(volatile uint32_t *) ((uint32_t) config->inputFreq);
    ldr r5, [r5, #4]
    str r5, [r2, #36] SYSCTL->SOCLOCK.SYSPLLPARAM1 = *(volatile uint32_t *) ((uint32_t) config->inputFreq + (uint32_t) 0x4);
    str r6, [r4] CPUSS->CTL = ctlTemp;
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Anyway:

    can you take a look at CPUSS->CTL, and see whether it is updated after this assembly code executed.

    That's a little hard to do given the difficulties I have debugging this issue, i.e. does not occur when a debugger is connected, and I'm not sure how else I could determine the actual value of this register at such an early stage in the init process. Any suggestions are appreciated.

    However:
    If I understood you correctly, you are saying that disabling the instruction cache might get delayed because it is still enabled at this point in time?
    If so, what exactly would be the consequence that leads to the hard fault eventually?

    Furthermore, it looks to me as if it would be the responsibility of the code (i.e. the TI SDK code) to make sure that something like that is avoided and I'm not sure if the __NOP() instructions suggested above are a reliable way to do that.

    In the FreeRTOS code I pointed out above, I see dsb and isb instructions before and after the CPU is put to sleep, but a similar issue still occurs even though only with GNU ARM toolchain v9.2.1.

    In other words:
    What is the recommended approach proposed by TI?

  • Hi Chris,

    Oh, I forget this will not trigger in the debug mode. this make it hard to obersrve.

    By the way, how do you find below instruction trigger hardfault the without in debug mode?

    "PC reported in hard fault, write r5 to r3+r7   SYSCTL->SOCLOCK.SYSPLLPARAM0 = "

    If I understood you correctly, you are saying that disabling the instruction cache might get delayed because it is still enabled at this point in time?
    If so, what exactly would be the consequence that leads to the hard fault eventually?

    That's my suspection. One information for this is the MCLK frequency, and what it the flash wait states?

    Anyway, let me forward your findings to our tool team to see any suggestion here.

    B.R.

    Sal

  • By the way, how do you find below instruction trigger hardfault the without in debug mode?

    I realized at some point that I can connect a debugger after the hard fault occurred and collect the corresponding exception stack frame. That gave me the lr and pc values. That then allowed for locating the corresponding code using the assembly output of the compiler as well as the disassembly output of the debugger.

    One information for this is the MCLK frequency, and what it the flash wait states?

    At this point in time, the MCU core is still clocked from the internal SYSOSC. Switching to HFCLK and the SYSPLL happens later.

    The flash wait states have been reconfigured to DL_SYSCTL_FLASH_WAIT_STATE_2 already, though, because we are running with MCLK and CPUCLK at 80 MHz, eventually. The complete system init routine follows:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    /*!
    * @brief Initializes low-level MCU parameters and peripherals.
    *
    * This function configures system PLL and derived clocks, the flash wait states and the BOR threshold.
    */
    static void sysctlInit(void)
    {
    // Set the brownout threshold to the highest available level.
    // This is done to avoid that the application starts running too early.
    // TODO: Why does this happen here while DL_SYSCTL_activateBORThreshold() is invoked at the end?
    DL_SYSCTL_setBORThreshold(DL_SYSCTL_BOR_THRESHOLD_LEVEL_3);
    // 2 flash wait states are required when running at MCLK and CPUCLK of 80 MHz.
    // 1 flash wait states could be used up to 48 MHz.
    // 0 flash wait states could be used up to 24 MHz.
    DL_SYSCTL_setFlashWaitState(DL_SYSCTL_FLASH_WAIT_STATE_2);
    // Configure the HFCLK source as HFXT oscillator (i.e. the 16 MHz oscillator on the THB).
    // The second argument specifies a "startup time" in steps of ~64us.
    // The third argument enables a "monitor" that makes sure that the HFXT oscillator is actually up and running.
    // NOTE: This routine would hang in an endless loop with the latter is not true!
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Prior to that, only the following happens:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    void hal_systemInit(void)
    {
    DL_GPIO_reset(GPIOA);
    DL_GPIO_reset(GPIOB);
    DL_UART_Main_reset(UART_1_INST);
    DL_GPIO_enablePower(GPIOA);
    DL_GPIO_enablePower(GPIOB);
    DL_UART_Main_enablePower(UART_1_INST);
    // NOTE & TODO:
    // This is before reconfigurung the clocks. As such, we're presumably running at 32 (instead of 80) MHz here.
    // The #defines POWER_STARTUP_DELAY and VREF_READY_DELAY have been produced by TI's IDE.
    // I'm not sure which CPU clock speed it uses to calculate these numbers.
    // To be clarified and verified.
    delay_cycles(POWER_STARTUP_DELAY);
    sysctlInit();
    [...]
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Anyway, let me forward your findings to our tool team to see any suggestion here.

    Thx!

  • Hi Chris,

    Thanks for the information. Let me check with team.

    B.R

    Sal

  • Hi Chris,

    Would it be possible for you to perform a quick test?

    It would involve disabling the cache on the device at the very start of syscltInit(), poll until the cache has been succesfully disabled. Only then proceed with sysctl configuration. At the very end you can then re-enable the cache.

    Wondering if this could resolve your issue and possibly point fingers at some API's missing the steps mentioned by Sal regarding cache disable or it not being robust enough against optimization.

  • Hi Henry,

    Would it be possible for you to perform a quick test?

    Sure. However:

    poll until the cache has been succesfully disabled.

    How do I do that?

  • Thanks Henry for the comments.

    Hi Chris,

    By the way, please see disclosure of the cache issue in ERRATA: https://www.ti.com/lit/er/slaz742b/slaz742b.pdf 

    I think this is the root cause makes it entry the hardfault.

    B.R.

    Sal

  • @Sal:

    I think this is the root cause makes it entry the hardfault.

    Sounds like it, but the workaround is present in the SDK code. It "just" doesn't work as expected w/ the Arm GNU toolchain.

    @Henry:

    I'm still waiting for feedback wrt "poll until the cache has been successfully disabled.".
    I don't know how I can verify that.

    Regards,
    Chris.

  • Hi Chris,

    I am also waiting for Henry update.

    Maybe you can try below:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    // save CPUSS CTL state and disable the cache
    uint32_t ctlTemp = DL_CORE_getInstructionConfig();
    DL_CORE_configInstruction(DL_CORE_PREFETCH_ENABLED, DL_CORE_CACHE_DISABLED,
    DL_CORE_LITERAL_CACHE_ENABLED);
    while((CPUSS->CTL & CPUSS_CTL_ICACHE_MASK) !=
    DL_CORE_CACHE_DISABLED) {
    ;
    }
    ...
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    I think this should work.

    By the way, can you help forward a minmum exmaple project with ARM GNU compiler which could reproduce your issues? Then I think it will be more efficient to dig more with this case.

    B.R.

    Sal

  • Hi All,

    Apologies for the delayed response from my side.

    When it comes to what I was talking about you can just take what Sal had recommended at the start and throw it at the start of your function.

    Then at function end re-apply the state and then re-enable the CPUSS.

    Thanks

  • It would involve disabling the cache on the device at the very start of syscltInit(), poll until the cache has been succesfully disabled. Only then proceed with sysctl configuration. At the very end you can then re-enable the cache.

    Yeah ... that fixes the issue. Which doesn't come as much of a surprise since inserting the _NOP() instructions (as suggested by Sal) fixed the issue as well. The issue also goes away when I disable/reenable the cache before/after invoking DL_SYSCTL_configSYSPLL() only.

    I think that confirms the relation to errata issue CPU_ERR_01 pointed out by Sal further above (even though the workaround proposed there apparently doesn't work in all cases).

    I can live with that "more encompassing" workaround assuming there are no other issues related to this errata hidden in the TI SDK.

    Just to satisfy my curiosity wrt "poll until the cache has been successfully disabled":

    The TRM doesn't indicate that reading CPUSS->CTL would return the actual state of the corresponding caches. As such, I would not expect to read back anything different from what was written to it previously. Apparently that's not true. The TRM also doesn't say what happens in detail when a cache or prefetch gets disabled.

    Can you shed some light on this or point me to other documentation that does?