TM4C123GH6PZ: Diagnosing Invalid State Usage Fault - Stack Overflow?

CamK

Part Number: TM4C123GH6PZ

Hello,

I'm running into an issue where, after about an hour of running my program on my TM4C123GH6PZ, the microcontroller goes into the Fault ISR. I want to walk through the steps I've taken so far and look for some advice or tips moving forward.

I attempted to use this document here to debug the fault. I've determined from the NVIC_FAULT_STAT register that I'm running into an invalid state usage fault. The NVIC_FAULT_ADDR is reading address 0xE000EDF8. From the TM4C123GH6PZ datasheet, this appears to be within the memory reserved for the Cortex-M4F peripherals (sys stick, NVIC, MPU, FPU, and SCB). Looking at the Cortex-M4F data sheet, 0xE000EDF8 appears to be the "debug core register data register." I've come to a dead end here, as I'm not entirely sure if this is a correct fault address.

It sounds like the most common cause of the invalid state usage fault is stack overflow or stack corruption. I read this document and added a hardware watchpoint to monitor __stack. So far, I haven't hit this watchpoint yet but I still hit the FaultISR. I had previously read that sometimes debugging a stackoverflow can be done by setting the entire stack to a specific known value and seeing if/when all the values get modified in memory. I read that this can be done in the linker file but I'm not sure how.

I also noticed that inside my linker file (.cmd), there is the statement

__STACK_TOP = __stack + 1900;

This is carry-over from an old project. My stack size is actually set to 2600. What is the effect of the above statement? From a previous thread on the forums, __STACK_TOP and __STACK_END start as the lowest value of the stack. As the stack is filled, __STACK_TOP decreases towards the value __stack. By setting __STACK_TOP = __stack + 1900, am I incidentally setting my stack to only 1900? My map file shows a stack size of 0xA28 (2600), but it seems like I'm offsetting the stack top at the beginning of the program.

Any help, tips, or thoughts would be widely appreciated! This is my first time really digging into stack usage, so please be light if I'm missing something obvious!

Regards,

CamK

over 8 years ago

0 Charles Tsai over 8 years ago

TI__Guru**** 184346 points

Hi,

For INVSTATE fault, the most likely cause is stack overflow or stack corruption. The __STACK_TOP is what is loaded to the stack pointer at vector 0x0 as in (void (*)(void))((uint32_t)&__STACK_TOP). Please change from 1900 to 2600 to match the stack size you define in CCS. What happens when you change to __STACK_TOP = __stack + 2600?

Other possible causes for INVSTAT fault are loading a branch target address to PC or the vector table contains a vector address with LSB=0.

0 CamK over 8 years ago in reply to Charles Tsai

Intellectual 410 points

Hi Charles,

Thanks for the reply.

I made the suggested change and set
__STACK_TOP = __stack + 2600;
in the linker file to match the stack size defined in CCS.

I still generate an invalid state usage fault after running for some amount of time (half hour-hourish). I set the hardware watchpoint for __stack but it never hits this. I also tried increasing my stack size a bit to 3200 to no avail.

Are there any other suggested methods to determine if it is a stackoverflow vs. stack corruption? It sounds like I may be able to use the LR register to determine the last running function.

Regards,
CamK

0 Genatco over 8 years ago in reply to CamK

Guru 55673 points

Have you tried to compile with code optimizations set lower and also add some heap ram in the linker options if you make SysPrintf() calls.

0 Charles Tsai over 8 years ago in reply to Genatco

TI__Guru**** 184346 points

What is the LR value when you are inside the HardFault exception routine? Is it a normal address?

Perhaps this is what you can try. Find out the SP value when you enter the HardFault exception routine. In the memory window go to the address pointed to by the SP. Check the LR value that is being saved onto the stack. During exception the R0, R1, R2, R3, R12, SP, LR, PC and PSR are saved onto the stack in a descending order with the R0 at the bottom of the address. Go to the address pointed to by the LR that is pushed to the stack. Do you see any instruction at this address that might have led to the usage fault with the INVSTAT such as trying to do a BLX to an even address?

0 CamK over 8 years ago in reply to Genatco

Intellectual 410 points

Hi all,

I'm not doing any dynamic allocation (heap is set to 0) and I have optimization turned off.

I ran the program and waited for it to crash. I went to the address stored in the SP - which appears to be a valid address within the bounds of the stack - and found the LR stored on the stack. I went to the address of the LR (0x3A35) and viewed the dissassembly at that address. My last instruction appears to be a str instruction.

I see the blx instruction as 0x3A32. The value of R2 stored on the exception stack frame (SP + 0x08) is 0x00000000. Could this be the culprit?

Regards,

CamK

0 Bruno Saraiva over 8 years ago

Guru 13040 points

CamK,
I had a similiar issue a long time ago: execution would freeze in very rare and random situations. Those kind of moments when you actually consider to configure a watchdog just to protect the product out to the market.
After blaming all the possible stacks and the Gods of Silicon, eventually the problem was on an array index - it was an almost impossible combination of data received, which caused a bad code to try to write way far from the array declared size.
It is something that I never let happen again, but back then it was sort of hidden and with no apparent relation to the problem I was seeing. The point being: keep a very open mind when looking for problems like yours.
As far as stacks are concerned, before I even blink a led, my TM4C129 projects are all set to 4096, and the TM4C123's to 2048 - certainly nothing too scientific about this, but they are numbers that avoid other mysterious (as in PAINFUL to debug) problems to happen.
Regards
Bruno

0 Charles Tsai over 8 years ago in reply to CamK

TI__Guru**** 184346 points

Hi Cam,
As I indicated earlier a blx to an even address (with R2's bit0=0) will attempt to switch to ARM state which is not allowed in CortexM4. Please investigate the corresponding C code to see if any clues for why the R2=0. What was R2 before the it was pushed to the stack?

0 CamK over 8 years ago in reply to Charles Tsai

Intellectual 410 points

So, after doing some modifications, every time I run, I consistently get the instruction access violation fault (NVIC_FAULT_STAT register = 0x00000001). NVIC_MM_AADR = 0xE000EDF8 (that same debug data register). The system crashes much quicker but it is much more consistent and always results in the same stack pointer (0x20005238). If I follow that address, I can see the same LR address on the stack (0x10DD3) every run. The PC register on the stack is 0xFFFFFF30 - this can't be valid, can it? It looks like R0 was 0x00000000.

Within my UART 4 interrupt, I can see the following code from the disassembly at the LR address. I'm using the ring buffer RingBufUsed function provided by TivaWare (2.1.3.156, I believe) to check if there is data in the ring buffer.

Does an offending instruction or anything stand out? ulMode is just a standard unsigned 32-bit integer. circTxBuffer is ring buffer of size 256 (using the struct provided in TivaWare). I'll also attach my entire UART4 interrupt in case I'm overlooking something. This issue has really got me scratching my head. Interestingly, my debug screen has a small window with the message 0xFFFFFFF0 (no symbols are defined for 0xFFFFFFF0). Are there any other registers I could check to help point me in the right direction?

void UART4IntHandler(void)
{
    uint32 intStatus;
    uint32 ulMode = 0;
    schEvent event;
    uint8 recChar[16];
    uint8 index = 0;

    intStatus = MAP_UARTIntStatus(UART4_BASE, true);
    MAP_UARTIntClear(UART4_BASE, intStatus);

    if(intStatus & (UART_IM_RTIM | UART_IM_RXIM) )
    {
        while(MAP_UARTCharsAvail(UART4_BASE)) 
        {
            recChar[index++] = MAP_UARTCharGetNonBlocking(UART4_BASE);
        }

        if(index)
        {
            RingBufWrite(&circRxBuffer, recChar, index);
            event.sig = UART_SIG;
            sch_post_isr(UART4_TASK_PRIO, event);
            MAP_IntMasterEnable();
        }
        RxInt++;
    }

    ulMode = RingBufUsed(&circTxBuffer);
    if(ulMode)
    {
        for(index=0; index < ulMode; index++)
        {
            if(MAP_UARTSpaceAvail(UART4_BASE))
            {
                MAP_UARTCharPut(UART4_BASE, RingBufReadOne(&circTxBuffer));
            }
            else
            {
                index = ulMode;
            }

        }
    }
}

0 Charles Tsai over 8 years ago in reply to CamK

TI__Guru**** 184346 points

Hi CamK,
The access violation means that the CPU is executing from an address which is a XN (execute never). The system control space is always XN.

Are you showing that the fault/crash only happens when UART4IntHandler() is entered?

If you single step the assembly code at which instruction does it generate the fault?

Does at any time the index goes over 16?

What are the registers saved to the stack before the RingBufUsed() was called and after the call?

0 CamK over 8 years ago in reply to Charles Tsai

Intellectual 410 points

Hi Charles,

I found the issue. It turns out that the array buffer was overflowing. The communication rate was so high that before the interrupt handler could clear out the FIFO, more bytes would arrive (FIFO trigger level was set to 6/8) and the array would overflow.

Thank you everyone for the time. It look a lot of patience to resolve this.

CamK

0 Bruno Saraiva over 8 years ago in reply to CamK

Guru 13040 points

CamK

Gotta say it... "Told you!"

"...eventually the problem was on an array index - it was an almost impossible combination of data received, which caused a bad code to try to write way far from the array declared size..."

Take another look into your code - even if the communication is fast, you should not let the interrupt dance allow that to happen. The "worst" that should happen would be unprocessed or lost data, but no array corruption; certainly, it ain't the hardware FIFO who has run out of space.

Still, glad you found it, thanks for sharing the outcome.

Bruno

Arm-based microcontrollers

Arm-based microcontrollers forum

TM4C123GH6PZ: Diagnosing Invalid State Usage Fault - Stack Overflow?