SW-TM4C: Simplified method for determining why a program ends up in FaultISR

Steve Strobel

Part Number: SW-TM4C

One of the things which most frustrated me when getting started with TivaWare (StellarisWare at the time) was ending up in FaultISR() when I didn't get a peripheral enabled before trying to use it or some other similar issue and having to single-step through my source to find where it went wrong. The standard advice (1) seems to be to decipher the values in the NVIC_FAULTSTAT and NVIC_FAULTADDR registers, then to manually decode the interrupt stack to determine the address of the instruction after the one that caused the fault (2). I have done that, but 99% of the time I can find the problem with much less work using a modified FaultISR() like this:

static void
FaultISR(void)
{
	volatile int i = 1;
    while(i)
    {
    }
}

With this implementation, when you pause the debugger and find that you are in FaultISR(), you can usually find the cause as follows:

Use the debugger's Variables view to change the value of i to 0 so it will exit the loop.
Click the C step-into button (usually twice) until the call stack display changes to show main() second from the bottom.
Click on the second-from-the-top item in the call stack. It should show your source code.
Look at the instruction before the one indicated. It is likely the one that caused the fault.

I suggest that TI change the default implementation of FaultISR() to something like that. The version I use also has a lot of comments in it; I'll include it below (3).

Even better, perhaps the debugger could be changed so it is able to decode the stack trace from within an interrupt service routine. I realize that the stack frame pushed by entering an ISR is different than that for a function, so there would need to be some method for deciphering which decoding method to use. If it isn't practical to do that automatically in the general case, I could imagine a way for the user to provide a hint (maybe using a checkbox) that a particular stack frame is for an ISR. Perhaps the debugger could maintain a list of symbol names which had been marked that way so it would know to treat them as an ISR the next time as well. That list could default to containing known ISRs like FaultISR(), NmiISR() and ResetISR() or could get initialized with the entries from g_pfnVectors[]. That way new users would immediately be able to see where they went wrong; my guess is that doing so would eliminate many of the questions about FaultISR() on this forum. It should be possible to decode the stack even if there are nested ISRs (higher-priority interrupts happening while processing a lower-priority one) and ISRs which call functions.

I should note that if something overwrites the stack (stack is too small, buffer overflow, etc.), the call stack will be corrupt and none of these methods will help. There are some source code comments about detecting stack overflows below (3, again).

Steve

(1) - https://e2e.ti.com/support/microcontrollers/arm-based-microcontrollers-group/arm-based-microcontrollers/f/arm-based-microcontrollers-forum/1020822/faq-sw-tm4c-how-to-debug-a-program-going-into-faultisr?tisearch=e2e-sitesearch&keymatch=faultisr#

(2) - SPMA043 describes how to manually decode the stack trace. I'll try to include an example in a follow-up post. /cfs-file/__key/communityserver-discussions-components-files/908/0842.spma043_2D00_Diagnosing-Software-Faults-in-Stellaris_AE00_-Microcontrollers.pdf

(3) - My full version of FaultISR(). Most of the differences from what I put above are comments, and some won't apply to others. A comment could be added about uninitialized peripherals being a common cause of ending up in FaultISR().

//*****************************************************************************
//
// This is the code that gets called when the processor receives a fault
// interrupt.  It prints any queued debug messages (including tracepoints)
// then enters an infinite loop, preserving the system state for examination
// by a debugger.
//
// If have trouble figuring out why we get here, check to see if there is a
// way to see which vector was used.  Could also make multiple ISRs and make
// each suspect vector point to a different one.  Update: it looks like FaultISR(),
// unlike IntDefaultHandler(), is pointed to by only one vector.  Might still
// be able to look at register values and determine what triggered the "hard fault".
// - Also consider setting a global to different values at various places in the
//   code so can check its value here and see which of those places it was last set.
//   Perhaps use a macro that sets a "checkpoint" (pointer to the filename or
//   maybe function name and an int to the line number).  Search for
//   "ktowyawesctlcic".  Todo.
//
//*****************************************************************************
static void
FaultISR(void)
{
	// Print messages before going into infinite loop.  If the COP watchdog is
	// enabled, this printing probably would have happened in a bit anyway when
	// watchdogTimeoutISR() got called.
	extern volatile uint32_t sysTickMillisecondCount;												// defined in sysTick2.c
	blockWhilePrintAllDebugMessages( sysTickMillisecondCount, "FaultISR" );

    //
    // Enter an infinite loop.
    //
	// There are a couple of ways to (sometimes) determine which code was running
	// when the fault occurred:
	// - Exit this loop and single-step out of this ISR to the calling code.
	//   To trace back out of this ISR, use debugger to change the value
	//   of i to 0, then click the "Assembly Step Into" button several times
	//   (usually 4 to 6) or try the C step-into button.
	// - See "Debugging - tracing how got into ISR.docx" for how to inspect
	//   the stack by hand and modify the PC to make the debugger show the calling
	//   code.
	//
	// If unable to trace back out, it may be because the stack got hammered.
	// - It could be that the stack is too small and is overflowing.
	//   - There is a process for checking the available stack space documented
	//     in the firmware release procedure.
	//   - You can set a watchpoint on __stack as described in
	//     http://processors.wiki.ti.com/index.php/Watchpoints_for_Stellaris_in_CCS
	//     to get the debugger to stop at the point the stack overflows.
	// - It could be that the stack frame is being overwritten even if the stack
	//   itself is not overflowing.
	//   - Can hammer the stack frame without overflowing the stack.  For
	//     example, could have a local array on the stack and overflow it.
	// - The stack pointer itself could get changed to an invalid location.
	//
	// Other possible ways to get clues about the cause:
	// - Look at contents of stack for clues (like strings).
	// - Enable IF_DEBUG_LOCKUPS_USING_TRACEPOINTS_MAIN_LOOP and similar code
	//   to help track down where the buffer overflow is occurring.  Search for
	//   "ktowyawesctlcic".
	// - Acquire a "reverse debugger" (perhaps using debug hardware with trace)
	//   so can look back to the point it all went wrong.
    //
	volatile int i = 1;
    while(i)
    {
    }
}

over 3 years ago

0 Steve Strobel over 3 years ago

Intellectual 370 points

If you are trying to decode the stack trace manually as described in SPMA043, the link below might be helpful to you. It shows an example of how I did it, with screen shots. I leveraged the debugger by plugging the function addresses into the disassembly window. I also found it helpful to set a breakpoint on a line in the disassembly window so I could find the corresponding C source file and line number. But I haven't don't any of that in a long time, as using the modified FaultISR() is much easier and generally gives me the same info.

https://docs.google.com/document/d/1XxnSUmKLSfFTGPlnjlHSKORdhp54uZrN/edit?usp=sharing&ouid=107025470996951424497&rtpof=true&sd=true

I'll try leaving that document editable by all, in case someone wants to improve it. If it gets filled with spam, I'll revert it and lock it down.

Steve

+1 Charles Tsai over 3 years ago in reply to Steve Strobel

TI__Guru**** 191886 points

Hi Steve,

Thank you for providing tips to debugging TM4C MCU. Most of the time a fault is caused by either an access to an un-enabled peripheral or lack of stack space. As you have indicated, when the stack is overflowed, your method may not work as the memory is already corrupted. Nonetheless, I will bookmark this post so I can refer others to it when they stumble on FaultISR. Thanks again.

0 Chester Gillon over 3 years ago

Guru 92251 points

Steve Strobel said:
Even better, perhaps the debugger could be changed so it is able to decode the stack trace from within an interrupt service routine.

CCS/TM4C1294KCPDT: How do I get the stack unwound in exception handlers? contains a example using GEL scripts to get the debugger to unwind to show the call stack at the point a hard fault occurred.

Was developed using CCS 9.1, but can't remember if have used with later CCS versions.

0 Steve Strobel over 3 years ago in reply to Chester Gillon

Intellectual 370 points

Thanks for that link. The fact that a GEL script can enable unwinding the stack from inside an exception handler is useful in itself, and shows that it could be built into the debugger and work out of the box.

I noticed that Peter Jaquiery's customized FaultISR(), like mine, has a number of notes about the likely causes of ending up in FaultISR().

Peter uses "__asm(" BKPT #2");" before the infinite loop in his FaultISR(). I tried using BKPT in place of the loop I had, like this:

volatile int i = 1;

++i;        // If execution stops here, increment the value in the PC register by 2 (remember
            // it is in hexadecimal, so 0x1238 becomes 0x123A).  Then click the C single step
            // button and the call stack that led to getting here should show up (if the stack
            // isn't hammered).

// When execution gets here, the debugger shows it being on the line above.
// If the debugger is not connected, it hangs.  If the watchdog is enabled, it
// causes an automatic reset.
__asm("    BKPT #2");

I'm not sure if that is better or not. It is convenient that it makes the debugger stop, so you don't have to notice that the program is hung and click the debugger's pause button. On the other hand, adding two to the PC is perhaps harder than changing i to 0 (at least if you aren't comfortable with hexadecimal). Either way, it is a lot easier to get to a usable call stack than with the default implementation of FaultISR().

Issues like these been making things hard for newcomers for many years now. Some of us have gotten past the steep learning curve (though some issues still cost us time), but I recommend newbies start with Arduino or Raspberry Pi because of stuff like this. TI, please make it easier to recommend your products! Having a bookmarked post in a forum somewhere is no substitute for making things just work (stack unwinding without needing to find and learn to use a GEL script), building in tools for finding your problems (modifying the code in FaultISR as Peter and I did), or having relevant comments that show up when the debugger stops (common reasons for ending up in FaultISR).

Steve

Arm-based microcontrollers

Arm-based microcontrollers forum

SW-TM4C: Simplified method for determining why a program ends up in FaultISR