The guides to debug the aborts happened on Hercules (TMS570 and RM4x/RM5x) devices
This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
What are exceptions?
An “Exception” is an event that makes the processor temporarily halt the normal flow of program execution, for example, to service an interrupt from a peripheral. Before attempting to handle an exception, the processor preserves the critical parts of the current processor state so that the original program can resume when the handler routine has finished.
In practical situations, exceptions can be mainly categorized into the following:
What is the exception priority order?
When several exceptions occur simultaneously, they are serviced in a fixed order of priority. Each exception is handled in turn before execution of the user program continues. It is not possible for all exceptions to occur concurrently. For example, the Undefined Instruction and SVC exceptions are mutually exclusive because they are both triggered by executing an instruction.
Because the Data Abort exception has a higher priority than the FIQ exception, the Data Abort is actually registered before the FIQ is handled. The Data Abort handler is entered, but control is then passed immediately to the FIQ handler. When the FIQ has been handled, control returns to the Data Abort handler. This means that the data transfer error does not escape detection as it would if the FIQ were handled first.
|Undefined Abort||6 (lowest)|
IRQs are disabled on entry to all exceptions. FIQs are disabled on entry to FIQs and Reset.
What is the processor response to an exception?
When an exception occurs, the ARM CPU:
Copies CPSR into SPSR_ <mode>
Sets appropriate CPSR bits
- If core currently in Thumb state then ARM state is entered
- Mode field bits
- Interrupt disable bits (if appropriate)
Stores the return address in LR_<mode>
Sets PC to vector address
Differences among the aborts (DABT, PABT, and UNDEF)
The processor takes the data abort if data is read from or written to a protected or faulty memory location. The data abort can be either synchronous or asynchronous.
The instruction that caused the data abort is at R14_ABT – 8 which means that the pointer points two instructions beyond the instruction that caused the abort.
The processor takes the prefetch abort if it tries to execute an instruction form a protected or faulty memory location. All prefetch aborts are synchronous.
The instruction that caused the data abort is at R14_ABT – 4. lr_ABT points to the instruction following the one that caused the exception. The handler must return to lr_ABT – 4
The processor takes the undefined instruction exception when it encounters an instruction that is undefined in the appropriate version of the ARM instruction set, or which is for the VFP when the VFP is disabled. The undefined instruction exception can be used to emulate undefined instructions, or simply to handle fault situations.
The instruction that caused the UNDEF abort is at R14_UND – 4.
Why are the return addresses of prefetch abort and data abort different?
For prefetch, the return address is: R14_abt = address of the aborted instruction + 4, and for data abort, the return address is: R14_abt = address of the aborted instruction + 8.
The CPU program counter (PC) is updated at specific points during execution. Exceptions can occur during different phases of fetching/decoding/execution.
In the case of the prefetch abort, the exception occurs only when the processor actually attempts to execute the instruction. The program counter is not updated at the time the prefetch abort is issued, lr_ABT points to the instruction following the one that caused the exception.
In the case of the data abort, the instruction is being executed, and the instruction’s execution causes the exception. When a load or store instruction tries to access memory, the program counter has been updated. A stored value of (pc – 4) in lr_ABT points to the second instruction beyond the address where the exception was generated.
Please refer to Table 3-4 from ARM TRM. This table summarizes the PC value preserved in the relevant R14 on exception entry, and the instruction ARM recommends for exiting the exception handler.
How do I know there is an abort?
When an abort happens, the program gets halted at the Exception Vector Table. Program Counter halts at the address 0x0C(PABT), 0x10(DABT), or 0x04 (UNDEF) if a breakpoint is set at the exception vector address.
There are three important ARM Cortex-R4/R5 registers that can be used to confirm the current state of the processor:
CPSR: The CPSR can be used to verify the current mode of the processor. The mode bits of the CPSR register can be used to check if the current mode is Abort.
M[4:0] Mode 10000 User 10001 FIQ 10010 IRQ 10011 Supervisor 10111 Abort 11011 Undefined 11111 System
SPSR: The SPSR can be used to check the previous mode just before entering the exception. For example, if the processor moves from System to Abort Mode, SPSR shows the mode as “System” while CPSR shows the mode as “Abort”. The bit definitions of SPSR register are the same as that of the CPSR register.
R14 Register: The R14 register is used to find the actual instruction or function call that caused the synchronous abort. The actual address of the instruction that triggered the Exception will be R14 - x, where “x” depends on the type of exception.
For details, see Table 3.4, “Exception Entry and Exit” in the Cortex-R4/R5 TRM: https://developer.arm.com/documentation/ddi0363/e/
Undefined Instruction Exceptions (UNDEF)
Undefined instruction exception can occur if the CPU does not understand the fetched instruction.
There are no Fault Status and Fault address registers associated with this exception; only Link register (R14_UND) provides relevant information. The instruction that caused the UNDEF abort is at R14_UND – 4.
7.1 Possible reasons for the execution of a faulty instruction
- Branch to RAM code that has been corrupted or not yet initialized with required functions
- Return address on the stack has been corrupted (for example, stack overflow or pop/push count mismatch).
- Function pointer is not initialized or corrupted.
7.2 Handling Undefined Instruction Exception
- Confirm whether the CPU control is stuck in an Undefined Instruction exception by checking the halt address. If the address is 0x04, then the control has ended in an Undefined Instruction Exception.
- Check the value of the R14_UND register. R14_UND – X provides the address of the instruction which caused the undefined instruction exception. “X” depends on the mode (X=4 for ARM mode, and X=2 for Thumb mode).
- Check the instruction at the address read from R14_UND - X.
- If it is a valid instruction, check whether the mode used (ARM or THUMB) for execution is correct (A mode mismatch for a valid instruction can cause undefined instruction exception).
- If the instruction is invalid, check for address corruption or RAM corruption.
When the VFP is not enabled, the processor takes the Undefined Instruction exception when performing a floating operation. CPSR[4:0]=b11011.
The instruction that causes the UNDEF exception is vldr s0, [r13, 0xc] at 0x00007430.
The r14_UND = 0x00007434. The mode before entering UNDEF abort is SPSR_UND[4:0]=b11111 (system mode).
Data Abort Exception (DABT)
A Data Abort Exception is a response of an invalid data access. If the exception is confirmed to be a Data Abort, as the first step, check the value of the Data Fault Status Register (DFSR) of the Cortex-R CPU.
DFSR Register The Figure bellow shows the DFSR register bit assignments:
Use the “S” Bit  and “Status Bits” [0:3] to understand the nature of the Data Abort. See Table below for status description:
The SD Bit distinguishes between an AXI Decode or Slave error on an external abort. This bit is valid only for external aborts. For all other types of abort, this bit is set to zero:
0 = AXI Decode error (DECERR) or AHB error caused the abort
1 = AXI Slave error (SLVERR) or unsupported exclusive access caused the abort. Example: exclusive access using the AHB peripheral port
The RW bit indicates whether a read or write access caused the abort.
0 = read access caused the abort
1 = write access caused the abort
Common Types of Data Abort
9.1 Background: Memory Protection Unit (MPU) settings must be correct for any region that the CPU is going to access. If the address that the CPU issues falls outside any of the defined regions and the MPU is enabled, the MPU is hard-wired to abort the access. That is, all accesses for an address that is not mapped to a region in the MPU generate a background fault.
A background fault does not occur if the background region is enabled and the access is Privileged. An MPU
background fault might indicate a stack overflow, and be rectified by allocating more stack.
9.2 Permission: This can happen when MPU settings prevent the access of a region. For example, if a User mode application attempts to access a Privileged mode access only region a permission fault occurs.
A write operation shown below triggered the abort. The MPU setting for memory location at 0x08028008 is read only.
As shown in Figure below, the DFAR register shows the address that triggered the Data Abort because it is a Permission Error (verified using DFSR) at BTCM (verified using ADFSR). R14_abt – 8 (0x000070E0) points to the instruction that caused that access. It shows a STR operation.
Instructions cannot be executed from regions with Device or Strongly-Ordered memory type attributes.
9.3 Synchronous/Asynchronous External: This happens when the access has been transferred from the CPU to the AXI/AHB Bus and encountered an error. This is the most common fault type that happens with Data Abort. If the Abort is Synchronous, you can check the actual memory address that when accessed resulted in Data Abort using Data Fault Address Register (DFAR), which holds the address of the fault when a synchronous abort occurred.
9.4 Synchronous/Asynchronous ECC: This happens if an ECC error is detected at TCM interfaces or in the cache.
10.1 Synchronous abort exceptions
In general, “load” instructions from areas or “store” instructions to memory causing an error are synchronous. DFAR shows the target address of the access. Also, as described in the previous section, R14_abt – 8 points to the instruction that caused that access.
Example 1: Load data from memory location with 2-bit ECC error
A read operation shown below triggered the abort. The data at 0x08000010 has 2-bit ECC error.
As shown in Figure above, the DFAR register shows the address that triggered the Data Abort because it is a Synchronous ECC Error (verified using DFSR) at BTCM (verified using ADFSR). R14_abt – 8 (0x00001F7C) points to the instruction that caused that access. It shows a LDM operation.
Example 2: Write data to unimplemented memory location
A write operation shown below triggered the abort. The address 0x08100018 is outside the valid memory range.
As shown in Figure above, the DFAR register shows the address that triggered the Data Abort because it is a Synchronous Abort (verified using DFSR). R14_abt – 8 (0x000070CC)points to the instruction that caused that access. It shows an STR operation.
Figure: Before Data Abort Happens
Figure: After Data Abort Happened
10.2 Asynchronous faults
Asynchronous fault is difficult to analyze because we are not able to trace the exact location that resulted in the abort. We cannot use the DFAR register which is used in Synchronous Faults. In general, “store” instructions to areas with “Normal” or “Device” memory attributes causing an error are asynchronous.
From the DFSR Register, we can check status bits, SD bit, and RW bit. SD: Internal AXI decode error, or external AXI slave error RW: Indicates whether a read or write access caused an abort Please read section 8 above for details.
10.3 How to track instruction that caused the asynchronous data abort
- R14_abt – 8 is a location near the instruction that caused the exception.
- Find a “store” instruction near R14_abt – 8, which can likely cause the exception.
You should define valid MPU settings for the regions accessed in the application so that CPU can access that region accordingly. If you do not define the MPU of a used region, it can cause a Background Fault Data Abort Exception depending on whether privileged access or non-privileged access is used:
11.1 For privileged accesses:
If the BR bit (Bit 17 of the SCTLR Arm register) is set, the default memory map serves as the background region for any access that does not hit a specified region; if the BR bit is 0, a Background Fault exception occurs for any access outside specified regions.
11.2 For non-privileged accesses:
A Background Fault exception occurs for any access outside specified MPU regions. To prevent a Background Fault exception for such accesses, define Region 0 as a Background region covering the entire memory map, which will then be used as the background region for regions outside defined MPUs.
Prefetch Abort (PABT) Exception occurs when an instruction fetch causes an error. When a Prefetch Abort occurs, the processor marks the prefetched instruction as invalid, but does not take the exception until the instruction is to be executed. If the instruction is not executed, for example because a branch occurs while it is in the pipeline, an abort does not occur. All prefetch aborts are synchronous.
The difference between Undefined Instruction Abort and Prefetch Abort exception is that in case of prefetch, CPU is unable to fetch the instruction from the address; in an Undefined Instruction Exception, the CPU does not know what the instruction does.
The reason for Prefetch Abort can be analyzed by reading the Instruction Fault Status Register (IFSR), the Instruction Fault Address Register (IFAR), and the Auxiliary Instruction Fault Status Register (AIFSR).
IFAR contains the address where the CPU was trying to fetch an instruction from. The contents of IFAR is always valid for a Prefetch Abort, because all Prefetch Aborts are synchronous.
AIFSR record additional information about the nature and location of the fault, for example ATCM (Flash) or BTCM (SRAM).
12.1 Possible reason for prefetch abort
Improper MPU setting: If a permission fault has occurred based on the IFSR status, it is possible that one of the following conditions has occurred:
- An instruction is being fetched from a location for which “Execute Never” attribute is set.
- The target address read from IFAR has “Device” or “Strongly-Ordered” memory attribute. This implicitly means that these areas do not have executable code.
ECC Error on the instruction read:
ECC error is detected on the instruction reads. The IFAR register provides the address that caused the error to be detected. The auxiliary IFSR indicates source of the ECC error.
Wrong return address or branch address - Return address being corrupted - Branch address is corrupted
12.2 Handling Prefetch Abort Exception
Confirm whether the CPU control is stuck in Prefetch Abort Exception by checking the halt address. If the Offset is 0x0C, it indicates that the control has ended in a Prefetch Abort.
Check the status from IFSR and IFAR to determine the type of fault and the address leading to the abort.
In the case of a “permission” fault, find the region in which the address read from the IFAR register falls under. The region can be checked for MPU violations for code area. (Execute Never setting, Device, Strongly-ordered memory).
12.3 Examples of Prefetch Abort Exception
The following example demonstrates the steps to debug a Prefetch Abort. Here, CPU execution was stuck in the Prefetch Abort handler. Relevant register values are as follows:
SPSR_Abt: 0x8000011F: Mode – (11111) System mode. This implies that the CPU was in System mode when the abort was triggered.
IFSR: 0x0000000D:The status indicates a Permission abort. The address captured in IFAR is valid and is the actual address that led to the abort.
IFAR: 0x0013F800:This address falls under a MPU region with strongly-ordered attribute.
Relevant register values for example 2 are as follows:
SPSR_Abt: 0x600001D1: Mode – (10001) FIQ mode. This implies that the CPU was in FIQ mode when the abort was triggered.
IFSR: 0x00000409:The status indicates a synchronous external abort or ECC abort. The address captured in IFAR is valid and is the actual address that led to the abort.
IFAR: 0x00009000:This address doesn’t contain a valid instruction and it has wrong ECC value.
AIFSR: 0x00400000: This status indicates that the source of the error is from ATCM (flash).