TMS570LC4357: Data Abort Exception - find instruction

Ju_We

Part Number: TMS570LC4357
Other Parts Discussed in Thread: HALCOGEN, SEGGER, TMS570LS0432,

Hi all,

in one of our projects we have a sporadic "Data Abort Exception". The "Data Fault Status Register" reads 0x0000080D. The TRM tells me that's a write permission fault.

I was hoping to be able to trace back to the instruction which caused the exception by evaluating the "Data Fault Address Regsiter", which reads 0x00000000. That's where my confusion starts, what does that actually mean?

Best Regards,
Juergen

over 5 years ago

0 QJ Wang over 5 years ago

TI__Guru**** 192666 points

Hello Juergen,

What is the your MPU settings? A Data Abort can be either precise or imprecise. The DFAR holds the address of the fault when a precise abort occurs.

0 Ju_We over 5 years ago in reply to QJ Wang

Intellectual 845 points

Hi QJ,

thanks for your reply. I am not very familiar with the MPU hence I am not sure how to answer your question. Below you can see the Halcogen settings for the MPU.

The project uses external RAM on EMIF extensively, not sure if that could be a hint. EMIF settings are in region 10.

Thanks and Regards,

Juergen

0 QJ Wang over 5 years ago in reply to Ju_We

TI__Guru**** 192666 points

Hello Juergen,

For Cortex-Rx based device, the instructions cannot be executed from regions with Device or Strongly-Ordered memory type attributes. The processor treats such regions as if they have XN permissions. Don't try to execute code from EMIF memory?

Did you step over your code and find which function causes the data abort?

0 Ju_We over 5 years ago in reply to QJ Wang

Intellectual 845 points

Hi QJ,

it's not possible to omit EMIF access, the project uses very large data buffers which can only be allocated in external RAM. I have reasons to believe that EMIF is not the problem anyway.

I wished I could just step over the code which causes the problem. Unfortunately the hard fault occurs only sporadically. My goal is to find the code which causes the problem.

In the concrete case of a data abort exception the "data fault status register reads" 0x00001008. Hence the 5-bit status equals 0b01000 which indicates an "Synchronous External Abort" for which the VAR is supposed to be valid. Furthermore it shows it's a read access. Since it's an external abort, SD-Bit = 1 means it's an "AXI Slave error".

DFAR reads 0x02096B1F which is reserved memory space.

I am wondering if the information above allows me to trace back to the instruction which caused the exception.

Thanks,
Juergen

0 QJ Wang over 5 years ago in reply to Ju_We

TI__Guru**** 192666 points

Hi Juergen,

It is caused that the code has accessed a nonexistent memory location.

0 Ju_We over 5 years ago in reply to QJ Wang

Intellectual 845 points

HI QJ,

that's what the DFAR tells me. The question was, if there is a way to track back to the piece of code which caused it.

Regards,
Juergen

0 QJ Wang over 5 years ago in reply to Ju_We

TI__Guru**** 192666 points

Hello Juergen,

What is the value R14_abt? Is the nERROR pin cleared? Is any ESM status flag set? It is not easy to find the root cause.

0 Chester Gillon over 5 years ago in reply to Ju_We

Guru 92251 points

I am wondering if the information above allows me to trace back to the instruction which caused the exception.

I was looking in CCS for how to inspect such a failure, with a program which deliberately attempts to read from the invalid address 0x02096B1F

In the CCS Tools -> ARM Advanced Features select "Break on Data Abort" under Vector Catch -> Non-Secure:

Let the program run until the Data Abort happens. With the default HalCoGen Interrupt Configuration there is no specific Data Abort handler, so when the Data Abort is caught by the debugger the PC will be at "b dataEntry" in the HalGoGen generated HL_sys_intvecs.asm.

In the CCS Registers view find the values of the Abort_Registers:

And the User Registers:

In the Registers view go to the Core Registers and manually set the registers to:

PC : The value of R14_ABT - 8, which is the instruction which generated the synchronous Data Abort (if there was an asynchronous Data Abort the instruction may inaccurate. The reason for subtracting 8 is explained by Exception entry and exit summary
SP : The value of R13_USER.

So, for the above example set:

PC = 0x00004120-8 = 0x00004118
SP = 0x08000FE8

Manually setting the PC and SP changes to the CCS debug context to that at which the Data Abort occurred, and allow you to examine local variables on the stack. For the example shows the line at which the invalid address of bad_ptr (0x02096B1F) was read:

If the program has crashed it is possible that the stack has been overwritten / corrupted which may hinder determining the cause.

0 Ju_We over 5 years ago in reply to Chester Gillon

Intellectual 845 points

Hi Chester,

thank you so much for your elaborate instructions on how to trace back a data abort. That looks very promising. However, I have trouble enabling 'Break On Data Abort", the console shows:

--------------------------------------------------------------------------------------------------------------------------------------------------

CortexR5: GEL Output: Memory Map Setup for Flash @ Address 0x0CortexR5:

Trouble Reading Register REG_AHB_DOWNLOAD: No mapped register was found for 1627402249

--------------------------------------------------------------------------------------------------------------------------------------------------

Not sure what that means and how to fix it.

I tried it in our project which is based on an operating system. It has branch instructions for exception to a central exception handler. I don't think that should matter at this point though.

Best Regards,

Juergen

0 Chester Gillon over 5 years ago in reply to Ju_We

Guru 92251 points

Ju_We said:

However, I have trouble enabling 'Break On Data Abort", the console shows:

--------------------------------------------------------------------------------------------------------------------------------------------------

CortexR5: GEL Output: Memory Map Setup for Flash @ Address 0x0CortexR5:

Trouble Reading Register REG_AHB_DOWNLOAD: No mapped register was found for 1627402249

Are you using a Segger J-Link?

The reason is that I saw that error recently when using a Segger J-Link with a TMS570LS0432 but every time execution stopped (https://e2e.ti.com/support/tools/ccs/f/81/p/913701/3421743#3421743)

I haven't tried enabling 'Break On Data Abort' with the combination of a Segger J-Link and a TMS570LC4357; a XDS110 was used for the above test.

Rather than enabling 'Break On Data Abort', setting a breakpoint on the the branch instruction to the central exception handler should have the same effect of stopping before the exception handler has modified any registers.

In theory the central exception handler could take steps to dump the callstack which lead to the data abort, but I don't currently have any example of that and it was easier to try and make use of the CCS debugger.

0 QJ Wang over 5 years ago in reply to Chester Gillon

TI__Guru**** 192666 points

Hi Juergen,

Have you captured the R14_abt? As Chester said, you can find the address (R14_abt - 0x8) which generates the data abort if the data abort is precise abort.

0 Ju_We over 5 years ago in reply to QJ Wang

Intellectual 845 points

Hi,

before going back to my real world problem, which is hard to replicate, I also deliberately forced the CPU into a data abort. Just to see if I get a meaningful result. Works for synchronous aborts and expected not for asynchronous aborts.

In CCS in combination with Segger's J-Link I wasn't able to read the USER_Registers. Seems similar to the problem when I couldn't set the "Break on Data Abort Option"

Doesn't matter so much to me though, IAR works and I can trace back to the instructions which caused the synchronous abort.

For the synchronous abort (reading from bad address) the DFSR value is: 0x00001008 -> synchronous extern abort /Read. Just wondering what the value in DFAR means (0x02096B1F) since it is supposed to be valid.

For the asynchronous abort (writing to the bad address) the DFSR value is 0x00001C06 -> Asynchronous External Abort/Write.

Also wondering if there is a way of debugging asynchronous aborts. The Technical Reference Manual talks about the A-Bit in SPSR in chapter 2.3.2.4. It sounds like setting the A-Bit mask the imprecise abort. Is there a feasible way to make use to be able to also debug asynchronous aborts?

Best Regards,
Juergen

0 QJ Wang over 5 years ago in reply to Ju_We

TI__Guru**** 192666 points

As in Chester's example, the R14_abt - 0x8 will show you the line at which the invalid address of bad_ptr (0x02096B1F) was read.

0 Ju_We over 5 years ago in reply to QJ Wang

Intellectual 845 points

Hi,

as I mentioned in the previous post, I was able to replicate Chester's example, no problem to find the instruction for synchronous aborts when I follow Chester's instructions. I was just wondering if there is a way to track back to the instruction in case of an asynchronous abort. I stimulated that by writing to the bad address instead of reading from it.

Anyway, I just wanted to learn from these examples, but they deviate from my actual problem which I replicated again:

When hitting the breakpoint at address 0x10, the DFAR is 0x0000080D -> Write Permission fault according to the ARM TRM:

Which brings me back to my very first post:

A) I don't even know what I am chasing, what causes a permission fault? MPU's MPUERRSTAT register is 0 when the error occurs.

B) For the permission fault the DFAR is supposed to be valid and reads 0. Not sure what that is telling me.

Best Regards,

Juergen

0 QJ Wang over 5 years ago in reply to Ju_We

TI__Guru**** 192666 points

Hello,

I remember your original question. The DFAR is not updated on a imprecise (async) abort. The async abort can not be normally be located. The permission fault might be related to the MPU settings, the following picture is an example of MPU settings using HALCOGen:

Arm-based microcontrollers

Arm-based microcontrollers forum

TMS570LC4357: Data Abort Exception - find instruction