This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570LC4357: Unexpected and confusing Data Abort

Part Number: TMS570LC4357

Hi,

I am getting a data abort that does not make sense to me.

I am using a TMDX570LC4357HDK board, CCS v11.1.0.0.00012, and the TI 20.2.5.LTS compiler.

I recently created a forum post about an undef abort in the ESM FIQ. I was helped with that issue, buy ensuring the ECC data was correct for Flash using my linker script. I no longer get any ESM interrupts. This problem I am describing here happens at approximately the same time in the same application. My application has two of the same HDK boards communicating with each other over SPI. If I do not connect their SPI pins, so there is no communication and no SPI interrupts, I do not get this issue. Perhaps the lack of these interrupts just makes the problem more rare and so I just have not seen it yet, or perhaps the interrupts are the culprit somehow. I have two applications - Peer0 and Peer1 - the problem is only present on one of these applications, regardless of which HDK each Peer runs on.

Within an IRQ interrupt I am getting a data abort. I have executed this ISR many times before it fails. This is supposedly the instruction that fails, according to both the debug call stack and the R14_ABT register:

00000846: 4913 ldr r1, [pc, #0x4c]      ;reads 0x08001C8C into R1
00000848: 9803 ldr r0, [r13, #0xc]
0000084a: 6008 str r0, [r1]

0000084c: 4914 ldr r1, [pc, #0x50]      ;reads 0x08001C90 into R1
0000084e: 9803 ldr r0, [r13, #0xc]
00000850: 43C0 mvns r0, r0
00000852: 6008 str r0, [r1]

Clearly this does not make sense as that is not a read/write operation. The values of r0 and r1 suggest that it has not yet executed line 0x84C, as the value or R1 is 0x08001C8C.

The value of the DFSR is 0x00001008, which indicates a 'synchronous external abort' - meaning the MVNS instruction was the culprit - so either this is wrong, or r14_ABT is wrong?

The value of the DFAR is 0x07FFFFEC, which is not the value of any of the registers at the point of the abort occuring.

According to the ARMv7-R TRM, table B5-8, DFSR[10, 3:0] being 01000b means that the DFAR is valid, which does not appear true to me.

According to section B5.4.3, an external abort on an instruction fetch is reported using a Data abort. This leads me to believe that something has gone wrong unrelated to the currently executing code and that most of the above information is a red herring.

How can I debug this data abort, or am I missing something?

Many thanks,

Adam

  • Hi Adam,

    What is the value of  r14_ABT?

  • R14_ABT is 0x850.

  • If R14_ABT=0x850, the instruction generated data abort is at 0x850-8=0x848: ldr r0, [r13, #0xc]

  • ldr r0, [r13, #0xc] looks like it would indeed cause a data abort because R13_IRQ is 0x07FFFFE8; this value + 8 is equal to the DFAR. However, I do not see why the SP has that value. The instructions before those posted above:

    00000840: 9803 ldr r0, [r13, #0xc]
    00000842: 1C40 adds r0, r0, #1
    00000844: 9003 str r0, [r13, #0xc]

    Clearly these instructions do not affect the SP, but if the SP had that bad value before these instructions, they would have triggered the data fault before the point that it actually occurred. I've checked through the disassembly and C code and there are no branches that would skip the bold instruction in this comment.

    The only other possibility I can think of is badly nested interrupts causing this issue. The SPSR_ABT shows that we were in IRQ mode with IRQs disabled and FIQs enabled when the abort occurred. This is what I would expect. However if I look at the SPSR_IRQ I see that we were in IRQ mode with both IRQs and FIQs enabled before the IRQ was taken. This indicates that we are nesting IRQs before the abort, which should not occur. The core should set the I bit in the CPSR upon taking a IRQ and there is no code in my application that clears the CPSR when in IRQ mode.

    I have checked all of my assembly code and inline assembly code for cpsie instructions and writes to the CPSR that clear the I bit but I have found none. I can see no reason that my interrupts should nest.

    Have I missed something?

  • Hi Adam,

    For TMS570LC43x, flash error detection and correction is enabled at reset. The ECC values for all of the Flash memory space (Flash banks 0 through 6) must be programmed into the Flash before the program/data can be read. This can be done by generating the correct values of the ECC using linker cmd.

    The Cortex R5F CPU may generate speculative fetches to any location within the Flash memory space. 

    Generate ECC using Linker CMD file:

    https://software-dl.ti.com/hercules/hercules_docs/latest/hercules/How_to_Guides/index.html

    My example of linker cmd file for generating ECC:

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/908/4338.HL_5F00_sys_5F00_link.cmd

  • Hi,

    I have already added the ECC information in my linker script and as far as I can see it is working as intended; my red error LED never turns on and I never get an ESM interrupt.