This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570LC4357: Data Abort on Branch

Part Number: TMS570LC4357
Other Parts Discussed in Thread: SEGGER, HALCOGEN

Issue:

I am having an odd issue where a data abort occurs on a branch instruction.

Setup:

I have the

  • Hercules LaunchPad TMS570LC43X connected to my laptop via the Micro USB cable.
  • I have a Segger JLink connected to the JTAG port and connected to my laptop.
  • I've used HALCoGen to generate some driver code that targets the GCC toolchain

Process:

My simple bootloader starts in FLASH with the HALCoGen startup code that sets up the registers, stacks, mpu, cache, and then jumps to my bootloader's main function. It prints to the UART fine. Drivers seem to work and be compiled properly.

I have a breakpoint set so that right before I jump to the RTOS (in RAM) I can load the RTOS over JTAG and then I continue. This seems to work fine. I can jump to the proper address in RAM and step through instructions and watch the registers get loaded with what they are supposed to be loaded with.

Symptom:

I get to the following spot in the RTOS startup code:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
─────────────────────────────────────────────────────────── registers ────
$r0 : 0x8000268 → 0xe0411000 → 0xe0411000
$r1 : 0x000000 → 0xea0018e6 → 0xea0018e6
$r2 : 0x000000 → 0xea0018e6 → 0xea0018e6
$r3 : 0x8011c60 → 0x000000 → 0xea0018e6 → 0xea0018e6
$r4 : 0x200003df → 0x200003df
$r5 : 0x000000 → 0xea0018e6 → 0xea0018e6
$r6 : 0x000000 → 0xea0018e6 → 0xea0018e6
$r7 : 0x80002d5 → 0x70b508f6 → 0x70b508f6
$r8 : 0x000000 → 0xea0018e6 → 0xea0018e6
$r9 : 0x000000 → 0xea0018e6 → 0xea0018e6
$r10 : 0x000000 → 0xea0018e6 → 0xea0018e6
$r11 : 0x000000 → 0xea0018e6 → 0xea0018e6
$r12 : 0x000000 → 0xea0018e6 → 0xea0018e6
$sp : 0x8011c60 → 0x000000 → 0xea0018e6 → 0xea0018e6
$lr : 0x8000278 → 0xe59f004c → 0xe59f004c
$pc : 0x8000274 → 0xe12fff17 → 0xe12fff17
$cpsr: [negative zero carry overflow INTERRUPT FAST thumb]
─────────────────────────────────────────────────────────────── stack ────
0x8011c60│+0x0000: 0x000000 → 0xea0018e6 → 0xea0018e6 ← $r3, $sp
0x8011c64│+0x0004: 0x000000 → 0xea0018e6 → 0xea0018e6
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

This is a GDB view with the GEF visualization showing the registers, source code, and disassembled code in a single view.

If you look at the "code" section the next instruction is going to branch to the address contained in "r7". The "r7" register contains the address 0x80002d5.

However, using the GDB command "ni" (next instruction) takes me to the abort handler:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
─────────────────────────────────────────────────────────── registers ────
$r0 : 0xfff7e400 → 0x000000 → 0xea0018e6 → 0xea0018e6
$r1 : 0x01c200 → 0xffffd9ff → 0x000000 → 0xea0018e6 → 0xea0018e6
$r2 : 0x00e898 → ; <UNDEFINED> instruction: 0xffffffff
$r3 : 0x3000032 → 0x3000032
$r4 : 0xfff7e400 → 0x000000 → 0xea0018e6 → 0xea0018e6
$r5 : 0x000000 → 0xea0018e6 → 0xea0018e6
$r6 : 0x000000 → 0xea0018e6 → 0xea0018e6
$r7 : 0x000000 → 0xea0018e6 → 0xea0018e6
$r8 : 0x000000 → 0xea0018e6 → 0xea0018e6
$r9 : 0x000000 → 0xea0018e6 → 0xea0018e6
$r10 : 0x000000 → 0xea0018e6 → 0xea0018e6
$r11 : 0x000000 → 0xea0018e6 → 0xea0018e6
$r12 : 0x000000 → 0xea0018e6 → 0xea0018e6
$sp : 0x8001400 → 0x33018333 → 0x33018333
$lr : 0x003608 → 0xa00000f → 0xa00000f
$pc : 0x000010 → 0xeafffffe → 0xeafffffe
$cpsr: [NEGATIVE zero carry overflow INTERRUPT FAST thumb]
─────────────────────────────────────────────────────────────── stack ────
0x8001400│+0x0000: 0x33018333 → 0x33018333 ← $sp
0x8001404│+0x0004: 0x62266c33 → 0x62266c33
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

As you can see I am now at the data abort handler.

Another interesting tidbit is that the Segger GDB server reports:

Fullscreen
1
2
3
4
5
6
7
8
9
10
Reading 64 bytes @ address 0x08001400
WARNING: Failed to read memory @ address 0x62266C32
WARNING: Failed to read memory @ address 0x62266C32
Reading 64 bytes @ address 0x62266C00
WARNING: Failed to read memory @ address 0x62266C00
WARNING: Failed to read memory @ address 0x62266C32
Received monitor command: cp15 6 0 0 0
Reading CP15 register (6,0,0,0 = 0x62266C32)
Received monitor command: cp15 5 0 0 0
Reading CP15 register (5,0,0,0 = 0x00001008)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

As you can see, it reports an address that isn't even in the same region, it is in the async RAM section, while I am executing in the RAM section.

I think that it is odd that this is occurring on a branch and not a load or store. It occurs at the same place every time.

Investigation:

I have found some data abort debugging articles and forum posts and have tried to follow as many points as I can.

  •   The data fault status register (DFSR)
    • Value: 0x00001008
    • Status [10,3:0]: 0b1000
      • Source: Synchronous External Abort
      • FAR Validity: Valid
    • SD [12]: 0x01
      • Only valid for external aborts, which this is
      • 1 = AXI Slave error (SLVERR), or unsupported exclusive access, for example exclusive access using the AHB peripheral port, caused the abort
    • RW [11]:
      • 0: read access caused the abort
  • Data Fault Address Register (DFAR)
    • Value: 0x62266C32
    • ExclamationThis is really bizarre because this is what is reported in the Segger GDB server but I have no code there! I'm not branching there and I never see when that address is in my stack or registers. Where is it coming from?
  • Auxiliary Data Fault Status Register (ADFSR)
    • Value: 0x00000000
    • CacheWay [27:24]
      • Value:
      • Description: The value returned in this field indicates the cache way or ways in which the error occurred.
    • Side [23:22]
      • Value:
      • Description: The value returned in this field indicates the source of the error.
    • Recoverable error [21]
      • Value:
      • Description: The value returned in this field indicates if the error is recoverable.
      • Decoded: 0 = Unrecoverable error.
    • SideExt [20]:
      • Value: 0b0
      • Description: The value returned in this field indicates the source of the error. See Table 4-32 for the encodings.
      • Decoded:
        • Along with Side, this indicates that the source of the error is "Cache/AXIM"
  • CPSR
    • Value: 0x800003d7
    • T [5]: 0
      • Not in thumb mode
    • M [4:0]: 0b10111:
      • Mode: Abort Mode
  • SPSR_abt
    • Value: 0x800003df
    • T[5]: 0
      • Not in thumb mode
    • M[4:0]: 0b11111
      • Mode: System Mode
  • The CPSR and SPSR_abt tell me that I am going from regular privileged code execution straight to the abort handler.

The relevant MPU regions are mainly brought over from the HALCoGen generated code, however, I changed the RAM region to be executable:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
static const mpu_region_t tms570lc43x_mpu_regions[NUM_MPU_REGIONS] = {
{
.enabled = true,
.region_number = 0,
.start_address = 0x0,
.size = MPU_4_GB,
.type = MPU_NORMAL_OINC_NONSHARED,
.permissions = MPU_PRIV_NA_USER_NA_NOEXEC,
.disabled_sub_regions = 0xFF,
},
{
// FLASH
.enabled = true,
.region_number = 1,
.start_address = FLASH_START,
.size = MPU_4_MB,
.type = MPU_NORMAL_OIWTNOWA_NONSHARED,
.permissions = MPU_PRIV_RO_USER_RO_EXEC,
},
{
// RAM
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

I overwrite the weak MPU initialization function in the HALCoGen code and iterate through this array and setup the regions in the same way. I have used the JTAG and the cp15 instructions to verify that the regions are set up as they are specified here.

I can also use the GDB "examine" command to view and disassemble the code at the address that is originally in "r7". So I feel like it isn't an issue related to accessing the memory or the code not being there. I'm very stumped. I have a feeling that there is some configuration that I am missing with the processor but I can't find anything related to data aborts and branches too.

Any tips are appreciated.

  • Is the RTOS a separate application image which includes device/clock/peripheral initialization, and its own interrupt vector, linker cmd file? Can the RTOS application boot-up correctly if it is programmed to 0x00000000?

  • Data Fault Address Register (DFAR)
    • Value: 0x62266C32
    • This is really bizarre because this is what is reported in the Segger GDB server but I have no code there! I'm not branching there and I never see when that address is in my stack or registers.

    The stack view *after* the data abort is shown does show 0x62266c33 which only differs in that the least significant bit is set:

    ─────────────────────────────────────────────────────────────── stack ────
    0x8001400│+0x0000: 0x33018333 → 0x33018333 ← $sp
    0x8001404│+0x0004: 0x62266c33 → 0x62266c33

    I'm not sure if the GDB view with the GEF visualization is somehow de-referencing pointers which is triggering reads leading to aborts.

  • and Chester, I must admit that this entire issue was the cause of my own lapse in intelligence.

    I was using the next instruction command in GDB instead of step on that branch. However when I let the program go it still didn't print, which is because the serial driver wasn't setting the baud properly.

    , you're suggestion got me looking into booting from 0x00000000 and I was getting the same issue at the same place. This made me evaluate my method more and right away I realized I was using the wrong command.

    The root of my issue was that my serial driver was setting the baud wrong. The application that I am using is a hello world application and I believe it is causing the abort once it is done.

    Thanks all for taking a look!

  • Chester,

    Good catch! Thanks, but I did end up resolving the issue. I couldn't figure out how to @ your username in my explanation.