Print r0-r14 and callstack like backtrace in Abort exception handler

Jhon zhu

Other Parts Discussed in Thread: AM3354

Hi All:

I am using the startware on the AM3354, my program is using LCD with DMA and CAN bus. But I got the Abort exception when I runing my program.

So is anyone can help me how can I print r0-r14 and callstack like backtrace?

(ps I'm using IAR)

Thanks

over 7 years ago

0 Lalindra Jayatilleke over 7 years ago

TI__Mastermind 30365 points

Jhon,
Will need to look into this and get back to you . Thanks for your patience.

Lali

0 James Willis over 7 years ago

Prodigy 175 points

Hi Jhon,

What you're asking for is a pretty big project. There are a quite a few subtle details to work out to get what you want from inside the abort handler. However, you can get the address (or at least in the neighborhood) without much fuss.

First, you'll want to modify the C abort handler in cpu.c and cpu.h to accept a pointer to the address causing the abort like so:

void CPUAbortHandler(void *ptr){
    printf("abort - %p\n", ptr);
}

Remember that your print function will be running while interrupts are disabled so you'll have to be polling the hardware to get the data out.

In the exceptions.S exceptionhandler.S file add an instruction to put the contents of the link register (address that caused the abort) into r0 which is where the C handler will look for the passed pointer.

AbortHandler:
UndefInstHandler:
@
@ Disable all the interrupts
@
     MRS r0, cpsr @ Read from CPSR
     ORR r0, r0, #0xC0 @ Clear the IRQ and FIQ bits
     MSR cpsr, r0 @ Write to CPSR
    ADD r0, r14, #0 @ Store abort address
    ADD r14, pc, #0 @ Store the return address
     LDR pc, =CPUAbortHandler @ Go to C handler
@
@ Go to infinite loop if returned from C handler
@
loop0:
    B loop0

This will get you in the ballpark of the instruction that caused the abort. Use the linker map output to find the corresponding function and start looking there for the problem.

Happy hunting,

James

0 Jhon zhu over 7 years ago in reply to James Willis

Intellectual 370 points

Hi James,

Thanks for reply!
1.The way you told me help me a lot, I got the address like this:
400001d7
a00001d7
600001d7
200001d7
600001db
All these addresses seem to be invalid address. Not only the region but also the alignment .
Is this means I really got an memory error?
By the way, I enabled my cache by

CacheEnable(CACHE_ALL);

and the MMU setting is the default setting in startware as below:

REGION regionDdr = {
MMU_PGTYPE_SECTION, START_ADDR_DDR, NUM_SECTIONS_DDR,
MMU_MEMTYPE_NORMAL_NON_SHAREABLE(MMU_CACHE_WT_NOWA,
MMU_CACHE_WB_WA),
MMU_REGION_NON_SECURE, MMU_AP_PRV_RW_USR_RW,
(unsigned int*)pageTable
};
/*
** Define OCMC RAM region of AM335x. Same Attributes of DDR region given.
*/
REGION regionOcmc = {
MMU_PGTYPE_SECTION, START_ADDR_OCMC, NUM_SECTIONS_OCMC,
MMU_MEMTYPE_NORMAL_NON_SHAREABLE(MMU_CACHE_WT_NOWA,
MMU_CACHE_WB_WA),
MMU_REGION_NON_SECURE, MMU_AP_PRV_RW_USR_RW,
(unsigned int*)pageTable
};

/*
** Define Device Memory Region. The region between OCMC and DDR is
** configured as device memory, with R/W access in user/privileged modes.
** Also, the region is marked 'Execute Never'.
*/
REGION regionDev = {
MMU_PGTYPE_SECTION, START_ADDR_DEV, NUM_SECTIONS_DEV,
MMU_MEMTYPE_DEVICE_SHAREABLE,
MMU_REGION_NON_SECURE,
MMU_AP_PRV_RW_USR_RW | MMU_SECTION_EXEC_NEVER,
(unsigned int*)pageTable
};

/* Initialize the page table and MMU */
MMUInit((unsigned int*)pageTable);

/* Map the defined regions */
MMUMemRegionMap(&regionDdr);
MMUMemRegionMap(&regionOcmc);
MMUMemRegionMap(&regionDev);

/* Now Safe to enable MMU */
MMUEnable((unsigned int*)pageTable);

Is this setting matters on this issue?

or should I lower the DDR3 frequency ?(I have lowered my MPU frequency but thing not change !)

2."There are a quite a few subtle details to work out to get what you want from inside the abort handler"
I'm also thinking about referring the linux panic function , it dumps the registers and the call stack.
Is that means I can't doing the "panic" things in my case?

0 James Willis over 7 years ago in reply to Jhon zhu

Prodigy 175 points

I don't think your problem lies with caching or the MMU. My best guess is that you are corrupting the stack somewhere inside a function call. As you traverse in and out of functions, the return address is stored on the stack at each level. When you return from a function, the program counter is popped from the stack and execution continues from there. If you've overwritten the stored pc value you'll see pretty much the symptoms you have.

Most often the stack is corrupted by overrunning a local variable in a function. For instance, you declare a local array like int a[4] but then assign a value to a[10] (or worse, a[-4].) The program will happily stuff something there and overwrite the stack. So, look for array indices that exceed your declaration or for negative indices. Anywhere you have a pointer to something in the stack (local variables) should be treated with suspicion. Also, make sure you're not passing the address of a simple variable to a function that expects an array.

There are some gcc options to help protect against this kind of problem but I have never played with them. I have no idea what level of effort would be required to use them. Have a look at -fstack-protector-strong and its siblings if you're interested.

To do the backtrace as you wish, the compiler, assembler, and linker all have to agree on the stack conventions and the code must push additional information on the stack at run time to help unwind it. The Linux panic function will give you an idea of how it's done but it's only one piece of the puzzle.

0 Jhon zhu over 7 years ago in reply to James Willis

Intellectual 370 points

Hi James:

Thanks for reply!
I am really appreciate your help! Yes I got a stack overwrite error in my program!
But, Unfortunately I still got the abort exception !
1. When I disable the cache this time I got the address below:

abort - 8000e770
abort - 800140c8
abort - 80032ce0

These address are all available address in the program, and the instruction there don't seem like can cause an abort exception !
and the address don't seems to has any regular pattern.
the instruction there are like below:

MOV R0 , #1

LDRB R0, [R9, #0x2]

the C code like below

if(pChar)
{
szRet.cx = pChar->nBountWidth;
szRet.cy = pChar->nBountHeight;//(abort address)
}
2. when I enable the cache I got the address below:
abort - 11c
abort - 8000cec4
abort - 58
abort - 8001c084

3.The abort exception seems to be the LCD driver cause error

I'm using Raster mode DMA , two DMA buffer , buffer1 for FRAME0 and buffer1 for FRAME1 . UIBuffer for app.

When I stop Flush the UIBuffer to the buffer0 buffer1 or don't change the value of the UIBuffer(still Flush UIBuffer to buffer0 and buffer1) , then I got no abort exception.

But ones I start Flush UIBuffer to the buffer0 buffer1, I got the abort exception later (about 10 to 20 min).

Is it maters change the buffer1 value when the DMA using the buffer1?

Again , thanks for your replay!

0 Jhon zhu over 7 years ago in reply to James Willis

Intellectual 370 points

Hi james:

I think I'm starting another topic here so I created another thread here

e2e.ti.com/.../509140

Thanks

Processors

Processors forum

Print r0-r14 and callstack like backtrace in Abort exception handler