call stack backtrace when exeception happen?

Hi all,

 when  DSPBIOS hang, it will trigger UTL_halt which call from SYS_abort.

So if without CCS, can we have method to get the call stack backtrace from the UTL_halt?

 I mean we can hook the UTL_halt function, add some function like backtrace(...), then we can know where the execption happens.

Does is possilbe?

Currently we  find out the code where the exception is triggered by setting breakpoint at _EXC_dispatch which is default entry point for all exceptions.

Then check the value of B3 register, which stores the function return address. Any method can do this without CCS?

Our hardware is Omap L138, BIOSDSP version is :5_41_02_14

 thanks

  JiangPeng

 

  • Unfortunately there is no run-time backtrace()-type capability for C6x, and even the CCS capability is limited w/o debug information compiled in.

    Buf for the exception situation a backtrace() wouldn't help much, since what it would tell you is already known - EXC_dispatch was called by the NMI/exception handler vector.  EXC_dispatch() will call subroutines to print all known exception information.  When you reach UTL_halt() due to an exception, you can open a BIOS message LOG and select the Execution Graph Details LOG, where you will find information related to the exception.

    If you want to "catch it in its tracks", you can place a breakpoint on the vector itself, at the label "hwi1".  When you hit that BP, you can see the NRP register in the CPU Core registers to determine where the PC was executing at the time of the exception (although, NRP is printed in the Executaion Graph Details LOG), and you can see all the register values at the time of the exception (which you won't have if you reach UTL_halt(), since the exception processing has trashed all registers).  You would then need to inspect the code pointed to by NRP to gather further information about what caused it.  If it is an external exception, and you don't already have it enabled, you can enable bios.MEM.USEMPC=1 in your .tcf file to have the MPC module decode exceptions generated by either of L1D/L1P/L2.

    Regards,

    - Rob

  • In reply to Robert Tivy:

    Hi Rob,

     thanks for reply, I am still confused about the exception handler of TIBIOS, we are using the C6747. I run below case to try some exception of bios, found for case 1,2 no utl_abort, for case 3, it will have.

    so, may I know in condition can catch exception such as case 1 or 2? set MMU?

    case 1: char *p = 0; *p = 30; //no exception

    case 2: unsigned long i = 9; *(unsigned long *)i = 10; // no exception

    case 3: void (*p)() = myfunc; p = 0; (*p)(); //have exception

     

    btw, we found method to backtrace the call stack of exception 3, share with you:

        sys_stk = (Uint32)HWI_data.stkBottom;
        usr_stk = *((Uint32*)sys_stk - 0x2d);
        ret_addr = *((Uint32*)sys_stk - 0x21);
        nrp_val = *((Uint32*)sys_stk - 0x43);

    // record function call stack
        stack_ptr = (Uint32*)usr_stk;
        func_num = 0;
        while ((func_num < 8)
            && ((Uint32)stack_ptr < (usr_stk + 256)))
        {
            if ((*stack_ptr >= (Uint32)text_start) && (*stack_ptr <= (Uint32)text_end))
            {
                g_CallStack[func_num] = *stack_ptr;
                func_num++;
            }
            stack_ptr++;
        }

     

  • In reply to peng jiang38224:

    I believe that if you enable the MPC module then you will see exceptions generated for cases 1 & 2.

    To enable the MPC module, do this in your .tcf file:
        bios.MEM.USEMPC = true;
    This causes the EXC module to be notified when an exception comes from one of the MPCs (in this case, L1D & L2).

    Your backtrace function looks interesting, thanks for sharing that.  It might record a false function in the backtrace if there is a local function pointer on the stack.

    Regards,

    - Rob

  • In reply to Robert Tivy:

    Hi Rob,

     thank you for your information about enable MPC, it can capture these kind of exception by MPC, but do you know any method can backtrace the call stack?

    Or from the MPC moduble, can we locate the user stack location, then we can use our method to walk through the call stack.

     thanks

     JiangPeng

  • In reply to peng jiang38224:

    Rob

    btw, can you share more information about MPC, because the API reference guide is very simple.

    From my observation, access Zero address, some un-existed DSP registers will cause the exception from MPC, can you share more kinds of error will generate the MPC exception? we are using OMAP L138.

     thanks

     JiangPeng

     

  • In reply to peng jiang38224:

    The MPC exception handling functions are called from within the EXC exception handlers, so any backtrace that works from the context of EXC will also work from MPC context.  Your method which tracks the HWI stack from UTL_halt() will also function as well (or as badly) when performed from MPC exception handling context.

    Regards,

    - Rob

     

  • In reply to peng jiang38224:

    BIOS's MPC module will report exceptions originating from the EVT{DMC,PMC,UMC}CMPA events on a C64+ processor.  Typically, those events result from a memory access violation when accessing legitimate addresses in the memory space and the access does not have the appropriate characteristics per the MPPA (Memory Protection Permission Attribute).  For example, when trying a memory read from a real L2 memory address that doesn't have the "read" attribute set in the MPPA for that address' memory page (permissions are set on a "page" basis, and the page size differs depending on the memory size).

    BIOS's MPC module does not program any MPPA registers, that is the responsibility of the application writer, and as a result all MPPAs are "wide open" (i.e., allow any type of access from any source).  However, the L1D/L1P/L2 memory controllers will ultimately report an access to a bad address such as 0 (even though they don't control that "address").  The reporting comes in different forms depending on which memory controller originated the request, and I can't really relate the details of the mechanism, but when you enable BIOS's MPC module, any of the 3 types of access to address 0 (code fetch, memory read, memory write) will result in an exception that gets handled by the EXC module (by way of the MPC module hooking into the EXC module "hooks").

    Regards,

    - Rob

  • In reply to Robert Tivy:

    Hi  Rob,

    I meet the  similar problem. I place a breakpoint on the vector itself, at the label "hwi1". And I set the B3 =0 ,just for test, because when abort, the B3 is 0, then my project will stop at hwi1, but the NRP is 0, and there is no call stack, I don't know when I set the B3 =0, where the PC was executing at the time of the exception. Can you give me some advice?

     

    Regards,

    Si

  • In reply to si cheng:

    si cheng
    And I set the B3 =0 ,just for test, because when abort, the B3 is 0, then my project will stop at hwi1, but the NRP is 0, and there is no call stack

    Setting B3=0 as a test is causing an exception.  The PC "returns" to B3's address and a code fetch exception is caused when the CPU tries to fetch code from 0.  The exception processing then puts the PC into NRP, which results in 0 in NRP, so NRP=0 is valid even though there's no code at 0.

    There will not be a valid call stack when inside exception processing, since the stack is moved from your code's stack to the BIOS interrupt stack.  However, you're stopping on hwi1, which is before BIOS exception processing code has had a chance to change the SP.  Even though you don't see a high-level call stack, there may be value in looking at the stack at the SP when you hit hwi1.

    You say that when you get your real exception (as opposed to this "test" exception) that B3=0.  The way to proceed in your investigation is to determine how B3 was assigned to 0.  This is often caused by a stack that has overflowed - the stack expands into some data area, and if B3's "stack save" position on the overflowed stack is written via a global data write (as opposed to a local stack-based variable) then B3 gets overwritten with 0 (restored with 0) instead of the actual return value that was saved on the stack.  You can check your Tasks' stacks with the ROV tool in CCS to see if they have overflowed.

    Since NRP is 0 you don't have a helpful pointer to the actual location of the problem.  Other loose "pointers" to the problem area include:
        - IRP register: it points to the last "return from interrupt" location, and your last interrupt might have happened near the code (above the code) that is causing the problem.
        - Any GP register (A0-A31/B0-B31) that contains a code address: functions can be called "through" a GP register, and a register might have been used to call into the function that caused the exception.

    Regards,

    - Rob