This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

mcBsp utl_halt?

Other Parts Discussed in Thread: OMAP-L138

I'm getting a strange error on an omap-l138, with dsp/bios 5.  My application launches and then 5 seconds later, the audio task begins (msbsp).  I see a few frame clocks go by and then the application bombs into UTL_halt.  Here is the system log message seen:

 

00000002    00000002    C3E8471D    00000009

EXC_exceptionHandler: EFR=0x2

00000004    D43B5600    C3E8473C    00000009

NRP=0xD43B5600

00000006    C3E84756    C3E84747    00000009

mode=supervisor

00000008    00000012    C3E8460E    00000009

Internal exception: IERR=0x00000012

0000000A    00000000    C3E8464A    00000009

Fetch packet exception

0000000C    00000000    C3E84692    00000009

Resource conflict exception

0000000E    C3E84761    C3E8543C    00000009

SYS abort called with message 'Run-time exception detected, aborting'

 

If I put a breakpoint at hwi1, I see that B3 = 0xC3E343E4, which is the line 1851 "chanHandle->currentError = IOM_COMPLETED;" in Mcbsp.c.

 

On some boards, the exact same code appears to work.  If I make a minor change to the code, recompile, and relaunch using dsp/link, the problem does not appear.  If I rebuild the filesystem with the new binary, that appears to be working, it then breaks again.

 

I'm kind of unsure how to continue debugging this, is there a way to get the stack back of whatever was executing before the NMI?

  • That was a good move to put a BP at hwi1, allowing you to see the machine state at the time of the exception before that state gets trashed by the exception handler.

    When stopped at the hwi1 label, register B3 doesn't tell you that much.  It is either the last place to which your application returned before the exception, or the place your application *would have* returned had the exception not occured.

    The more revealing information is the NRP (in your case, NRP=0xD43B5600).  This is the value of the PC at the time of exception detection.  Typically the problem code is either at that instruction or a few cycles before it.  Does the address 0xD43B5600 look like a valid code address?  Can you disassemble that area?

    It's possible that the NRP value is not a valid code area at all, and if that's the case then that means the program ran off into the weeds.  If so, B3 is probably your best indicator of where the program was executing before it went off "into the weeds".

    As for a stack backtrace, it's difficult on the C6x architecture, but I have had some success in the past by "jamming" the PC with some code address and "jamming" the SP with the value it would have had at that code location, then issuing the "backtrace" command from CCS (I don't recall exactly what the name of the command is).  The tough part is determining what the SP was at "that" location (wherever "that" is).  For your situation where you're stopped at the hwi1 label, the SP is already setup for the frame that was executing at the time of the exception, so you might have good success by simply putting the NRP value into the PC register and invoking the backtrace (you will need application symbols to be loaded for this).

    Regards,

    - Rob

     

     

  • In this case, when I hit the HWI, my SP is 0x11800770, which lies in the HWI stack.  Does that mean the error happened during a HWI, or was the SP changed on me?

  • In this case, the answer was found by using the ROV to look at the TSK status when hwi1 was hit.  It told me which task was running, so all I had to do was step through that task enough times that I knew which area was causing the interrupt.  Then it was pretty obvious which variables weren't initialized.

  • Hi Mike,

    I have a similar issue with a "Resource Conflict Exception".  So you can you explain in more detail what the issue was with the unitialized variables and how that caused your issue?

  • Eric,

    I didn't look at it in too much detail once I realized what was happening.  But mainly, the issue was that there were a few pointers in a struct that were pointing off into bad locations in memory.  I didn't step through the code to verify that it was dereferencing these pointers that caused the exception, but that should be the reason.

  • Mike,

    Was it your code or was it the driver code in mcbsp.c or some other driver?

    thanks!

  • It was in some of my code.  The mcbsp was the driver of that task, only by means of feeding it data to process.

  • I see that you have already found your problem, so FYI...

    On the C6x architecture, when any interrupt (including NMI or exception, which vector to hwi1) happens there is *not* an automatic change of the SP.  DSP/BIOS does use a "HWI/SWI stack", and the switch to this stack is controlled by the DSP/BIOS interrupt handling code.  So, if the SP lies on the HWI stack when you hit the hwi1 breakpoint that means the exception happened during either HWI or SWI processing, when the SP is already on the HWI/SWI stack.\

    Regards,

    - Rob

     

  • Eric,

    Unless there is a bug in the compiler, you should *never* get a "Resource Conflict Exception" from valid C code.  I suspect Mike's issue was an uninitialized code pointer that got called and the PC started executing garbage (hence the Fetch Packet Exception), and the resource conflict arose from that garbage.

    It's even difficult for assembly code to generate a resource conflict, as the assembler typically catches such code and warns about it.

    Regards,

    - Rob