This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

runtime exception

Other Parts Discussed in Thread: OMAPL138, SYSBIOS

Hello,  my target is a LogicPD OMAPL138 on an experimenter board.  I'm running SYS/BIOS 6.33.05.46, XDCtools 3.23.02.47 and TI complier v7.3.5.  When my program runs I get an exception (see console output below).  If I change to SYS/BIOS 6.33.1.25 and XDC 3.23.00.32 the exception is not present.  This may point towards BIOS, but I don't want to jump to that conclusion because I've been looking at this problems for a few days now and have noticed the exception will come and go by changing application code (seemingly unrelated), or changing the size of ti_sysbios_rta_Agent.sysbiosLoggerSize, for example.    I've looked at task's stack sizes using ROV and don't see any evidence of a stack overflow.  If I look at the "Exec Graph" in CCS the post exception view always looks similar, the active task is ..idle_loop().  The concerning point is a large number of Semaphore post/pends during the ..idle_loop() (this doesn't occur when the program is running ok).  Using CCS "Raw Logs" info and searching the ROV tree it looks like these two semaphores are part of ti->sysbios->gates->GateMutex.

About the console output:

Using CCS disassembly window, pc = 0x1182c4b4 looks to be inside a call to ti_sysbios_hal_hwi_checkstack.  This is consistent, but I have seen it have a couple other addresses, but most of the time this is the value.

Using CCS disassembly window, sp = 0x11813ed0 looks to be safely inside ti_sysbio_knl_Task_Instance_State_6_stack__A.  This is very consistent.  I've never seen it fail and not point inside this stack.

[C674X_0] A0=0x1 A1=0x0

[C674X_0] A2=0x0 A3=0x11830770

[C674X_0] A4=0x0 A5=0x10df0800

[C674X_0] A6=0x0 A7=0x16e36000

[C674X_0] A8=0x0 A9=0x1180fc98

[C674X_0] A10=0x14000100 A11=0x118306b4

[C674X_0] A12=0x0 A13=0x0

[C674X_0] A14=0x0 A15=0x0

[C674X_0] A16=0x302e3030 A17=0x0

[C674X_0] A18=0x11819220 A19=0x40

[C674X_0] A20=0x1180ed20 A21=0x0

[C674X_0] A22=0xc1f178c9 A23=0x1c0a39ca

[C674X_0] A24=0xa14b5d17 A25=0xba3677e5

[C674X_0] A26=0x1c14164 A27=0x1c14164

[C674X_0] A28=0x1c14160 A29=0x1c14160

[C674X_0] A30=0x20 A31=0x0

[C674X_0] B0=0x1 B1=0x0

[C674X_0] B2=0x0 B3=0xc

[C674X_0] B4=0x118309d0 B5=0xbe

[C674X_0] B6=0x79 B7=0x78803b7

[C674X_0] B8=0x0 B9=0x1183087c

[C674X_0] B10=0x1 B11=0x118306b0

[C674X_0] B12=0x0 B13=0x0

[C674X_0] B14=0x11831e00 B15=0x11813ed0

[C674X_0] B16=0xbebebebe B17=0xbebebebe

[C674X_0] B18=0xa B19=0x78

[C674X_0] B20=0x69 B21=0x3588fad4

[C674X_0] B22=0x20f B23=0x0

[C674X_0] B24=0xc2093a75 B25=0xe7d21cf9

[C674X_0] B26=0x2 B27=0xd78eafb7

[C674X_0] B28=0xbd98d524 B29=0x11f03dc0

[C674X_0] B30=0x1182b410 B31=0xc

[C674X_0] NTSR=0x1000f

[C674X_0] ITSR=0xf

[C674X_0] IRP=0x1182c4b4

[C674X_0] SSR=0x0

[C674X_0] AMR=0x0

[C674X_0] RILC=0x0

[C674X_0] ILC=0x0

[C674X_0] Exception at 0xc

[C674X_0] EFR=0x2 NRP=0xc

[C674X_0] Internal exception: IERR=0x19

[C674X_0] Instruction fetch exception

[C674X_0] Opcode exception

[C674X_0] Resource conflict exception

[C674X_0] ti.sysbios.family.c64p.Exception: line 248: E_exceptionMin: pc = 0x1182c4b4, sp = 0x11813ed0.

[C674X_0] To see more exception detail, use ROV or set 'ti.sysbios.family.c64p.Exception.enablePrint = true;'

[C674X_0] xdc.runtime.Error.raise: terminating execution

Thanks in advance for your help.  It is much appreciated.

Dave M.

  • Dave,

    I think the line with the pc and sp is misleading and incorrect and needs to be corrected in BIOS, however your problem is that your program is branching to address 0xC and executing from there.  How do I know this?  Because the NRP=0xC and the NRP shows you where the Exception occurred.  Your value of B3=0xC too and this is typically the return address.  I think what could be going on is that one of either your Hwi Stack or Task Stacks is getting overflowed or corrupted since B3 usually is saved unto the stack.  I would increase your stack sizes.

    Judah

  • Thanks for the quick response.  I'll try your suggestions.

    When you say Hwi Stack you mean the system stack, is that true?  I adjust that within the .cfg file by adjusting the following line.

    Program.stack = 0x1000;

    Is there a good way to view the status of the system stack?  I've used the memory browser in CCS and looked at _stack and _STACK_END, making sure the bottom of the stack has plenty of 0xBEBE values.  Is there a better way?  

    Using ROV, my task stacks all look ok, is it possible this feedback is incorrect?

    Dave M.

  • Hwi stack is the same as system stack.

    Using ROV is the right thing to do.  I would trust what ROV is telling you.

    Judah

  • ROV indicates my task stacks are ok.  I increased my system stack 4x and that still didn't eliminate the problem.  For the system stack I see a peak at 1036 which is much lower than my original size of 4096 and of course the new system stack size of 16k.

    Do you think there is any concern that switching BIOS versions can make this problem come and go without changing my application code or do you think that all I'm doing is moving code around and exposing the issue (some kind of memory overrun bug in my program?).

  • David,

    It is possible that changing BIOS version makes this problem come and go only because things may get linked at a different address.  Another words, there could be a real problem here which gets mask when using a different BIOS version.

    I think you should stick with the version that is triggering the Exception and try to figure out where the Exception is occurring.  Typically using NRP and B3 together sort of gives you a clue as to where the Exception is happening but in your case they both are the same and a bad value.  Typically the B3 register is not modified directly by the user unless you have some hand assembly code.  If you do then this could be another place that you need to check.  Now it could be that your stack is not getting overflow but that something is corrupting your stack which is a reason why B3 is bad.

    Things to try:

    1.  Put a breakpoint on the NMI vector, I think the symbol is:  'ti_sysbios_family_c64p_Hwi1".
    2.  When you get here, look at B15 - This is the stack pointer.  You should see the bad B3 value [0xC] somewhere near B15.
    3.  This will tell you which  stack is being corrupted.
    4.  If Hardware probe points are available you could try probing the address which you suspect is getting corrupted.  Or you can try running your program with breakpoints and monitoring the suspected bad address to see if you can catch the problem before it happens.

    Judah

  • Judah,

    I identified that the idle tasks stack was being corrupted.  Looking at the memory map I noticed this was close to a buffer I have to use Cache_inv() on. A little more digging and I realized I didn't align(and size) my "buffer" to the correct cache line size.  So when I called Cache_inv(), I would invalidate buffer as I intended, but I would also lose other unknown valid cached data (e.g. idle task stack).  This explains why my problem would appear to come and go with different builds.  

    Thanks for your help.

    Dave M.