This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RTOS/TM4C129CNCZAD: ti.sysbios.knl.Task E_stackOverflow references a task that doesn't exist

Part Number: TM4C129CNCZAD
Other Parts Discussed in Thread: SYSBIOS

Tool/software: TI-RTOS

I'm running into an issue where on a particular build configuration I have with 5 tasks, I get the error mentioned in the subject.  One of the tasks is pending on a `UART_read` that is blocking (so it's waiting for a semaphore inside the driver).  When it pends, I send one character to that UART and the processor/RTOS crashes with this stack overflow error.  What's more, the error it gives is

Task 0x2001e7d8 stack overflow.

Which doesn't correspond to any of my tasks.  After the crash with the debugger connected, looking in ROV none of the tasks look like they're even close to overflowing their stacks.  The weird thing is that if I specify the `BIOS.heapSize = 2048` (instead of not specifying it and letting it default to 4096), the crash doesn't happen.  Any idea how to debug this?


In the map file, that memory location looks like the spot where SYSBIOS stores some of its task info:

                  2001e7c0    00000080     UnitTest_pem4f.oem4f (.data:ti_sysbios_knl_Task_Module_State_0_readyQ__A)

  • Hi Kaveh,

    In your version, ROV is not the best at showing Tasks that are constructed (instead of created...refer to this for more details on the difference: ).

    Did you call Task_construct? If you did, I expect it is the culprit.

    If not, place a breakpoint in xdc_runtime_Error_raiseX__E. When it is hit, look at the backtrace to who caused it (new or old task). For example, I corrupted the top of the running task on purpose. On the next context switch I hit the breakpoint. I selected the checkStacks function and that shows the return location was after the old task Error_raise (which made sense because I corrupted that one). Since it was the old task, the entry function (task1Fxn) will be on the back trace.

  • I'm not using Task_construct as I'm using the cfg file to create the tasks I use (though perhaps the generated C code does?).

    When I set a breakpoint a the place you indicated with the red arrow, it never hits it.

    Furthermore, if I set a breakpoint right before the `Semaphore_pend` in the driver, then continue, the crash never happens.  It does crash even with a `Task_sleep(1000)` before this pend. 

  • The generated code does neither Task_create or Task_construct. It just defines the structure and updates the arrays as needed. Statically created tasks always show up in ROV.

    When adding the breakpoint, I add "xdc_runtime_Error_raiseX__E" into the disassembly window and set a breakpoint at the first instruction. Can you try that and see what happens? Make sure you have it so the stack overflow case is occurring (since it sounds like the code has been changed a bit).
  • If I set a breakpoint on the first instruction of `xdc_runtime_Error_raiseX__E`, the stack trace looks like this (and happens immediately after doing a System Reset, then run):

  • Click on the ti_sysbios_knl_Task_checkStacks__E line. If the Task.c file does not get opened, navigate to the directory to have CCS find it.

    Which Error_raise is being called (e.g. on my pic above, I see it is the old task one)?
  • It looks like it's probably the second one.

  • See if you can see what the oldTask was? Where is the stack? Does it have 0xbebebebe at the top of the stack? You should be able to debug it from here.

    Fyi...I'd check the System stack also for overflow (ROV->Hwi->Module).
  • Oops...I meant to say newTask.
  • newTask matches the address given in the console for the stack overflow (now it's 0x2001D968 as I changed something in the cfg file to use SysStd instead of SysMin).  oldTask appears to be the last task shown in ROV (which is the idle task).  Memory of that location is in the image below.

    I checked the system stack earlier as well.  It wasn't anywhere near its limit (it's at about 20%).

  • The readyQ is an array (based on priority) of the tasks ready to run. Most look empty (both prev and next address value pointing back to itself). I expect your application is corrupting something in the kernel and the scheduler thinks this is a Task_Object (which it is not). I explains why it disappears when you change the stack sizes since the corruption might be occurring somewhere else less important.

    Things to look at/try (based on experience)
    - You need to confirm you are not using the driverlib interrupt module to plug ISRs.
    - Look at HeapTrack to find allocated buffers that are being overwritten. More details here: training.ti.com/debugging-common-application-issues-ti-rtos

    I'm not sure I can give you any more pointers.

    Todd
  • We only have heap on for some BIOS-related tools but don't use dynamic allocation in any of our code.  HeapTrack isn't showing any buffers that are overflown.  It does show some heap being requested though, and the task handle matches the task the stack overflow is complaining about, but HeapTrack says that allocation is orphaned.  Breaking before `BIOS_start()` in main would indicate that this is being allocated by the BIOS somewhere.

    If I set a breakpoint on the heap location that would be written to by this allocation, it breaks in memset() at memset_t2.asm with no other frame information.

    I'm also pretty certain we don't allow driverlib to plug ISRs, we define them all in the cfg file instead.

  • An orphan block means that the task that allocated it no longer exists. Did you call Task_delete on a task. If so, was it statically created? Did you let a task fall out of its entry function?
  • Nope, I leave all the creation/management to the cfg file. There are other applications where I have let a task fall out of its function when it was no longer needed (with no bad effects), but in this application all tasks remain running and show as such in ROV when the error happens.

    As I said, I'm quite perplexed by this issue. I'd be happy to hop onto a webex or something and show you live if you think that would be helpful.

    Thanks for all the help, I really appreciate how responsive you've been.
  • Still working this off-line...
  • It turns out there was some memory corruption being done in the application. Unfortunately some tables in the kernel were the things being corrupted. Once the application code was corrected, things are working now.

    One thing to note, we wasted time pursuing some weird information in the Hwi view in ROV. The Hwi_Structs in the TI-RTOS driver structures do not get shown properly in ROV. With this version, it is expected and was been corrected in future versions of XDCtools and SYS/BIOS. Note: these newer versions cannot be used with TI-RTOS for TivaC.

    Todd