This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28075: Illegal Instruction / Corrupted Stack Trace

Part Number: TMS320F28075

My application uses TI-RTOS with 5 tasks.  I'm getting an Illegal Instruction trap intermittently at a certain place in my code.  But what is puzzling is the stack trace appears corrupted:

illegalOpIsr is my ISR handler (as a note, you have to register this with hwi_plug(), not hwi_create() for anyone trying to figure out similar issues).  The assembly in RunBluetoothTask() where the ISR appears to be called from is completely valid.  If I return from the ISR, I can step through subsequent lines of code.  However, the next task switch (when an Event_pend() is called) doesn't work properly.

I have looked through ROV at my stack usage using the initStackFlag and checkStackFlag parameters in the cfg file.  I'm OK there.  I think I'm trashing the RTOS internal data by writing past the end of a buffer, but need a better sense of how to find what particular data is getting overwritten.  What data contains the base of the call stack?  It's obviously incorrect.  Any other tips for debugging this kind of problem?

Looking at the registers, the RPC is 0x000015 as shown in the call-stack.  Again, I'm not sure how that's getting set or what might be corrupting it, but it's clearly not correct.  As I step through Event_pend(), the Task_self() function returns a value that is an invalid address.

  • Hi Rob,

    Can you do a ROV->BIOS->Scan for Errors? This checks all the modules to see if anything is not expected. Are you using any interrupts not managed by the kernel? For example, do you use the interrupt keyword and not tell the kernel about it. The kernel supports this, but it must be told not to touch the interrupt and the interrupt cannot use any kernel APIs.

    Todd
  • Todd,

    ROV->BIOS->Scan for Errors? Yes - I did that and there were no errors.

    The only interrupt not managed by the kernel is the illegal operation (int 19). I added that after getting the HWI interrupt unplugged function call on interrupt 19. So, the problem was already there. The only thing that handler does is increment a counter which gives me a place to set a breakpoint.

    All other interrupts are created using Hwi_create().

    Rob
  • Hi Rob,
    Can you try to narrow down the problem some more? Maybe get rid of some of the tasks, or limit the functions they call. Are you calling any Flash initialization functions? If so, they must be called from RAM.
    Best regards,
    Janet
  • Janet,

    A number of the tasks need to work together and would require creating large stub functions to generate fake data so the rest of the system would work. However, I was able to do that to some extent and may have isolated the particular area of a particular task that is causing the problem. This is a little difficult because sometimes the problem shows up in minutes, other times it takes hours to see the crash.

    I was hoping for something a little more surgical - if I knew what specific data items within the BIOS code controlled the task being called, I could set a watchpoint for that to be written, then find out who was writing it and from where. It sounds like that is not doable. Correct?

    Rob
  • Hi Rob,

    Can you put a watchpoint on the end of the buffer you think you might be overwriting?  You could also try adding a Task switch hook function where you can check if your task is switched to the running state.  Then you would know which task was running previously.

    Best regards,

    Janet

  • Rob,

    Did this get resolved?

    Todd

    [Updated 7/6: Marking this as TI Thinks Resolved due to no responses from original poster]

  • I ended up doing a number of the items listed above:

    1. The ROV did end up at times showing that I needed to increase the stack size, but that did not seem to be the main problem.
    2. There was a limit to what tasks I could comment out because of the nature of the program. I was able to generate fake / canned data from the thread which seemed to be the problem. When I did that, the crash seemed to go away. Since it was an intermittent, it was hard to say conclusively - but it seemed that way.
    3. I re-enabled the "problem" thread, but put bounds detection around the array accesses - that alone made the crash much less frequent and the bounds check never showed a problem. That was a huge puzzle as to why the problem would go away from just trying to detect a problem. Code got moved?
    4. After adding bounds detection, I did see a crash in a different part of the same thread - there was a function that populated an array which was a local variable in the calling function. I hadn't checked that particular section of code because it's not what gets called most of the time. It made sense of the symptoms - the stack was getting corrupted. My main arrays are file scope variables (are not on the stack) and I had already moved them through the linker cmd file. However, a local variable is a different matter. I was not bounds checking that access. I added bounds limiting and have not seen the problem since.

    Because of the intermittent nature, it's hard to say definitively that it's fixed, but I think it is. The big takeaway for someone looking at this kind of problem is that if you've moved your main file-scope arrays away from the region of your stack and you know your stacks are big enough, the problem is very likely that you are corrupting a local array since those live on the stack anyway.

    Thanks for the help!
  • Thanks for the update!