I've been at this for some time now, and I think it's time to pull in the big guns here at e2e.
I'm trying to debug a seemingly random crash in my C5510 DSP/BIOS application. The result of the crash is usually that the DSP ends up in a random location in memory, though sometimes, the DSP ends up at its reset vector and starts running from the top of the application. I've verified that the cause isn't a stack overflow (through identification of the watermark in the various stacks/sysstacks and by setting watchpoints at the top of each stack), though the possibility of stack corruption still remains. The problem is I don't know how to track this down using the tools at my disposal (DSP/BIOS and CCSv5).
I've examined the document at http://processors.wiki.ti.com/index.php?title=DSP_BIOS_Debugging_Tips . It suggests using the kernel object viewer (or what I believe to be the ROV in CCSv4/v5), but I've been unable to get this to start. It occurs to me that maybe this is a RTSC tool and thus unavailable to me in a DSP/BIOS application. Is this true? The error message I get is below.

In my application, I have 1 TSK (in addition to the idle TSK), and all other processing is HWI/SWI based (3 HWIs, 2 SWIs). In other words, the application lives and dies by a HWI generated by a TIMER object which sets off processing of ADC samples read in through a McBSP by the single TSK. The processed data is sent up through HPI to a small interface processor responsible for gating data between 2 DSPs and USB. The interface processor is sending periodic pings to the DSPs to which the DSPs respond. The interface processor is also sending tasking messages to the DSP to tell it how to process the data samples. This messaging happens first with an interrupt generated by the interface processor, which creates a HWI in the DSP. This HWI handler posts a SWI for physically handling each message.
As mentioned previously, there's no way (that I know of) to detect within the DSP that the application has "crashed". The way I've detected it to this point is to breakpoint in the interface processor when a DSP stops responding to its pings. The problem is that at this point, it's too late...the DSP has already crashed. There's no stack trace information, and the only information I do have are the Raw RTA logs and various registers.
I call the crash "random" because it always takes a different amount of time to happen (even with the same requested tasking of the DSP). Sometimes the application will run for 10-15 minutes, and other times it can take over an hour. This leads me to believe that the problem is timing based. In other words, the timing of the HWIs from the interface processor in relation to the timing of the TIMER generated HWIs.
If I can provide more information which will enable someone to help me track this down, please let me know. Thanks in advance for your help. In summary, my main questions are:
- Am I able to use ROV with DSP/BIOS?
- Is there some way the RTA tools might be beneficial? Is there a wiki or a good documentation resource?
- Is there a way to detect the DSP "crashing" other than waiting for it stop responding?
- How would I go about getting more information regarding the stack trace?
- How would I detect stack corruption (instead of stack overflow)?