What DSP/BIOS calls result in UTL_halt?

ReinierC

Good day experts,

Can you please advise me which calls in DSP/BIOS cause UTL_halt() to be called? From the DSP/BIOS user guide and API reference, I found the following:

- The default event handlers in the ECM module

- SYS_exit

- SYS_abort

My DSP/BIOS (v5.42.1.9 in CCS 5.4) application crashes somewhere and the only information available from the call stack is that UTL_halt was called. SYS_abort is called explicitly in my application, however I did set a breakpoint which is never reached. I do not know if SYS_exit is called indirectly somehow? I am using the ECM module through the BIOSPSP drivers, but none of the other ECM events are enabled, so I'm not expecting UTL_halt to be called from there.

Do you perhaps have any other suggestions?

Regards

Reinier

over 12 years ago

0 Steven Connell over 12 years ago

TI__Mastermind 45025 points

Hi Reinier,

Which h/w platform are you using? Which processor?

It could very well be being called from within BIOS due to some critical error happening. You should put a break point at UTL_halt and then change the PC to the value of the return address to see where it's being called from. The return address should be stored in a register - on C6x cores it's register B3. On ARM it's the LR. But I don't know your h/w so can't tell you for sure.

Once you get to the return address, set a break point there and re-run. Then, once that hits, repeat this trick of setting break points at the return address and this will allow you to trace backwards and hopefully trace where the error's coming from.

You could also try checking the args passed to UTL_halt to see if it was an event number.

From the DSP/BIOS 5.x user's guide ECM section:

"run UTL_halt (which runs an infinite loop with all processor interrupts disabled) and pass their event
number as an argument."

Steve

0 ReinierC over 12 years ago in reply to Steven Connell

Expert 2425 points

Hi Steve,

I am using the C6748 DSP on our own custom hardware platform.

Thank you for the tip, but can you please just provide me with some more information to achieve this? UTL_halt is a BIOS function, so I looked in the .map file and the address I found is as follows:

80007040 _UTL_halt

Is this the address I have set a breakpoint at? Once the breakpoint is reach, should I then copy the value of register B3 to register PC (I believe these registers are located in the debug perspective of CCS5 under "Core Registers"?).

I'm assuming if I step through the code from this point onwards, it should expose the calling function of UTL_halt?

Regards

Reinier

0 Steven Connell over 12 years ago in reply to ReinierC

TI__Mastermind 45025 points

Reinier,

ReinierC said:

80007040 _UTL_halt

Is this the address I have set a breakpoint at?

Yes. The easiest way is to use the CCS break point manager. You can enter a new break point by entering the symbol name:

ReinierC said:
Once the breakpoint is reach, should I then copy the value of register B3 to register PC (I believe these registers are located in the debug perspective of CCS5 under "Core Registers"?).

B3 and other registers are shown in the Core Registers veiw. But you don't want to copy B3 it to the PC. You just want to use B3 to get back to where you came from (i.e. to see where the call to UTL_halt was made) so that you can set a break point there. You can enter "B3" into the Disassembly window (view -> disassembly) to take you to that location.

Once there, you can scroll up in the Disassmbly window to see which function you are in (it may be SYS_abort or whichever SYS_* function that called UTL_halt). Then you can set a break point at that location from within the disassembly window itself, or use the same method described above to set it.

Next, reload/rerun and see if you hit that breakpoints. If you hit the "B3 break point" you set before, you can then repeat the same trick, using B3 to get back to where *that* function was called from (as B3 is always the return address of the current function you are in), set a break point, reload/rerun.

Steve

0 ReinierC over 12 years ago in reply to Steven Connell

Expert 2425 points

Hi Steve,

Thank you for the additional information. I am currently in the process of tracing the offending call. Unfortunately is takes several hours to reproduce the bug, so I have made a lot of progress yet. However, I did manage to trace it back to SYS_abort, so now I have to figure out where SYS_abort is called through the method you described above. I will give you feedback when I have more information.

Regards

Reinier

0 ReinierC over 12 years ago in reply to Steven Connell

Expert 2425 points

Hi Steve,

I have managed to trace the bug a little further:

1) UTL_halt is called from UTL_doAbort

2) UTL_doAbort is called from SYS_abort

3) SYS_abort is called from the EXC_exceptionHandler

4) EXC_exceptionHandler is called from EXC_dispatch

Finally it appears as if the exception dispatcher is called from MEM_alloc. I have attached a screenshot of where the value of the B3 register took me in the disassembly window, which appears to be somewhere in the MEM_alloc function:

MEM_alloc is obviously called multiple times throughout the program execution, so it won't be of much help to put a breakpoint there. Do you perhaps know, which kind of inputs to MEM_alloc would cause an exception, so that I can check for dodgy input parameters?

Regards

Reinier

0 Steven Connell over 12 years ago in reply to ReinierC

TI__Mastermind 45025 points

Hi Reinier,

This is good, you're definitely making progress here!

There should be some information from the EXC module that was written out. I think you can see this in ROV, under "LOG". You should check the "LOG_system" log. Does it show any info on the exception?

If there's not enough information in the LOG, we may need to define some hook functions that the EXC module will call when the exception happens. Hopefully those will allow you extract more info about the problem.

You may want to have a look at Appendix C of the DSP/BIOS API Guide (spru403s.pdf, it's in your BIOS installation).

Steve

0 ReinierC over 12 years ago in reply to Steven Connell

Expert 2425 points

Steve,

I did not see any logs in LOG_system in ROV. However, I think I probably caught the exception before the logs were made. I can definitely verify that the exception results from MEM_alloc(). I suspect this should be something rather serious, because if MEM_alloc() could not allocate the memory, it would typically only return a NULL pointer, and we're checking for that in any case.

From the DSP/BIOS API:

"If the memory request cannot be satisfied, MEM_alloc calls SYS_error with SYS_EALLOC and returns MEM_ILLEGAL."

In my disassembly window I also ended up in the lock2() function and from searching the forums I found that the implementation of the function is:

/*
* ======== lock2 ========
*/
static Void lock2(Void)
{
    if (TSK_isTSK()) {
        LCK_pend(&_MEM_mutex, SYS_FOREVER);
    }
    else {
   SYS_abort("*** MEM lock NOT CALLED IN TSK CONTEXT");
    }
}

So I am also assuming the UTL_halt() results from this SYS_abort() call. However, I do not exactly know how a MEM_alloc() is not called from a TSK context, since we are definitely not calling it from any HWIs or SWIs, as described in the DSP/BIOS user manual. Do you perhaps have any insights here?

Furthermore, I read through Appendix C of the DSP/BIOS API and although it was useful information, it does not really give an example or some clear instructions on how to configure hook functions. I had a look at the source code (exc.c and exc_asm.s64P), but how should it be used? Should I add modified versions of these source files to my project? What exactly should be modified? Do you perhaps have some documentation with examples?

In any case, would it be useful to spent the time to figure out exceptions, if I already know that MEM_alloc() is the culprit?

Regards

Reinier

0 ReinierC over 12 years ago in reply to Steven Connell

Expert 2425 points

Steve,

I called TSK_isTSK() just before calling MEM_alloc() and inserted a breakpoint if TSK_isTSK() returns zero. It actually hit the breakpoint after a whole day of running the system. As expected, the calling function was not a HWI or a SWI, so from the macro in tsk.h I guess it must be that current task is then equal to the KNL_dummy task?

What does this mean? Is the task stack corrupted somehow?

0 Steven Connell over 12 years ago in reply to ReinierC

TI__Mastermind 45025 points

Hi Reinier,

I still believe that memory is being corrupted (stomped on) by some part of the application. One possibility is that your application's heap is being stomped on.

The BIOS MEM module (which handles the heap) contains a linked list of free memory blocks. When a call to MEM_alloc is made, it traverses this link list in order to find the first free block that's large enough to satisfy the allocation.

But, if say one of the 'next' pointers in that free list has been overwritten, that pointer would then be invalid and trying to access it would cause an exception. So what you have to do here is figure out who/where/what is overwriting memory that it shouldn't.

To start, let's see what ROV tells us for the MEM module when the failure occurs? The MEM module's ROV view shows you how much heap you have used and how much is free. It also shows a graphical representation of the linked list of free blocks I described above.

I think you should see a "red box" in the ROV MEM module when the failure happens. If you hover the mouse on that box, it should display an error message.

You can also try using the MEM_stat API to help you solve this. I expect that the MEM_stat function should fail in the same way, but without the memory allocation. MEM_stat just gives you a status of the heap. I think calling MEM_stat will allow you to test if the problem has happened yet.

So what you can do is sprinkle some MEM_statcalls around your application. I'm hoping that you'll hit the error on one of the MEM_stat calls. Hopefully this would help you to get a better idea of which point the problem begins in your application.

Assuming you are able to hit the problem via a MEM_stat at "such and such point" in your app, you can then use break points to stop the app at "such an such point" and inspect the state of your app at that point. What's going on at that point in the app? Maybe some operation is about to occur (or just occurred) that is going to stomp on the memory. Hopefully this strategy could help point you to the culprit.

Steve

0 ReinierC over 12 years ago in reply to Steven Connell

Expert 2425 points

Hi Steve,

I guess this response is probably more for future C6748 users coming across this problem.

I found the cause for this memory corruption to be somehow related to the use of the DSPF_sp_mat_mul_cplx() hand-written assembler function in the C6748 DSPLIB 3.1.0.0. I am aware that there are a number of restrictions applicable to using this function, but I ensured that these requirements were not violated, as specified at this link:

http://processors.wiki.ti.com/index.php/C674x_DSPLIB#DSPF_sp_mat_mul_cplx_.28Complex_Matrix_Multiply.29

The only possibility is that the arrays were not double word aligned.

In any case, I decided that the natural C version is optimized well enough by the compiler, so now I'm just using it instead of the assembler version and that ultimately solved my problem.

Processors

Processors forum

What DSP/BIOS calls result in UTL_halt?