This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Codec Engine shutdown

Hello,

my Codec Engine iUniversal ARM application appears not to properly shut down.

Here is the code for ending the application:

/* teardown the codec */
cerr << "Tearing down codec" << endl;
UNIVERSAL_delete(hUniversal);

/* close the engine */
cerr << "Closing engine" << endl;
Engine_close(ce);

/* free buffers */
cerr << "Freeing memory for input/output/inOut buffers" << endl;
Memory_free(inBuf, inBufsSize, &allocParams);
Memory_free(outBuf, outBufsSize, &allocParams);
Memory_free(inOutBuf, inOutBufsSize, &allocParams);

/* exit Codec Engine Runtime */
cerr << "Exiting Codec Engine Runtime" << endl;
CERuntime_exit();

And this is the output for CE_DEBUG=2 (last few lines):

[...]
[DSP] @0,266,951tk: [+5 T:0x87d7459c] CN - NODE> returned from call(algHandle=0x87d74108, msg=0x87305880); messageId=0x000266b7
@1,036,346us: [+0 T:0x41cdf460] CE - Engine_fwriteTrace> returning count [2751]
@1,036,407us: [+0 T:0x41cdf460] CV - VISA_call Completed: messageId=0x000266b7, command=0x0, return(status=0)
@1,036,499us: [+5 T:0x41cdf460] CV - VISA_freeMsg(0x12d30, 0x41440880): Freeing message with messageId=0x000266b7
@1,036,590us: [+0 T:0x41cdf460] ti.sdo.ce.universal.UNIVERSAL - UNIVERSAL_process> Exit (handle=0x12d30, retVal=0x0)
Tearing down codec
@1,055,145us: [+0 T:0x41cdf460] ti.sdo.ce.universal.UNIVERSAL - UNIVERSAL_delete> Enter (handle=0x12d30)
@1,055,236us: [+0 T:0x41cdf460] CV - VISA_delete(0x12d30)
@1,055,267us: [+5 T:0x41cdf460] CV - VISA_delete> deleting codec (localQueue=0x10001, remoteQueue=0x2)
@1,055,328us: [+0 T:0x41cdf460] CE - Engine_ctrlNode(0x12d78, 0x12d68, 0x0)
@1,056,396us: [+0 T:0x4001e900] OP - doCmd> Enter (cmdId=3, proc=0x0)
@1,056,518us: [+0 T:0x4001e900] ti.sdo.ce.osal.Sem - Entered Sem_post> sem[0x124f0]
@1,056,610us: [+0 T:0x40bef460] ti.sdo.ce.osal.Sem - Leaving Sem_pend> sem[0x124f0] status[0]
@1,056,671us: [+0 T:0x40bef460] OP - getCmd_d> Exit (result=3)
@1,056,701us: [+0 T:0x40bef460] ti.sdo.ce.osal.Sem - Entered Sem_post> sem[0x12508]
@1,056,762us: [+0 T:0x40bef460] ti.sdo.ce.osal.Sem - Leaving Sem_post> sem[0x12508]
@1,057,464us: [+0 T:0x4001e900] ti.sdo.ce.osal.Sem - Leaving Sem_post> sem[0x124f0]
@1,057,556us: [+0 T:0x4001e900] ti.sdo.ce.osal.Sem - Entered Sem_pend> sem[0x12508] timeout[0xffffffff]
@1,057,617us: [+0 T:0x4001e900] ti.sdo.ce.osal.Sem - Leaving Sem_pend> sem[0x12508] status[0]
@1,057,647us: [+0 T:0x4001e900] OP - doCmd> Exit (result=1)
@1,057,708us: [+0 T:0x4001e900] OT - Thread_delete> Enter (task=0x12540)
@1,057,800us: [+4 T:0x4001e900] OT - Thread_delete> pthread_cancel (0x3)
@1,057,891us: [+4 T:0x4001e900] OT - Thread_delete> pthread_join (0x0)
@1,057,952us: [+0 T:0x4001e900] OT - Thread_delete> Exit (task=0x12540)
@1,058,013us: [+0 T:0x4001e900] ti.sdo.ce.osal.Sem - Entered Sem_delete> sem[0x124f0]
@1,058,074us: [+0 T:0x4001e900] ti.sdo.ce.osal.Sem - Leaving Sem_delete>
@1,058,135us: [+0 T:0x4001e900] ti.sdo.ce.osal.Sem - Entered Sem_delete> sem[0x12508]
@1,058,166us: [+0 T:0x4001e900] ti.sdo.ce.osal.Sem - Leaving Sem_delete>
[root@xxx ~]#

So the program seems to exit before calling Engine_close(ce); What could be my mistake?

Actually I'm trying to execute the ARM application from a batch script through SSH (written in GNU Octave). It needs to run hundreds of times with various input signals as stdin for a test. However, after a number of runs it hangs at the very same place (after the last line of the CE_DEBUG output above). Is there anything I should be careful about when running a CE application many times from a batch script?

Software versions:
DSP Link 1.65.00.03, CE 2.26.02.11, DSP/BIOS 5.41.10.36

Thank you for your support.

  • Someone is calling exit() which is calling the atexit()-registered functions (or, in this case, just one of them?).

    We can see the call to Engine_ctrlNode() from VISA_delete(), and Engine_ctrlNode() sends a message to the NODE on the DSP and then blocks waiting for a reply from the DSP.  The very next thing we see is the start of CE's Processor module cleanup() function, which has been registered with atexit().  The remaining trace is just the Processor cleanup(), then the shell prompt.  What we don't see here is the rest of CE's cleanup via atexit()-registered functions.  The Engine module also has one, but we don't see that, which I'm at a loss to explain.

    The thread ID of the Processor module cleanup() trace is 0x4001e900 (the thread that calls doCmd() above).  This is probably also the thread ID of the thread that called exit().  Can you identify that thread ID, perhaps using earlier CE_DEBUG trace?

    DSPLink also has atexit() processing, as well as signal handling.  I suspect that is where the exit()ing is coming from.  Can you enable DSPLink trace and see what comes out?  Please see this wiki page for help with DSPLink trace: http://processors.wiki.ti.com/index.php/Enabling_trace_in_DSPLink

    Regards,

    - Rob

     

  • Thank you for your reply, Rob.

    Please find attached the output of CE_DEBUG=2 and also the DSPLink trace:

    5415.outdebug.txt

    By the way, do I have to worry about the ti.sdo.ce.osal.Sem - Entered Sem_pend> sem[0x12508] timeout[0xffffffff]  messages?

    Thanks.

  • Thankyou for providing the DSPLINK traces.  They show that a signal was received which initiated the system shutdown.  The function DSPLINK_sigHandler() is called with the value 0xb, which is SIGSEGV.  DSPLINK_sigHandler() calls DSPLINK_atExitHandler() and then calls exit().  Since DSPLINK is intercepting this then the default SIGSEGV handling, which includes producing a core file, is not run.

    I can't easily identify from the traces which piece of code might be producing this.  The last thing we see before the sig handler is Engine_ctrlNode(), but the parameters to it seem valid.

    In order to determine the cause of the SIGSEGV, a core file would help greatly.  With that core file in place, gdb would tell you the location of the faulting access.

    In order to have a core file produced, the following need to happen:
        - ensure that your shell allows core files to be produced:
            % ulimit -c 100000
        this will allow core files of up to 100000 bytes to be produced.  Most shells default to "ulimit -c 0" which prevents a core file from being produced at all.  Do "% man ulimit" for more details
        - either:
            - disable DSPLINK's signal handler for SIGSEGV (by following instructions here: http://processors.wiki.ti.com/index.php/DSPLink_Signal_Handling_in_Codec_Engine), which will allow the default signal handler to run which produces a core file.
            - provide your own handler in its place that calls abort().  The abort() system call will cause a core file to be produced.  To provide your own handler you will need to disable DSPLINK's signal handler for SIGSEGV and then enable your signal handler by the usual "signal" mechanism.
            - modify DSPLINK's signal handler and rebuild DSPLINK libraries and relink your app with those new libraries.  The modification would just be to replace the exit() system call inside DSPLINK_sigHandler() with abort(), or perhaps just add an abort() call before the sig handler calls DSPLINK_atExitHandler() so that you avoid all the processing of DSPLINK_atExithandler().

    DSPLINK_sigHandler() and DSPLINK_atExitHandler() are defined in the file <dsplink_install_dir>/packages/dsplink/gpp/src/api/Linux/drv_api.c

    Once you have a core file you can run 'gdb' on it and it should show you the location of the segfault.  Or, you can bypass the core file altogether by just disabling the DSPLINK signal handler and running your application under the control of gdb, and when the segfault happens gdb will show you the location of the violation (it would show you the same info as is shown for the core file method).

    If you don't have gdb then you may have gdbserver instead, which is run on the target and talks to a front-end gdb running on a Linux host connected to the same network as your board.

    Regards,

    - Rob

     

  • Alex59370 said:

    By the way, do I have to worry about the ti.sdo.ce.osal.Sem - Entered Sem_pend> sem[0x12508] timeout[0xffffffff]  messages?

    No, don't worry about those.  The CE Processor modules creates a daemon thread that handles communication with DSPLINK.  This thread has input and output pipes for communicating to it, and data on those pipes is synchronized by way of these Semaphores.  I can see from the traces that the above message relates to these.

    Regards,

    - Rob

     

  • Thank you very much for your detailed answer, Rob.

    While I was able to create a core dump file using my own signal handler, I didn't succeed loading the debug version of CMEMK. This is what is shown when the startup script tries to load the module:
    7282.cmem_err.txt

    Can you please tell me how to proceed?

    Thanks.

     

  • It looks like you're using Linux Utils 2.26.01.02, which contains a bug that would cause your crash.  Please upgrade to Linux Utils 2.26.02.05 for a fix for this.  You can find the release here: http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/linuxutils/2_26_02_05/index_FDS.html

    Regards,

    - Rob

  • I apologize for the long delay.

    Thank you for your suggestion, Rob. The segmentation fault did not occur again. The original problem, however, persists: After around 350 calls of the software on the target from a batch script on the host, the program (batch script) suddenly hangs. It does seem that this problem is unrelated to the Codec Engine shutdown procedure though.

    Again thanks.

  • Thanks for letting us (or, the thread really) know that you're having some success.

    Do you have CE_DEBUG enabled during your batch run?  Since it is foundational software, CE traces might give a clue as to where the system is hanging.

    Regards,

    - Rob

     

  • Yes, CE_DEBUG=2 and DSPLink tracing are both enabled: 1351.errfile.txt

    Given the output, my assumption was that the target program now properly finishes. I can run the software on the target from a Bash script (via SSH) thousands of times, but when I run it from a GNU Octave script (also via SSH), the above mentioned problem occurs.