This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

OpenMAX application debugging with gdb/gdbserver

Has anyone else experienced problems debugging Linux omx applications with gdb/gdbserver?  I find that once omx starts it's multitude of threads, I can no longer break with gdb.  Enabling the debug output from gdbserver itself, and taking a look at the relevant PIDs in /proc, I can see that two of the threads cannot seem to be stopped.  Is it possible they are doing some unstoppable operation in kernel mode? Anyone else experience this?

Thanks,


Joel

  • Can anyone from the community or TI confirm that they have successfully used gdb remote-debugging with an OpenMAX application?  I am using EZSDK 5.03 and have tried tools (gdb/gdbserver) from two versions of CodeSourcery (including the latest 'CodeBench' version).   

  • I have experienced it, but cannot offer you any advice. I built a simple mutithreaded application and was able to do mutithreaded gdb debugging on it, but have always been completely unsuccessful with OMX multi threaded debug. However i did find that single threaded debugging works fine (using an old version of gdb server). This is only really helpful if you are debugging initialization in the main thread, but it is something.

    TI, please look into this!

    -Ben

  • If i may add to  your conversation that debugging of OMX application with gdb is important for me and i believe to many other developers who work with TI SOCs.
    Can TI please give a decent solution to this issue? 

  • Hello all,

    I also encountered with this problem and I'll be glad if a solution will be supplied.

    Thanks.

  • Same for me. Very annoying to debug with "printf" and development is very long. A good solution would be much appreciated.

  • Hi,

    I agree this is an issue. We haven't got to the root of it yet. Simple multi-threaded applications can be debugged using gdb for sure.

  • A solution would be very much appreciated.   Here is some further information I have collected:

    My setup is using a DM8168 running the OMX decode example, /usr/share/ti/ti-omx/decode_a8host_debug.xv5T.   I run the example using gdbserver, and additionally I have a debug kernel with kgdb setup.  I see that, on a breakpoint, gdb is sending SIGSTOP to all the threads (processes) within the example, and I believe that it then waits for notification from the kernel that the proccesses have been stopped.  I can see via procfs that 3 out of 5 of the threads are successfully stopped.  The two that are still listed as 'Running' are doing an ioctl. I then break in kgdb, and I can see that those two threads are within syslink, and repeatedly scheduling a timer, but I have been unable to get a full backtrace for them.

    My guess is that whatever those two threads are doing in the kernel is preventing the STOP signal from being delivered, and hence gdb & the debugger from proceeding.  

    I have verified that this is also the case when doing a "kill -s STOP <pid>" on the OMX example, which is further evidence to suggested that the problem is that signals are not  being delivered to those threads, and that the problem is not with gdb or the setup, but with the TI kernel module - syslink, or specifically, openMAX's usage of syslink.

    Hopefully this information helps.  It is critical to our project that we get this issue resolved!

    Thanks,

    Joel

  • Joel Keller said:

    A solution would be very much appreciated.   Here is some further information I have collected:

    My setup is using a DM8168 running the OMX decode example, /usr/share/ti/ti-omx/decode_a8host_debug.xv5T.   I run the example using gdbserver, and additionally I have a debug kernel with kgdb setup.  I see that, on a breakpoint, gdb is sending SIGSTOP to all the threads (processes) within the example, and I believe that it then waits for notification from the kernel that the proccesses have been stopped.  I can see via procfs that 3 out of 5 of the threads are successfully stopped.  The two that are still listed as 'Running' are doing an ioctl. I then break in kgdb, and I can see that those two threads are within syslink, and repeatedly scheduling a timer, but I have been unable to get a full backtrace for them.

    My guess is that whatever those two threads are doing in the kernel is preventing the STOP signal from being delivered, and hence gdb & the debugger from proceeding.  

    I have verified that this is also the case when doing a "kill -s STOP <pid>" on the OMX example, which is further evidence to suggested that the problem is that signals are not  being delivered to those threads, and that the problem is not with gdb or the setup, but with the TI kernel module - syslink, or specifically, openMAX's usage of syslink.

    Hopefully this information helps.  It is critical to our project that we get this issue resolved!

    Thanks,

    Joel

    Hi all.

    I had similar issue using Mcfw (from RDK) applications...

    See: http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/717/t/160644.aspx#590087

    Ivan

  • Any update on this issue from TI?  Please note that not only does this mean that we cannot debug our OpenMAX code, we also cannot debug code which executes in the same process as the OMX code.  Enabling GDB debugging with OMX is very important for our project.

    Thanks,

    Joel

  • Ivan Nardi said:

    Hi all.

    I had similar issue using Mcfw (from RDK) applications...

    See: http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/717/t/160644.aspx#590087

    Ivan

    Hi all

    Same issue with RDK version 01.09.00.19, too.

    Any news?

    Ivan

  • Any update on this issue?  It has been over a month since reporting. 

  • Hi all,

    same issue with RDK GA 02.00.00.23, too.

    I tried to summarize the information about that problem in http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/717/t/177042.aspx

    Thanks

    Ivan

  • Any news this week?

  • Since this is a defect in TI's drivers, rather than support/clarification, is there a another place to report/track this issue besides e2e?

    Thanks,

    Joel

  • Hi Joel,


    Have you had any luck with this issue (or managed to get a formal support request in)?

    I see that this persists in the latest EZSDK (5.04) release.  Any attempt to step over/run past OMX_Init() immediately results in number of Syslink assertions and a segfault.  I'm using the gdbserver binary and libthread_db.so provided in the  linux-devkit directory, for reference.

    Any help on this would be greatly appreciated...not being able to use a debugger certainly slows development down...

    - Jon

  • Hi Jon, 

    No, I haven't had any progress on this issue.  Unfortunately (fortunately?) I've been working on other aspects of our project for the past month or so, and I have set OMX aside for now.  I will have to return to it soon though.

    -Joel

  • Joel,

    Yesterday I seemed to have a little luck getting past the OMX_Init() call by attaching gbdserver to an already running OMX application.  This at least allowed me to get some backtraces.  However, the program was just getting terminated with SIGTRAP with any attempts to set breakpoints.  After that, I have to power cycle to get OMX applications running again. (Syslink seems to go into an inconsistent state, based upon the torrent of MessageQ_put assertion failures.)

    Please do share if you come up with anything.

    Regards,

    Jon

  • Where can I find a copy of gdb or gdbserver? Did you build from source?

  • Charles,

    I used what was provided in EZSDK 5.04 and in the CodeSourcery (arm2009-q1) toolchain.

    I copied these files over to my target:

    • $EZSDK_DIR/linux-devkit/arm-none-linux-gnueabi/usr/lib/libthread_db.so
    • $EZSDK_DIR/linux-devkit/arm-none-linux-gnueabi/usr/bin/gdbserver

    The host GDB binary I use is in $CODESOURCERY_DIR/bin/arm-none-linux-gnueabi-gdb.

    Not sure if this is the best advice...given that I have issues debugging OpenMax applications.  It seems to work fine for other applications, however.  If anyone has any suggestions, I'm all ears.

    - Jon

  • Jon/Charles,


    I strongly believe that this problem is not related to gdb/gdbserver binaries.  This issue occurs because one or more threads of the process being debugged are stuck in the kernel within a syslink call, and are preventing SIGSTOP being delivered to the process.

    Try this for example:  Run your OMX application and obtain it's PID.  then do:  kill -9 ${PID}

    I think you will notice that the process does not die.  This is another manifestation of the bug, but with gdb/gdbserver taken out of the picture.

    -Joel

     

  • Hi all,

    same problem with RDK 02.00.00.24, too...

    Any news from TI?

    Ivan

  • Hi All,

    I also face the same problem. The process does not get sigfault signal and core dump is not generated. the status of the process would D (Dead) state. I would not get any information on whether segment fault has happened  or not. I would not know where the program has crashed and every time this happens i've to restart the evm. It is very difficult and frustrating to restart the evm for a crash in your program and not getting the core dump

    It would be of great help if some one from TI suggests on this issue

  • Hi all.

    It's six months since the issue has been reported for the first time.

    Any news?

    Ivan

  • Any news on this issue? no,it is not a rethorical question: we need gdb support with OMX based applications.

    Without gdb support I do question if TI can compete with the comming platforms from Intel or Altera/Xilinx based solutions.

  • It has been 8 months since I reported this bug, with many others joining me in expressing frustration.  It has been demonstrated that this is a serious bug in TI's driver code, which prevents GDB debugging as well as abnormal process termination (think low-memory killing, etc...).  This is a core bug in the video code for a processor which is supposed to be a "video platform".  Is there a reason why this won't be fixed?

  • Hi I dont know if it would resolve the issue you are facing but our team had an issue where syslink would go crazy if a signal was sent to the application. For this a patch was created which resolved the signal handling issue for us.If it is not too much trouble you can probably try it out and check if it resolves your issue.

    diff --git a/packages/ti/syslink/ipc/hlos/knl/Linux/MessageQDrv.c b/packages/ti/syslink/ipc/hlos/knl/Linux/MessageQDrv.c
    index b1301da..19931c2 100755
    --- a/packages/ti/syslink/ipc/hlos/knl/Linux/MessageQDrv.c
    +++ b/packages/ti/syslink/ipc/hlos/knl/Linux/MessageQDrv.c
    @@ -334,6 +334,17 @@ long MessageQDrv_ioctl (struct file *  filp,
                     index    = SharedRegion_getId (msg);
                     msgSrPtr = SharedRegion_getSRPtr (msg, index);
                 }
    +            else if(status == MessageQ_E_UNBLOCKED)
    +            {
    +                /* If status is MessageQ_E_UNBLOCKED, ioctl has succeeded
    +                 * keep status as MessageQ_E_UNBLOCKED and return SUCCESS
    +                 * to the ioctl
    +                 */
    +                osStatus = 0;
    +            }
    +            else {
    +                osStatus = status;
    +            }
     
                 cargs.args.get.msgSrPtr = msgSrPtr;
             }
    diff --git a/packages/ti/syslink/ipc/hlos/knl/MessageQ.c b/packages/ti/syslink/ipc/hlos/knl/MessageQ.c
    index 2ea96e7..34048d7 100755
    --- a/packages/ti/syslink/ipc/hlos/knl/MessageQ.c
    +++ b/packages/ti/syslink/ipc/hlos/knl/MessageQ.c
    @@ -1104,6 +1104,10 @@ MessageQ_get (MessageQ_Handle handle, MessageQ_Msg * msg, UInt timeout)
                             status = MessageQ_S_SUCCESS;
                         }
                     }
    +                else if (status == -ERESTARTSYS) {
    +                    /* leave status as -ERESTARTSYS */
    +                    break;
    +                }
                     else {
                         status = MessageQ_E_FAIL;
                         break;
    

     

  • Hi Badri,

    Thanks very much for posting your patch.  I have been swamped with other issues, so I have been unable to get back to this.  I will test it out soon.  I will report back if this fixes my issue.

    Thanks,

    Joel

  • We've experiencing all the same issues Joel has been describing in this thread. We're currently using EZSDK v5.03. Joel or TI, any update on this issue?

    Thanks.

  • Hi Kevin,

    I am also still using EZSDK 5.03, but it has been quite a while since I have been looking at the OMX-related stuff.  We currently have that area of our application working well enough for our purposes.  What I ended up doing is the following small modification to syslink:

    file:  syslink/ipc/hlos/knl/MessageQ.c


    In the function:

    MessageQ_get (MessageQ_Handle handle, MessageQ_Msg * msg, UInt timeout)

    around line 1109, there is an else{} clause:

    else {
       status = MessageQ_E_FAIL;
    +   break;
    }

    The addition of the "break;" statement there will cause the threads that get 'stuck' in the kernel to exit the kernel when a signal is delivered.   I can't remember the details right now, but that could be a starting point for your investigation.

    You could also consider upgrading to a more recent EZSDK, or comparing that particular file in EZSDK 5.03 vs 5.05 to see if perhaps there has been a fix to it.


    Hope that helps.  If I switch back to working on OMX stuff and find any other relevant information, I'll try to remember to post it here.

    -Joel