OpenMAX application debugging with gdb/gdbserver

Joel Keller

Has anyone else experienced problems debugging Linux omx applications with gdb/gdbserver? I find that once omx starts it's multitude of threads, I can no longer break with gdb. Enabling the debug output from gdbserver itself, and taking a look at the relevant PIDs in /proc, I can see that two of the threads cannot seem to be stopped. Is it possible they are doing some unstoppable operation in kernel mode? Anyone else experience this?

Thanks,

Joel

over 13 years ago

0 Joel Keller over 13 years ago

Expert 1305 points

Can anyone from the community or TI confirm that they have successfully used gdb remote-debugging with an OpenMAX application? I am using EZSDK 5.03 and have tried tools (gdb/gdbserver) from two versions of CodeSourcery (including the latest 'CodeBench' version).

0 BenM over 13 years ago in reply to Joel Keller

Expert 2220 points

I have experienced it, but cannot offer you any advice. I built a simple mutithreaded application and was able to do mutithreaded gdb debugging on it, but have always been completely unsuccessful with OMX multi threaded debug. However i did find that single threaded debugging works fine (using an old version of gdb server). This is only really helpful if you are debugging initialization in the main thread, but it is something.

TI, please look into this!

-Ben

0 Gabi Gvili over 13 years ago in reply to BenM

Genius 4120 points

If i may add to your conversation that debugging of OMX application with gdb is important for me and i believe to many other developers who work with TI SOCs.
Can TI please give a decent solution to this issue?

0 Alla over 13 years ago in reply to Gabi Gvili

Expert 1100 points

Hello all,

I also encountered with this problem and I'll be glad if a solution will be supplied.

Thanks.

0 sroussea over 13 years ago in reply to Alla

Prodigy 240 points

Same for me. Very annoying to debug with "printf" and development is very long. A good solution would be much appreciated.

0 Siddharth Heroor over 13 years ago in reply to sroussea

TI__Expert 4245 points

Hi,

I agree this is an issue. We haven't got to the root of it yet. Simple multi-threaded applications can be debugged using gdb for sure.

0 Joel Keller over 13 years ago in reply to Siddharth Heroor

Expert 1305 points

A solution would be very much appreciated. Here is some further information I have collected:

My setup is using a DM8168 running the OMX decode example, /usr/share/ti/ti-omx/decode_a8host_debug.xv5T. I run the example using gdbserver, and additionally I have a debug kernel with kgdb setup. I see that, on a breakpoint, gdb is sending SIGSTOP to all the threads (processes) within the example, and I believe that it then waits for notification from the kernel that the proccesses have been stopped. I can see via procfs that 3 out of 5 of the threads are successfully stopped. The two that are still listed as 'Running' are doing an ioctl. I then break in kgdb, and I can see that those two threads are within syslink, and repeatedly scheduling a timer, but I have been unable to get a full backtrace for them.

My guess is that whatever those two threads are doing in the kernel is preventing the STOP signal from being delivered, and hence gdb & the debugger from proceeding.

I have verified that this is also the case when doing a "kill -s STOP <pid>" on the OMX example, which is further evidence to suggested that the problem is that signals are not being delivered to those threads, and that the problem is not with gdb or the setup, but with the TI kernel module - syslink, or specifically, openMAX's usage of syslink.

Hopefully this information helps. It is critical to our project that we get this issue resolved!

Thanks,

Joel

0 Ivan Nardi over 13 years ago in reply to Joel Keller

Intellectual 445 points

Joel Keller said:

A solution would be very much appreciated. Here is some further information I have collected:

My setup is using a DM8168 running the OMX decode example, /usr/share/ti/ti-omx/decode_a8host_debug.xv5T. I run the example using gdbserver, and additionally I have a debug kernel with kgdb setup. I see that, on a breakpoint, gdb is sending SIGSTOP to all the threads (processes) within the example, and I believe that it then waits for notification from the kernel that the proccesses have been stopped. I can see via procfs that 3 out of 5 of the threads are successfully stopped. The two that are still listed as 'Running' are doing an ioctl. I then break in kgdb, and I can see that those two threads are within syslink, and repeatedly scheduling a timer, but I have been unable to get a full backtrace for them.

My guess is that whatever those two threads are doing in the kernel is preventing the STOP signal from being delivered, and hence gdb & the debugger from proceeding.

I have verified that this is also the case when doing a "kill -s STOP <pid>" on the OMX example, which is further evidence to suggested that the problem is that signals are not being delivered to those threads, and that the problem is not with gdb or the setup, but with the TI kernel module - syslink, or specifically, openMAX's usage of syslink.

Hopefully this information helps. It is critical to our project that we get this issue resolved!

Thanks,

Joel

Hi all.

I had similar issue using Mcfw (from RDK) applications...

See: http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/717/t/160644.aspx#590087

Ivan

0 Joel Keller over 13 years ago in reply to Siddharth Heroor

Expert 1305 points

Any update on this issue from TI? Please note that not only does this mean that we cannot debug our OpenMAX code, we also cannot debug code which executes in the same process as the OMX code. Enabling GDB debugging with OMX is very important for our project.

Thanks,

Joel

0 Ivan Nardi over 13 years ago in reply to Ivan Nardi

Intellectual 445 points

Ivan Nardi said:

Hi all.

I had similar issue using Mcfw (from RDK) applications...

See: http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/717/t/160644.aspx#590087

Ivan

Hi all

Same issue with RDK version 01.09.00.19, too.

Any news?

Ivan

0 Joel Keller over 13 years ago

Expert 1305 points

Any update on this issue? It has been over a month since reporting.

0 Joel Keller over 13 years ago in reply to Joel Keller

Expert 1305 points

Weekly bump.

0 Ivan Nardi over 13 years ago in reply to Joel Keller

Intellectual 445 points

Hi all,

same issue with RDK GA 02.00.00.23, too.

I tried to summarize the information about that problem in http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/717/t/177042.aspx

Thanks

Ivan

0 Joel Keller over 13 years ago in reply to Ivan Nardi

Expert 1305 points

Any news this week?

0 Joel Keller over 13 years ago in reply to Joel Keller

Expert 1305 points

Since this is a defect in TI's drivers, rather than support/clarification, is there a another place to report/track this issue besides e2e?

Thanks,

Joel

0 Jon S. over 13 years ago in reply to Joel Keller

Expert 1240 points

Hi Joel,

Have you had any luck with this issue (or managed to get a formal support request in)?

I see that this persists in the latest EZSDK (5.04) release. Any attempt to step over/run past OMX_Init() immediately results in number of Syslink assertions and a segfault. I'm using the gdbserver binary and libthread_db.so provided in the linux-devkit directory, for reference.

Any help on this would be greatly appreciated...not being able to use a debugger certainly slows development down...

- Jon

0 Joel Keller over 13 years ago in reply to Jon S.

Expert 1305 points

Hi Jon,

No, I haven't had any progress on this issue. Unfortunately (fortunately?) I've been working on other aspects of our project for the past month or so, and I have set OMX aside for now. I will have to return to it soon though.

-Joel

0 Jon S. over 13 years ago in reply to Joel Keller

Expert 1240 points

Joel,

Yesterday I seemed to have a little luck getting past the OMX_Init() call by attaching gbdserver to an already running OMX application. This at least allowed me to get some backtraces. However, the program was just getting terminated with SIGTRAP with any attempts to set breakpoints. After that, I have to power cycle to get OMX applications running again. (Syslink seems to go into an inconsistent state, based upon the torrent of MessageQ_put assertion failures.)

Please do share if you come up with anything.

Regards,

Jon

0 Charles Luciano over 13 years ago in reply to Jon S.

Prodigy 65 points

Where can I find a copy of gdb or gdbserver? Did you build from source?

0 Jon S. over 13 years ago in reply to Charles Luciano

Expert 1240 points

Charles,

I used what was provided in EZSDK 5.04 and in the CodeSourcery (arm2009-q1) toolchain.

I copied these files over to my target:

$EZSDK_DIR/linux-devkit/arm-none-linux-gnueabi/usr/lib/libthread_db.so
$EZSDK_DIR/linux-devkit/arm-none-linux-gnueabi/usr/bin/gdbserver

The host GDB binary I use is in $CODESOURCERY_DIR/bin/arm-none-linux-gnueabi-gdb.

Not sure if this is the best advice...given that I have issues debugging OpenMax applications. It seems to work fine for other applications, however. If anyone has any suggestions, I'm all ears.

- Jon

0 Joel Keller over 13 years ago in reply to Jon S.

Expert 1305 points

Jon/Charles,

I strongly believe that this problem is not related to gdb/gdbserver binaries. This issue occurs because one or more threads of the process being debugged are stuck in the kernel within a syslink call, and are preventing SIGSTOP being delivered to the process.

Try this for example: Run your OMX application and obtain it's PID. then do: kill -9 ${PID}

I think you will notice that the process does not die. This is another manifestation of the bug, but with gdb/gdbserver taken out of the picture.

-Joel

0 Ivan Nardi over 13 years ago in reply to Joel Keller

Intellectual 445 points

Hi all,

same problem with RDK 02.00.00.24, too...

Any news from TI?

Ivan

0 Pradeep Acharya over 13 years ago in reply to Ivan Nardi

Intellectual 930 points

Hi All,

I also face the same problem. The process does not get sigfault signal and core dump is not generated. the status of the process would D (Dead) state. I would not get any information on whether segment fault has happened or not. I would not know where the program has crashed and every time this happens i've to restart the evm. It is very difficult and frustrating to restart the evm for a crash in your program and not getting the core dump

It would be of great help if some one from TI suggests on this issue

0 Ivan Nardi over 13 years ago in reply to Pradeep Acharya

Intellectual 445 points

Hi all.

It's six months since the issue has been reported for the first time.

Any news?

Ivan

0 Jorge Ramirez over 13 years ago in reply to Ivan Nardi

Prodigy 95 points

Any news on this issue? no,it is not a rethorical question: we need gdb support with OMX based applications.

Without gdb support I do question if TI can compete with the comming platforms from Intel or Altera/Xilinx based solutions.

0 Joel Keller over 13 years ago in reply to Jorge Ramirez

Expert 1305 points

It has been 8 months since I reported this bug, with many others joining me in expressing frustration. It has been demonstrated that this is a serious bug in TI's driver code, which prevents GDB debugging as well as abnormal process termination (think low-memory killing, etc...). This is a core bug in the video code for a processor which is supposed to be a "video platform". Is there a reason why this won't be fixed?

0 Badri Narayanan over 12 years ago in reply to Joel Keller

TI__Guru 59700 points

Hi I dont know if it would resolve the issue you are facing but our team had an issue where syslink would go crazy if a signal was sent to the application. For this a patch was created which resolved the signal handling issue for us.If it is not too much trouble you can probably try it out and check if it resolves your issue.

Fullscreen 1680.DVRRDK_syslink_messageQ_signal_issue.patch.txt Download

diff --git a/packages/ti/syslink/ipc/hlos/knl/Linux/MessageQDrv.c b/packages/ti/syslink/ipc/hlos/knl/Linux/MessageQDrv.c
index b1301da..19931c2 100755
--- a/packages/ti/syslink/ipc/hlos/knl/Linux/MessageQDrv.c
+++ b/packages/ti/syslink/ipc/hlos/knl/Linux/MessageQDrv.c
@@ -334,6 +334,17 @@ long MessageQDrv_ioctl (struct file *  filp,
                 index    = SharedRegion_getId (msg);
                 msgSrPtr = SharedRegion_getSRPtr (msg, index);
             }
+            else if(status == MessageQ_E_UNBLOCKED)
+            {
+                /* If status is MessageQ_E_UNBLOCKED, ioctl has succeeded
+                 * keep status as MessageQ_E_UNBLOCKED and return SUCCESS
+                 * to the ioctl
+                 */
+                osStatus = 0;
+            }
+            else {
+                osStatus = status;
+            }
 
             cargs.args.get.msgSrPtr = msgSrPtr;
         }
diff --git a/packages/ti/syslink/ipc/hlos/knl/MessageQ.c b/packages/ti/syslink/ipc/hlos/knl/MessageQ.c
index 2ea96e7..34048d7 100755
--- a/packages/ti/syslink/ipc/hlos/knl/MessageQ.c
+++ b/packages/ti/syslink/ipc/hlos/knl/MessageQ.c
@@ -1104,6 +1104,10 @@ MessageQ_get (MessageQ_Handle handle, MessageQ_Msg * msg, UInt timeout)
                         status = MessageQ_S_SUCCESS;
                     }
                 }
+                else if (status == -ERESTARTSYS) {
+                    /* leave status as -ERESTARTSYS */
+                    break;
+                }
                 else {
                     status = MessageQ_E_FAIL;
                     break;

0 Joel Keller over 12 years ago in reply to Badri Narayanan

Expert 1305 points

Hi Badri,

Thanks very much for posting your patch. I have been swamped with other issues, so I have been unable to get back to this. I will test it out soon. I will report back if this fixes my issue.

Thanks,

Joel

0 Kevin G over 12 years ago in reply to Joel Keller

Prodigy 40 points

We've experiencing all the same issues Joel has been describing in this thread. We're currently using EZSDK v5.03. Joel or TI, any update on this issue?

Thanks.

0 Joel Keller over 12 years ago in reply to Kevin G

Expert 1305 points

Hi Kevin,

I am also still using EZSDK 5.03, but it has been quite a while since I have been looking at the OMX-related stuff. We currently have that area of our application working well enough for our purposes. What I ended up doing is the following small modification to syslink:

file: syslink/ipc/hlos/knl/MessageQ.c

In the function:

MessageQ_get (MessageQ_Handle handle, MessageQ_Msg * msg, UInt timeout)

around line 1109, there is an else{} clause:

else {
status = MessageQ_E_FAIL;
+ break;
}

The addition of the "break;" statement there will cause the threads that get 'stuck' in the kernel to exit the kernel when a signal is delivered. I can't remember the details right now, but that could be a starting point for your investigation.

You could also consider upgrading to a more recent EZSDK, or comparing that particular file in EZSDK 5.03 vs 5.05 to see if perhaps there has been a fix to it.

Hope that helps. If I switch back to working on OMX stuff and find any other relevant information, I'll try to remember to post it here.

-Joel

Processors

Processors forum

OpenMAX application debugging with gdb/gdbserver