This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: application blocked at vxGraphParameterDequeueDoneRef

Part Number: TDA4VM
Other Parts Discussed in Thread: SYSBIOS

Dear experts,

We have a use-case where I have 2 graph's which require cameras image as an input.One graph is used to display, another graph is used to detect.And there is only a camera node.The display graph includes capture node(YUV422), dispaly m2m node(YUV422-->YUV420), opengl node,display m2m node(RGBX-->YUV422), csitx node.The detection graph has two different detection and includes MSC scale node, MSC mosaic node, 2 our preproc nodes, 2 tidl nodes, 2 our postproc nodes.Both graphs use pipline mode.According to the TI FAE's advice,the detection graph copys cameras data from display graph.When we run our application, the display freezed over time.It could be a few hours, it could be a day or two. And we added some logs and found it blocked at vxGraphParameterDequeueDoneRef in display graph.For more informations, we followed the rtos/tiovx/source/framework/vx_target.c, and found it blocked at  tivxTargetDequeueObjDesc in tivxTargetTaskMain function.We did lots of tests and found 3 different situation:

  1. when it blocked we could not get data form capture but the cameras were working properly.
  2. when it blocked we could not get data form display m2m
  3. when it blocked we could not get data form csitx

If we stop the detection graph, the probability of display freezed will be reduced.Sometime if we add some logs in tiovx, the probability of display freezed will be reduced too.Could you give us some suggections to slove this issue?

  • Hello Sam,

    There is a similar issue we have seen in the past where the application freezes after a few hours to a day or two. Could you try the tests documented in this FAQ and share the results in our current thread: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1090755/faq-tda4vm-why-does-my-board-enter-an-abort-state-after-running-my-application-for-a-long-time

    Regards,

    Takuma

  • Hi Sam,

    Just realized we have a separate thread already for this issue: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1089649/tda4vm-run-app_tidl_od-demo-on-evm-about-48-hours-c7x-aborted

    Let's continue the discussion on that thread.

    Regards,

    Takuma

  • Hello Takuma,

    Thanks for your reply.

    We found this situation,when the application freezed, C7x didn't abort.So we think these are two different issues.We can continue the C7x aborted discussion on that thread.At the same time,we want to continue to get some informations about this issue.Looking forward to your reply.

    Regards,

    Sam

  • Hello Sam,

    I agree with you. The behavior we are observing seems different from the issue we have seen in the past. Let's continue the discussion from a software standpoint on this thread, and continue the discussion from a hardware standpoint on the other thread.

    Another issue we have seen in the past is the idle stack size overflowing after a long period of time due to preemptive interrupts. Can you try the following:

    In file vision_apps/apps/basic_demos/app_rtos/rtos_linux/c7x_1/c7x_1.cfg
    Add below in bold,
    var HwiC7x = xdc.useModule('ti.sysbios.family.c7x.Hwi');
    HwiC7x.bootToNonSecure = true;
    HwiC7x.dispatcherAutoNestingSupport = false;
    var Task = xdc.useModule('ti.sysbios.knl.Task');
    Task.idleTaskStackSize = 64 * 1024;

    Scrub vision_apps and rebuild

    vision_apps# make vision_apps_scrub

    vision_apps# make vision_apps -j8

    Regards,

    Takuma

  • Hello Takuma,

    Thanks for your reply.

    We modifed the c7x_1.cfg as you suggested,and we are running the app_tidl_od demo on tow our boards. I will tell you the result at C7X aborted thread as soon as possible.About the issue blocked at vxGraphParameterDequeueDoneRef.Based on our test result and debug information,we suspect the problem is at MCU2_0.Do you have any suggestions in this field when vxGraphParameterDequeueDoneRef blocked,or any other experiences in this area?Looking forward to your reply.

    Regards,

    Sam

  • Hi Sam,

    From personal experience, I have seen that when an application randomly aborts after a long time it is usually due to an issue with the stack for some task or an issue with power.

    As an example for issues with stack that I have seen: 

    • The idle task's stack size for C7x was overflowing unexpectedly after a long period of time, because preemptive interrupts that we did not expect to happen so often were filling up this stack. This could be applied to other cores such as the MCU2_0 so we could try changing the stack size for MCU2_0 in the mcu2_0.cfg file.

    As a weird example for issues with power I have heard from a colleague of mine:

    • There would be sudden drops in the power that will restart the system after a while. This was root caused to a vacuum cleaner being used by a janitor nearby my colleagues office that will induce enough noise into the power supply to cause a system reboot. 

    Regards,

    Takuma

  • Hello Takuma,

    Thank you for sharing the experience.We have three questions about this issue:

    1. In a recen test, we added some logs in our appliction and found a fvid2 error befor application blocked.We followed the code in rtos\pdk_jacinto_08_00_00_37\packages\ti\drv\fvid2\src\trace.c, line 449, asserted false.It looks like it's in an infinite loop.And in rtos\pdk_jacinto_08_00_00_37\packages\ti\drv\fvid2\src\fvid2_utils.c, line 1318, asserted false too. What situation will cause this error?
    2. We did a test today, we set the idle stack size to 64 * 1024,but the application blocked too.Could you give us a example of mcu2_0.cfg?At the same time,we want to know how to view stack usage for each task in MCU2_0 or other core.Do you have any suggestions?
    3. We want to know if it is reasonable that detection graph copy the camer data from the display graph.What's the most reasonable way?

    Looking forward to your reply.

    Regards,

    Sam

  • Hi Takuma,

    We want to debug further when application blocked.We have two questions that we would like you to help.

    1. We added some logs in our appliction and found a fvid2 error befor application blocked,this time asserted false at different place.Could you give us any suggestions at this error from your experience? Logs as follow:           [MCU2_0] 393957.757823 s: VX_ZONE_OPTIMIZATION:[tivxObjDescSend:243] self_id:396 obj_desc_id:68
                 [MCU2_0] 393957.757968 s: src/trace.c @ Line 449:
                 [MCU2_0] 393957.758004 s: Assertion @ Line: 762 in src/drv/m2m/dss_m2mApi.c: (NULL != qObj) : failed !!!
                 [MCU2_0] 393957.759032 s: VX_ZONE_OPTIMIZATION:[tivxTargetTaskMain:1215] target:396 obj_desc_id:68
    2. There are two m2m nodes in our display graph, we set one m2m node taget TIVX_TARGET_DISPLAY_M2M1, and  set the other one taget TIVX_TARGET_DISPLAY_M2M2.As we found a error in m2m, we want to know if M2M is thread-safe?

    Looking forward to your reply.

    Regards,

    Sam

  • Hi Sam,

    This looks like an issue with display, so I am looping in our expert with the display driver.

    Regards,

    Takuma

  • Hi Brijesh,

    Thanks for your reply.

    We are using ti-processor-sdk-rtos-j721e-evm-08_00_00_12.

    Regards,

    Sam

  • Hi Sam,

    Can you please apply attached patch on ti-processor-sdk-rtos-j721e-evm-08_00_00_12\pdk_jacinto_08_00_00_37 folder and check it again?

    /cfs-file/__key/communityserver-discussions-components-files/791/0001_2D00_Fixed_2D00_PDK_2D00_10871_2D00_Assertion_2D00_is_2D00_observed_2D00_in_2D00_the_2D00_DSS_2D00_M2M.patch

    Regards,

    Brijesh

  • Hi Brijesh,

    Thanks for your reply.

    We will try and share you the result. And we noticed the difference of this file between SDK 08.00 and SDK 08.02.At the same time ,we found some differences in file rtos\pdk_jacinto_08_00_00_37\packages\ti\drv\dss\soc\V1\dss_soc_fw.c and file rtos\tiovx\source\platform\psdk_j7\rtos\tivx_queue.c between two version SDK.  Do these files need to be modified on this issue?

    Regards,

    Sam

  • Hi Sam Ma,

    Yes, there is a change in the eDP firmware, so if you are using eDP output, please use new file from rtos\pdk_jacinto_08_00_00_37\packages\ti\drv\dss\soc\V1\dss_soc_fw.c.

    There is also a bug fix in the rtos\tiovx\source\platform\psdk_j7\rtos\tivx_queue.c, so yes, please take the changes from this file as well.

    Regards,

    Brijesh 

  • Hi Brijesh,

    Thanks for your reply.

    We will continue the test and share you the result.By the way, there are two graphs in our application,one graph is used to display, another graph is used to detect.And there is only a camera node.We want to know if it is reasonable that detection graph copies the camera data from the display graph.What's the most reasonable way?Could you give us any suggestions?

    Regards,

    Sam

  • Hi Sam,

    Can you please bit explain in details about detection and display graph? 

    It should be ok to handle them in different graphs, but want to first understand if there is really need. 

    Regards,

    Brijesh

  • Hi Brijesh,

    Thanks for your reply.

    Here is our use-case.There are two graphs in our application,one graph is used to display, another graph is used to detect.And there is only a camera node.The display graph includes capture node(YUV422), dispaly m2m node(YUV422-->YUV420), opengl node(YUV420-->RGBX),display m2m node(RGBX-->YUV422), csitx node(YUV422).The detection graph has two different detection functions and includes MSC scale node, MSC mosaic node, 2 our preproc nodes, 2 tidl nodes, 2 our postproc nodes.One detection function : MSC scale node-->our preproc node-->tidl node-->our postproc node, the other one detection function:MSC scale node-->MSC mosaic node-->our preproc node-->tidl node-->our postproc node. Both graphs use pipline mode.The frame rates of the two graphs are different,display graph : 25 fps, detection graph : 15 fps. If the copying mode is reasonable, what is the most efficient method?

    Regards,

    Sam

  • Hi Sam,

    Can you please refer to below FAQ? It explains how to share the data between two/multiple graphs.. 

    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1084620/faq-tda4vm-run-the-multi-cam-demo-with-dual-graph

    Regards,

    Brijesh

  • Hi Brijesh,

    Thank you so much, we will have a try.

    Regards,

    Sam