AM62A7: Frame skips when high CPU usage

Jason Zeng

Part Number: AM62A7

SDK ver 10_00_00_08

When I used the stress-ngcommand to increase the CPU load to 70%, a new problem emerged: the video frames became disordered. Please refer to the attached video for details.

3 months ago

0 Jay Goyal 3 months ago

TI__Intellectual 1900 points

Hi Jason,

Are you testing with the full pipeline as mentioned here: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1576997/am62a7-tidss-error

Is it possible for you to test with the different elements disabled as you did here: RE: AM62A7: TIDSS Error

This should help us find if a particular part of the pipeline is causing this issue.

Another potential issue that our development team suggested could be that the pipeline completely stalls after running the algorithm for long enough. Similar issues were noted in parallel decoders documented in this external Jira: https://sir.ext.ti.com/jira/browse/EXT_EP-12787

In the meantime, I will try to replicate the issue on my end here.

Regards,
Jay

0 Jason Zeng 3 months ago in reply to Jay Goyal

Intellectual 375 points

Hi Jay

I have changed the pipeline to the following.Then, after running stress-ng --cpu 4 --cpu-load 70 --timeout 200in the background, the issue doesn't seem to appear. Let me take some time to gradually add elements to test where the issue might happen.

gst-launch-1.0 v4l2src device=/dev/video3 io-mode=dmabuf-import ! \
video/x-raw, format=UYVY, width=1920, height=1536, framerate=60/1 ! \
tiovxldc dcc-file="/root/isp_config/dcc_ldc.bin" sensor-name="X3F" ! \
video/x-raw, format=NV12, width=1920, height=1536, framerate=60/1 ! \
tiovxmultiscaler name=multi target=0 \
multi.src_0 ! video/x-raw, width=1920,height=720,format=NV12 ! queue ! \
tiovxmosaic ! \
kmssink driver-name=tidss sync=false skip-vsync=true

Jason

0 Jason Zeng 3 months ago in reply to Jason Zeng

Intellectual 375 points

Hi Jay

This issue may also be related to the algorithm. When I only enable the preview and algorithm functions, and use stress-ng to increase CPU usage, the problem occurs. Adding other functions does not trigger the issue. Below is the problematic pipeline diagram.

I tried to simulate the pipeline with commands but could not reproduce the problem.

gst-launch-1.0 v4l2src device=/dev/video3 io-mode=dmabuf-import ! \
video/x-raw, format=UYVY, width=1920, height=1080, framerate=60/1 ! \
tiovxldc dcc-file="/root/isp_config/dcc_ldc.bin" sensor-name="X3F" ! \
video/x-raw, format=NV12, width=1920, height=1080, framerate=60/1 ! \
tiovxmultiscaler name=multi target=0 \
multi.src_0 ! video/x-raw, width=1280,height=720,format=NV12 ! queue ! mosaic.sink_0 \
multi.src_1 ! video/x-raw, width=640,height=720,format=NV12 ! queue ! mosaic.sink_1 \
multi.src_2 ! video/x-raw, width=960,height=540,format=NV12 ! queue ! multi2.sink \
tiovxmultiscaler name=multi2 target=1 \
multi2.src_0 ! video/x-raw, width=608,height=352,format=NV12 ! queue ! \
fakesink sync=false \
tiovxmosaic name=mosaic \
sink_0::startx="<0>" sink_0::starty="<0>" \
sink_1::startx="<1280>" sink_1::starty="<0>" ! \
kmssink driver-name=tidss sync=false skip-vsync=true

Jason

0 Jay Goyal 3 months ago in reply to Jason Zeng

TI__Intellectual 1900 points

Hi Jason,

A potential issue for this could be that the algorithm taking time for the inference could cause the pipeline to stall. To isolate if this issue is that or with the gstreamer pipeline itself, or from the algorithm, pull the data(using appsink) into an application that consumes the frames and sleeps for one second. If that causes the pipeline to stall, then the issue might be fixed from setting the flag on appsink here: https://gstreamer.freedesktop.org/documentation/app/appsink.html?gi-language=c#appsink:leaky-type or adding a queue(we can work to figure that part out).

If that doesn't work, please share the details on what all the algorithm does. As in, is it just inference or is it also doing some other task?

Regards,
Jay

0 Jason Zeng 3 months ago in reply to Jay Goyal

Intellectual 375 points

Hi Jay

We are not directly processing algorithm in the appsink because we have found that process algorithm directly in the appsink would cause the preview to also be affected by the algorithm's frame rate. Even after adding queues in multiple places. The new-sample callback in our appsink is as follows, and we started a new thread to retrieve data from the frame_queue for algorithm inference.

static GstFlowReturn AlgProcessSample(GstAppSink *appsink, gpointer user_data)
{
    // printf("AlgProcessSample\n");
    GstSample *sample = gst_app_sink_pull_sample(appsink);
    if (!sample)
    {
        g_print("algo no sample!\n");
        return GST_FLOW_ERROR;
    }

    GstBuffer *buffer = gst_sample_get_buffer(sample);
    if (!buffer)
    {
        g_print("algo no buffer!\n");
        return GST_FLOW_ERROR;
    }

    // int64_t syscurr = media_getBootTick();

    // copy frame
    GstBuffer *new_buf = gst_buffer_copy_deep(buffer);

    g_async_queue_push(frame_queue, new_buf);

    // int64_t sysend = media_getBootTick();
    // printf("alg duration: (%lld)\n\n", sysend - syscurr);
    gst_sample_unref(sample);
    return GST_FLOW_OK;
}

I attempted to modify the callback function and also disabled the thread that actually runs algorithm, yet the issue can still be reproduced.So, I believe this issue might be unrelated to algorithm processing. Could the problem be in my pipeline instead?Could there be an issue with the settings of my queue or other properties?

static GstFlowReturn AlgProcessSample(GstAppSink *appsink, gpointer user_data)
{
    // printf("AlgProcessSample\n");
    GstSample *sample = gst_app_sink_pull_sample(appsink);
    if (!sample)
    {
        g_print("algo no sample!\n");
        return GST_FLOW_ERROR;
    }

    GstBuffer *buffer = gst_sample_get_buffer(sample);
    if (!buffer)
    {
        g_print("algo no buffer!\n");
        return GST_FLOW_ERROR;
    }
    
    usleep(10 * 1000);
    // printf("alg duration: (%lld)\n\n", sysend - syscurr);
    gst_sample_unref(sample);
    return GST_FLOW_OK;
}

Jason

0 Jay Goyal 3 months ago in reply to Jason Zeng

TI__Intellectual 1900 points

Hi Jason,

This does rule out the algorithm as the source of the error. I am trying to replicate this error on my end with a similar pipeline.

As a test, can you try adding a queue element with leaky=2 property after the v4l2src. This would drop older frames if some element stalls the pipeline.

The issue that I was suggesting in the previous reply was that the pipeline might be stalling intermittently due to the algorithm not consuming the frames fast enough. This would need to be fixed in the pipeline itself.

In the meantime, I will try to reproduce this issue and get back to you with a better analysis of what might be the issue.

Regards,
Jay

0 Jason Zeng 3 months ago in reply to Jay Goyal

Intellectual 375 points

Hi Jay

I have add a queue element after v4l2src, and the issue is still exits

Jason

0 Jay Goyal 3 months ago in reply to Jason Zeng

TI__Intellectual 1900 points

Hi Jason,

I think I have reproduced this issue on my end and am trying to see what the issue is. But I am in the middle of some critical debugs. So, please expect a delay in response. I'll try to get back to you by Friday. Apologies for the same.

Regards,
Jay

0 Jay Goyal 2 months ago in reply to Jason Zeng

TI__Intellectual 1900 points

Jason Zeng said:
I have add a queue element after v4l2src, and the issue is still exits

BR

Hey Jason,

Sorry for the delayed reply. I have reproduced the error based on the pipeline with appsink. But here, it seems that you have removed the entire pipeline with algorithm processing. Can you confirm if you are facing the issue in this pipeline as well because with 90% loading, I am not facing this.

Also, in the test with delay, is the test running correctly without CPU loading?

Regards,
Jay

0 Jason Zeng 2 months ago in reply to Jay Goyal

Intellectual 375 points

Hi Jay

Regarding the issue reproduction based on appsink, have you found a fix?

As for the aforementioned pipeline, I will get back to you after verification.

BR,

Jason

0 Jay Goyal 2 months ago in reply to Jason Zeng

TI__Intellectual 1900 points

Hi Jason,

Can you try with the following pipeline:

v4l2src device=/dev/video3 io-mode=dmabuf-import ! \
video/x-raw, format=UYVY, width=1920, height=1080, framerate=60/1 ! \
tiovxldc dcc-file="/root/isp_config/dcc_ldc.bin" sensor-name="X3F" ! \
video/x-raw, format=NV12, width=1920, height=1080, framerate=60/1 ! \
tee name=t \
t. ! tiovxmultiscaler name=multi target=0 \
multi.src_0 ! video/x-raw, width=1280,height=720,format=NV12 ! queue ! mosaic.sink_0 \
multi.src_1 ! video/x-raw, width=640,height=720,format=NV12 ! queue ! mosaic.sink_1 \
t. ! queue max-size-buffers=1 leaky=downstream ! tiovxmultiscaler name=multi2 target=1 \
multi2.src_0 ! video/x-raw, width=608,height=352,format=NV12 ! \
appsink drop=true name=sink_0 \
tiovxmosaic name=mosaic background=/tmp/background_0 \
sink_0::startx="<0>" sink_0::starty="<0>" \
sink_1::startx="<1280>" sink_1::starty="<0>" ! \
video/x-raw, width=1920, height=1080 ! \
kmssink driver-name=tidss force-modesetting=true sync=false

Key change: Since the output resolution of the output pipeline is achievable through a single multiscaler node, I split the pipeline before the multiscaler using tee. The queue after the tee node(second branch) also has leaky property set. So, older frames may be discarded if the framerate of the incoming pipeline is too high compared to processing of the algorithm.

If this doesn't work, can you try running as mentioned here: https://github.com/TexasInstruments/edgeai-gst-apps/tree/main/scripts/gst_tracers

This should generate helpful debug logs to see which element might be causing latency issues.

Regards,
Jay

Processors

Processors forum

AM62A7: Frame skips when high CPU usage