This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VH-Q1: v4l2h264enc CPU load optimization

Part Number: TDA4VH-Q1
Other Parts Discussed in Thread: TDA4VH

Tool/software:

TDA4VH 

SDK 11.0 linux+freertos

 TDA4VH-Q1: v4l2h264enc CPU load optimization 

Has there been any update to the v4l2h264enc usage case mentioned in the above post?

  • Hello,

    Haven't been able to verify the multi-cam-codec demo just yet on 11.0. We have seen past issues with using appsink for dma-buf handling. Are you using appsink to write to simply write to a display or to a separate application altogether?

    Thanks,
    Sarabesh S.

  • HI

    We created a separate thread to pull the data from appsink and save it as a file.

    The file we finally saved was a compressed file in H264 format.

  • If the end goal is writing to a file then there are unavoidable CPU copies there because there are no display buffers or other application buffers to take advantage of DMA-BUF.

  • hi

    1.The gstreamer command we use is as follows

    [2025-06-13 12:08:11]  gst_wrapper: GstCmdString:
    [2025-06-13 12:08:11]  appsrc format=GST_FORMAT_TIME is-live=true do-timestamp=true block=false name=myAppSrc0 ! queue 
    [2025-06-13 12:08:11]  ! video/x-raw, width=(int)1536, height=(int)1728, framerate=(fraction)30/1, format=(string)NV12, interlace-mode=(string)progressive, colorimetry=(string)smpte240m
    [2025-06-13 12:08:11]  ! v4l2h264enc
    [2025-06-13 12:08:11]  ! video/x-h264 
    [2025-06-13 12:08:11]  ! h264parse config-interval=-1
    [2025-06-13 12:08:11]  ! queue ! appsink name=myAppSink0 max-buffers=50 drop=true 
    [2025-06-13 12:08:11]  
    [2025-06-13 12:08:11]  GstPipe init status 0!

    2.We used the "top -H -p pid" command to discover that the CPU load of the t_h264_save thread (save the H264 file to the file system) was very low, only 1.7%, but the queue0:src thread(It should be the first "queue" in the GStreamer command line) was quite high, at 14%.

    Why did the queue0:src thread consume so much CPU? Is this reasonable?

    3.The flame graph generated by perf also shows that the load of queue0:src is the highest.You can download the out.zip file to view it.

    1374.out.zip

  • Hello, 

    How many streams are being passed by appSrc?

    The queue element will act to decouple the up and downstream elements. The first queue is configured to constantly poll to push frames downstream to keep timing in sync. I suggest trying to set is-live=false and blocking=true to see if CPU drops and no desync occurs. If frames get out of sync then is-live=true is needed.

    Let me know if that helps, if not we can explore some other attributes to constrain the queue element's CPU utilization.

    Thanks,
    Sarabesh S.

  • hi

    How many streams are being passed by appSrc?

    Our app uses mosaic nodes to combine six images into one 1536*1728 image, and then push it to appsrc.The actual frame rate is approximately 23 frames per second.

    I suggest trying to set is-live=false and blocking=true to see if CPU drops and no desync occurs.

    We attempted to configure is-live=false and block=true, but the CPU load did not change.The actual frame rate remains the same as before, at 23 frames per second.

  • Hello,

    Could you try to use the queue with these properties: 

    queue leaky=downstream max-size-time=87000000

    I notice that your pipeline is not taking advantage of DMA-buf to avoid memcpys from the appsrc camera buffers to the downstream elements. That is likely the source of the CPU overhead which cannot be brought down by placing limits on the queue element. Could you also run the application with GST_DEBUG=queue:6 GST_DEBUG_NO_COLOR=1 GST_DEBUG_FILE=queue.log and share the log so we can see how full each queue gets.

    Thanks,
    Sarabesh S.

  • hi

    Could you try to use the queue with these properties: 

    queue leaky=downstream max-size-time=87000000

    We tried this configuration, it seems that the CPU load has not changed significantly.

    Could you also run the application with GST_DEBUG=queue:6 GST_DEBUG_NO_COLOR=1 GST_DEBUG_FILE=queue.log and share the log so we can see how full each queue gets.

    You can download the "queue.zip" file to view it.

    queue.zip

  • Hello,

    Has there been any further advancement on this issue? Like stated above you will need to utilize dma-buf to reduce CPU utilization with mem-copies between elements in the pipeline. Additionally since you are writing to a file with Appsink you will undoubtedly see CPU usage for writing to a file, you can try writing to fakesink to see if you see any improvements.

    Based on the TOP command you show the CPU utilization for the encoder hardware is actually quite low ~1.7%, I don't see this going much lower. You can try adding the following to your vl42h264enc element: 

    i += snprintf(&params->m_cmdString[i], CODEC_MAX_LEN_CMD_STR-i,"! v4l2h264enc output-io-mode=4 capture-io-mode=2 bitrate=1500000 \n");

    And have your camera configured to import buffers from the v4l2 element. 

    Thanks,
    Sarabesh S.