SK-AM62A-LP: High Encoder latency (180 ms!!!!!) for 4K 30fps h264/h265 encoding

Rouzbeh Tafreshi

Part Number: SK-AM62A-LP

Dear TI experts,

Throughout my investigation on encoder evolution for 4K 30 fps video feed, I have gotten bizarre results from the encoder latency using the tools provided in https://software-dl.ti.com/processor-sdk-linux/esd/AM62AX/09_00_01/exports/edgeai-docs/common/measure_perf.html. So the test file is 4.633 seconds of a 4K 30 fps RAW (I420) video placed at RAM with tmpfs($ mount -t tmpfs none /test_case/)
Cause when I put the video on storage aka /test for instance, the output fps is 5~6 fps due to memory bandwidth constrains on fetching the data.
However, when I play the raw input on the RAM, I do achieve 30 ps throughput form the pipeline, but the latency of the encoder using GST_TRACERS is almost 180 ms as you see below.

GST_DEBUG_FILE=./h264.log GST_DEBUG_NO_COLOR=1 GST_DEBUG="GST_TRACER:7" GST_TRACERS="latency(flags=element)" gst-launch-1.0 -e filesrc location= /test_cases/bbb_sunflower_2160p_30fps_normal_short.yuv blocksize=12441600 ! rawvideoparse width=3840 height=2160 format=i420 framerate=30/1 colorimetry=bt709 ! v4l2h264enc extra-controls="controls,frame_level_rate_control_enable=1,video_bitrate_mode=1,video_bitrate=15000000,h264_profile=4,h264_level=15" ! filesink location=./output.h265  sync=true

Note that I have done the same command with no extra-controls, with diffenret bitrates, with different profiles and levels, and with HEVC encoder as well. However, the result for all of them is same as below with tiny bit differneces but still the encoder latency is much more than what i expect (33 ms aka 1 frame latency for 30fps) and it varies between 175-190 ms using the parse_gst_tracers.py

+-----------------------------------------------------------------------------------+
|element latency out-latancy out-fps frames |
+-----------------------------------------------------------------------------------+
|rawvideoparse0 19.64 32.14 31 139 |
|v4l2h265enc0 176.80 32.88 30 139 |
+-----------------------------------------------------------------------------------+
I dont understand how you measure the out-latency but for me it looks like the output-latency is just 1/frames rate and its just the throughput of the pipeline which is fine. But since we need this encoder for real-time low latency solution, the 180 ms is not acceptable.

something I noticed was that when I put the file at flash storage in /test directory I get the following results
+-----------------------------------------------------------------------------------+
|element latency out-latancy out-fps frames |
+-----------------------------------------------------------------------------------+
|rawvideoparse0 159.11 169.64 5 195 |
|v4l2h264enc0 34.71 169.11 5 195 |
+-----------------------------------------------------------------------------------+

it means when the videoparsing pipeline is congested and slow, the encoder has lower latency and works well, but in overall the throughput is bounded by rawvideparse thats why I get 5fps.
Besides the gstreamer for instance reports the following at the end (my video is 4.633333 s) but I have timed the pipeline by my watch and its around 5.5 seconds to execute the pipeline. So I dont know how your IP core is reporting the pipeline execution time like the below one and what it is measuring exactly (I assume its just outputs frames/fps instead of actual execution time)

Got EOS from element "pipeline0".
EOS received - stopping pipeline...
Execution ended after 0:00:04.633856320
Setting pipeline to NULL ...
Freeing pipeline ...

Could you please investigate these further and suggest how I can minimize the encoder latency. I am using edgeai SDK 09_00_01 for AM62Ax.

Thank you so much

over 1 year ago

0 Suren Porwar over 1 year ago

TI__Mastermind 28235 points

Hi Rouzbeh,

Have you tried to experiment with 1920x1080@30fps file or a Camera capture and are the observations similar? Also can you add v4lh264dec capture-io-mode=4 and give it a try.

Also, I am on travel to customer site, so expect a response after i return next week when I will have my setup to verify your observations.

Apologies for the delay.

Best Regards,

Suren

0 Rouzbeh Tafreshi over 1 year ago in reply to Suren Porwar

Prodigy 30 points

Hi Suren,

Yes I have successfully implemented CSI 1080p 30 fps from IMX219 which I get 9.83 ms encoder latency from it as follow:

Target$ GST_DEBUG_FILE=/test/quality/csi_h265.log GST_DEBUG_NO_COLOR=1 GST_DEBUG="GST_TRACER:7" GST_TRACERS="latency(flags=element+pipeline)" gst-launch-1.0 -e -v v4l2src device=/dev/video3 io-mode=dmabuf-import ! video/x-bayer, width=1920, height=1080, framerate=30/1, format=rggb10 ! tiovxisp sink_0::device=/dev/v4l-subdev2 sensor-name="SENSOR_SONY_IMX219_RPI" dcc-isp-file=/opt/imaging/imx219/linear/dcc_viss_10b_1920x1080.bin sink_0::dcc-2a-file=/opt/imaging/imx219/linear/dcc_2a_10b_1920x1080.bin format-msb=9 ! video/x-raw, format=NV12, width=1920, height=1080, framerate=30/1 ! v4l2h264enc output-io-mode=dmabuf-import ! fakesink

+-----------------------------------------------------------------------------------+
|element latency out-latancy out-fps frames |
+-----------------------------------------------------------------------------------+
|capsfilter0 0.53 33.27 30 571 |
|tiovxisp0 9.84 33.23 30 571 |
|capsfilter1 0.88 33.23 30 571 |
|v4l2src0 1.08 33.21 30 571 |
|v4l2h264enc0 9.83 33.21 30 571 |
+-----------------------------------------------------------------------------------+

I have also done the decode and capture-io-mode then encode as you said with the following command again with my h264 file located in the RAM tmpfs

target$ GST_DEBUG_FILE=/test/quality/decenc.log GST_DEBUG_NO_COLOR=1 GST_DEBUG="GST_TRACER:7" GST_TRACERS="latency(flags=element+pipeline)" gst-launch-1.0 -e filesrc location= /test/input.h264 ! h264parse ! v4l2h264dec capture-io-mode=dmabuf ! v4l2h264enc output-io-mode=5 ! filesink location=./output.h264

+-----------------------------------------------------------------------------------+
|element latency out-latancy out-fps frames |
+-----------------------------------------------------------------------------------+
|h264parse0 2.66 113.85 8 218 |
|v4l2h264dec0 195.20 32.96 30 755 |
|filesrc0 277.82 113.88 8 218 |
|v4l2h264enc0 84.03 32.90 30 755 |
+-----------------------------------------------------------------------------------+

it is better and 84 ms but stiill its 3 times higher than 33 ms expected latency
Could you please give an update on this matter.

Thanks again!

0 Rouzbeh Tafreshi over 1 year ago in reply to Suren Porwar

Prodigy 30 points

Dear Suren,

Have you had a chance to check on the latency problem??
I still havent managed to get anything below 85 ms as encoder latency while the throughput is 30 fps for 4K. It can be a underlying codec IP that buffers multiple frames before encoding them? like batch encoding? Cause ideally for real-time encoding the frames are available to the encoder every 33.3 ms and ideally encoder itself should encode the frame in less than this time to reach real-time! theoretically TRM reported 33.3 ms encoder latency for 4K. So could you please advise how can I reach that ?

Thank you again!

0 Suren Porwar over 1 year ago in reply to Rouzbeh Tafreshi

TI__Mastermind 28235 points

Hi Rouzbeh,

GST Latency Tracers work okay with only live sources. Filesrc is a non-lilve source.

Can you try to see if the latency is better when you have a v4l2src (Camera source) -> Encode -> Stream like the one below:

Target$ GST_DEBUG_FILE=/test/quality/csi_h265.log GST_DEBUG_NO_COLOR=1 GST_DEBUG="GST_TRACER:7" GST_TRACERS="latency(flags=element+pipeline)" gst-launch-1.0 -e -v v4l2src device=/dev/video3 io-mode=dmabuf-import ! video/x-bayer, width=1920, height=1080, framerate=30/1, format=rggb10 ! tiovxisp sink_0::device=/dev/v4l-subdev2 sensor-name="SENSOR_SONY_IMX219_RPI" dcc-isp-file=/opt/imaging/imx219/linear/dcc_viss_10b_1920x1080.bin sink_0::dcc-2a-file=/opt/imaging/imx219/linear/dcc_2a_10b_1920x1080.bin format-msb=9 ! video/x-raw, format=NV12, width=1920, height=1080, framerate=30/1 ! v4l2h264enc output-io-mode=dmabuf-import ! rtph264pay ! udpsink host=<ip_addr> port=5000

Best Regards,

Suren

0 Rouzbeh Tafreshi over 1 year ago in reply to Suren Porwar

Prodigy 30 points

Hi Suren,

I have done the 1080p 30fps from CSI camera and I have provided the latency outcomes in my previous replies in this thread. Its about 9.8 ms for 1080p 30 fps which is somewhat acceptable. But how can I check the latency for 4K 30fps without a 4K csi Camera then?
Besides, maybe using the Wave5 native driver API gives a more accurate latency on this case. Is there a example or sample C++ src code for encoding written to use wave5 API and V4l2 library?? That would be a good starting point to make our own program for encoding as apart from latency, there is a memory leak issue which crashes the Gstreamer pipeline encoding longer video streams.

Thank you so much.

0 Suren Porwar over 1 year ago in reply to Rouzbeh Tafreshi

TI__Mastermind 28235 points

Hi Rouzbeh,

I will have to check with my software team, if we have any sample code or anything that can be shared. Please allow me a day/two to respond.

IMX219 also supports 4k@15fps

Best Regards,

Suren

0 Suren Porwar over 1 year ago in reply to Suren Porwar

TI__Mastermind 28235 points

Hi Rouzbeh,

Since our driver is compliant with V4L2, you should be able to write a V4L2 M2M based application that is platform independent to verify.

Best Regards,

Suren

Processors

Processors forum

SK-AM62A-LP: High Encoder latency (180 ms!!!!!) for 4K 30fps h264/h265 encoding