Dear TI experts,
Throughout my investigation on encoder evolution for 4K 30 fps video feed, I have gotten bizarre results from the encoder latency using the tools provided in https://software-dl.ti.com/processor-sdk-linux/esd/AM62AX/09_00_01/exports/edgeai-docs/common/measure_perf.html. So the test file is 4.633 seconds of a 4K 30 fps RAW (I420) video placed at RAM with tmpfs($ mount -t tmpfs none /test_case/)
Cause when I put the video on storage aka /test for instance, the output fps is 5~6 fps due to memory bandwidth constrains on fetching the data.
However, when I play the raw input on the RAM, I do achieve 30 ps throughput form the pipeline, but the latency of the encoder using GST_TRACERS is almost 180 ms as you see below.GST_DEBUG_FILE=./h264.log GST_DEBUG_NO_COLOR=1 GST_DEBUG="GST_TRACER:7" GST_TRACERS="latency(flags=element)" gst-launch-1.0 -e filesrc location= /test_cases/bbb_sunflower_2160p_30fps_normal_short.yuv blocksize=12441600 ! rawvideoparse width=3840 height=2160 format=i420 framerate=30/1 colorimetry=bt709 ! v4l2h264enc extra-controls="controls,frame_level_rate_control_enable=1,video_bitrate_mode=1,video_bitrate=15000000,h264_profile=4,h264_level=15" ! filesink location=./output.h265 sync=true
Note that I have done the same command with no extra-controls, with diffenret bitrates, with different profiles and levels, and with HEVC encoder as well. However, the result for all of them is same as below with tiny bit differneces but still the encoder latency is much more than what i expect (33 ms aka 1 frame latency for 30fps) and it varies between 175-190 ms using the parse_gst_tracers.py
+-----------------------------------------------------------------------------------+
|element latency out-latancy out-fps frames |
+-----------------------------------------------------------------------------------+
|rawvideoparse0 19.64 32.14 31 139 |
|v4l2h265enc0 176.80 32.88 30 139 |
+-----------------------------------------------------------------------------------+
I dont understand how you measure the out-latency but for me it looks like the output-latency is just 1/frames rate and its just the throughput of the pipeline which is fine. But since we need this encoder for real-time low latency solution, the 180 ms is not acceptable.
something I noticed was that when I put the file at flash storage in /test directory I get the following results
+-----------------------------------------------------------------------------------+
|element latency out-latancy out-fps frames |
+-----------------------------------------------------------------------------------+
|rawvideoparse0 159.11 169.64 5 195 |
|v4l2h264enc0 34.71 169.11 5 195 |
+-----------------------------------------------------------------------------------+
it means when the videoparsing pipeline is congested and slow, the encoder has lower latency and works well, but in overall the throughput is bounded by rawvideparse thats why I get 5fps.
Besides the gstreamer for instance reports the following at the end (my video is 4.633333 s) but I have timed the pipeline by my watch and its around 5.5 seconds to execute the pipeline. So I dont know how your IP core is reporting the pipeline execution time like the below one and what it is measuring exactly (I assume its just outputs frames/fps instead of actual execution time)
Got EOS from element "pipeline0".
EOS received - stopping pipeline...
Execution ended after 0:00:04.633856320
Setting pipeline to NULL ...
Freeing pipeline ...
Could you please investigate these further and suggest how I can minimize the encoder latency. I am using edgeai SDK 09_00_01 for AM62Ax.
Thank you so much