Hi there,
We’re attempting to bring up 8 cameras on our new platform based on TDA4VM.
The issue we currently face is that we are seeing some pretty high C6x DSP and A72 utilization when using tiovxcolorconvert plugin.
The cameras produce UYVY (YUV422) formatted image streams, so we must first convert these to planar YUV420 (NV12) in order to utilize additional plugins like tiovxmultiscaler and tiovxmosaic. Our thought was to use tiovxcolorconvert for this in order to take advantage of the on-board processing of the TDA4VM, but what we find is that each camera instance needs about 25% of a single C6x DSP and adds some CPU utilization (A72) as well.
Our streams are 1280x800 images at 30 fps (UYVY). We are attempting to have 8 of them on a screen at the same time (scaled down to fit the screen). This is an incoming pixel rate of around 250MP/s for all 8 cameras. We are able to display 6 of them, but doing so results in nearly 100% utilization of both C6x DSP cores and a significant chunk of A72. This seems ‘off’ to me – I wouldn’t think that the DSP load would be that high for a simple colorspace conversion operation.
We are using PSDK Linux 8.5.
Here is our pipeline:
gst-launch-1.0 \
v4l2src device=/dev/video3 io-mode=2 ! video/x-raw,width=1280,height=800,format=UYVY ! tiovxcolorconvert target=0 ! video/x-raw,format=NV12 ! tiovxmultiscaler target=0 ! video/x-raw, width=320, height=384 ! waylandsink \
v4l2src device=/dev/video4 io-mode=2 ! video/x-raw,width=1280,height=800,format=UYVY ! tiovxcolorconvert target=0 ! video/x-raw,format=NV12 ! tiovxmultiscaler target=0 ! video/x-raw, width=320, height=384 ! waylandsink \
v4l2src device=/dev/video5 io-mode=2 ! video/x-raw,width=1280,height=800,format=UYVY ! tiovxcolorconvert target=1 ! video/x-raw,format=NV12 ! tiovxmultiscaler target=1 ! video/x-raw, width=320, height=384 ! waylandsink \
v4l2src device=/dev/video6 io-mode=2 ! video/x-raw,width=1280,height=800,format=UYVY ! tiovxcolorconvert target=1 ! video/x-raw,format=NV12 ! tiovxmultiscaler target=1 ! video/x-raw, width=320, height=384 ! waylandsink \
v4l2src device=/dev/video19 io-mode=2 ! video/x-raw,width=1280,height=800,format=UYVY ! tiovxcolorconvert target=0 ! video/x-raw,format=NV12 ! tiovxmultiscaler target=0 ! video/x-raw, width=320, height=384 ! waylandsink \
v4l2src device=/dev/video20 io-mode=2 ! video/x-raw,width=1280,height=800,format=UYVY ! tiovxcolorconvert target=1 ! video/x-raw,format=NV12 ! tiovxmultiscaler target=1 ! video/x-raw, width=320, height=384 ! waylandsink &
Here is the summary of the load:
Summary of CPU load,
====================
CPU: mpu1_0: TOTAL LOAD = 39.48 % ( HWI = 2.30 %, SWI = 1.28 % )
CPU: mcu2_0: TOTAL LOAD = 9. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
CPU: mcu2_1: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
CPU: c6x_1: TOTAL LOAD = 88. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
CPU: c6x_2: TOTAL LOAD = 89. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
CPU: c7x_1: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
HWA performance statistics,
===========================
HWA: MSC0: LOAD = 23.40 % ( 135 MP/s )
HWA: MSC1: LOAD = 23.42 % ( 134 MP/s )
DDR performance statistics,
===========================
DDR: READ BW: AVG = 2310 MB/s, PEAK = 9918 MB/s
DDR: WRITE BW: AVG = 1361 MB/s, PEAK = 5297 MB/s
DDR: TOTAL BW: AVG = 3671 MB/s, PEAK = 15215 MB/s
We have also set up a pipeline using videotestsrc:
GST_DEBUG_FILE=pipeline-kmssink-sync-false-480.log GST_DEBUG_NO_COLOR=1 GST_DEBUG="GST_TRACER:7" GST_TRACERS="latency(flags=element)" \
gst-launch-1.0 videotestsrc ! video/x-raw,width=1280,height=720,format=RGB,framerate=30/1 ! \
tiovxcolorconvert ! video/x-raw,format=NV12 ! kmssink sync=false driver-name=tidss -v &
Gstreamer Tracers:
Thanks for your assistance,
John