TDA4VM: DSP/CPU utilization by tiovxcolorconvert

John Weber

Part Number: TDA4VM

Hi there,

We’re attempting to bring up 8 cameras on our new platform based on TDA4VM.

The issue we currently face is that we are seeing some pretty high C6x DSP and A72 utilization when using tiovxcolorconvert plugin.

The cameras produce UYVY (YUV422) formatted image streams, so we must first convert these to planar YUV420 (NV12) in order to utilize additional plugins like tiovxmultiscaler and tiovxmosaic. Our thought was to use tiovxcolorconvert for this in order to take advantage of the on-board processing of the TDA4VM, but what we find is that each camera instance needs about 25% of a single C6x DSP and adds some CPU utilization (A72) as well.

Our streams are 1280x800 images at 30 fps (UYVY). We are attempting to have 8 of them on a screen at the same time (scaled down to fit the screen). This is an incoming pixel rate of around 250MP/s for all 8 cameras. We are able to display 6 of them, but doing so results in nearly 100% utilization of both C6x DSP cores and a significant chunk of A72. This seems ‘off’ to me – I wouldn’t think that the DSP load would be that high for a simple colorspace conversion operation.

We are using PSDK Linux 8.5.

Here is our pipeline:

gst-launch-1.0 \

v4l2src device=/dev/video3 io-mode=2 ! video/x-raw,width=1280,height=800,format=UYVY ! tiovxcolorconvert target=0 ! video/x-raw,format=NV12 ! tiovxmultiscaler target=0 ! video/x-raw, width=320, height=384 ! waylandsink \

v4l2src device=/dev/video4 io-mode=2 ! video/x-raw,width=1280,height=800,format=UYVY ! tiovxcolorconvert target=0 ! video/x-raw,format=NV12 ! tiovxmultiscaler target=0 ! video/x-raw, width=320, height=384 ! waylandsink \

v4l2src device=/dev/video5 io-mode=2 ! video/x-raw,width=1280,height=800,format=UYVY ! tiovxcolorconvert target=1 ! video/x-raw,format=NV12 ! tiovxmultiscaler target=1 ! video/x-raw, width=320, height=384 ! waylandsink \

v4l2src device=/dev/video6 io-mode=2 ! video/x-raw,width=1280,height=800,format=UYVY ! tiovxcolorconvert target=1 ! video/x-raw,format=NV12 ! tiovxmultiscaler target=1 ! video/x-raw, width=320, height=384 ! waylandsink \

v4l2src device=/dev/video19 io-mode=2 ! video/x-raw,width=1280,height=800,format=UYVY ! tiovxcolorconvert target=0 ! video/x-raw,format=NV12 ! tiovxmultiscaler target=0 ! video/x-raw, width=320, height=384 ! waylandsink \

v4l2src device=/dev/video20 io-mode=2 ! video/x-raw,width=1280,height=800,format=UYVY ! tiovxcolorconvert target=1 ! video/x-raw,format=NV12 ! tiovxmultiscaler target=1 ! video/x-raw, width=320, height=384 ! waylandsink &

Here is the summary of the load:

Summary of CPU load,

====================

CPU: mpu1_0: TOTAL LOAD = 39.48 % ( HWI = 2.30 %, SWI = 1.28 % )

CPU: mcu2_0: TOTAL LOAD = 9. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )

CPU: mcu2_1: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )

CPU: c6x_1: TOTAL LOAD = 88. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )

CPU: c6x_2: TOTAL LOAD = 89. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )

CPU: c7x_1: TOTAL LOAD = 0. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )

HWA performance statistics,

===========================

HWA: MSC0: LOAD = 23.40 % ( 135 MP/s )

HWA: MSC1: LOAD = 23.42 % ( 134 MP/s )

DDR performance statistics,

===========================

DDR: READ BW: AVG = 2310 MB/s, PEAK = 9918 MB/s

DDR: WRITE BW: AVG = 1361 MB/s, PEAK = 5297 MB/s

DDR: TOTAL BW: AVG = 3671 MB/s, PEAK = 15215 MB/s

We have also set up a pipeline using videotestsrc:

GST_DEBUG_FILE=pipeline-kmssink-sync-false-480.log GST_DEBUG_NO_COLOR=1 GST_DEBUG="GST_TRACER:7" GST_TRACERS="latency(flags=element)" \

gst-launch-1.0 videotestsrc ! video/x-raw,width=1280,height=720,format=RGB,framerate=30/1 ! \

tiovxcolorconvert ! video/x-raw,format=NV12 ! kmssink sync=false driver-name=tidss -v &

Gstreamer Tracers:

Thanks for your assistance,

John

over 1 year ago

0 John Weber over 1 year ago

Intellectual 421 points

We also ran the the pipeline with a very similar frame format as below. Instead of providing an RGB image from videotestsrc, we wanted to see what the consumption of tiovxcolorconvert would be for YUYV images. However, the capsfilter for videotestsrc will not work (error). Instead, we switched to UYVY.

GST_DEBUG_FILE=pipeline-kmssink-sync-false-480.log GST_DEBUG_NO_COLOR=1 GST_DEBUG="GST_TRACER:7" GST_TRACERS="latency(flags=element)" \

gst-launch-1.0 videotestsrc ! capsfilter caps=video/x-raw,width=1280,height=720,format=UYVY,framerate=30/1 ! \

tiovxcolorconvert ! video/x-raw,format=NV12 ! kmssink sync=false driver-name=tidss -v &

Should we expect 10% A72 loading and 17% C6x_1 loading for a single color conversion task?

Thanks,

John

0 Takuma Fujiwara over 1 year ago in reply to John Weber

TI__Mastermind 25380 points

Hi John,

Apologies, our expert on this plugin is currently out of office, and thank you for posting on E2E. I have gathered the following information which explains some of the numbers.

The 10% on A72 is expected. This comes not from the color convert plugin but mostly from the other plugins such as videotestsrc and kmssink which are running on the A72. For example, removing tiovxcolorconvert from the pipeline will result in the following:
- Example pipeline of no tiovxcolorconvert: gst-launch-1.0 videotestsrc ! video/x-raw,width=1280,height=720,format=UYVY ! kmssink
As for the high numbers for C66x such as 17%, there are two different color conversion plugins within our project: tiovxcolorconvert and tiovxdlcolorconvert. The difference between the two is there has been some changes done for DMA and C66x optimization in the "dl" version of colorconvert. The issue is we support a subset of the original tiovxcolorconvert plugin, and UYVY is not a supported input color format. More details about the plugins can be found here: https://github.com/TexasInstruments/edgeai-gst-plugins/wiki
- Example pipeline with "dl" version: gst-launch-1.0 videotestsrc is-live=true pattern=0 ! "video/x-raw, format=RGB, width=1280, height=720" ! tiovxdlcolorconvert in-pool-size=6 out-pool-size=6 ! "video/x-raw, format=NV12" ! kmssink
- Example pipeline with non-"dl" version: gst-launch-1.0 videotestsrc is-live=true pattern=0 ! "video/x-raw, format=RGB, width=1280, height=720" ! tiovxcolorconvert in-pool-size=6 out-pool-size=6 ! "video/x-raw, format=NV12" ! kmssink

Regards,

Takuma

0 John Weber over 1 year ago in reply to Takuma Fujiwara

Intellectual 421 points

Thanks Takuma. This does help confirm our own findings. Your comment regarding A72 usage does make sense given the other elements in the pipeline.

The issue with the color formats is that NV12, NV21, and I420 are all planar formats which are not going to be supported with the cameras we are using. The only packed color format supported by the tiovxdlcolorconvert is RGB, which is not the most efficient color format (24bits/pixel).

A couple of questions:

1) What is driving the processing requirement for tiovxcolorconvert? Can this be improved in any way?

2) Is there a roadmap to add other packed color formats to tiovxdlcolorconvert?

Thanks!

John

0 John Weber over 1 year ago in reply to John Weber

Intellectual 421 points

Hi Takuma,

I was wondering if there might be additional responses to my questions above?

Thanks,

John

0 Takuma Fujiwara over 1 year ago in reply to John Weber

TI__Mastermind 25380 points

Hi John,

Apologies for the long delay.

1) The main optimization done to make the processing time better for tiovxdlcolorconvert is the switch to DMA to make memory access more efficient.

2) Currently we do not have plans, but I will bring this up with the development team to see if we can add support.

Regards,

Takuma

0 John Weber over 1 year ago in reply to Takuma Fujiwara

Intellectual 421 points

Thanks Takuma.

0 John Weber over 1 year ago in reply to Takuma Fujiwara

Intellectual 421 points

Takuma.

I just noticed the following commit to the edgeai-gst-plugins:

https://github.com/TexasInstruments/edgeai-gst-plugins/commit/bd479bacba471183459aac0cbc34e74aa9305ea4

"Add edgeai-tiovx-kernels as dependency and remove DSP target

This commit addresses following:
1) Adds link to libedgeai-tiovx-kernels.so for dl_pre_proc,
   dl_color_convert, dl_color_blend modules
2) Remove DSP target and add Armv8 target instead

Signed-off-by: Shubham Jain <a0492788@ti.com>"

It seems that the DSP target is being removed from the plugins themselves. I can see this being OK for J784S4, but I would think DSP C6x usage would need to retained for TDA4VM and J721S2.

Can you confirm this change? If so, what is the reasoning behind it?

Thanks,

John

0 Takuma Fujiwara over 1 year ago in reply to John Weber

TI__Mastermind 25380 points

Hi John,

Yes, I got feedback that the changes were intentional. Instead of using the C6x, there are plans to use ARM Neon to do the color conversion.

Regards,

Takuma

0 Takuma Fujiwara over 1 year ago in reply to Takuma Fujiwara

TI__Mastermind 25380 points

Hi John,

We have a new proposal for color conversion. Instead of C66x, we can utilize the LDC to convert between UYVY and NV12 since this is unused if the camera already provides formatted streams. Attached are some patches that need to be applied on top of the repositories from EdgeAI SDK 8.5:

8_5_edgeai_ldc_colorconvert.zip

The zip file contains a README with installation instructions, and an example GStreamer pipeline using videotestsrc 720p resolution to convert between UYVY and NV12. LDC load is around 8% for the pipeline.
We also tried 8 channel by splitting the v4l2src output using an OV5640 camera and LDC load was around 70%

Regards,

Takuma

0 John Weber over 1 year ago in reply to Takuma Fujiwara

Intellectual 421 points

Thanks Takuma! This seems like a reasonable proposal. We'll test it and let you know.

Can the LDC also do YUYV format (gstreamer format: YUY2)? Or just UYVY?

0 Takuma Fujiwara over 1 year ago in reply to John Weber

TI__Mastermind 25380 points

Hi John,

It can support the following:

+    case GST_VIDEO_FORMAT_GRAY8:
+      append_format_to_list (src_formats, "GRAY8");
+      break;
+    case GST_VIDEO_FORMAT_GRAY16_LE:
+      append_format_to_list (src_formats, "GRAY16_LE");
+      break;
+    case GST_VIDEO_FORMAT_NV12:
+      append_format_to_list (src_formats, "NV12");
+      break;
+    case GST_VIDEO_FORMAT_UYVY:
+      append_format_to_list (src_formats, "UYVY");
+      append_format_to_list (src_formats, "NV12");

So currently, does not support YUYV.

Regards,

Takuma

0 John Weber over 1 year ago in reply to Takuma Fujiwara

Intellectual 421 points

Hi Takuma,

Thanks. We will check. The reason I asked is because the cameras we use support YUYV at the moment.

Regards,

John

0 Brijesh Jadav over 1 year ago in reply to John Weber

TI__Guru**** 398160 points

Btw LDC does not support YUYV format. It supports only UYVY format.

Processors

Processors forum

TDA4VM: DSP/CPU utilization by tiovxcolorconvert