TDA4VM: Is there any video encoding example on TDA4 SDK_08_02?

Damon@INVO

Intellectual 690 points

Part Number: TDA4VM

Hi TI,

I saw the SDK_08_02 has been released. Is there any video encoding example on TDA4 SDK_08_02?

Regards,

Damon

over 2 years ago

0 Lucas Weaver over 2 years ago

TI__Genius 13765 points

Hello Damon,

We don't have any examples of video encoding in the PSDK RTOS 8.2 release below:

https://www.ti.com/tool/download/PROCESSOR-SDK-RTOS-J721E

However, we do have some video encoding examples in the edge AI SDK below:

https://software-dl.ti.com/jacinto7/esd/edgeai-sdk-j721e/latest/exports/docs/sdk_overview.html

Regards,

Lucas

0 Damon@INVO over 2 years ago in reply to Lucas Weaver

Intellectual 690 points

Hi Lucas,

Thank you for help.I took a look at the example you provided.Is the entire stream based on the GStreamer framework?

Lucas Weaver said:
However, we do have some video encoding examples in the edge AI SDK below:

https://software-dl.ti.com/jacinto7/esd/edgeai-sdk-j721e/latest/exports/docs/sdk_overview.html

I'm trying to implement it on OpenVX Stream. I create a userkernel running on A72, which encapsulates the V4L2 driver.(refered to the example of vxe_vxd/encoder under SDK_Linux). I created a graph(capture->displayM2M->scaler->encoder) for testing and successfully encoded a 4CHcam image with a frame rate of 25fps.

Then I added srv node into the graph: capture->displayM2M->scaler->encoder

|------>srv->display

And the frame rate reduced to 18fps.I don't understand why the addition of the SRV node causes the framerate to drop? And not sure how I can get the framerate up to 25fps

The performance information of these two graphs is as follows：

Graph: Capture->displayM2M->scaler->encoder

|------>display

Summary of CPU load,
====================
CPU: mpu1_0: TOTAL LOAD = 14.86 % ( HWI = 0.31 %, SWI = 0.10 % )
CPU: mcu2_0: TOTAL LOAD = 10. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
CPU: mcu2_1: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
HWA: MSC1: LOAD = 24.81 % ( 138 MP/s )
DDR: READ BW: AVG = 873 MB/s, PEAK = 9196 MB/s
DDR: WRITE BW: AVG = 1196 MB/s, PEAK = 8401 MB/s
DDR: TOTAL BW: AVG = 2069 MB/s, PEAK = 17597 MB/s
GRAPH: HPA_Demo (#nodes = 5, #executions = 200)
NODE: CAPTURE1: CaptureNode: avg = 40163 usecs, min/max = 39879 / 85762 usecs, #executions = 200
NODE: DSS_M2M1: display_m2m: avg = 7722 usecs, min/max = 7672 / 7969 usecs, #executions = 200
NODE: VPAC_MSC2: ScalerNode: avg = 10174 usecs, min/max = 10116 / 10274 usecs, #executions = 200
NODE: A72-0: InvoV4L2EncoderNode: avg = 37328 usecs, min/max = 30680 / 57197 usecs, #executions = 200
NODE: DISPLAY1: DisplayNode: avg = 17819 usecs, min/max = 91 / 31299 usecs, #executions = 200
PERF: TOTAL: avg = 40059 usecs, min/max = 30634 / 60881 usecs, #executions = 127
PERF: TOTAL: 24.96 FPS

Graph: Capture->displayM2M->scaler->encoder

|------>srv->display
Summary of CPU load,
====================
CPU: mpu1_0: TOTAL LOAD = 14.95 % ( HWI = 0.41 %, SWI = 0.10 % )
CPU: mcu2_0: TOTAL LOAD = 9. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
CPU: mcu2_1: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
HWA: MSC1: LOAD = 20.31 % ( 107 MP/s )
HWA: GPU : LOAD = 23.93 % ( 40 MP/s )
DDR: READ BW: AVG = 1507 MB/s, PEAK = 9816 MB/s
DDR: WRITE BW: AVG = 1468 MB/s, PEAK = 19622 MB/s
DDR: TOTAL BW: AVG = 2975 MB/s, PEAK = 29438 MB/s
GRAPH: HPA_Demo (#nodes = 6, #executions = 471)
NODE: CAPTURE1: CaptureNode: avg = 26806 usecs, min/max = 131 / 75798 usecs, #executions = 471
NODE: DSS_M2M1: display_m2m: avg = 7791 usecs, min/max = 7673 / 8561 usecs, #executions = 471
NODE: VPAC_MSC2: ScalerNode: avg = 10702 usecs, min/max = 10083 / 12775 usecs, #executions = 471
NODE: A72-0: InvoV4L2EncoderNode: avg = 38216 usecs, min/max = 30734 / 59165 usecs, #executions = 471
NODE: A72-0: SrvInvoNode: avg = 12377 usecs, min/max = 11111 / 26213 usecs, #executions = 471
NODE: DISPLAY1: DisplayNode: avg = 16969 usecs, min/max = 124 / 33551 usecs, #executions = 471
PERF: TOTAL: avg = 51505 usecs, min/max = 4 / 127462 usecs, #executions = 178
PERF: TOTAL: 19.41 FPS
==========================

Could you help me with this?

Regards,

Damon

0 Lucas Weaver over 2 years ago in reply to Damon@INVO

TI__Genius 13765 points

Hi Damon,

If you run this encoder node in isolation, does this get the expected performance or does it have a reduced performance from your expectation in isolation as well?

Regards,

Lucas

0 Damon@INVO over 2 years ago in reply to Lucas Weaver

Intellectual 690 points

Hi Lucas,

Lucas Weaver said:
If you run this encoder node in isolation, does this get the expected performance or does it have a reduced performance from your expectation in isolation as well?

Yes, the frame rate is 25Fps when running the graph below:(there are 4 camera inputs and their frame rate is 25fps)

Damon@INVO said:
Graph: Capture->displayM2M->scaler->encoder

|------>display

I tried using TIDL_OD instead of SRV, the frame rate is 25Fps too.

Graph: Capture->displayM2M->scaler->encoder

|------>img_preproc->tidl_od->draw_detec

So it seems that the framerate drop only occurs when the Encoder and SRV are in parallel. Both of them are running on the A72 core.But the total load of mpu1_0 is not so high.What do you think about this?

Regards,

Damon

+1 Lucas Weaver over 2 years ago in reply to Damon@INVO

TI__Genius 13765 points

Hi Damon,

If you remove encoder or the preproc, is this performance as expected?

Also, which OpenVX target are you assigning the encoder and SRV nodes to? If they are assigned to the same target, I would suggest changing them to different A72 targets, as we have multiple options for targets running on A72. (The "targets" in this context is essentially a thread, so if you have them both assigned to the same thread, this could block execution on the other node.)

Finally, are there sufficient buffers being used between the nodes as well as the pipeline depth increased for the addition of the new node?

Regards,

Lucas

0 Damon@INVO over 2 years ago in reply to Lucas Weaver

Intellectual 690 points

Hi Lucas,

Lucas Weaver said:
If you remove encoder or the preproc, is this performance as expected?

Yes, it is 25 Fps which is the expected performance.

I tried extending the SRV kernel to support the target 'A72_1'，then I set SRV to A72_1 and Encoder to A72_0. And the performance improved tp 24.5Fps.I am satisfied with this now.

Thank you for help!

Regards,

Damon

Processors

Processors forum

TDA4VM: Is there any video encoding example on TDA4 SDK_08_02?