This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: Is there any video encoding example on TDA4 SDK_08_02?

Part Number: TDA4VM

Hi TI,

I saw the SDK_08_02 has been released. Is there any video encoding example on TDA4 SDK_08_02?

Regards,

Damon

  • Hello Damon,

    We don't have any examples of video encoding in the PSDK RTOS 8.2 release below:

    https://www.ti.com/tool/download/PROCESSOR-SDK-RTOS-J721E

    However, we do have some video encoding examples in the edge AI SDK below:

    https://software-dl.ti.com/jacinto7/esd/edgeai-sdk-j721e/latest/exports/docs/sdk_overview.html

    Regards,

    Lucas

  • Hi Lucas,

    Thank you for help.I took a look at the example you provided.Is the entire stream based on the GStreamer framework?

    However, we do have some video encoding examples in the edge AI SDK below:

    https://software-dl.ti.com/jacinto7/esd/edgeai-sdk-j721e/latest/exports/docs/sdk_overview.html

    I'm trying to implement it on OpenVX Stream. I create a userkernel running on A72, which encapsulates the V4L2 driver.(refered to the example of vxe_vxd/encoder under SDK_Linux). I created a graph(capture->displayM2M->scaler->encoder) for testing and successfully encoded a 4CHcam image with a frame rate of 25fps.

    Then I added srv node into the graph: capture->displayM2M->scaler->encoder

                                                                                                                         |------>srv->display

     And the frame rate reduced to 18fps.I don't understand why the addition of the SRV node causes the framerate to drop? And not sure how I can get the framerate up to 25fps

    The performance information of these two graphs is as follows:

    Graph: Capture->displayM2M->scaler->encoder

                                                                 |------>display

    Summary of CPU load,
    ====================
    CPU: mpu1_0: TOTAL LOAD = 14.86 % ( HWI = 0.31 %, SWI = 0.10 % )
    CPU: mcu2_0: TOTAL LOAD = 10. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
    CPU: mcu2_1: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
    HWA: MSC1: LOAD = 24.81 % ( 138 MP/s )
    DDR: READ BW: AVG = 873 MB/s, PEAK = 9196 MB/s
    DDR: WRITE BW: AVG = 1196 MB/s, PEAK = 8401 MB/s
    DDR: TOTAL BW: AVG = 2069 MB/s, PEAK = 17597 MB/s
    GRAPH: HPA_Demo (#nodes = 5, #executions = 200)
    NODE: CAPTURE1: CaptureNode: avg = 40163 usecs, min/max = 39879 / 85762 usecs, #executions = 200
    NODE: DSS_M2M1: display_m2m: avg = 7722 usecs, min/max = 7672 / 7969 usecs, #executions = 200
    NODE: VPAC_MSC2: ScalerNode: avg = 10174 usecs, min/max = 10116 / 10274 usecs, #executions = 200
    NODE: A72-0: InvoV4L2EncoderNode: avg = 37328 usecs, min/max = 30680 / 57197 usecs, #executions = 200
    NODE: DISPLAY1: DisplayNode: avg = 17819 usecs, min/max = 91 / 31299 usecs, #executions = 200
    PERF: TOTAL: avg = 40059 usecs, min/max = 30634 / 60881 usecs, #executions = 127
    PERF: TOTAL: 24.96 FPS


    Graph: Capture->displayM2M->scaler->encoder

                                                                 |------>srv->display                                                                    
    Summary of CPU load,
    ====================
    CPU: mpu1_0: TOTAL LOAD = 14.95 % ( HWI = 0.41 %, SWI = 0.10 % )
    CPU: mcu2_0: TOTAL LOAD = 9. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
    CPU: mcu2_1: TOTAL LOAD = 1. 0 % ( HWI = 0. 0 %, SWI = 0. 0 % )
    HWA: MSC1: LOAD = 20.31 % ( 107 MP/s )
    HWA: GPU : LOAD = 23.93 % ( 40 MP/s )
    DDR: READ BW: AVG = 1507 MB/s, PEAK = 9816 MB/s
    DDR: WRITE BW: AVG = 1468 MB/s, PEAK = 19622 MB/s
    DDR: TOTAL BW: AVG = 2975 MB/s, PEAK = 29438 MB/s
    GRAPH: HPA_Demo (#nodes = 6, #executions = 471)
    NODE: CAPTURE1: CaptureNode: avg = 26806 usecs, min/max = 131 / 75798 usecs, #executions = 471
    NODE: DSS_M2M1: display_m2m: avg = 7791 usecs, min/max = 7673 / 8561 usecs, #executions = 471
    NODE: VPAC_MSC2: ScalerNode: avg = 10702 usecs, min/max = 10083 / 12775 usecs, #executions = 471
    NODE: A72-0: InvoV4L2EncoderNode: avg = 38216 usecs, min/max = 30734 / 59165 usecs, #executions = 471
    NODE: A72-0: SrvInvoNode: avg = 12377 usecs, min/max = 11111 / 26213 usecs, #executions = 471
    NODE: DISPLAY1: DisplayNode: avg = 16969 usecs, min/max = 124 / 33551 usecs, #executions = 471
    PERF: TOTAL: avg = 51505 usecs, min/max = 4 / 127462 usecs, #executions = 178
    PERF: TOTAL: 19.41 FPS
    ==========================

    Could you help me with this?

    Regards,

    Damon

  • Hi Damon,

    If you run this encoder node in isolation, does this get the expected performance or does it have a reduced performance from your expectation in isolation as well?

    Regards,

    Lucas

  • Hi Lucas,

    If you run this encoder node in isolation, does this get the expected performance or does it have a reduced performance from your expectation in isolation as well?

    Yes, the frame rate is 25Fps when running the graph below:(there are 4 camera inputs and their frame rate is 25fps)

    Graph: Capture->displayM2M->scaler->encoder

                                                                 |------>display

    I tried using TIDL_OD instead of SRV, the frame rate is 25Fps too.

    Graph: Capture->displayM2M->scaler->encoder

                                                                 |------>img_preproc->tidl_od->draw_detec

    So it seems that the framerate drop only occurs when the Encoder and SRV are in parallel. Both of them are running on the A72 core.But the total load of mpu1_0 is not so high.What do you think about this?

    Regards,

    Damon

  • Hi Damon,

    If you remove encoder or the preproc, is this performance as expected?

    Also, which OpenVX target are you assigning the encoder and SRV nodes to?  If they are assigned to the same target, I would suggest changing them to different A72 targets, as we have multiple options for targets running on A72.  (The "targets" in this context is essentially a thread, so if you have them both assigned to the same thread, this could block execution on the other node.)

    Finally, are there sufficient buffers being used between the nodes as well as the pipeline depth increased for the addition of the new node?

    Regards,

    Lucas

  • Hi Lucas,

    If you remove encoder or the preproc, is this performance as expected?

    Yes, it is 25 Fps which is  the expected performance.

    I tried extending the SRV kernel to support the target 'A72_1',then I set SRV to A72_1 and Encoder to A72_0.  And the performance improved tp 24.5Fps.I am satisfied with this now.

    Thank you for help!

    Regards,

    Damon