This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VMXEVM: Surround view performance drop compared to Multi-cam use case

Part Number: TDA4VMXEVM

Hello,

We have been working on a new 3MP sensor integration and we are facing some performance drop in the Surround view use case. Event-based scheduling is used in both use-cases. The same events and parameters are being used and currently the only difference between the two use cases is the following:
  - Mosaic node is used in the Multi-cam use case | SRV node is used in the Surround view use case

The LDC node is not being used at the moment in the multi-cam use case.
By comparing the performance statistics, it was observed that the DDR's load is higher in the SRV use case. Our assumption is that the SRV node which stitches the four input images is loading the DDR more heavily than the Mosaic node. The performance logs are attached.

Could this be resolved in some way (e.g. by changing the OPP_mode -> OPP_nominal/OPP_high etc. as it was in TDA3)? Or by decreasing the performance of the OpenGL_SRV_node to ~30 000 usec which could decrease the DDR load but not break the pipelining?

Any suggestions or recommendations would be appreciated!

Summary of CPU load,
====================

CPU: mpu1_0: TOTAL LOAD =   0.67 % ( HWI =   0. 8 %, SWI =   0. 0 % )
CPU: mcu2_0: TOTAL LOAD =  23.81 % ( HWI =   0.58 %, SWI =   0.67 % )
CPU: mcu2_1: TOTAL LOAD =  47.51 % ( HWI =   4.21 %, SWI =   0.96 % )
CPU:  c6x_1: TOTAL LOAD =   0. 5 % ( HWI =   0. 2 %, SWI =   0. 1 % )
CPU:  c6x_2: TOTAL LOAD =   0. 5 % ( HWI =   0. 2 %, SWI =   0. 1 % )
CPU:  c7x_1: TOTAL LOAD =   0.10 % ( HWI =   0. 6 %, SWI =   0. 3 % )


HWA performance statistics,
===========================

HWA:   VISS: LOAD =  64.57 % ( 548 MP/s )
HWA:   MSC1: LOAD =  47.95 % ( 740 MP/s )


DDR performance statistics,
===========================

DDR: READ  BW: AVG =   2275 MB/s, PEAK =   4024 MB/s
DDR: WRITE BW: AVG =   1519 MB/s, PEAK =   2680 MB/s
DDR: TOTAL BW: AVG =   3794 MB/s, PEAK =   6704 MB/s


Detailed CPU performance/memory statistics,
===========================================

CPU: mcu2_0: TASK:           IPC_RX:   0. 6 %
CPU: mcu2_0: TASK:       REMOTE_SRV:   0. 0 %
CPU: mcu2_0: TASK:         TIVX_CPU:  20.77 %
CPU: mcu2_0: TASK:      IPC_TEST_RX:   0. 0 %
CPU: mcu2_0: TASK:      IPC_TEST_TX:   0. 0 %
CPU: mcu2_0: TASK:      IPC_TEST_TX:   0. 0 %
CPU: mcu2_0: TASK:      IPC_TEST_TX:   0. 0 %
CPU: mcu2_0: TASK:      IPC_TEST_TX:   0. 0 %
CPU: mcu2_0: TASK:      IPC_TEST_TX:   0. 0 %

CPU: mcu2_0: HEAP:   DDR_SHARED_MEM: size =    4194304 B, free =    1616384 B ( 38 % unused)

CPU: mcu2_1: TASK:           IPC_RX:   0. 7 %
CPU: mcu2_1: TASK:       REMOTE_SRV:   0.85 %
CPU: mcu2_1: TASK:         TIVX_CPU:   0. 0 %
CPU: mcu2_1: TASK:          TIVX_NF:   0. 0 %
CPU: mcu2_1: TASK:        TIVX_LDC1:   0. 0 %
CPU: mcu2_1: TASK:        TIVX_MSC1:  19.30 %
CPU: mcu2_1: TASK:        TIVX_MSC2:   0. 0 %
CPU: mcu2_1: TASK:         TIVX_SDE:   0. 0 %
CPU: mcu2_1: TASK:         TIVX_DOF:   0. 0 %
CPU: mcu2_1: TASK:       TIVX_VISS1:  17.67 %
CPU: mcu2_1: TASK:       TIVX_CAPT1:   3.25 %
CPU: mcu2_1: TASK:       TIVX_CAPT2:   0. 0 %
CPU: mcu2_1: TASK:       TIVX_DISP1:   1.14 %
CPU: mcu2_1: TASK:       TIVX_DISP2:   0. 0 %
CPU: mcu2_1: TASK:       TIVX_VDEC1:   0. 0 %
CPU: mcu2_1: TASK:       TIVX_VDEC2:   0. 0 %

CPU: mcu2_1: HEAP:   DDR_SHARED_MEM: size =   16777216 B, free =   16633088 B ( 99 % unused)
CPU: mcu2_1: HEAP:  DDR_NON_CACHE_M: size =   67108864 B, free =   47210496 B (  6 % unused)

CPU:  c6x_1: TASK:           IPC_RX:   0. 0 %
CPU:  c6x_1: TASK:       REMOTE_SRV:   0. 0 %
CPU:  c6x_1: TASK:         TIVX_CPU:   0. 0 %
CPU:  c6x_1: TASK:      IPC_TEST_RX:   0. 0 %
CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %

CPU:  c6x_1: HEAP:   DDR_SHARED_MEM: size =   16777216 B, free =   16775168 B ( 99 % unused)
CPU:  c6x_1: HEAP:           L2_MEM: size =     229376 B, free =     229376 B (100 % unused)
CPU:  c6x_1: HEAP:  DDR_SCRATCH_MEM: size =   50331648 B, free =   50331648 B ( 14 % unused)

CPU:  c6x_2: TASK:           IPC_RX:   0. 0 %
CPU:  c6x_2: TASK:       REMOTE_SRV:   0. 0 %
CPU:  c6x_2: TASK:         TIVX_CPU:   0. 0 %
CPU:  c6x_2: TASK:      IPC_TEST_RX:   0. 0 %
CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %

CPU:  c6x_2: HEAP:   DDR_SHARED_MEM: size =   16777216 B, free =   16775168 B ( 99 % unused)
CPU:  c6x_2: HEAP:           L2_MEM: size =     229376 B, free =     229376 B (100 % unused)
CPU:  c6x_2: HEAP:  DDR_SCRATCH_MEM: size =   50331648 B, free =   50331648 B ( 14 % unused)

CPU:  c7x_1: TASK:           IPC_RX:   0. 0 %
CPU:  c7x_1: TASK:       REMOTE_SRV:   0. 0 %
CPU:  c7x_1: TASK:         TIVX_CPU:   0. 0 %
CPU:  c7x_1: TASK:      IPC_TEST_RX:   0. 0 %
CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %

CPU:  c7x_1: HEAP:   DDR_SHARED_MEM: size =  469762048 B, free =  469762048 B (  8 % unused)
CPU:  c7x_1: HEAP:           L3_MEM: size =    8159232 B, free =    8159232 B (100 % unused)
CPU:  c7x_1: HEAP:           L2_MEM: size =     491520 B, free =     491520 B (100 % unused)
CPU:  c7x_1: HEAP:           L1_MEM: size =      16384 B, free =      16384 B (100 % unused)
CPU:  c7x_1: HEAP:  DDR_SCRATCH_MEM: size =   67108864 B, free =   67108864 B ( 36 % unused)


GRAPH:         graph_85 (#nodes =   5, #executions =   3976)
 NODE:   CAPTURE1:             Capture_node: avg =  33262 usecs, min/max =    290 /  74632 usecs, #executions =       3976
 NODE: VPAC_VISS1:          VISS_Processing: avg =  29665 usecs, min/max =  22944 /  52935 usecs, #executions =       3976
 NODE:     IPU1-1:               2A_AlgNode: avg =   7794 usecs, min/max =   6641 /  16933 usecs, #executions =       3976
 NODE:  VPAC_MSC1:               MosaicNode: avg =  21721 usecs, min/max =  20448 /  56443 usecs, #executions =       3976
 NODE:   DISPLAY1:             Display_node: avg =  15301 usecs, min/max =    129 /  29845 usecs, #executions =       3976

 PERF:            TOTAL: avg =  33343 usecs, min/max =  27706 /  80106 usecs, #executions =       3977

 PERF:            TOTAL:   29.99 FPS

Summary of CPU load,
====================

CPU: mpu1_0: TOTAL LOAD =   5.47 % ( HWI =   0.23 %, SWI =   0.21 % )
CPU: mcu2_0: TOTAL LOAD =  60.96 % ( HWI =   2.80 %, SWI =   2. 0 % )
CPU: mcu2_1: TOTAL LOAD =  43.74 % ( HWI =   6.78 %, SWI =   2.15 % )
CPU:  c6x_1: TOTAL LOAD =   0. 6 % ( HWI =   0. 2 %, SWI =   0. 2 % )
CPU:  c6x_2: TOTAL LOAD =   0. 6 % ( HWI =   0. 2 %, SWI =   0. 2 % )
CPU:  c7x_1: TOTAL LOAD =   0.12 % ( HWI =   0. 6 %, SWI =   0. 3 % )


HWA performance statistics,
===========================

HWA:   VISS: LOAD =  69.37 % ( 482 MP/s )
HWA:   GPU : LOAD =  62.94 % ( 93 MP/s )


DDR performance statistics,
===========================

DDR: READ  BW: AVG =   3045 MB/s, PEAK =   3199 MB/s
DDR: WRITE BW: AVG =   2272 MB/s, PEAK =   2378 MB/s
DDR: TOTAL BW: AVG =   5317 MB/s, PEAK =   5577 MB/s


Detailed CPU performance/memory statistics,
===========================================

CPU: mcu2_0: TASK:           IPC_RX:   0. 7 %
CPU: mcu2_0: TASK:       REMOTE_SRV:   0. 3 %
CPU: mcu2_0: TASK:         TIVX_CPU:  54.39 %
CPU: mcu2_0: TASK:      IPC_TEST_RX:   0. 0 %
CPU: mcu2_0: TASK:      IPC_TEST_TX:   0. 0 %
CPU: mcu2_0: TASK:      IPC_TEST_TX:   0. 0 %
CPU: mcu2_0: TASK:      IPC_TEST_TX:   0. 0 %
CPU: mcu2_0: TASK:      IPC_TEST_TX:   0. 0 %
CPU: mcu2_0: TASK:      IPC_TEST_TX:   0. 0 %

CPU: mcu2_0: HEAP:   DDR_SHARED_MEM: size =    4194304 B, free =    1616384 B ( 38 % unused)

CPU: mcu2_1: TASK:           IPC_RX:   0.17 %
CPU: mcu2_1: TASK:       REMOTE_SRV:   2.41 %
CPU: mcu2_1: TASK:         TIVX_CPU:   0. 0 %
CPU: mcu2_1: TASK:          TIVX_NF:   0. 0 %
CPU: mcu2_1: TASK:        TIVX_LDC1:   0. 0 %
CPU: mcu2_1: TASK:        TIVX_MSC1:   0. 0 %
CPU: mcu2_1: TASK:        TIVX_MSC2:   0. 0 %
CPU: mcu2_1: TASK:         TIVX_SDE:   0. 0 %
CPU: mcu2_1: TASK:         TIVX_DOF:   0. 0 %
CPU: mcu2_1: TASK:       TIVX_VISS1:  26.84 %
CPU: mcu2_1: TASK:       TIVX_CAPT1:   4.21 %
CPU: mcu2_1: TASK:       TIVX_CAPT2:   0. 0 %
CPU: mcu2_1: TASK:       TIVX_DISP1:   1. 9 %
CPU: mcu2_1: TASK:       TIVX_DISP2:   0. 0 %
CPU: mcu2_1: TASK:       TIVX_VDEC1:   0. 0 %
CPU: mcu2_1: TASK:       TIVX_VDEC2:   0. 0 %

CPU: mcu2_1: HEAP:   DDR_SHARED_MEM: size =   16777216 B, free =   16631808 B ( 99 % unused)
CPU: mcu2_1: HEAP:  DDR_NON_CACHE_M: size =   67108864 B, free =   47210496 B (  6 % unused)

CPU:  c6x_1: TASK:           IPC_RX:   0. 0 %
CPU:  c6x_1: TASK:       REMOTE_SRV:   0. 0 %
CPU:  c6x_1: TASK:         TIVX_CPU:   0. 0 %
CPU:  c6x_1: TASK:      IPC_TEST_RX:   0. 0 %
CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %

CPU:  c6x_1: HEAP:   DDR_SHARED_MEM: size =   16777216 B, free =   16772608 B ( 99 % unused)
CPU:  c6x_1: HEAP:           L2_MEM: size =     229376 B, free =     229376 B (100 % unused)
CPU:  c6x_1: HEAP:  DDR_SCRATCH_MEM: size =   50331648 B, free =   50331648 B ( 14 % unused)

CPU:  c6x_2: TASK:           IPC_RX:   0. 0 %
CPU:  c6x_2: TASK:       REMOTE_SRV:   0. 0 %
CPU:  c6x_2: TASK:         TIVX_CPU:   0. 0 %
CPU:  c6x_2: TASK:      IPC_TEST_RX:   0. 0 %
CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %

CPU:  c6x_2: HEAP:   DDR_SHARED_MEM: size =   16777216 B, free =   16775168 B ( 99 % unused)
CPU:  c6x_2: HEAP:           L2_MEM: size =     229376 B, free =     229376 B (100 % unused)
CPU:  c6x_2: HEAP:  DDR_SCRATCH_MEM: size =   50331648 B, free =   50331648 B ( 14 % unused)

CPU:  c7x_1: TASK:           IPC_RX:   0. 0 %
CPU:  c7x_1: TASK:       REMOTE_SRV:   0. 0 %
CPU:  c7x_1: TASK:         TIVX_CPU:   0. 0 %
CPU:  c7x_1: TASK:      IPC_TEST_RX:   0. 0 %
CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %

CPU:  c7x_1: HEAP:   DDR_SHARED_MEM: size =  469762048 B, free =  469762048 B (  8 % unused)
CPU:  c7x_1: HEAP:           L3_MEM: size =    8159232 B, free =    8159232 B (100 % unused)
CPU:  c7x_1: HEAP:           L2_MEM: size =     491520 B, free =     491520 B (100 % unused)
CPU:  c7x_1: HEAP:           L1_MEM: size =      16384 B, free =      16384 B (100 % unused)
CPU:  c7x_1: HEAP:  DDR_SCRATCH_MEM: size =   67108864 B, free =   67108864 B ( 36 % unused)


GRAPH:            3DSRV (#nodes =   5, #executions =   3235)
 NODE:   CAPTURE1:             Capture_node: avg =  12745 usecs, min/max =    278 /  51708 usecs, #executions =       3235
 NODE: VPAC_VISS1:          VISS_Processing: avg =  34664 usecs, min/max =  27130 /  46695 usecs, #executions =       3235
 NODE:     IPU1-1:               2A_AlgNode: avg =  22100 usecs, min/max =  10769 /  26049 usecs, #executions =       3235
 NODE:      A72-0:          OpenGL_SRV_Node: avg =  22262 usecs, min/max =  18238 /  41672 usecs, #executions =       3235
 NODE:   DISPLAY1:             Display_node: avg =   9175 usecs, min/max =     83 /  34132 usecs, #executions =       3235

 PERF:            TOTAL: avg =  34914 usecs, min/max =  29351 /  79836 usecs, #executions =       3236

 PERF:            TOTAL:   28.64 FPS


Regards,
Todor

  • Hello Todor,

    Can you run PVRTune on the platform to get GPU specific statistics? You can get PVRTune here:

    https://www.imgtec.com/developers/powervr-sdk-tools/pvrtune/

    You will need the EVM connected to ethernet.

    Once you download PowerVR SDK containing PVRTune, you will need to copy PVRTuneDeveloper/PVRPerfServer/Linux_armv8_64/PVRPerfServerDeveloper to the target platform and run it.

    Now, you can run PVRTuneDeveloperGUI for your platrorm (Windows or Linux) from PVRTuneDeveloper/GUI/<platform>/

    The GUI application can talk to PVRPerfServer and collect GPU statistics. This will give us some clue about GPU loading. You can save the file and share it with us.

    GPU frequency can be changed by adding a device tree entry to 

    <your kernel>/arch/arm64/boot/dts/ti/k3-j721e-common-proc-board.dts

    Entry for GPU frequency (example: reducing to 375 MHz from 750MHz):

    &gpu {
     assigned-clocks = <&k3_clks 125 0>;
     assigned-clock-rates = <375000000>;
    };

    If you want changing it run time for experimentation, you could use k3conf: https://git.ti.com/cgit/k3conf/k3conf/

     k3conf show device | grep GPU

    Show current frequency:

     k3conf dump clock 125

    Set frequency to 375MHz:

    k3conf set clock 125 0 375000000

    Btw - PVRTune will show the correct frequency - so you can make sure that the changes are taking place.

    Regards

    Hemant

     

     

  • Hello Hemant,

    The PVR output file is attached.https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/8737.out.7z

    Regards,
    Todor

  • Thank you Todor.

    Looking at PVRTune, the GPU is only consuming 1.4 GBPs b/w. This is in agreement with the difference in b/w between mosaic and srv.

    My initial guess is that slowing the GPU down may increase the frame time and reduce performance. But it is still a good data point to have.

    Can you reduce the frequency of the GPU to half? Let us see if the total frame time goes up by 15-20ms?

    Regards

    Hemant

  • Hello Hemant,

    Thank you for the support and suggestions!
    The reduction of the GPU's frequency to 200MHz resolved the performance issue. We will proceed to fine-tune the frequency to best-fit the use case.

    We were wondering if there is some tda4 built-in way to set the gpu's frequency from the use case besides calling the external k3conf tool?

    Regards,
    Todor

  • Todor,

    Thank you for the update and letting us know. If you get an opportunity, do you mind sharing PVRTune of your use case with reduced frequency. This will help us.

    Unfortunately, we do not have a GPU frequency scaling mechanism on TDA4x other than k3conf or device tree.

    Regards

    Hemant

  • Hello Hemant,

    The PVRTune of our use case with reduced frequency is attached.
    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/srv_5F00_out.7z

    Regards,
    Todor

  • Hello Todor,

    Thank you for the updated pvrtune. Can we close this thread now?

    Regards

    Hemant

  • Hello Hemant,

    Yes, the thread can be closed.

    Regards,
    Todor