This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: How to improve the frame rate of multi-node graph running?

Part Number: TDA4VM

Dear Experts,

The graph I am running now has about 30 nodes, and the input of the graph comes from four cameras with 25 FPS. Before, when the number of nodes in the graph is less than 10, the frame rate can reach about 25 FPS. Now it is only about 16FPS. Do you have any suggestions? Looking forward to your reply.

Regards,

Xin

  • Hi Xin,

    I believe you are running one graph with 30 nodes right?

    In this case, you would have to identify which node is a bottleneck here as the rate of the graph execution depends on the slowest node.

    Are you running the graph in a pipeline way? If not, I would suggest you to do that first.

    One other suggestion would be to give more buffer depth for that node (at input) who is the slowest so that the previous node is not blocked because of this node.

    If you have a direct path from capture to display and additional branches on this path, I would suggest to split it into separate graphs so that the nodes in the branches does not affect the main pipeline.

    Regards,

    Nikhil

  • Hi NIkhil,

    This is the performance statistics printed while the program is running,Can you find anything out?

    Regards,

    Xin

    Summary of CPU load,
    ====================
    
    CPU: mpu1_0: TOTAL LOAD =  66.29 % ( HWI =   2.24 %, SWI =   0. 0 % )
    CPU: mcu2_0: TOTAL LOAD =  12. 0 % ( HWI =   0. 0 %, SWI =   0. 0 % )
    CPU: mcu2_1: TOTAL LOAD =   1. 0 % ( HWI =   0. 0 %, SWI =   0. 0 % )
    CPU:  c6x_1: TOTAL LOAD =   4. 0 % ( HWI =   0. 0 %, SWI =   0. 0 % )
    CPU:  c6x_2: TOTAL LOAD =  54. 0 % ( HWI =   0. 0 %, SWI =   0. 0 % )
    CPU:  c7x_1: TOTAL LOAD =  82. 0 % ( HWI =   0. 0 %, SWI =   0. 0 % )
    
    
    HWA performance statistics,
    ===========================
    
    HWA:   MSC0: LOAD =  44.99 % ( 220 MP/s )
    HWA:   MSC1: LOAD =   1.88 % ( 2 MP/s )
    HWA:   GPU : LOAD =  94.29 % ( 47 MP/s )
    
    
    DDR performance statistics,
    ===========================
    
    DDR: READ  BW: AVG =   4612 MB/s, PEAK =  12294 MB/s
    DDR: WRITE BW: AVG =   2631 MB/s, PEAK =   8064 MB/s
    DDR: TOTAL BW: AVG =   7243 MB/s, PEAK =  20358 MB/s
    
    
    Detailed CPU performance/memory statistics,
    ===========================================
    
    DDR_SHARED_MEM: Alloc's: 228 alloc's of 239193566 bytes
    DDR_SHARED_MEM: Free's : 11 free's  of 607212 bytes
    DDR_SHARED_MEM: Open's : 217 allocs  of 238586354 bytes
    DDR_SHARED_MEM: Total size: 536870912 bytes
    
    CPU: mcu2_0: TASK:           IPC_RX:   0.45 %
    CPU: mcu2_0: TASK:       REMOTE_SRV:   0.13 %
    CPU: mcu2_0: TASK:        LOAD_TEST:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_CPU_0:   0. 0 %
    CPU: mcu2_0: TASK:        TIVX_V1NF:   0. 0 %
    CPU: mcu2_0: TASK:      TIVX_V1LDC1:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_V1SC1:   5.28 %
    CPU: mcu2_0: TASK:      TIVX_V1MSC2:   0.86 %
    CPU: mcu2_0: TASK:       TIVXVVISS1:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_CAPT1:   0.54 %
    CPU: mcu2_0: TASK:       TIVX_CAPT2:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_DISP1:   0.50 %
    CPU: mcu2_0: TASK:       TIVX_DISP2:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_CSITX:   0.62 %
    CPU: mcu2_0: TASK:       TIVX_CAPT3:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_CAPT4:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_CAPT5:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_CAPT6:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_CAPT7:   0. 0 %
    CPU: mcu2_0: TASK:       TIVX_CAPT8:   0. 0 %
    CPU: mcu2_0: TASK:      TIVX_DPM2M1:   2.89 %
    CPU: mcu2_0: TASK:      TIVX_DPM2M2:   0.90 %
    CPU: mcu2_0: TASK:      TIVX_DPM2M3:   0. 0 %
    CPU: mcu2_0: TASK:      TIVX_DPM2M4:   0. 0 %
    
    CPU: mcu2_0: HEAP:    DDR_LOCAL_MEM: size =   16777216 B, free =   16684544 B ( 99 % unused)
    CPU: mcu2_0: HEAP:           L3_MEM: size =     262144 B, free =     261888 B ( 99 % unused)
    
    CPU: mcu2_1: TASK:           IPC_RX:   0. 0 %
    CPU: mcu2_1: TASK:       REMOTE_SRV:   0.10 %
    CPU: mcu2_1: TASK:        LOAD_TEST:   0. 0 %
    CPU: mcu2_1: TASK:       TIVX_CPU_1:   0. 0 %
    CPU: mcu2_1: TASK:         TIVX_SDE:   0. 0 %
    CPU: mcu2_1: TASK:         TIVX_DOF:   0. 0 %
    CPU: mcu2_1: TASK:      IPC_TEST_RX:   0. 0 %
    CPU: mcu2_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu2_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu2_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu2_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU: mcu2_1: TASK:      IPC_TEST_TX:   0. 0 %
    
    CPU: mcu2_1: HEAP:    DDR_LOCAL_MEM: size =   16777216 B, free =   16773376 B ( 99 % unused)
    CPU: mcu2_1: HEAP:           L3_MEM: size =     262144 B, free =     262144 B (100 % unused)
    
    CPU:  c6x_1: TASK:           IPC_RX:   0.16 %
    CPU:  c6x_1: TASK:       REMOTE_SRV:   0. 1 %
    CPU:  c6x_1: TASK:        LOAD_TEST:   0. 0 %
    CPU:  c6x_1: TASK:         TIVX_CPU:   3.52 %
    CPU:  c6x_1: TASK:      IPC_TEST_RX:   0. 0 %
    CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_1: TASK:      IPC_TEST_TX:   0. 0 %
    
    CPU:  c6x_1: HEAP:    DDR_LOCAL_MEM: size =   16777216 B, free =   16749312 B ( 99 % unused)
    CPU:  c6x_1: HEAP:           L2_MEM: size =     229376 B, free =          0 B (  0 % unused)
    CPU:  c6x_1: HEAP:  DDR_SCRATCH_MEM: size =   50331648 B, free =   50331648 B (100 % unused)
    
    CPU:  c6x_2: TASK:           IPC_RX:   0.29 %
    CPU:  c6x_2: TASK:       REMOTE_SRV:   0. 1 %
    CPU:  c6x_2: TASK:        LOAD_TEST:   0. 0 %
    CPU:  c6x_2: TASK:         TIVX_CPU:  50.83 %
    CPU:  c6x_2: TASK:      IPC_TEST_RX:   0. 0 %
    CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c6x_2: TASK:      IPC_TEST_TX:   0. 0 %
    
    CPU:  c6x_2: HEAP:    DDR_LOCAL_MEM: size =   16777216 B, free =   16563968 B ( 98 % unused)
    CPU:  c6x_2: HEAP:           L2_MEM: size =     229376 B, free =     229376 B (100 % unused)
    CPU:  c6x_2: HEAP:  DDR_SCRATCH_MEM: size =   50331648 B, free =   50331648 B (100 % unused)
    
    CPU:  c7x_1: TASK:           IPC_RX:   0. 9 %
    CPU:  c7x_1: TASK:       REMOTE_SRV:   0. 0 %
    CPU:  c7x_1: TASK:        LOAD_TEST:   0. 0 %
    CPU:  c7x_1: TASK:      TIVX_C71_P1:  81.34 %
    CPU:  c7x_1: TASK:      TIVX_C71_P2:   0. 0 %
    CPU:  c7x_1: TASK:      TIVX_C71_P3:   0. 0 %
    CPU:  c7x_1: TASK:      TIVX_C71_P4:   0. 0 %
    CPU:  c7x_1: TASK:      TIVX_C71_P5:   0. 0 %
    CPU:  c7x_1: TASK:      TIVX_C71_P6:   0. 0 %
    CPU:  c7x_1: TASK:      TIVX_C71_P7:   0. 0 %
    CPU:  c7x_1: TASK:      TIVX_C71_P8:   0. 0 %
    CPU:  c7x_1: TASK:      IPC_TEST_RX:   0. 0 %
    CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
    CPU:  c7x_1: TASK:      IPC_TEST_TX:   0. 0 %
    
    CPU:  c7x_1: HEAP:    DDR_LOCAL_MEM: size =  268435456 B, free =   72954112 B ( 27 % unused)
    CPU:  c7x_1: HEAP:           L3_MEM: size =    8159232 B, free =          0 B (  0 % unused)
    CPU:  c7x_1: HEAP:           L2_MEM: size =     458752 B, free =     458752 B (100 % unused)
    CPU:  c7x_1: HEAP:           L1_MEM: size =      16384 B, free =          0 B (  0 % unused)
    CPU:  c7x_1: HEAP:  DDR_SCRATCH_MEM: size =  385875968 B, free =  380768841 B ( 98 % unused)
    
    GRAPH:         Demo (#nodes =  25, #executions =   1536)
     NODE:       CAPTURE1:           	  node1: avg =   2696 usecs, min/max =    101 /  99295 usecs, #executions =       1536
     NODE:       DSS_M2M1:                    node2: avg =  10117 usecs, min/max =   9463 /  11776 usecs, #executions =       1536
     NODE:      VPAC_MSC1:                    node3: avg =  15560 usecs, min/max =  13275 /  23731 usecs, #executions =       1536
     NODE:      VPAC_MSC1:                    node4: avg =  15047 usecs, min/max =  13193 /  18528 usecs, #executions =       1536
     NODE:          DSP-1:                    node5: avg =    351 usecs, min/max =    316 /   2128 usecs, #executions =       1536
     NODE:       DSP_C7-1:                    node6: avg =  10131 usecs, min/max =   7724 /  12055 usecs, #executions =       1536
     NODE:          DSP-2:                    node7: avg =  11282 usecs, min/max =   8966 /  31544 usecs, #executions =       1536
     NODE:          A72-3:                    node8: avg =    224 usecs, min/max =     66 /   3332 usecs, #executions =       1536
     NODE:          A72-0:                    node9: avg =   3998 usecs, min/max =   1804 /  19368 usecs, #executions =       1536
     NODE:       DSS_M2M2:                   node10: avg =   1245 usecs, min/max =    529 /   3615 usecs, #executions =       1536
     NODE:      VPAC_MSC2:                   node11: avg =   1380 usecs, min/max =    848 /   2970 usecs, #executions =       1536
     NODE:          DSP-1:                   node12: avg =    363 usecs, min/max =    320 /   2453 usecs, #executions =       1536
     NODE:       DSP_C7-1:                   node13: avg =   7590 usecs, min/max =   6715 /   8843 usecs, #executions =       1536
     NODE:       DSP_C7-1:                   node14: avg =   9974 usecs, min/max =   9258 /  11519 usecs, #executions =       1536
     NODE:          DSP-2:                   node15: avg =   6931 usecs, min/max =   4764 /  22559 usecs, #executions =       1542
     NODE:          A72-1:                   node16: avg =   2836 usecs, min/max =   1000 /  32044 usecs, #executions =       1542
     NODE:          DSP-2:                   node17: avg =  15096 usecs, min/max =  10510 /  30192 usecs, #executions =       1542
     NODE:          A72-2:                   node18: avg =  12061 usecs, min/max =   5835 /  45587 usecs, #executions =       1542
     NODE:          A72-0:                   node19: avg =  43282 usecs, min/max =  22534 / 145379 usecs, #executions =       1542
     NODE:          CSITX:                   node20: avg =  25800 usecs, min/max =  25706 /  27098 usecs, #executions =       1542
     NODE:       DISPLAY1:                   node21: avg =  21145 usecs, min/max =     87 /  44923 usecs, #executions =       1542
     NODE:          DSP-1:                   node22: avg =   1211 usecs, min/max =   1078 /   7291 usecs, #executions =       1542
     NODE:       DSP_C7-1:                   node23: avg =  26584 usecs, min/max =  22861 /  30125 usecs, #executions =       1542
     NODE:          DSP-2:                   node24: avg =    948 usecs, min/max =    295 /   2416 usecs, #executions =       1542
     NODE:          A72-0:                   node25: avg =    456 usecs, min/max =    186 /   7799 usecs, #executions =       1542
    
     PERF:            TOTAL: avg =  66897 usecs, min/max =     18 / 338107 usecs, #executions =        309
    
     PERF:            TOTAL:   14.94 FPS
    
    

  • Hi Xin,

    Could you give the name of these nodes and their functionality?

    It would be better to have a block diagram for this nodes and how they are connected.

    I see that A72-0 node 19 takes 43msecs. What is happening in this node?

    Regards,

    Nikhil

  • Hi Nikhil,

    The following are the attribute categories of these nodes.

    Regards,

    Xin

     NODE:       CAPTURE1:           	  node1: CaptureNode
     NODE:       DSS_M2M1:                    node2: Displaym2mNode
     NODE:      VPAC_MSC1:                    node3: ScalerNode
     NODE:      VPAC_MSC1:                    node4: ScalerNode
     NODE:          DSP-1:                    node5: PreProcNode
     NODE:       DSP_C7-1:                    node6: TIDLNode
     NODE:          DSP-2:                    node7: PostProcNode
     NODE:          A72-3:                    node8: Custom send data node
     NODE:          A72-0:                    node9: CustomSrvNode
     NODE:       DSS_M2M2:                   node10: Displaym2mNode
     NODE:      VPAC_MSC2:                   node11: ScalerNode
     NODE:          DSP-1:                   node12: PreProcNode
     NODE:       DSP_C7-1:                   node13: TIDLNode
     NODE:       DSP_C7-1:                   node14: TIDLNode
     NODE:          DSP-2:                   node15: PostProcNode
     NODE:          A72-1:                   node16: Custom send data node
     NODE:          DSP-2:                   node17: PostProcNode
     NODE:          A72-2:                   node18: Custom send data node
     NODE:          A72-0:                   node19: CustomSrvNode
     NODE:          CSITX:                   node20: CsiTxNode
     NODE:       DISPLAY1:                   node21: DisplayNode
     NODE:          DSP-1:                   node22: PreProcNode
     NODE:       DSP_C7-1:                   node23: TIDLNode
     NODE:          DSP-2:                   node24: PostProcNode
     NODE:          A72-0:                   node25: Custom send data node

  • Hi Xin,

    Thank you for sharing this information. Could you please let me know how these nodes are interconnected?

    What are the buffer depths provided for each node?

    Are you running this in pipeline mode or non-pipeline mode?

    Which all node parameters are graph parameters here?

    Regards,

    Nikhil

  • Hi Nikhil,

    Only CaptureNode parameter is graph parameter,I run this in pipeline mode,The buffer depth of the average time less than 5000 microseconds was set to 2, the buffer depth of the average time more than 5000 microseconds less than 10000 microseconds was set to 4, the buffer depth of the average time more than 10000 microseconds less than 15000 microseconds was set to 6, the buffer depth of the average time more than 15000 microseconds less than 20000 microseconds was set to 8, and the buffer depth of the average time more than 15000 microseconds less than 20000 microseconds was set to 8. The buffer depth greater than 20000 microseconds is set to 10.It looks like the A72-0 node 19 execution time is too long and the overall average elapsed time is longer, which reduces the FPS. Am I right?

    Regards,

    Xin

  • Hi

    It looks like the A72-0 node 19 execution time is too long and the overall average elapsed time is longer, which reduces the FPS. Am I right?

    Yes, if it is a single graph, then the slowest node determines the FPS (if sufficient buffer depth is not given to this node) 

    Regards,

    Nikhil