This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: Performance number of the Custom Application on psdk_rtos_auto_j7_06_02_00_21

Part Number: TDA4VM
Hi All,

Using psdk_rtos_auto_j7_06_02_00_21 and TI has released patch of Tidl and mmalib to us using CDDS.

We have used 

TIDL patch      : tidl_j7_01_01_01_01

MMA Library   : mmalib_01_01_00_02

We are trying to run (16,16,3)model and have modified our application to support the same, by using the suggestions suggested by Mr. Shyam on https://e2e.ti.com/support/processors/f/791/p/898935/3365185#pi320966=1

Our Application is running but the performance of TIDL and Preproc node has degraded compared to earlier model (8,8,3)
 
On SDK : /home/anshuman/psdk_rtos_auto_j7_06_02_00_21/   (patch vision_apps released in May)
 
This application uses Model: Patched TIDL and MMALIB, Mobilenet new anchor L1 regularization (16,16,3) tidl_io_new_anchor_l11
 
 
GRAPH:      OpenVxGraph (#nodes =  10, #executions =     51)
 NODE:   CAPTURE1:             capture_node: avg =   2519 usecs, min/max =     65 /  57684 usecs, #executions =         51
 NODE:      DSP-1:        colorConvert_node: avg =  12448 usecs, min/max =  12293 /  12608 usecs, #executions =         51
 NODE:  VPAC_MSC1:               ScalerNode: avg =   2553 usecs, min/max =   2514 /   2668 usecs, #executions =         51
 NODE:      DSP-1:              PreProcNode: avg =  32792 usecs, min/max =  32774 /  32851 usecs, #executions =         51
 NODE:   DSP_C7-1:                 TIDLNode: avg =  38589 usecs, min/max =  38561 /  38692 usecs, #executions =         51
 NODE:      DSP-1:             tracker_node: avg =    215 usecs, min/max =    192 /    628 usecs, #executions =         51
 NODE:      DSP-2:    DrawBoxDetectionsNode: avg =   3693 usecs, min/max =   3395 /   3748 usecs, #executions =         51
 NODE:  VPAC_MSC1:               MosaicNode: avg =   8437 usecs, min/max =   6067 /  23143 usecs, #executions =         51
 NODE:   DISPLAY1:              DisplayNode: avg =   8352 usecs, min/max =    105 /  16750 usecs, #executions =         51
 NODE:      DSP-2:           op_signal_node: avg =   2058 usecs, min/max =    740 /   2160 usecs, #executions =         51

 PERF:           FILEIO: avg =      0 usecs, min/max = 4294967295 /      0 usecs, #executions =          0
 PERF:            TOTAL: avg =  90729 usecs, min/max =  33052 / 103703 usecs, #executions =         54

 PERF:            TOTAL:   11. 2 FPS

Overall there is degradation of FPS by a factor of 3 compared to application running (8,8,3) model.

We see that capture node is showing a reduction in timing. Any specific reason for that.

On Earlier sdk version capture node was 

NODE:   CAPTURE1:             Capture_node: avg =  33228 usecs, min/max =  30035 /  37423 usecs, #executions =        125

We are also see that the TIDL node and the preproc node are taking longer time due to 16 16 3 based processing. 

How can we reduce the processing time on these 2 nodes.

  • Anshuman,

    Regarding the capture node showing lower ms is a bug and that can be fixed by updating this value,

    In your copy of the app_tidl_cam if you see below,

    add_graph_parameter_by_node_index(obj->graph, obj->captureObj.node, 0);

    Please change it to

    add_graph_parameter_by_node_index(obj->graph, obj->captureObj.node, 1);

    Regarding TIDL performance for 16 bit inference, what is the standalone performance you observe. (CCS based approach using TIDL testbench for target)

    Regarding PreProc performance, the version you currently have it is not optimized for either 8bit or 16bit. The next upcoming SDK release should have support for optimized 16bit output and also used UDMA. So you should observe better performance.


    Regards,
    Shyam