TDA4VM: Performance number of the Custom Application on psdk_rtos_auto_j7_06_02_00_21

Kumar Anshuman

Expert 1090 points

Part Number: TDA4VM

Hi All,

Using psdk_rtos_auto_j7_06_02_00_21 and TI has released patch of Tidl and mmalib to us using CDDS.

We have used

TIDL patch : tidl_j7_01_01_01_01

MMA Library : mmalib_01_01_00_02

We are trying to run (16,16,3)model and have modified our application to support the same, by using the suggestions suggested by Mr. Shyam on https://e2e.ti.com/support/processors/f/791/p/898935/3365185#pi320966=1

Our Application is running but the performance of TIDL and Preproc node has degraded compared to earlier model (8,8,3)

On SDK : /home/anshuman/psdk_rtos_auto_j7_06_02_00_21/ (patch vision_apps released in May)

This application uses Model: Patched TIDL and MMALIB, Mobilenet new anchor L1 regularization (16,16,3) tidl_io_new_anchor_l11

GRAPH:      OpenVxGraph (#nodes =  10, #executions =     51)
 NODE:   CAPTURE1:             capture_node: avg =   2519 usecs, min/max =     65 /  57684 usecs, #executions =         51
 NODE:      DSP-1:        colorConvert_node: avg =  12448 usecs, min/max =  12293 /  12608 usecs, #executions =         51
 NODE:  VPAC_MSC1:               ScalerNode: avg =   2553 usecs, min/max =   2514 /   2668 usecs, #executions =         51
 NODE:      DSP-1:              PreProcNode: avg =  32792 usecs, min/max =  32774 /  32851 usecs, #executions =         51
 NODE:   DSP_C7-1:                 TIDLNode: avg =  38589 usecs, min/max =  38561 /  38692 usecs, #executions =         51
 NODE:      DSP-1:             tracker_node: avg =    215 usecs, min/max =    192 /    628 usecs, #executions =         51
 NODE:      DSP-2:    DrawBoxDetectionsNode: avg =   3693 usecs, min/max =   3395 /   3748 usecs, #executions =         51
 NODE:  VPAC_MSC1:               MosaicNode: avg =   8437 usecs, min/max =   6067 /  23143 usecs, #executions =         51
 NODE:   DISPLAY1:              DisplayNode: avg =   8352 usecs, min/max =    105 /  16750 usecs, #executions =         51
 NODE:      DSP-2:           op_signal_node: avg =   2058 usecs, min/max =    740 /   2160 usecs, #executions =         51

 PERF:           FILEIO: avg =      0 usecs, min/max = 4294967295 /      0 usecs, #executions =          0
 PERF:            TOTAL: avg =  90729 usecs, min/max =  33052 / 103703 usecs, #executions =         54

 PERF:            TOTAL:   11. 2 FPS

Overall there is degradation of FPS by a factor of 3 compared to application running (8,8,3) model.

We see that capture node is showing a reduction in timing. Any specific reason for that.

On Earlier sdk version capture node was

NODE:   CAPTURE1:             Capture_node: avg =  33228 usecs, min/max =  30035 /  37423 usecs, #executions =        125

We are also see that the TIDL node and the preproc node are taking longer time due to 16 16 3 based processing.

How can we reduce the processing time on these 2 nodes.

over 5 years ago

0 Shyam Jagannathan over 5 years ago

TI__Genius 10355 points

Anshuman,

Regarding the capture node showing lower ms is a bug and that can be fixed by updating this value,

In your copy of the app_tidl_cam if you see below,

add_graph_parameter_by_node_index(obj->graph, obj->captureObj.node, 0);

Please change it to

add_graph_parameter_by_node_index(obj->graph, obj->captureObj.node, 1);

Regarding TIDL performance for 16 bit inference, what is the standalone performance you observe. (CCS based approach using TIDL testbench for target)

Regarding PreProc performance, the version you currently have it is not optimized for either 8bit or 16bit. The next upcoming SDK release should have support for optimized 16bit output and also used UDMA. So you should observe better performance.

Regards,
Shyam

Processors

Processors forum

TDA4VM: Performance number of the Custom Application on psdk_rtos_auto_j7_06_02_00_21