This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: MobilenetV2 performance question

Part Number: TDA4VM

Hello.

From the "JacintoTM Automotive Processor TI Deep Learning  (TIDL) Library Overview" I have read that on 224px images with MobileNet V2 I will have 441FPS.

I have downloaded MobilenetV2-1.0.onnx from ONNX github repo (https://github.com/onnx/models/tree/master/vision/classification/mobilenet), put it into tidl_j7_01_00_01_00/ti_dl/test/testvecs/models/public/onnx folder, than run tidlModelImportTool:

./out/tidl_model_import.out ../../test/testvecs/config/import/public/onnx/tidl_import_mobileNetv2.txt

Here is console output:
ONNX Model (Proto) File  : ../../test/testvecs/models/public/onnx/mobilenetv2-1.0.onnx  
TIDL Network File      : ../../test/testvecs/config/tidl_models/onnx/tidl_net_mobilenetv2.bin  
TIDL IO Info File      : ../../test/testvecs/config/tidl_models/onnx/tidl_io_mobilenetv2_  

~~~~~Running TIDL in PC emulation mode to collect Activations range for each layer~~~~~

Processing config file #0 : /home/ylyudkevich/tda4vm/psdk_rtos_auto_j7_06_01_01_12/tidl_j7_01_00_01_00/ti_dl/utils/tidlModelImport/tempDir/qunat_stats_config.txt
 ----------------------- TIDL Process with REF_ONLY FLOW ------------------------

#    0 . .. T     563.50  ... A :   895, 0.0000, 0.0000,   847 .... .....
#    1 . .. T     544.81  ... A :   557, 0.0000, 0.0000,   743 .... .....
#    2 . .. T     540.26  ... A :   442, 0.0000, 0.0000,   509 .... .....
#    3 . .. T     540.83  ... A :   498, 0.0000, 0.2500,   646 .... .....
#    4 . .. T     551.26  ... A :   538, 0.0000, 0.2000,   787 .... .....
------------------ Network Compiler Traces -----------------------------
Main iteration numer: 0....
Preparing for memory allocation : internal iteration number: 0
successful Memory allocation

-------------------- Network Compiler : Analysis Results are available --------------------

****************************************************
**                ALL MODEL CHECK PASSED          **
****************************************************


After that I have copied tidl_net_mobilenetv2.bin and tidl_io_mobilenetv2_1.bin at /opt/vision_sdk/ folder on the SD card and create config for this network (based on app_oc.cfg). Next, I boot up TDA4VM board with this SD, navigate to /opt/vision_sdk folder and setup environment:
source ./vision_apps_init.sh

And ather that run tidl example:
./vx_app_tidl.out --cfg my_app_oc.conf

Application run successfully, but I get only about 27FPS.

So my questions is:
 1) does this correct to run mobilenetv2-1.0 with vx_app_tidle.out?
 2) why only 30FPS? What I am doing wrong?

Thanks in advance.

  • Yuri,

    Since you are running it from vision_apps, it is locked to display VSYNC frequency And therefore you are getting ~30 FPS. You will get the free running FPS when TIDL standalone application is run (without vision Apps)

    - Subhajit

  • Thank you for you reply.

    Does it locked to display VSYNC even if I choose console mode in config?

  • Subhajit Paul said:

    Yuri,

    Since you are running it from vision_apps, it is locked to display VSYNC frequency And therefore you are getting ~30 FPS. You will get the free running FPS when TIDL standalone application is run (without vision Apps)

    - Subhajit

    Is there any way to turn off VSYNC locking?

    Do I understand correctly - TIDL standalone only possible when board configured to "no boot" and application loaded via CCS?

  • Subhajit Paul said:

    Yuri,

    Since you are running it from vision_apps, it is locked to display VSYNC frequency And therefore you are getting ~30 FPS. You will get the free running FPS when TIDL standalone application is run (without vision Apps)

    - Subhajit



    I have modified vx_app_tidl so my app_run_graph_for_one_frame() now looks like:

    static void app_run_graph_for_one_frame(AppObj *obj, char *curFileName)
    {
        vx_char input_file_name[APP_MAX_FILE_PATH];

        appPerfPointBegin(&obj->total_perf);

        snprintf(input_file_name, APP_MAX_FILE_PATH-1, "%s/%s",
              obj->input_file_path,
              curFileName
              );

        #ifdef APP_DEBUG
        printf("app_tidl: Reading input file %s ... \n", input_file_name);
        #endif

        appPerfPointBegin(&obj->fileio_perf);

        /* Read input from file and poplulate the input tensors */
        readInput(obj, obj->context, obj->config, obj->input_tensors, input_file_name);

        appPerfPointEnd(&obj->fileio_perf);

        #ifdef APP_DEBUG
        printf("app_tidl: Reading input file %s ... Done.\n", input_file_name);
        #endif

        #ifdef APP_DEBUG
        printf("app_tidl: Running graph ... \n");
        #endif

        /* Execute the network */
        vxProcessGraph(obj->graph);

        #ifdef APP_DEBUG
        printf("app_tidl: Running graph ... Done.\n");
        #endif

        appPerfPointEnd(&obj->total_perf);
    }

    from app_init() function I have removed Draw2D creation and initialization.


    After this modification I can get as much as 86FPS on mobilenetV2-1.0.
    Is it maximum performance I can achieve when I am runing network with Vision_SDK?

  • I do not quite understand how FPS measured with CCS and standalone TIDL corellate to FPS on target device, e.g. when linux running on A57 (for such tasks as translation health status over CAN and Ethernet) and my netwok running on C7x? How can I measure C7x performance in such conditions?

  • The standalone application does not load linux or any other software, and it is the ideal FPS recorded when TIDL is running on C7x with no additional pressure created by other cores accessing memory.

  • The Object Classification demo is intentionally slowed down for the viewers to read the top 5 classes showed on the screen. This can be configured from app_oc.cfg file parameter, "delay", currently for the demo its set to 1000 ms. So the task on A72 will sleep for 1000 ms before it picks up next frame for classification.

    You can set this to 0 and observe a boost in performance. File to change is present under, apps/basic_demos/app_linux_fs_files/app_oc.cfg

    Modify this file before the binaries transferred to the SD card. i.e before you do make linux_fs_install_sd.


    Regards,
    Shyam

  • Hello. Thank you for you answer.

    I did it in a different way: add another performance counter into vx_app_tidl application - it measures time duration of sinle graph run.

    For example - on mobilenetv2 I have got 1.7ms duration of single graph run. And if I have stress (as 'stress -c 2 --vm 10 --vm-bytes 128M --vm-stride 64', just to put memory bus under load) run in parallel - I have got 2.9ms.