[FAQ] AM62A7-Q1: Deploy MMYOLO on AM62A, YOLOV8 nano as example.

Part Number: AM62A7-Q1

Tool/software:

MMYOLO is an open source repo for quick yolo model evaluation. Deployment MMYOLO on TI processor like AM6x or TDAx requires some modifications on this repo.

This FAQ uses yolov8 as example to demonstrate how to run yolo models on TI SOC. 

  • Step1 Setup all required environment

    We recommend use pyenv to manage the environment. Please follow these steps to install pyenv : https://github.com/TexasInstruments/edgeai-tensorlab/blob/3de61dfa503c408346c3bcd029f49a25e42a8a73/edgeai-benchmark/docs/setup_instructions.md#environment 

    Then setup environment for each repo:

    MMYOLO:

    download the patch file: https://github.com/user-attachments/files/16469333/0001-2024-Aug-2-mmyolo.commit-8c4d9dc5.-model-surgery-with-edgeai-modeloptimization.txt 

    git clone https://github.com/open-mmlab/mmyolo.git
    git checkout 8c4d9dc5
    git am 0001-mmyolo.commit-8c4d9dc5.-model-surgery-with-edgeai-modeloptimization.txt

    This patch will modify deployment process for onnx and also the model implementation. 

    Create a new virtual python env with pyenv.

    pyenv virtualenv 3.10 mmyolo
    pyenv activate mmyolo

    Install the requirements:

    cd mmyolo
    pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
    ./setup.sh
    mim install -v -e .
    pip install albumentations==1.3.1
    pip install numpy==1.26.4
    pip install onnxruntime

    For more details, refer to https://github.com/TexasInstruments/edgeai-tensorlab/issues/7 

    EDGEAI BENCHMARK:

    git clone https://github.com/TexasInstruments/edgeai-tensorlab.git

    Create a new pyenv in a new terminal:

    pyenv virtualenv 3.10 benchmark
    pyenv activate benchmark

    Run the setup for benchmark:

    cd edgeai-tensorlab/edgeai-benchmark
    git checkout 3de61dfa503c408346c3bcd029f49a25e42a8a73
    ./setup_pc.sh

    For more details, refer to https://github.com/TexasInstruments/edgeai-tensorlab/blob/3de61dfa503c408346c3bcd029f49a25e42a8a73/edgeai-benchmark/docs/setup_instructions.md 

    Note: SDK 10.0.x is used in this FAQ.

  • Step 2 Download dataset and train your model in mmyolo

    cd mmyolo

    Download dataset as instructed: https://github.com/open-mmlab/mmyolo/blob/8c4d9dc503dc8e327bec8147e8dc97124052f693/docs/en/useful_tools/download_dataset.md 

    Train your model:

    python tools/train.py configs/yolov8yolov8_n_syncbn_fast_8xb16-500e_coco.py --model-surgery 2

  • Step 3 Test and validate your trained weights in torch environment:

    python tools/test.py configs/yolov8/yolov8_n_syncbn_fast_8xb16-500e_coco.py work_dirs/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_nano_lite_640x640_20231118_checkpoint.pth --model-surgery 2 --show-dir show_results
    

    Benchmark with TI trained stat_dict

    Infer res:

  • Step 4 Convert the torch model to onnx and test in onnxruntime:

    cd mmyolo 
    pyenv activate mmyolo
    python projects/easydeploy/tools/export_onnx.py    \
    configs/yolov8/yolov8_n_syncbn_fast_8xb16-500e_coco.py     \
    work_dirs/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_nano_lite_640x640_20231118_checkpoint.pth    \
        --work-dir work_dirs/yolov8_n_syncbn_fast_8xb16-500e_coco   \
        --img-size 640 640   \
        --batch 1    \
        --device cpu    \
        --simplify  \
        --opset 11      \
        --pre-topk 1000     \
        --keep-topk 100      \
        --iou-threshold 0.65    \
        --score-threshold 0.25 \
        --export-type YOLOv5 \
        --model-surgery 2
    

    The onnx and prototxt will be generated.

    Then validate the generated onnx in benchmark. Note, the easydeployment project in mmyolo will not work!

    cd (your onw path)/edgeai-benchmark
    pyenv activate benchmark
    

    make a few modifications on benchmark just to enable evaluation of yolov8:

    diff --git a/edgeai-benchmark/configs/detection_additional.py b/edgeai-benchmark/configs/detection_additional.py
    index a5e946a286..b16262a93d 100644
    --- a/edgeai-benchmark/configs/detection_additional.py
    +++ b/edgeai-benchmark/configs/detection_additional.py
    @@ -156,11 +156,11 @@ def get_configs(settings, work_dir):
                 session=onnx_session_type(**sessions.get_onnx_session_cfg(settings, work_dir=work_dir, input_mean=(0.0, 0.0, 0.0), input_scale=(0.003921568627, 0.003921568627, 0.003921568627)),
                     runtime_options=settings.runtime_options_onnx_np2(
                          det_options=True, ext_options={'object_detection:meta_arch_type': 8,
    -                     'object_detection:meta_layers_names_list':f'../edgeai-modelzoo-cl/models/vision/detection/coco/edgeai-mmyolo-gplv3/yolov8_nano_lite_640x640_20231118_model.prototxt',
    +                     'object_detection:meta_layers_names_list':f'/home/ht/edgeai/mmyolo/work_dirs/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_nano_lite_640x640_20231118_checkpoint.prototxt',
                          'advanced_options:output_feature_16bit_names_list':''
                          },
                          ),
    -                model_path=f'../edgeai-modelzoo-cl/models/vision/detection/coco/edgeai-mmyolo-gplv3/yolov8_nano_lite_640x640_20231118_model.onnx'),
    +                model_path=f'/home/ht/edgeai/mmyolo/work_dirs/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_nano_lite_640x640_20231118_checkpoint.onnx'),
                 postprocess=postproc_transforms.get_transform_detection_yolov5_onnx(squeeze_axis=None, normalized_detections=False, resize_with_pad=True, formatter=postprocess.DetectionBoxSL2BoxLS()), #TODO: check this
                 metric=dict(label_offset_pred=datasets.coco_det_label_offset_80to90(label_offset=1)),
                 model_info=dict(metric_reference={'accuracy_ap[.5:.95]%':34.5}, model_shortlist=70, compact_name='yolov8-nano-lite-640x640-gplv3', shortlisted=False)
    diff --git a/edgeai-benchmark/settings_base.yaml b/edgeai-benchmark/settings_base.yaml
    index a9bac3aaf5..c4815231a3 100644
    --- a/edgeai-benchmark/settings_base.yaml
    +++ b/edgeai-benchmark/settings_base.yaml
    @@ -13,7 +13,7 @@ target_device : null
     tensor_bits : 8
     
     # number of frames for inference
    -num_frames : 5000 #1000 #10000 #50000
    +num_frames : 100 #1000 #10000 #50000
     
     # number of frames to be used for post training quantization / calibration
     calibration_frames : 25 #50
    @@ -32,7 +32,7 @@ calibration_iterations : 25 #50
     configs_path : './configs'
     
     # folder where models are available
    -models_path : '../edgeai-modelzoo/models'
    +models_path : '../../edgeai-modelzoo-cl/models'
     
     # create your datasets under this folder
     datasets_path : './dependencies/datasets'
    @@ -52,7 +52,7 @@ session_type_dict : {'onnx':'onnxrt', 'tflite':'tflitert', 'mxnet':'tvmdlr'}
     #   examples: ['resnet18.onnx', 'resnet50_v1.tflite'] ['classification'] ['imagenet1k'] ['torchvision'] ['coco']
     #   examples: [cl-0000, od-2020, ss-2580, cl-3090, od-5120, ss-5710, cl-6360, od-8020, od-8200, od-8220, od-8420, ss-8610, kd-7060, 3dod-7100, 6dpose-7200, ss-7618]
     #   examples: [cl-0000, od-2020, cl-6360, od-8200, od-8270, od-8410, ss-8610, ss-8630, ss-8710, ss-8720]
    -model_selection : null
    +model_selection : [od-8870]
     
     # model_shortlist can be a number, which indicates a predefined shortlist, and a fraction of the models will be selected
     # model_shortlist and model_selection are complimentary - they can be used together.
    @@ -100,7 +100,7 @@ detection_top_k : 200
     verbose : False
     
     # save detection, segmentation, human pose estimation output
    -save_output : False
    +save_output : True
     
     # it defines if we want to use udp postprocessing in human pose estimation.
     # Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased
    @@ -114,10 +114,10 @@ flip_test : False
     target_device_preset : True
     
     # enable use of additional models - the actual model files may be in a different modelzoo repo (for example edgeai-modelzoo-cl)
    -additional_models : False
    +additional_models : True
     
     # enable use of experimental models - these model files may not be available in modelzoo in some cases
    -experimental_models : False
    +experimental_models : True
     
     # dataset type to use if there are multiple variants for each dataset
     # imagenetv2c is available for quick download - so use it in the release branch
    

    Change this to false in settings_base.yaml to run in onnxruntime:

    Refer to my log to run in onnxruntime:

    (benchmark_10_0) ht@ht-OMEN:~/edgeai/edgeai-tensorlab/edgeai-benchmark$ ./run_benchmarks_pc.sh AM62A
    TARGET_SOC:     AM62A
    TARGET_MACHINE: pc
    DEBUG MODE:     false @ ht-OMEN:5678
    => Recommend to use the alternate script run_benchmarks_parallelbash_pc.sh
       for import and inference across models in parallel.
    TIDL_TOOLS_PATH=/home/ht/edgeai/edgeai-tensorlab/edgeai-benchmark/tools/AM62A/tidl_tools
    LD_LIBRARY_PATH=/home/ht/edgeai/edgeai-tensorlab/edgeai-benchmark/tools/AM62A/tidl_tools
    PYTHONPATH=:
    ===================================================================
    argv: ['./scripts/benchmark_modelzoo.py', 'settings_import_on_pc.yaml', '--target_device', 'AM62A', '--run_inference', 'False']
    settings: {'include_files': None, 'pipeline_type': 'accuracy', 'num_frames': 100, 'calibration_frames': 25, 'calibration_iterations': 25, 'configs_path': './configs', 'models_path': '../../edgeai-modelzoo-cl/models', 'modelartifacts_path': './work_dirs/modelartifacts/AM62A', 'modelpackage_path': './work_dirs/modelpackage/AM62A', 'datasets_path': './dependencies/datasets', 'target_device': 'AM62A', 'target_machine': 'pc', 'run_suffix': None, 'parallel_devices': None, 'parallel_processes': 1, 'tensor_bits': 8, 'runtime_options': {'advanced_options:quantization_scale_type': 4}, 'run_import': True, 'run_inference': False, 'run_missing': True, 'detection_threshold': 0.3, 'detection_top_k': 200, 'detection_nms_threshold': None, 'detection_keep_top_k': None, 'save_output': True, 'num_output_frames': 50, 'model_selection': ['od-8870'], 'model_shortlist': None, 'model_exclusion': None, 'task_selection': None, 'runtime_selection': None, 'session_type_dict': {'onnx': 'onnxrt', 'tflite': 'tflitert', 'mxnet': 'tvmdlr'}, 'dataset_type_dict': {'imagenet': 'imagenetv2c'}, 'dataset_selection': None, 'dataset_loading': True, 'config_range': None, 'enable_logging': True, 'verbose': False, 'capture_log': False, 'additional_models': True, 'experimental_models': True, 'rewrite_results': False, 'with_udp': True, 'flip_test': False, 'model_transformation_dict': None, 'report_perfsim': False, 'tidl_offload': False, 'input_optimization': None, 'run_dir_tree_depth': None, 'target_device_preset': True, 'fast_calibration_factor': 0.5, 'param_template_file': None, 'settings_file': 'settings_import_on_pc.yaml', 'basic_keys': ['include_files', 'pipeline_type', 'num_frames', 'calibration_frames', 'calibration_iterations', 'configs_path', 'models_path', 'modelartifacts_path', 'modelpackage_path', 'datasets_path', 'target_device', 'target_machine', 'run_suffix', 'parallel_devices', 'parallel_processes', 'tensor_bits', 'runtime_options', 'run_import', 'run_inference', 'run_missing', 'detection_threshold', 'detection_top_k', 'detection_nms_threshold', 'detection_keep_top_k', 'save_output', 'num_output_frames', 'model_selection', 'model_shortlist', 'model_exclusion', 'task_selection', 'runtime_selection', 'session_type_dict', 'dataset_type_dict', 'dataset_selection', 'dataset_loading', 'config_range', 'enable_logging', 'verbose', 'capture_log', 'additional_models', 'experimental_models', 'rewrite_results', 'with_udp', 'flip_test', 'model_transformation_dict', 'report_perfsim', 'tidl_offload', 'input_optimization', 'run_dir_tree_depth', 'target_device_preset', 'fast_calibration_factor', 'param_template_file', 'settings_file'], 'dataset_cache': None}
    work_dir: ./work_dirs/modelartifacts/AM62A/8bits
    Using model configs from Python module: ./configs
    
    INFO:20250105-163154: loading dataset - category:coco variant:coco
    
    INFO:20250105-163154: dataset exists - will reuse - ./dependencies/datasets/coco
    loading annotations into memory...
    Done (t=0.50s)
    creating index...
    index created!
    loading annotations into memory...
    Done (t=0.60s)
    creating index...
    index created!
    configs to run: ['od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx']
    number of configs: 1
    
    INFO:20250105-163204: parallel_run - parallel_processes:1 parallel_devices=[0]
    TASKS                                                       |          |     0% 0/1| [< ]
    INFO:20250105-163204: starting process on parallel_device - 0   0%|          || 0/1 [00:00<?, ?it/s]
    
    INFO:20250105-163204: starting - od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx
    INFO:20250105-163204: model_path - /home/ht/edgeai/mmyolo/work_dirs/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_nano_lite_640x640_20231118_checkpoint.onnx
    INFO:20250105-163204: model_file - /home/ht/edgeai/edgeai-tensorlab/edgeai-benchmark/work_dirs/modelartifacts/AM62A/8bits/od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx/model/yolov8_nano_lite_640x640_20231118_checkpoint.onnx
    INFO:20250105-163204: quant_file - /home/ht/edgeai/edgeai-tensorlab/edgeai-benchmark/work_dirs/modelartifacts/AM62A/8bits/od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx/model/yolov8_nano_lite_640x640_20231118_checkpoint_qparams.prototxt
    Downloading 1/1: /home/ht/edgeai/mmyolo/work_dirs/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_nano_lite_640x640_20231118_checkpoint.onnx
    Download done for /home/ht/edgeai/mmyolo/work_dirs/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_nano_lite_640x640_20231118_checkpoint.onnx
    Downloading 1/1: /home/ht/edgeai/mmyolo/work_dirs/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_nano_lite_640x640_20231118_checkpoint.onnx
    Download done for /home/ht/edgeai/mmyolo/work_dirs/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_nano_lite_640x640_20231118_checkpoint.onnx
    Converted model is valid!
    
    INFO:20250105-163204: running - od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx
    INFO:20250105-163204: pipeline_config - {'task_type': 'detection', 'dataset_category': 'coco', 'calibration_dataset': <edgeai_benchmark.datasets.coco_det.COCODetection object at 0x760624a86fb0>, 'input_dataset': <edgeai_benchmark.datasets.coco_det.COCODetection object at 0x760624a870a0>, 'preprocess': <edgeai_benchmark.preprocess.PreProcessTransforms object at 0x760624a64f10>, 'session': <edgeai_benchmark.sessions.onnxrt_session.ONNXRTSession object at 0x760624a64f70>, 'postprocess': <edgeai_benchmark.postprocess.PostProcessTransforms object at 0x760624a65240>, 'metric': {'label_offset_pred': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 13, 12: 14, 13: 15, 14: 16, 15: 17, 16: 18, 17: 19, 18: 20, 19: 21, 20: 22, 21: 23, 22: 24, 23: 25, 24: 27, 25: 28, 26: 31, 27: 32, 28: 33, 29: 34, 30: 35, 31: 36, 32: 37, 33: 38, 34: 39, 35: 40, 36: 41, 37: 42, 38: 43, 39: 44, 40: 46, 41: 47, 42: 48, 43: 49, 44: 50, 45: 51, 46: 52, 47: 53, 48: 54, 49: 55, 50: 56, 51: 57, 52: 58, 53: 59, 54: 60, 55: 61, 56: 62, 57: 63, 58: 64, 59: 65, 60: 67, 61: 70, 62: 72, 63: 73, 64: 74, 65: 75, 66: 76, 67: 77, 68: 78, 69: 79, 70: 80, 71: 81, 72: 82, 73: 84, 74: 85, 75: 86, 76: 87, 77: 88, 78: 89, 79: 90, 80: 91}}, 'model_info': {'metric_reference': {'accuracy_ap[.5:.95]%': 34.5}, 'model_shortlist': 70, 'compact_name': 'yolov8-nano-lite-640x640-gplv3', 'shortlisted': False}}
    INFO:20250105-163204: import  - od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx - this may take some time...================================ import model =============
    
    INFO:20250105-163206: import completed  - od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx - 1 sec
    
    SUCCESS:20250105-163206: benchmark results - {}
    TASKS                                                       | 100%|██████████|| 1/1 [00:01<00:00,  1.87s/it]
    
    
    -------------------------------------------------------------------
    argv: ['./scripts/benchmark_modelzoo.py', 'settings_import_on_pc.yaml', '--target_device', 'AM62A', '--run_import', 'False']
    settings: {'include_files': None, 'pipeline_type': 'accuracy', 'num_frames': 100, 'calibration_frames': 25, 'calibration_iterations': 25, 'configs_path': './configs', 'models_path': '../../edgeai-modelzoo-cl/models', 'modelartifacts_path': './work_dirs/modelartifacts/AM62A', 'modelpackage_path': './work_dirs/modelpackage/AM62A', 'datasets_path': './dependencies/datasets', 'target_device': 'AM62A', 'target_machine': 'pc', 'run_suffix': None, 'parallel_devices': None, 'parallel_processes': 1, 'tensor_bits': 8, 'runtime_options': {'advanced_options:quantization_scale_type': 4}, 'run_import': False, 'run_inference': True, 'run_missing': True, 'detection_threshold': 0.3, 'detection_top_k': 200, 'detection_nms_threshold': None, 'detection_keep_top_k': None, 'save_output': True, 'num_output_frames': 50, 'model_selection': ['od-8870'], 'model_shortlist': None, 'model_exclusion': None, 'task_selection': None, 'runtime_selection': None, 'session_type_dict': {'onnx': 'onnxrt', 'tflite': 'tflitert', 'mxnet': 'tvmdlr'}, 'dataset_type_dict': {'imagenet': 'imagenetv2c'}, 'dataset_selection': None, 'dataset_loading': True, 'config_range': None, 'enable_logging': True, 'verbose': False, 'capture_log': False, 'additional_models': True, 'experimental_models': True, 'rewrite_results': False, 'with_udp': True, 'flip_test': False, 'model_transformation_dict': None, 'report_perfsim': False, 'tidl_offload': False, 'input_optimization': None, 'run_dir_tree_depth': None, 'target_device_preset': True, 'fast_calibration_factor': 0.5, 'param_template_file': None, 'settings_file': 'settings_import_on_pc.yaml', 'basic_keys': ['include_files', 'pipeline_type', 'num_frames', 'calibration_frames', 'calibration_iterations', 'configs_path', 'models_path', 'modelartifacts_path', 'modelpackage_path', 'datasets_path', 'target_device', 'target_machine', 'run_suffix', 'parallel_devices', 'parallel_processes', 'tensor_bits', 'runtime_options', 'run_import', 'run_inference', 'run_missing', 'detection_threshold', 'detection_top_k', 'detection_nms_threshold', 'detection_keep_top_k', 'save_output', 'num_output_frames', 'model_selection', 'model_shortlist', 'model_exclusion', 'task_selection', 'runtime_selection', 'session_type_dict', 'dataset_type_dict', 'dataset_selection', 'dataset_loading', 'config_range', 'enable_logging', 'verbose', 'capture_log', 'additional_models', 'experimental_models', 'rewrite_results', 'with_udp', 'flip_test', 'model_transformation_dict', 'report_perfsim', 'tidl_offload', 'input_optimization', 'run_dir_tree_depth', 'target_device_preset', 'fast_calibration_factor', 'param_template_file', 'settings_file'], 'dataset_cache': None}
    work_dir: ./work_dirs/modelartifacts/AM62A/8bits
    Using model configs from Python module: ./configs
    
    INFO:20250105-163208: loading dataset - category:coco variant:coco
    
    INFO:20250105-163208: dataset exists - will reuse - ./dependencies/datasets/coco
    loading annotations into memory...
    Done (t=0.51s)
    creating index...
    index created!
    loading annotations into memory...
    Done (t=0.61s)
    creating index...
    index created!
    configs to run: ['od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx']
    number of configs: 1
    
    INFO:20250105-163217: parallel_run - parallel_processes:1 parallel_devices=[0]
    TASKS                                                       |          |     0% 0/1| [< ]
    INFO:20250105-163218: starting process on parallel_device - 0   0%|          || 0/1 [00:00<?, ?it/s]
    
    INFO:20250105-163218: starting - od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx
    INFO:20250105-163218: model_path - /home/ht/edgeai/mmyolo/work_dirs/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_nano_lite_640x640_20231118_checkpoint.onnx
    INFO:20250105-163218: model_file - /home/ht/edgeai/edgeai-tensorlab/edgeai-benchmark/work_dirs/modelartifacts/AM62A/8bits/od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx/model/yolov8_nano_lite_640x640_20231118_checkpoint.onnx
    INFO:20250105-163218: quant_file - /home/ht/edgeai/edgeai-tensorlab/edgeai-benchmark/work_dirs/modelartifacts/AM62A/8bits/od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx/model/yolov8_nano_lite_640x640_20231118_checkpoint_qparams.prototxt
    
    INFO:20250105-163218: running - od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx
    INFO:20250105-163218: pipeline_config - {'task_type': 'detection', 'dataset_category': 'coco', 'calibration_dataset': <edgeai_benchmark.datasets.coco_det.COCODetection object at 0x70cd57a76fb0>, 'input_dataset': <edgeai_benchmark.datasets.coco_det.COCODetection object at 0x70cd57a770a0>, 'preprocess': <edgeai_benchmark.preprocess.PreProcessTransforms object at 0x70cd57a54f10>, 'session': <edgeai_benchmark.sessions.onnxrt_session.ONNXRTSession object at 0x70cd57a54f70>, 'postprocess': <edgeai_benchmark.postprocess.PostProcessTransforms object at 0x70cd57a55240>, 'metric': {'label_offset_pred': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 13, 12: 14, 13: 15, 14: 16, 15: 17, 16: 18, 17: 19, 18: 20, 19: 21, 20: 22, 21: 23, 22: 24, 23: 25, 24: 27, 25: 28, 26: 31, 27: 32, 28: 33, 29: 34, 30: 35, 31: 36, 32: 37, 33: 38, 34: 39, 35: 40, 36: 41, 37: 42, 38: 43, 39: 44, 40: 46, 41: 47, 42: 48, 43: 49, 44: 50, 45: 51, 46: 52, 47: 53, 48: 54, 49: 55, 50: 56, 51: 57, 52: 58, 53: 59, 54: 60, 55: 61, 56: 62, 57: 63, 58: 64, 59: 65, 60: 67, 61: 70, 62: 72, 63: 73, 64: 74, 65: 75, 66: 76, 67: 77, 68: 78, 69: 79, 70: 80, 71: 81, 72: 82, 73: 84, 74: 85, 75: 86, 76: 87, 77: 88, 78: 89, 79: 90, 80: 91}}, 'model_info': {'metric_reference': {'accuracy_ap[.5:.95]%': 34.5}, 'model_shortlist': 70, 'compact_name': 'yolov8-nano-lite-640x640-gplv3', 'shortlisted': False}}
    INFO:20250105-163218: infer  - od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoininfer : od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-| 100%|##########|| 100/100 [00:05<00:00, 17.82it/s]
    
    INFO:20250105-163224: infer completed  - od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx - 6 secLoading and preparing results...
    DONE (t=0.00s)
    creating index...
    index created!
    Running per image evaluation...
    Evaluate annotation type *bbox*
    DONE (t=0.17s).
    Accumulating evaluation results...
    DONE (t=0.25s).
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.333
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.418
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.354
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.105
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.374
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.504
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.290
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.371
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.375
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.107
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.397
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.588
    
    
    SUCCESS:20250105-163224: benchmark results - {'infer_path': 'od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx', 'accuracy_ap[.5:.95]%': 33.273675, 'accuracy_ap50%': 41.753303, 'num_subgraphs': 0, 'infer_time_core_ms': 46.713805, 'infer_time_subgraph_ms': 0.0, 'ddr_transfer_mb': 0.0, 'perfsim_time_ms': 0.0, 'perfsim_ddr_transfer_mb': 0.0, 'perfsim_gmacs': 0.0}
    TASKS                                                       | 100%|██████████|| 1/1 [00:07<00:00,  7.00s/it]
    
    
    -------------------------------------------------------------------
    settings: {'include_files': None, 'pipeline_type': 'accuracy', 'num_frames': 100, 'calibration_frames': 25, 'calibration_iterations': 25, 'configs_path': './configs', 'models_path': '../../edgeai-modelzoo-cl/models', 'modelartifacts_path': './work_dirs/modelartifacts/', 'modelpackage_path': './work_dirs/modelpackage/', 'datasets_path': './dependencies/datasets', 'target_device': None, 'target_machine': 'pc', 'run_suffix': None, 'parallel_devices': None, 'parallel_processes': 1, 'tensor_bits': 8, 'runtime_options': None, 'run_import': True, 'run_inference': True, 'run_missing': True, 'detection_threshold': 0.3, 'detection_top_k': 200, 'detection_nms_threshold': None, 'detection_keep_top_k': None, 'save_output': True, 'num_output_frames': 50, 'model_selection': ['od-8870'], 'model_shortlist': None, 'model_exclusion': None, 'task_selection': None, 'runtime_selection': None, 'session_type_dict': {'onnx': 'onnxrt', 'tflite': 'tflitert', 'mxnet': 'tvmdlr'}, 'dataset_type_dict': {'imagenet': 'imagenetv2c'}, 'dataset_selection': None, 'dataset_loading': True, 'config_range': None, 'enable_logging': True, 'verbose': False, 'capture_log': False, 'additional_models': True, 'experimental_models': True, 'rewrite_results': False, 'with_udp': True, 'flip_test': False, 'model_transformation_dict': None, 'report_perfsim': False, 'tidl_offload': False, 'input_optimization': None, 'run_dir_tree_depth': None, 'target_device_preset': True, 'fast_calibration_factor': None, 'param_template_file': None, 'skip_pattern': '_package', 'settings_file': 'settings_import_on_pc.yaml', 'basic_keys': ['include_files', 'pipeline_type', 'num_frames', 'calibration_frames', 'calibration_iterations', 'configs_path', 'models_path', 'modelartifacts_path', 'modelpackage_path', 'datasets_path', 'target_device', 'target_machine', 'run_suffix', 'parallel_devices', 'parallel_processes', 'tensor_bits', 'runtime_options', 'run_import', 'run_inference', 'run_missing', 'detection_threshold', 'detection_top_k', 'detection_nms_threshold', 'detection_keep_top_k', 'save_output', 'num_output_frames', 'model_selection', 'model_shortlist', 'model_exclusion', 'task_selection', 'runtime_selection', 'session_type_dict', 'dataset_type_dict', 'dataset_selection', 'dataset_loading', 'config_range', 'enable_logging', 'verbose', 'capture_log', 'additional_models', 'experimental_models', 'rewrite_results', 'with_udp', 'flip_test', 'model_transformation_dict', 'report_perfsim', 'tidl_offload', 'input_optimization', 'run_dir_tree_depth', 'target_device_preset', 'fast_calibration_factor', 'param_template_file', 'skip_pattern', 'settings_file'], 'dataset_cache': None}
    results found for 3 models
    Report generated at ./work_dirs/modelartifacts/
    -------------------------------------------------------------------
    ===================================================================
    (benchmark_10_0) ht@ht-OMEN:~/edgeai/edgeai-tensorlab/edgeai-benchmark$ 
    
    
    

    infer res:

    benchmark:

     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.333
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.418
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.354
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.105
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.374
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.504
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.290
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.371
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.375
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.107
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.397
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.588

  • Step 5 Run model in TIDLRT

    Change this to True in settings_base.yaml to run in TIDLRT mode:

    Remove the folder for this model under work_dir and then refer to my log to run in TIDLRT:

    (benchmark_10_0) ht@ht-OMEN:~/edgeai/edgeai-tensorlab/edgeai-benchmark$ ./run_benchmarks_pc.sh AM62A
    TARGET_SOC:     AM62A
    TARGET_MACHINE: pc
    DEBUG MODE:     false @ ht-OMEN:5678
    => Recommend to use the alternate script run_benchmarks_parallelbash_pc.sh
       for import and inference across models in parallel.
    TIDL_TOOLS_PATH=/home/ht/edgeai/edgeai-tensorlab/edgeai-benchmark/tools/AM62A/tidl_tools
    LD_LIBRARY_PATH=/home/ht/edgeai/edgeai-tensorlab/edgeai-benchmark/tools/AM62A/tidl_tools
    PYTHONPATH=:
    ===================================================================
    argv: ['./scripts/benchmark_modelzoo.py', 'settings_import_on_pc.yaml', '--target_device', 'AM62A', '--run_inference', 'False']
    settings: {'include_files': None, 'pipeline_type': 'accuracy', 'num_frames': 100, 'calibration_frames': 25, 'calibration_iterations': 25, 'configs_path': './configs', 'models_path': '../../edgeai-modelzoo-cl/models', 'modelartifacts_path': './work_dirs/modelartifacts/AM62A', 'modelpackage_path': './work_dirs/modelpackage/AM62A', 'datasets_path': './dependencies/datasets', 'target_device': 'AM62A', 'target_machine': 'pc', 'run_suffix': None, 'parallel_devices': None, 'parallel_processes': 1, 'tensor_bits': 8, 'runtime_options': {'advanced_options:quantization_scale_type': 4}, 'run_import': True, 'run_inference': False, 'run_missing': True, 'detection_threshold': 0.3, 'detection_top_k': 200, 'detection_nms_threshold': None, 'detection_keep_top_k': None, 'save_output': True, 'num_output_frames': 50, 'model_selection': ['od-8870'], 'model_shortlist': None, 'model_exclusion': None, 'task_selection': None, 'runtime_selection': None, 'session_type_dict': {'onnx': 'onnxrt', 'tflite': 'tflitert', 'mxnet': 'tvmdlr'}, 'dataset_type_dict': {'imagenet': 'imagenetv2c'}, 'dataset_selection': None, 'dataset_loading': True, 'config_range': None, 'enable_logging': True, 'verbose': False, 'capture_log': False, 'additional_models': True, 'experimental_models': True, 'rewrite_results': False, 'with_udp': True, 'flip_test': False, 'model_transformation_dict': None, 'report_perfsim': False, 'tidl_offload': True, 'input_optimization': None, 'run_dir_tree_depth': None, 'target_device_preset': True, 'fast_calibration_factor': 0.5, 'param_template_file': None, 'settings_file': 'settings_import_on_pc.yaml', 'basic_keys': ['include_files', 'pipeline_type', 'num_frames', 'calibration_frames', 'calibration_iterations', 'configs_path', 'models_path', 'modelartifacts_path', 'modelpackage_path', 'datasets_path', 'target_device', 'target_machine', 'run_suffix', 'parallel_devices', 'parallel_processes', 'tensor_bits', 'runtime_options', 'run_import', 'run_inference', 'run_missing', 'detection_threshold', 'detection_top_k', 'detection_nms_threshold', 'detection_keep_top_k', 'save_output', 'num_output_frames', 'model_selection', 'model_shortlist', 'model_exclusion', 'task_selection', 'runtime_selection', 'session_type_dict', 'dataset_type_dict', 'dataset_selection', 'dataset_loading', 'config_range', 'enable_logging', 'verbose', 'capture_log', 'additional_models', 'experimental_models', 'rewrite_results', 'with_udp', 'flip_test', 'model_transformation_dict', 'report_perfsim', 'tidl_offload', 'input_optimization', 'run_dir_tree_depth', 'target_device_preset', 'fast_calibration_factor', 'param_template_file', 'settings_file'], 'dataset_cache': None}
    work_dir: ./work_dirs/modelartifacts/AM62A/8bits
    Using model configs from Python module: ./configs
    
    INFO:20250105-163417: loading dataset - category:coco variant:coco
    
    INFO:20250105-163417: dataset exists - will reuse - ./dependencies/datasets/coco
    loading annotations into memory...
    Done (t=0.50s)
    creating index...
    index created!
    loading annotations into memory...
    Done (t=0.60s)
    creating index...
    index created!
    configs to run: ['od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx']
    number of configs: 1
    
    INFO:20250105-163427: parallel_run - parallel_processes:1 parallel_devices=[0]
    TASKS                                                       |          |     0% 0/1| [< ]
    INFO:20250105-163427: starting process on parallel_device - 0   0%|          || 0/1 [00:00<?, ?it/s]
    
    INFO:20250105-163427: starting - od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx
    INFO:20250105-163427: model_path - /home/ht/edgeai/mmyolo/work_dirs/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_nano_lite_640x640_20231118_checkpoint.onnx
    INFO:20250105-163427: model_file - /home/ht/edgeai/edgeai-tensorlab/edgeai-benchmark/work_dirs/modelartifacts/AM62A/8bits/od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx/model/yolov8_nano_lite_640x640_20231118_checkpoint.onnx
    INFO:20250105-163427: quant_file - /home/ht/edgeai/edgeai-tensorlab/edgeai-benchmark/work_dirs/modelartifacts/AM62A/8bits/od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx/model/yolov8_nano_lite_640x640_20231118_checkpoint_qparams.prototxt
    Downloading 1/1: /home/ht/edgeai/mmyolo/work_dirs/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_nano_lite_640x640_20231118_checkpoint.onnx
    Download done for /home/ht/edgeai/mmyolo/work_dirs/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_nano_lite_640x640_20231118_checkpoint.onnx
    Downloading 1/1: /home/ht/edgeai/mmyolo/work_dirs/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_nano_lite_640x640_20231118_checkpoint.onnx
    Download done for /home/ht/edgeai/mmyolo/work_dirs/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_nano_lite_640x640_20231118_checkpoint.onnx
    Converted model is valid!
    
    INFO:20250105-163427: running - od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx
    INFO:20250105-163427: pipeline_config - {'task_type': 'detection', 'dataset_category': 'coco', 'calibration_dataset': <edgeai_benchmark.datasets.coco_det.COCODetection object at 0x7ec0f2782fb0>, 'input_dataset': <edgeai_benchmark.datasets.coco_det.COCODetection object at 0x7ec0f27830a0>, 'preprocess': <edgeai_benchmark.preprocess.PreProcessTransforms object at 0x7ec0f2760f10>, 'session': <edgeai_benchmark.sessions.onnxrt_session.ONNXRTSession object at 0x7ec0f2760f70>, 'postprocess': <edgeai_benchmark.postprocess.PostProcessTransforms object at 0x7ec0f2761240>, 'metric': {'label_offset_pred': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 13, 12: 14, 13: 15, 14: 16, 15: 17, 16: 18, 17: 19, 18: 20, 19: 21, 20: 22, 21: 23, 22: 24, 23: 25, 24: 27, 25: 28, 26: 31, 27: 32, 28: 33, 29: 34, 30: 35, 31: 36, 32: 37, 33: 38, 34: 39, 35: 40, 36: 41, 37: 42, 38: 43, 39: 44, 40: 46, 41: 47, 42: 48, 43: 49, 44: 50, 45: 51, 46: 52, 47: 53, 48: 54, 49: 55, 50: 56, 51: 57, 52: 58, 53: 59, 54: 60, 55: 61, 56: 62, 57: 63, 58: 64, 59: 65, 60: 67, 61: 70, 62: 72, 63: 73, 64: 74, 65: 75, 66: 76, 67: 77, 68: 78, 69: 79, 70: 80, 71: 81, 72: 82, 73: 84, 74: 85, 75: 86, 76: 87, 77: 88, 78: 89, 79: 90, 80: 91}}, 'model_info': {'metric_reference': {'accuracy_ap[.5:.95]%': 34.5}, 'model_shortlist': 70, 'compact_name': 'yolov8-nano-lite-640x640-gplv3', 'shortlisted': False}}
    INFO:20250105-163427: import  - od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx - this may take some time...========================= [Model Compilation Started] =========================
    
    Model compilation will perform the following stages:
    1. Parsing
    2. Graph Optimization
    3. Quantization & Calibration
    4. Memory Planning
    
    ============================== [Version Summary] ==============================
    
    -------------------------------------------------------------------------------
    |          TIDL Tools Version          |              10_00_08_00             |
    -------------------------------------------------------------------------------
    |         C7x Firmware Version         |              10_00_02_00             |
    -------------------------------------------------------------------------------
    |            Runtime Version           |            1.14.0+10000005           |
    -------------------------------------------------------------------------------
    |          Model Opset Version         |                  11                  |
    -------------------------------------------------------------------------------
    
    NOTE: The runtime version here specifies ONNXRT_VERSION+TIDL_VERSION
    Ex: 1.14.0+1000XXXX -> ONNXRT 1.14.0 and a TIDL_VERSION 10.00.XX.XX
    
    ============================== [Parsing Started] ==============================
    
    yolov8 is meta arch name 
    yolov8
    Number of OD backbone nodes = 158 
    Size of odBackboneNodeIds = 158 
    
    ------------------------- Subgraph Information Summary -------------------------
    -------------------------------------------------------------------------------
    |          Core           |      No. of Nodes       |   Number of Subgraphs   |
    -------------------------------------------------------------------------------
    | C7x                     |                     281 |                       1 |
    | CPU                     |                       0 |                       x |
    -------------------------------------------------------------------------------
    ============================= [Parsing Completed] =============================
    
    TIDL Meta pipeLine (proto) file  : /home/ht/edgeai/edgeai-tensorlab/edgeai-benchmark/work_dirs/modelartifacts/AM62A/8bits/od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx/model/yolov8_nano_lite_640x640_20231118_checkpoint.prototxt  
    yolov8
    ==================== [Optimization for subgraph_0 Started] ====================
    
    [TIDL Import] [PARSER] WARNING: Requested output data convert layer is not added to the network, It is currently not optimal
    ----------------------------- Optimization Summary -----------------------------
    -------------------------------------------------------------------------------------
    |            Layer           | Nodes before optimization | Nodes after optimization |
    -------------------------------------------------------------------------------------
    | TIDL_OdOutputReformatLayer |                         0 |                        1 |
    | TIDL_BatchNormLayer        |                         0 |                        1 |
    | TIDL_ConcatLayer           |                        13 |                       13 |
    | TIDL_ReLULayer             |                        57 |                        0 |
    | TIDL_ResizeLayer           |                         2 |                        2 |
    | TIDL_SliceLayer            |                         8 |                       16 |
    | TIDL_ConvolutionLayer      |                        63 |                       63 |
    | TIDL_EltWiseLayer          |                         8 |                        6 |
    | TIDL_DetectionOutputLayer  |                         0 |                        1 |
    | TIDL_CastLayer             |                         1 |                        0 |
    | TIDL_PoolingLayer          |                         6 |                        6 |
    -------------------------------------------------------------------------------------
    
    =================== [Optimization for subgraph_0 Completed] ===================
    
    The soft limit is 10240
    The hard limit is 10240
    MEM: Init ... !!!
    MEM: Init ... Done !!!
     0.0s:  VX_ZONE_INIT:Enabled
     0.13s:  VX_ZONE_ERROR:Enabled
     0.16s:  VX_ZONE_WARNING:Enabled
     0.2197s:  VX_ZONE_INIT:[tivxInit:190] Initialization Done !!!
    ============= [Quantization & Calibration for subgraph_0 Started] =============
    
    [TIDL Import] [QUANTIZATION] WARNING: Could not open /home/ht/edgeai/edgeai-tensorlab/edgeai-benchmark/work_dirs/modelartifacts/AM62A/8bits/od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx/model/yolov8_nano_lite_640x640_20231118_checkpoint_qparams.prototxt for importing mixed precision info.
    This will be generated after model compilation.
    Parameters unavailable, running calibration
    
    -------- Running Calibration in Float Mode to Collect Tensor Statistics --------
    [=========================>                                                   ] 33 %
    INFO:20250105-163527: parallel_run - num_total_tasks:1 len(queued_tasks):0 len(process_dict):1 len(result_list):0
    [=============================================================================] 100 %
    
    ----------------- Fixed-point Calibration Iteration [1 / 12]: -----------------
    Parameters unavailable, running calibration
    [=============================================================================] 100 %
    
    ----------------- Fixed-point Calibration Iteration [2 / 12]: -----------------
    Parameters unavailable, running calibration
    [===================>                                                         ] 25 %
    INFO:20250105-163627: parallel_run - num_total_tasks:1 len(queued_tasks):0 len(process_dict):1 len(result_list):0
    [=============================================================================] 100 %
    
    ----------------- Fixed-point Calibration Iteration [3 / 12]: -----------------
    Parameters unavailable, running calibration
    [=============================================================================] 100 %
    
    ----------------- Fixed-point Calibration Iteration [4 / 12]: -----------------
    Parameters unavailable, running calibration
    [======================================================================>      ] 91 %
    INFO:20250105-163727: parallel_run - num_total_tasks:1 len(queued_tasks):0 len(process_dict):1 len(result_list):0
    [=============================================================================] 100 %
    
    ----------------- Fixed-point Calibration Iteration [5 / 12]: -----------------
    Parameters unavailable, running calibration
    [=============================================================================] 100 %
    
    ----------------- Fixed-point Calibration Iteration [6 / 12]: -----------------
    Parameters unavailable, running calibration
    [=============================================================================] 100 %
    
    ----------------- Fixed-point Calibration Iteration [7 / 12]: -----------------
    Parameters unavailable, running calibration
    [============================================>                                ] 58 %
    INFO:20250105-163827: parallel_run - num_total_tasks:1 len(queued_tasks):0 len(process_dict):1 len(result_list):0
    [=============================================================================] 100 %
    
    ----------------- Fixed-point Calibration Iteration [8 / 12]: -----------------
    Parameters unavailable, running calibration
    [=============================================================================] 100 %
    
    ----------------- Fixed-point Calibration Iteration [9 / 12]: -----------------
    Parameters unavailable, running calibration
    [=============================================================================] 100 %
    
    ----------------- Fixed-point Calibration Iteration [10 / 12]: -----------------
    Parameters unavailable, running calibration
    [===================>                                                         ] 25 %
    INFO:20250105-163927: parallel_run - num_total_tasks:1 len(queued_tasks):0 len(process_dict):1 len(result_list):0
    [=============================================================================] 100 %
    
    ----------------- Fixed-point Calibration Iteration [11 / 12]: -----------------
    Parameters unavailable, running calibration
    [=============================================================================] 100 %
    
    ----------------- Fixed-point Calibration Iteration [12 / 12]: -----------------
    Parameters unavailable, running calibration
    [================================================================>            ] 83 %
    INFO:20250105-164028: parallel_run - num_total_tasks:1 len(queued_tasks):0 len(process_dict):1 len(result_list):0
    [=============================================================================] 100 %
    
    Parameters unavailable, running calibration
    Output network quant params prototxt file path: /home/ht/edgeai/edgeai-tensorlab/edgeai-benchmark/work_dirs/modelartifacts/AM62A/8bits/od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx/artifacts/tempDir/subgraph_0_tidl_net_quant_params.prototxt
    Calibrated quant parameters stored in protoTxt format
    ==================== [Quantization & Calibration Completed] ====================
    
    ========================== [Memory Planning Started] ==========================
    
    
    ------------------------- Network Compiler Traces ------------------------------
    Successful Memory Allocation
    Successful Workload Creation
    
    ========================= [Memory Planning Completed] =========================
    
    ======================== Subgraph Compiled Successfully ========================
    
    
    
    ================================ import model =============
    MEM: Deinit ... !!!
    MEM: Alloc's: 26 alloc's of 171097576 bytes 
    MEM: Free's : 26 free's  of 171097576 bytes 
    MEM: Open's : 0 allocs  of 0 bytes 
    MEM: Deinit ... Done !!!
    
    INFO:20250105-164033: import completed  - od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx - 366 sec
    
    SUCCESS:20250105-164033: benchmark results - {}
    TASKS                                                       | 100%|██████████|| 1/1 [06:06<00:00, 366.36s/it]
    TASKS                                                       | 100%|██████████|| 1/1 [06:06<00:00, 366.26s/it]
    
    -------------------------------------------------------------------
    argv: ['./scripts/benchmark_modelzoo.py', 'settings_import_on_pc.yaml', '--target_device', 'AM62A', '--run_import', 'False']
    settings: {'include_files': None, 'pipeline_type': 'accuracy', 'num_frames': 100, 'calibration_frames': 25, 'calibration_iterations': 25, 'configs_path': './configs', 'models_path': '../../edgeai-modelzoo-cl/models', 'modelartifacts_path': './work_dirs/modelartifacts/AM62A', 'modelpackage_path': './work_dirs/modelpackage/AM62A', 'datasets_path': './dependencies/datasets', 'target_device': 'AM62A', 'target_machine': 'pc', 'run_suffix': None, 'parallel_devices': None, 'parallel_processes': 1, 'tensor_bits': 8, 'runtime_options': {'advanced_options:quantization_scale_type': 4}, 'run_import': False, 'run_inference': True, 'run_missing': True, 'detection_threshold': 0.3, 'detection_top_k': 200, 'detection_nms_threshold': None, 'detection_keep_top_k': None, 'save_output': True, 'num_output_frames': 50, 'model_selection': ['od-8870'], 'model_shortlist': None, 'model_exclusion': None, 'task_selection': None, 'runtime_selection': None, 'session_type_dict': {'onnx': 'onnxrt', 'tflite': 'tflitert', 'mxnet': 'tvmdlr'}, 'dataset_type_dict': {'imagenet': 'imagenetv2c'}, 'dataset_selection': None, 'dataset_loading': True, 'config_range': None, 'enable_logging': True, 'verbose': False, 'capture_log': False, 'additional_models': True, 'experimental_models': True, 'rewrite_results': False, 'with_udp': True, 'flip_test': False, 'model_transformation_dict': None, 'report_perfsim': False, 'tidl_offload': True, 'input_optimization': None, 'run_dir_tree_depth': None, 'target_device_preset': True, 'fast_calibration_factor': 0.5, 'param_template_file': None, 'settings_file': 'settings_import_on_pc.yaml', 'basic_keys': ['include_files', 'pipeline_type', 'num_frames', 'calibration_frames', 'calibration_iterations', 'configs_path', 'models_path', 'modelartifacts_path', 'modelpackage_path', 'datasets_path', 'target_device', 'target_machine', 'run_suffix', 'parallel_devices', 'parallel_processes', 'tensor_bits', 'runtime_options', 'run_import', 'run_inference', 'run_missing', 'detection_threshold', 'detection_top_k', 'detection_nms_threshold', 'detection_keep_top_k', 'save_output', 'num_output_frames', 'model_selection', 'model_shortlist', 'model_exclusion', 'task_selection', 'runtime_selection', 'session_type_dict', 'dataset_type_dict', 'dataset_selection', 'dataset_loading', 'config_range', 'enable_logging', 'verbose', 'capture_log', 'additional_models', 'experimental_models', 'rewrite_results', 'with_udp', 'flip_test', 'model_transformation_dict', 'report_perfsim', 'tidl_offload', 'input_optimization', 'run_dir_tree_depth', 'target_device_preset', 'fast_calibration_factor', 'param_template_file', 'settings_file'], 'dataset_cache': None}
    work_dir: ./work_dirs/modelartifacts/AM62A/8bits
    Using model configs from Python module: ./configs
    
    INFO:20250105-164036: loading dataset - category:coco variant:coco
    
    INFO:20250105-164036: dataset exists - will reuse - ./dependencies/datasets/coco
    loading annotations into memory...
    Done (t=0.50s)
    creating index...
    index created!
    loading annotations into memory...
    Done (t=0.60s)
    creating index...
    index created!
    configs to run: ['od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx']
    number of configs: 1
    
    INFO:20250105-164045: parallel_run - parallel_processes:1 parallel_devices=[0]
    TASKS                                                       |          |     0% 0/1| [< ]
    INFO:20250105-164045: starting process on parallel_device - 0   0%|          || 0/1 [00:00<?, ?it/s]
    
    INFO:20250105-164045: starting - od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx
    INFO:20250105-164045: model_path - /home/ht/edgeai/mmyolo/work_dirs/yolov8_n_syncbn_fast_8xb16-500e_coco/yolov8_nano_lite_640x640_20231118_checkpoint.onnx
    INFO:20250105-164045: model_file - /home/ht/edgeai/edgeai-tensorlab/edgeai-benchmark/work_dirs/modelartifacts/AM62A/8bits/od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx/model/yolov8_nano_lite_640x640_20231118_checkpoint.onnx
    INFO:20250105-164045: quant_file - /home/ht/edgeai/edgeai-tensorlab/edgeai-benchmark/work_dirs/modelartifacts/AM62A/8bits/od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx/model/yolov8_nano_lite_640x640_20231118_checkpoint_qparams.prototxt
    
    INFO:20250105-164045: running - od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx
    INFO:20250105-164045: pipeline_config - {'task_type': 'detection', 'dataset_category': 'coco', 'calibration_dataset': <edgeai_benchmark.datasets.coco_det.COCODetection object at 0x7ece4d382fb0>, 'input_dataset': <edgeai_benchmark.datasets.coco_det.COCODetection object at 0x7ece4d3830a0>, 'preprocess': <edgeai_benchmark.preprocess.PreProcessTransforms object at 0x7ece4d360f10>, 'session': <edgeai_benchmark.sessions.onnxrt_session.ONNXRTSession object at 0x7ece4d360f70>, 'postprocess': <edgeai_benchmark.postprocess.PostProcessTransforms object at 0x7ece4d361240>, 'metric': {'label_offset_pred': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 13, 12: 14, 13: 15, 14: 16, 15: 17, 16: 18, 17: 19, 18: 20, 19: 21, 20: 22, 21: 23, 22: 24, 23: 25, 24: 27, 25: 28, 26: 31, 27: 32, 28: 33, 29: 34, 30: 35, 31: 36, 32: 37, 33: 38, 34: 39, 35: 40, 36: 41, 37: 42, 38: 43, 39: 44, 40: 46, 41: 47, 42: 48, 43: 49, 44: 50, 45: 51, 46: 52, 47: 53, 48: 54, 49: 55, 50: 56, 51: 57, 52: 58, 53: 59, 54: 60, 55: 61, 56: 62, 57: 63, 58: 64, 59: 65, 60: 67, 61: 70, 62: 72, 63: 73, 64: 74, 65: 75, 66: 76, 67: 77, 68: 78, 69: 79, 70: 80, 71: 81, 72: 82, 73: 84, 74: 85, 75: 86, 76: 87, 77: 88, 78: 89, 79: 90, 80: 91}}, 'model_info': {'metric_reference': {'accuracy_ap[.5:.95]%': 34.5}, 'model_shortlist': 70, 'compact_name': 'yolov8-nano-lite-640x640-gplv3', 'shortlisted': False}}
    INFO:20250105-164045: infer  - od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx - this may take some time...libtidl_onnxrt_EP loaded 0x57e89bc0deb0 
    Final number of subgraphs created are : 1, - Offloaded Nodes - 281, Total Nodes - 281 
    The soft limit is 10240
    The hard limit is 10240
    MEM: Init ... !!!
    MEM: Init ... Done !!!
     0.0s:  VX_ZONE_INIT:Enabled
     0.13s:  VX_ZONE_ERROR:Enabled
     0.16s:  VX_ZONE_WARNING:Enabled
     0.1972s:  VX_ZONE_INIT:[tivxInit:190] Initialization Done !!!
    infer : od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-|  15%|#5        || 15/100 [00:57<05:25,  3.83s/it]
    INFO:20250105-164145: parallel_run - num_total_tasks:1 len(queued_tasks):0 len(process_dict):1 len(result_list):0
    infer : od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-|  30%|###       || 30/100 [01:54<04:28,  3.83s/it]
    INFO:20250105-164245: parallel_run - num_total_tasks:1 len(queued_tasks):0 len(process_dict):1 len(result_list):0
    infer : od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-|  45%|####5     || 45/100 [02:52<03:30,  3.82s/it]
    INFO:20250105-164345: parallel_run - num_total_tasks:1 len(queued_tasks):0 len(process_dict):1 len(result_list):0
    infer : od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-|  60%|######    || 60/100 [03:49<02:32,  3.82s/it]
    INFO:20250105-164445: parallel_run - num_total_tasks:1 len(queued_tasks):0 len(process_dict):1 len(result_list):0
    infer : od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-|  78%|#######8  || 78/100 [04:58<01:24,  3.83s/it]
    INFO:20250105-164545: parallel_run - num_total_tasks:1 len(queued_tasks):0 len(process_dict):1 len(result_list):0
    infer : od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-|  93%|#########3|| 93/100 [05:55<00:26,  3.84s/it]
    INFO:20250105-164645: parallel_run - num_total_tasks:1 len(queued_tasks):0 len(process_dict):1 len(result_list):0
    infer : od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-| 100%|##########|| 100/100 [06:22<00:00,  3.83s/it]
    MEM: Deinit ... !!!
    MEM: Alloc's: 26 alloc's of 50689407 bytes 
    MEM: Free's : 26 free's  of 50689407 bytes 
    MEM: Open's : 0 allocs  of 0 bytes 
    MEM: Deinit ... Done !!!
    
    INFO:20250105-164709: infer completed  - od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx - 383 secLoading and preparing results...
    DONE (t=0.00s)
    creating index...
    index created!
    Running per image evaluation...
    Evaluate annotation type *bbox*
    DONE (t=0.16s).
    Accumulating evaluation results...
    DONE (t=0.25s).
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.338
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.421
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.368
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.104
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.377
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.501
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.294
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.371
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.374
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.105
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.396
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.581
    
    
    SUCCESS:20250105-164709: benchmark results - {'infer_path': 'od-8870_onnxrt_work_dirs_yolov8_n_syncbn_fast_8xb16-500e_coco_yolov8_nano_lite_640x640_20231118_checkpoint_onnx', 'accuracy_ap[.5:.95]%': 33.825025, 'accuracy_ap50%': 42.100531, 'num_subgraphs': 0, 'infer_time_core_ms': 3806.334489, 'infer_time_subgraph_ms': 0.0, 'ddr_transfer_mb': 0.0, 'perfsim_time_ms': 22.92888, 'perfsim_ddr_transfer_mb': 35.94, 'perfsim_gmacs': 4.3904}
    TASKS                                                       | 100%|██████████|| 1/1 [06:24<00:00, 384.01s/it]
    TASKS                                                       | 100%|██████████|| 1/1 [06:23<00:00, 383.91s/it]
    
    -------------------------------------------------------------------
    settings: {'include_files': None, 'pipeline_type': 'accuracy', 'num_frames': 100, 'calibration_frames': 25, 'calibration_iterations': 25, 'configs_path': './configs', 'models_path': '../../edgeai-modelzoo-cl/models', 'modelartifacts_path': './work_dirs/modelartifacts/', 'modelpackage_path': './work_dirs/modelpackage/', 'datasets_path': './dependencies/datasets', 'target_device': None, 'target_machine': 'pc', 'run_suffix': None, 'parallel_devices': None, 'parallel_processes': 1, 'tensor_bits': 8, 'runtime_options': None, 'run_import': True, 'run_inference': True, 'run_missing': True, 'detection_threshold': 0.3, 'detection_top_k': 200, 'detection_nms_threshold': None, 'detection_keep_top_k': None, 'save_output': True, 'num_output_frames': 50, 'model_selection': ['od-8870'], 'model_shortlist': None, 'model_exclusion': None, 'task_selection': None, 'runtime_selection': None, 'session_type_dict': {'onnx': 'onnxrt', 'tflite': 'tflitert', 'mxnet': 'tvmdlr'}, 'dataset_type_dict': {'imagenet': 'imagenetv2c'}, 'dataset_selection': None, 'dataset_loading': True, 'config_range': None, 'enable_logging': True, 'verbose': False, 'capture_log': False, 'additional_models': True, 'experimental_models': True, 'rewrite_results': False, 'with_udp': True, 'flip_test': False, 'model_transformation_dict': None, 'report_perfsim': False, 'tidl_offload': True, 'input_optimization': None, 'run_dir_tree_depth': None, 'target_device_preset': True, 'fast_calibration_factor': None, 'param_template_file': None, 'skip_pattern': '_package', 'settings_file': 'settings_import_on_pc.yaml', 'basic_keys': ['include_files', 'pipeline_type', 'num_frames', 'calibration_frames', 'calibration_iterations', 'configs_path', 'models_path', 'modelartifacts_path', 'modelpackage_path', 'datasets_path', 'target_device', 'target_machine', 'run_suffix', 'parallel_devices', 'parallel_processes', 'tensor_bits', 'runtime_options', 'run_import', 'run_inference', 'run_missing', 'detection_threshold', 'detection_top_k', 'detection_nms_threshold', 'detection_keep_top_k', 'save_output', 'num_output_frames', 'model_selection', 'model_shortlist', 'model_exclusion', 'task_selection', 'runtime_selection', 'session_type_dict', 'dataset_type_dict', 'dataset_selection', 'dataset_loading', 'config_range', 'enable_logging', 'verbose', 'capture_log', 'additional_models', 'experimental_models', 'rewrite_results', 'with_udp', 'flip_test', 'model_transformation_dict', 'report_perfsim', 'tidl_offload', 'input_optimization', 'run_dir_tree_depth', 'target_device_preset', 'fast_calibration_factor', 'param_template_file', 'skip_pattern', 'settings_file'], 'dataset_cache': None}
    results found for 3 models
    Report generated at ./work_dirs/modelartifacts/
    -------------------------------------------------------------------
    ===================================================================
    (benchmark_10_0) ht@ht-OMEN:~/edgeai/edgeai-tensorlab/edgeai-benchmark$ 
    
    
    

    infer res:

    benchmark:

     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.338
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.421
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.368
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.104
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.377
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.501
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.294
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.371
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.374
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.105
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.396
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.581

  • Step 6 Run model in TIDLRT on EVM

    ./run_benchmarks_pc.sh AM62A