This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: Model compilation/conversion is stuck for custom dataset trained yolov5 model

Part Number: TDA4VM

Tool/software:

Hi Team,

      

I have TDA4VM  kit.

Using the below repo I trained a yolov5m6 model for my custom dataset. My dataset has only 1 class. 

https://github.com/TexasInstruments/edgeai-yolov5 

After training I do the onnx conversion, with the below command 

`python export.py --weights runs/exp/weights/best.pt --img-size=640 --optimize --batch 1 --simplify --opset=11 --export-nms`


I have two files. "best.onnx" and "best.prototxt"

The contents of the "best.prototxt" file looks like this.

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
name: "yolo_v3"
tidl_yolo {
yolo_param {
input: "498"
anchor_width: 8.984375
anchor_width: 21.453125
anchor_width: 16.5
anchor_height: 11.5
anchor_height: 18.828125
anchor_height: 40.8125
}
yolo_param {
input: "560"
anchor_width: 43.15625
anchor_width: 39.4375
anchor_width: 86.4375
anchor_height: 32.40625
anchor_height: 70.3125
anchor_height: 64.3125
}
yolo_param {
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

I am currently working on SDK 8.5 on TDA4VM and using the same to do the model compilation. I am using the foolowing repo for compilation

https://github.com/TexasInstruments/edgeai-benchmark/tree/r8.5

Sometimes the model conversion gets stuck after the iterations print "TIDL Process with REF_ONLY FLOW" I am facing the same problem with SDK 9 as well. 

Below you can find the debug trace of the same with debug level 2

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
work_dir = ./work_dirs/modelartifacts/TDA4VM/8bits
packaged_dir = ./work_dirs/modelartifacts/TDA4VM_package/8bits
loading annotations into memory...
Done (t=0.02s)
creating index...
index created!
Image Dir : ./dependencies/datasets/calib/coco/calib
Annotations File : ./dependencies/datasets/calib/coco/annotations/instances_calib.json
--------------------------Loading Datasets------------------------------
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Image Dir : ./dependencies/datasets/val/coco/valid
Annotations File : ./dependencies/datasets/val/coco/annotations/instances_valid.json
Pipeline configuration : {'yolov5m_Aug_640': {'task_type': 'detection', 'calibration_dataset': <edgeai_benchmark.datasets.coco_det.COCODetection object at 0x7ff8c5cd3c10>, 'input_dataset': <edgeai_benchmark.datasets.coco_det.COCODetection object at 0x7ff8c5cec290>, 'preprocess': <edgeai_benchmark.preprocess.PreProcessTransforms object at 0x7ff8e630f910>, 'session': <edgeai_benchmark.sessions.onnxrt_session.ONNXRTSession object at 0x7ff8e630f990>, 'postprocess': <edgeai_benchmark.postprocess.PostProcessTransforms object at 0x7ff8b50f5610>, 'metric': {0: 1}, 'model_info': {'metric_reference': {'accuracy_ap[.5:.95]%': 37.4}}}}
Inside Running accuracy
Settings : {'include_files': None, 'pipeline_type': 'accuracy', 'num_frames': 2, 'calibration_frames': 2, 'calibration_iterations': 1, 'configs_path': './configs', 'models_path': './edgeai-modelzoo/models/', 'modelartifacts_path': './work_dirs/modelartifacts/TDA4VM', 'datasets_path': './dependencies/datasets', 'target_device': 'TDA4VM', 'target_machine': 'pc', 'run_suffix': None, 'parallel_devices': [0], 'tensor_bits': 8, 'runtime_options': None, 'run_import': True, 'run_inference': True, 'run_missing': True, 'detection_threshold': 0.3, 'detection_top_k': 300, 'detection_nms_threshold': None, 'detection_keep_top_k': None, 'save_output': True, 'num_output_frames': 50, 'model_selection': None, 'model_shortlist': None, 'model_exclusion': None, 'task_selection': None, 'runtime_selection': None, 'session_type_dict': {'onnx': 'onnxrt', 'tflite': 'tflitert', 'mxnet': 'tvmdlr'}, 'dataset_type_dict': {'imagenet': 'coco'}, 'dataset_selection': None, 'dataset_loading': True, 'config_range': None, 'enable_logging': True, 'verbose': True, 'capture_log': False, 'experimental_models': True, 'rewrite_results': False, 'with_udp': False, 'flip_test': False, 'model_transformation_dict': None, 'report_perfsim': False, 'tidl_offload': True, 'input_optimization': None, 'run_dir_tree_depth': None, 'num_classes': 1, 'basic_keys': ['include_files', 'pipeline_type', 'num_frames', 'calibration_frames', 'calibration_iterations', 'configs_path', 'models_path', 'modelartifacts_path', 'datasets_path', 'target_device', 'target_machine', 'run_suffix', 'parallel_devices', 'tensor_bits', 'runtime_options', 'run_import', 'run_inference', 'run_missing', 'detection_threshold', 'detection_top_k', 'detection_nms_threshold', 'detection_keep_top_k', 'save_output', 'num_output_frames', 'model_selection', 'model_shortlist', 'model_exclusion', 'task_selection', 'runtime_selection', 'session_type_dict', 'dataset_type_dict', 'dataset_selection', 'dataset_loading', 'config_range', 'enable_logging', 'verbose', 'capture_log', 'experimental_models', 'rewrite_results', 'with_udp', 'flip_test', 'model_transformation_dict', 'report_perfsim', 'tidl_offload', 'input_optimization', 'run_dir_tree_depth', 'num_classes'], 'dataset_cache': None}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

I have another model of yolov5m6 trained using the same repo (edgeai-yolov5) from Ti and similar dataset (slightly lesser images). It compiles successfully but most of the models get stuck at the same point. I also tried yolov5s6 and its still the same.


Currently I am stuck at this point. Thanking you in advance for your help

Edit: I noticed that the models trained with ADAM optimizer doesn;t compile whereas with SGD it does. I looked at the weights of the convolution layers and a model trained with ADAM has weights as absolute 0's whereas for SGD the weights are very minimal in order of e-9. Could this be the reason for the models not being able to compile ? 

Thanks and regards

Keerthitheja S C

  • Hi Keerthitheja,

    I do apologize for not getting to your question earlier. Is this still a blocking issue for you?

    As a baseline, have you tried running one of the pretrained models given in the repository? Have you also seen this behavior on the newest SDK (9.2) - if possible could you try this route and see if the issue persists? 

    Best,

    Asha

  • Hi 
           Yes, I still see this issue. The issue is only with the models trained using the Adam optimizer but the compilation happens correctly when trained with SGD. I looked at the pretrained models and they are also trained with SGD. So, they do compile correctly. I haven't tried with SDK 9.2 because our inference code on TDA4VM is written on SDK 8.5. 

    Also I noticed one more issue that if I change the output dimensions of the yolov5 model in the onnx file, the SDK isn't considering the changes. The default dimensions of the output is [300, 6] but i changed the output detections to have the dimensions of [300, 8] but editing the onnx file as below 


    When i compile the above model, the compiled binaries still have [300, 6] dims whereas you can see from the image that the output of the concat node is of dimensions [None, 8]. My question is, is it possible to do the changes this way or will the SDK consider only a specific output dimensions