TDA4VM: graph resolve error while compiling onnx model using TIDL

Aniket Patil

Part Number: TDA4VM

Hello,

Background:
I have trained model using repository - https://github.com/TexasInstruments/edgeai-yolov5

No modifications are done to network, only custom data is used to train for few epochs.

Pretrained network at https://github.com/TexasInstruments/edgeai-yolov5/tree/master/pretrained_models/models/yolov5s6_640_ti_lite was used.

After exporting to onnx and trying to compile on TDA4VM I get following error:

RuntimeExceptionTraceback (most recent call last)
<ipython-input-4-bb2e73c005ce> in <module>
     30 so = rt.SessionOptions()
     31 EP_list = ['TIDLCompilationProvider','CPUExecutionProvider']
---> 32 sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[compile_options, {}], sess_options=so)
     33 
     34 input_details = sess.get_inputs()

/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in __init__(self, path_or_bytes, sess_options, providers, provider_options)
    281 
    282         try:
--> 283             self._create_inference_session(providers, provider_options)
    284         except RuntimeError:
    285             if self._enable_fallback:

/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in _create_inference_session(self, providers, provider_options)
    313 
    314         # initialize the C++ InferenceSession
--> 315         sess.initialize_session(providers, provider_options)
    316 
    317         self._sess = sess

RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /home/a0133185/ti/GIT_C7x_MMA_TIDL/c7x-mma-tidl/ti_dl/release/build_cloud/test/onnxruntime/onnxruntime/core/providers/tidl/tidl_execution_provider.cc:170 virtual std::vector<std::unique_ptr<onnxruntime::ComputeCapability> > onnxruntime::TidlExecutionProvider::GetCapability(const onnxruntime::GraphViewer&, const std::vector<const onnxruntime::KernelRegistry*>&) const graph_build.Resolve().IsOK() was false. 

The compilation option used are as below:

compile_options = {
 'tidl_tools_path' : os.environ['TIDL_TOOLS_PATH'],
 'artifacts_folder' : output_dir,
 'tensor_bits' : 16,
 'accuracy_level' : 0,
 'advanced_options:calibration_frames' : len(calib_images), 
 'advanced_options:calibration_iterations' : 3 # used if accuracy_level = 1
}
so = rt.SessionOptions()
EP_list = ['TIDLCompilationProvider','CPUExecutionProvider']
sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[compile_options, {}], sess_options=so)

I am not sure about the error. It says failure during initialization (may be memory issue.)

Can you please check.
Thanks in advance.

over 2 years ago

0 Paula Carrillo over 2 years ago

TI__Mastermind 34950 points

Hi, let me ask you few questions. Which PSDK version are you using in your board? 8.0?. Also, which example are you running?

thank you,

Paula

0 Manisha Agrawal over 2 years ago

TI__Mastermind 22776 points

Hi, while we wait for our experts to respond, please also checkout our webinar on Yolov5. It address model compilation configuration steps, and key cares.

https://training.ti.com/process-efficient-object-detection-using-yolov5-and-tda4x-processors

Regards,

Manisha

0 Aniket Patil over 2 years ago in reply to Paula Carrillo

Prodigy 10 points

Hi, I am using Edge AI cloud so I am not aware of PSDK version. I re-trained Yolov5s6-ti-lite model as explained above, then converted it to onnx using the same repo.

I am using custom-model-onnx.ipynb example of cloud. My onnx model is passed as input and in compile options I have also given path of prototxt.

0 Paula Carrillo over 2 years ago in reply to Aniket Patil

TI__Mastermind 34950 points

Hi Aniket, have you tried our YOLOv5 model from model-zoo? if so, same issue?

If not, you can give a try. You can find it in TI EdgeAI cloud in prebuild-models folder

Eventually, you would probably want to use your own trained YOLOv5, but above check could helps to rule out things,

Another option, mentioned in Yolov5 webinar, is to use TI's edgeai-tidl-tools. You can clone it from TI's github (https://github.com/TexasInstruments/edgeai-tidl-tools). Just FYI, edgeai-tidl-tools offers compilation in your linux machine. Inference can be in an EVM or in your linux machine (host emulation)

thank you,

Paula

0 Paula Carrillo over 2 years ago in reply to Paula Carrillo

TI__Mastermind 34950 points

Hi Aniket, I gave a try to compile YOLOv5 from prebuild-models, and I got a kernel died. It is probably a TIDL tools version mismatch (I will double check with some experts internally). Anyhow, you can give a try to edgeai-tidl-tools from TI's github as mentioned in the webinar.

thank you,

Paula

0 Paula Carrillo over 2 years ago in reply to Paula Carrillo

TI__Mastermind 34950 points

Hi Aniket, In TI EdgeAI Cloud, I was able to compile YOLOv5 using model from model-zoo (which is in cloud's workspace inside prebuild-models folder as shown in above snapshot). Please see attached custom-model notebook. One thing to keep in mind is to use 640x640 images. Please try your model with this notebook and let us know if you still face the same issue.

custom-model-onnx.ipynb

thank you,

Paula

0 Aniket Patil over 2 years ago

Prodigy 10 points

Hi Paula,
As we want to train the model on our dataset, we were trying to use the edge AI github.

We tried the best.pt file provided in weights folder, converted to to onnx using the above edge AI github but still could not compile it on cloud.

Testing only onnx would not help much as we eventually want to train on our dataset.

0 Aniket Patil over 2 years ago in reply to Paula Carrillo

Prodigy 10 points

I still get the same error with the above script.

0 Paula Carrillo over 2 years ago in reply to Aniket Patil

TI__Mastermind 34950 points

Aniket, a question, if you run the model only using ARM (no using onnx Execution Providers) does it works? this could help us as a first debug step. If it still fails in ARM then is something else.

You can take a look to vcls-onnx-arm.ipynb for details on how to only run it in ARM

thank you,

Paula

0 Aniket Patil over 2 years ago in reply to Paula Carrillo

Prodigy 10 points

Hi Paula,

I could run my onnx on ARM using the script mentioned by you above - vcls-onnx-arm.ipynb.

It is very slow, it took ~1700ms for Yolov5s6 640x640.

I then again checked my earlier code, I get error whenever I use TIDLCompilationProvider.

I can share with you my onnx model. Please let me know where to upload.

Thanks.

0 Paula Carrillo over 2 years ago in reply to Aniket Patil

TI__Mastermind 34950 points

Hi Aniket, thanks for sharing your model. After checking your model we think this is related to pytorch 1.10 creating some spurious unconnected nodes. It seems pytorch 1.10 produces multiple subgraphs for the NMS operator. Please move to pytorch 1.7 and re-export the model. This should fix the issue.

thank you,

Paula

0 Aniket Patil over 2 years ago in reply to Paula Carrillo

Prodigy 10 points

Hi Paula, I tried following things:
1. Exported the model in pytorch 1.7 environment, If I use --export-nms flag then I get following error

Traceback (most recent call last):
File "export.py", line 252, in <module>
main(opt)
File "export.py", line 247, in main
run(**vars(opt))
File "export.py", line 200, in run
y_export = nms_export(y)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yolov5/edgeai-yolov5-master/models/common.py", line 322, in forward
return non_max_suppression_export(x[0], conf_thres=self.conf, iou_thres=self.iou, classes=self.classes)
File "/home/yolov5/edgeai-yolov5-master/utils/general.py", line 602, in non_max_suppression_export
conf, j = cls_conf.max(1, keepdim=True)
RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity

To get away with this error I did not use the above flag so that NMS is not included. The export worked but when I run on cloud then again I get kernel dead error. If I only use ARM CPU then its works as before.

2. I thought to train the model itself in pytorch 1.7 environment. This was done but same errors as above.

Please suggest.

Thanks.

0 Paula Carrillo over 2 years ago in reply to Aniket Patil

TI__Mastermind 34950 points

Hi Aniket, please send us again your model trained in pytorch 1.7 so we can take a look

thank you,

Paula

+1 Paula Carrillo over 2 years ago in reply to Paula Carrillo

TI__Mastermind 34950 points

Hi Aniket, thanks for sharing your models. After opening yolov5s6_640_ti_lite_torch17.onnx I see three unconnected outputs. Pytorch 1.9 seems to work OK. Please give a try and let us know your results.

Thank you,

Paula

0 Aniket Patil over 2 years ago in reply to Paula Carrillo

Prodigy 10 points

Hi Paula, thanks, the models shared by you are working.

Because of the holidays, TI E2E™ design support forum responses will be delayed from Dec. 25 through Jan. 2. Thank you for your patience.

Processors

Processors forum

TDA4VM: graph resolve error while compiling onnx model using TIDL