This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: graph resolve error while compiling onnx model using TIDL

Part Number: TDA4VM


Hello,

Background:
I have trained model using repository - https://github.com/TexasInstruments/edgeai-yolov5

No modifications are done to network, only custom data is used to train for few epochs.

Pretrained network at https://github.com/TexasInstruments/edgeai-yolov5/tree/master/pretrained_models/models/yolov5s6_640_ti_lite was used.

After exporting to onnx and trying to compile on TDA4VM I get following error:

RuntimeExceptionTraceback (most recent call last)
<ipython-input-4-bb2e73c005ce> in <module>
     30 so = rt.SessionOptions()
     31 EP_list = ['TIDLCompilationProvider','CPUExecutionProvider']
---> 32 sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[compile_options, {}], sess_options=so)
     33 
     34 input_details = sess.get_inputs()

/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in __init__(self, path_or_bytes, sess_options, providers, provider_options)
    281 
    282         try:
--> 283             self._create_inference_session(providers, provider_options)
    284         except RuntimeError:
    285             if self._enable_fallback:

/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in _create_inference_session(self, providers, provider_options)
    313 
    314         # initialize the C++ InferenceSession
--> 315         sess.initialize_session(providers, provider_options)
    316 
    317         self._sess = sess

RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /home/a0133185/ti/GIT_C7x_MMA_TIDL/c7x-mma-tidl/ti_dl/release/build_cloud/test/onnxruntime/onnxruntime/core/providers/tidl/tidl_execution_provider.cc:170 virtual std::vector<std::unique_ptr<onnxruntime::ComputeCapability> > onnxruntime::TidlExecutionProvider::GetCapability(const onnxruntime::GraphViewer&, const std::vector<const onnxruntime::KernelRegistry*>&) const graph_build.Resolve().IsOK() was false. 

The compilation option used are as below:

compile_options = {
'tidl_tools_path' : os.environ['TIDL_TOOLS_PATH'],
'artifacts_folder' : output_dir,
'tensor_bits' : 16,
'accuracy_level' : 0,
'advanced_options:calibration_frames' : len(calib_images),
'advanced_options:calibration_iterations' : 3 # used if accuracy_level = 1
}
so = rt.SessionOptions()
EP_list = ['TIDLCompilationProvider','CPUExecutionProvider']
sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[compile_options, {}], sess_options=so)

I am not sure about the error. It says failure during initialization (may be memory issue.)

Can you please check.
Thanks in advance.
  • Hi, let me ask you few questions. Which PSDK version are you using in your board? 8.0?. Also, which example are you running? 

    thank you,

    Paula

  • Hi, while we wait for our experts to respond, please also checkout our webinar on Yolov5. It address model compilation configuration steps, and key cares.

    https://training.ti.com/process-efficient-object-detection-using-yolov5-and-tda4x-processors

    Regards,

    Manisha

  • Hi, I am using Edge AI cloud so I am not aware of PSDK version. I re-trained Yolov5s6-ti-lite model as explained above, then converted it to onnx using the same repo.

    I am using custom-model-onnx.ipynb example of cloud. My onnx model is passed as input and in compile options I have also given path of prototxt.

  • Hi Aniket, have you tried our YOLOv5 model from model-zoo? if so, same issue?

    If not, you can give a try. You can find it in TI EdgeAI cloud in prebuild-models folder

    Eventually, you would probably want to use your own trained YOLOv5, but above check could helps to rule out things,

    Another option, mentioned in Yolov5 webinar, is to use TI's edgeai-tidl-tools. You can clone it from TI's github (https://github.com/TexasInstruments/edgeai-tidl-tools). Just FYI, edgeai-tidl-tools offers compilation in your linux machine. Inference can be in an EVM or in your linux machine (host emulation)

    thank you,

    Paula

  • Hi Aniket, I gave a try to compile YOLOv5 from prebuild-models, and I got a kernel died. It is probably a TIDL tools version mismatch (I will double check with some experts internally). Anyhow, you can give a try to edgeai-tidl-tools from TI's github as mentioned in the webinar.

    thank you,

    Paula

  • Hi Aniket, In TI EdgeAI Cloud, I was able to compile YOLOv5 using model from model-zoo (which is in cloud's workspace inside prebuild-models folder as shown in above snapshot). Please see attached custom-model notebook. One thing to keep in mind is to use 640x640 images. Please try your model with this notebook and let us know if you still face the same issue. 

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/custom_2D00_model_2D00_onnx.ipynb

    thank you,

    Paula

  • Hi Paula,
    As we want to train the model on our dataset, we were trying to use the edge AI github.

    We tried the best.pt file provided in weights folder, converted to to onnx using the above edge AI github but still could not compile it on cloud.

    Testing only onnx would not help much as we eventually want to train on our dataset.

  • I still get the same error with the above script.

  • Aniket, a question, if you run the model only using ARM (no using onnx Execution Providers) does it works? this could help us as a first debug step. If it still fails in ARM then is something else. 

    You can take a look to vcls-onnx-arm.ipynb for details on how to only run it in ARM

    thank you,

    Paula

  • Hi Paula,

    I could run my onnx on ARM using the script mentioned by you above - vcls-onnx-arm.ipynb. 

    It is very slow, it took ~1700ms for Yolov5s6 640x640.

    I then again checked my earlier code, I get error whenever I use TIDLCompilationProvider.

    I can share with you my onnx model. Please let me know where to upload.

    Thanks.

  • Hi Aniket, thanks for sharing your model. After checking your model we think this is related to pytorch 1.10 creating some spurious unconnected nodes. It seems pytorch 1.10 produces multiple subgraphs for the NMS operator. Please move to pytorch 1.7 and re-export the model. This should fix the issue.

    thank you,

    Paula

  • Hi Paula, I tried following things:
    1. Exported the model in pytorch 1.7 environment, If I use --export-nms flag then I get following error

    Traceback (most recent call last):
    File "export.py", line 252, in <module>
    main(opt)
    File "export.py", line 247, in main
    run(**vars(opt))
    File "export.py", line 200, in run
    y_export = nms_export(y)
    File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/home/yolov5/edgeai-yolov5-master/models/common.py", line 322, in forward
    return non_max_suppression_export(x[0], conf_thres=self.conf, iou_thres=self.iou, classes=self.classes)
    File "/home/yolov5/edgeai-yolov5-master/utils/general.py", line 602, in non_max_suppression_export
    conf, j = cls_conf.max(1, keepdim=True)
    RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity

    To get away with this error I did not use the above flag so that NMS is not included. The export worked but when I run on cloud then again I get kernel dead error. If I only use ARM CPU then its works as before.

    2. I thought to train the model itself in pytorch 1.7 environment. This was done but same errors as above.

    Please suggest.

    Thanks.

  • Hi Aniket, please send us again your model trained in pytorch 1.7 so we can take a look

    thank you,

    Paula

  • Hi Aniket, thanks for sharing your models. After opening yolov5s6_640_ti_lite_torch17.onnx I see three unconnected outputs. Pytorch 1.9 seems to work OK. Please give a try and let us know your results.

    Thank you,

    Paula

  • Hi Paula, thanks, the models shared by you are working.