AM69A: Not able to run model from compiled TIDL artifacts on cloud device

Akhilesh Gangwar

Part Number: AM69A

Hi all,

I have used the docker based setup in linux pc and tried to compile the yolov5l model provided by TI model zoo. I am successfully able to compile the model and also I am able to run the model using TIDL artifacts inside docker based setup for am69a device using my linux pc.

Now I copied the artifact folder onto cloud and tried to run the model using these artifacts on the device am69a using cloud service.

This is the main part of code I am running-

import onnxruntime as rt

onnx_model_path = '/home/root/notebooks/custom_models/yolov5l6_640_ti_lite_47p1_65p6.onnx'
delegate_options = {}
so = rt.SessionOptions()
delegate_options['artifacts_folder'] = '/home/root/notebooks/custom-artifacts/yolov5l/'
delegate_options.update(optional_options)   
EP_list = ['TIDLExecutionProvider','CPUExecutionProvider']
sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[delegate_options, {}], sess_options=so)

input_details = sess.get_inputs()
output_details = sess.get_outputs()

I am getting these errors when I am trying to run the model using onnxrt env-

RuntimeErrorTraceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in __init__(self, path_or_bytes, sess_options, providers, provider_options)
    282         try:
--> 283             self._create_inference_session(providers, provider_options)
    284         except RuntimeError:

/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in _create_inference_session(self, providers, provider_options)
    314         # initialize the C++ InferenceSession
--> 315         sess.initialize_session(providers, provider_options)
    316 

RuntimeError: std::exception

During handling of the above exception, another exception occurred:

AttributeErrorTraceback (most recent call last)
<ipython-input-9-4e829141b539> in <module>
      1 EP_list = ['TIDLExecutionProvider','CPUExecutionProvider']
----> 2 sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[delegate_options, {}], sess_options=so)
      3 
      4 input_details = sess.get_inputs()
      5 output_details = sess.get_outputs()

/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in __init__(self, path_or_bytes, sess_options, providers, provider_options)
    284         except RuntimeError:
    285             if self._enable_fallback:
--> 286                 print("EP Error using {}".format(self._providers))
    287                 print("Falling back to {} and retrying.".format(self._fallback_providers))
    288                 self._create_inference_session(self._fallback_providers, None)

AttributeError: 'InferenceSession' object has no attribute '_providers'

The model I picked is from here https://github.com/TexasInstruments/edgeai-yolov5/blob/master/pretrained_models/models/detection/coco/edgeai-yolov5/yolov5l6_640_ti_lite_47p1_65p6.onnx.link

Thanks

Akhilesh

over 1 year ago

0 Josue Zamitiz-Ayala over 1 year ago

TI__Mastermind 32155 points

Hello Akhilesh,

Due to a regional holiday, half our team is out of office this week. Please expect delays on response this week.
Apologies for the inconvenience and thank you for your patience.

-Josue

0 Anand Pathak over 1 year ago in reply to Josue Zamitiz-Ayala

TI__Genius 9065 points

Hi Akhilesh,

Can you share the release version you are using? Also, please share the logs observed on setting debug_level = 2 on EVM.

Regards,

Anand

0 Akhilesh Gangwar over 1 year ago in reply to Anand Pathak

Intellectual 310 points

Hi Anand,

I am using sdk 09_00_00_06 which is the latest one inside the docker based setup in my linux machine.

And regarding running on cloud, I am seeing kernel dead now every time I am trying to run the model using compiled TIDL artifacts. Not sure but this issue I use to get several time.

Thanks

0 Praveen Rao over 1 year ago in reply to Akhilesh Gangwar

TI__Mastermind 48243 points

Hello Akilesh,

Can you help with the requested logs so that we can analyze the issue you are reporting?

Akhilesh Gangwar said:
I am seeing kernel dead now every time I am trying to run the model using compiled TIDL artifacts. Not sure but this issue I use to get several time.

Again, share the debug terminal logs for this!

Thanks,

0 Akhilesh Gangwar over 1 year ago in reply to Praveen Rao

Intellectual 310 points

Hi,

These are the logs I got in log file.

Internal Error: do_recv() expected MSG_ID 5001, got 0!

Stack trace:
  [bt] (0) /usr/lib/libti_inference_client.so(StackTrace[abi:cxx11](unsigned long, unsigned long)+0x1ed) [0x7fb76bfdc05d]
  [bt] (1) /usr/lib/libti_inference_client.so(send_once(unsigned int, void*, int, progressbar*, progressbar*)+0x838) [0x7fb76bfdbac8]
  [bt] (2) /usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so(+0x8cc9d) [0x7fb76c27ac9d]
  [bt] (3) /usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so(+0xc77f3) [0x7fb76c2b57f3]
  [bt] (4) /usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so(+0x74433) [0x7fb76c262433]
  [bt] (5) /usr/bin/python3(_PyCFunction_FastCallDict+0x35c) [0x566b0c]
  [bt] (6) /usr/bin/python3() [0x50a2c3]
  [bt] (7) /usr/bin/python3(_PyEval_EvalFrameDefault+0x444) [0x50bcb4]
  [bt] (8) /usr/bin/python3() [0x509459]
  [bt] (9) /usr/bin/python3() [0x50a18d]
  [bt] (10) /usr/bin/python3(_PyEval_EvalFrameDefault+0x444) [0x50bcb4]
  [bt] (11) /usr/bin/python3() [0x507a64]
  [bt] (12) /usr/bin/python3(_PyFunction_FastCallDict+0x2e2) [0x508d42]
  [bt] (13) /usr/bin/python3() [0x5946d1]
  [bt] (14) /usr/bin/python3() [0x549cff]
  [bt] (15) /usr/bin/python3() [0x5513f1]

These are the logs printed in jupiter notebook

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in __init__(self, path_or_bytes, sess_options, providers, provider_options)
    282         try:
--> 283             self._create_inference_session(providers, provider_options)
    284         except RuntimeError:

/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in _create_inference_session(self, providers, provider_options)
    314         # initialize the C++ InferenceSession
--> 315         sess.initialize_session(providers, provider_options)
    316 

RuntimeError: std::exception

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-13-0d5c7f313037> in <module>
----> 1 sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[delegate_options, {}], sess_options=so)
      2 
      3 input_details = sess.get_inputs()
      4 output_details = sess.get_outputs()

/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in __init__(self, path_or_bytes, sess_options, providers, provider_options)
    284         except RuntimeError:
    285             if self._enable_fallback:
--> 286                 print("EP Error using {}".format(self._providers))
    287                 print("Falling back to {} and retrying.".format(self._fallback_providers))
    288                 self._create_inference_session(self._fallback_providers, None)

AttributeError: 'InferenceSession' object has no attribute '_providers'

Thanks

0 Anand Pathak over 1 year ago in reply to Akhilesh Gangwar

TI__Genius 9065 points

Hi Akhilesh,

I checked internally and the cloud currently supports SDK 8.6 and is not yet migrated to SDK 9.0, so this error would be expected.

Few workarounds for this in order of recommendation:

(1) If you have a board available locally, you can use that with SDK 9.0 instead of cloud to infer the model by using the artifacts you have currently generated

(2) Compile the model on cloud itself, that would ensure compilation is compatible with inference version supported on the board

(3) Compile the model offline using tools (tidl_tools) obtained from https://github.com/TexasInstruments/edgeai-tidl-tools/releases/tag/08_06_00_03 and try inference the current way, by copying artifacts to cloud.

As a sanity test, would request you to try executing CPU Execution provider only as a first step without TIDL execution provider (as part of the inference session providers setting) to ensure environment works correctly.

Let me know if you observe any issues in trying above.

Regards,

Anand

0 Akhilesh Gangwar over 1 year ago in reply to Anand Pathak

Intellectual 310 points

Thanks Anand. I'll try and will let you know. My apologies for the delay as I was on vacation.

Also, let me know how do I check sdk version on cloud for future reference.

Regards

Akhilesh

0 Akhilesh Gangwar over 1 year ago in reply to Akhilesh Gangwar

Intellectual 310 points

Thanks Anand. My apologies for the delay as I was on vacation.

So here is what I did now -

I had compiled the yolov5l model (downloaded from ti model zoo) with the latest sdk version (9.xxx) inside a docker based setup on ubuntu machine. After compiling, I copied the model artifacts onto the am69a device (having sdk version 9.xx, latest) that I have with me. When I tried to run the model using tidl artifacts, I got this following error -

python3 file.py

args,  Namespace(compile=False, disable_offload=False, run_model_zoo=False, models=[])
Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']

Running_Model :  yolov5l6_640_ti_lite_47p1_65p6

platform.machine() :  aarch64
model config :  {'model_path': '/opt/edgeai-gst-apps/akhilesh/models/yolov5l6_640_ti_lite_47p1_65p6.onnx', 'mean': [0, 0, 0], 'scale': [0.003921568627, 0.003921568627, 0.003921568627], 'num_images': 100, 'num_classes': 91, 'model_type': 'od', 'od_type': 'YoloV5', 'session_name': 'onnxrt', 'framework': 'onnxrt', 'meta_layers_names_list': '/opt/edgeai-gst-apps/akhilesh/models/yolov5l6_640_ti_lite_metaarch.prototxt', 'meta_arch_type': 6}
delegate :  {'artifacts_folder': '/opt/edgeai-gst-apps/akhilesh/model-artifacts/yolov5l6_640_ti_lite_47p1_65p6/', 'platform': 'J7', 'version': '7.2', 'tensor_bits': 8, 'debug_level': 2, 'max_num_subgraphs': 16, 'deny_list': '', 'deny_list:layer_type': '', 'deny_list:layer_name': '', 'model_type': '', 'accuracy_level': 1, 'advanced_options:calibration_frames': 2, 'advanced_options:calibration_iterations': 5, 'advanced_options:output_feature_16bit_names_list': '', 'advanced_options:params_16bit_names_list': '', 'advanced_options:mixed_precision_factor': -1, 'advanced_options:quantization_scale_type': 0, 'advanced_options:high_resolution_optimization': 0, 'advanced_options:pre_batchnorm_fold': 1, 'ti_internal_nc_flag': 1601, 'advanced_options:activation_clipping': 1, 'advanced_options:weight_clipping': 1, 'advanced_options:bias_calibration': 1, 'advanced_options:add_data_convert_ops': 3, 'advanced_options:channel_wise_quantization': 0, 'advanced_options:inference_mode': 0, 'advanced_options:num_cores': 1}
***** executing model *****
model path :  /opt/edgeai-gst-apps/akhilesh/models/yolov5l6_640_ti_lite_47p1_65p6.onnx
libtidl_onnxrt_EP loaded 0x831b680
artifacts_folder                                = /opt/edgeai-gst-apps/akhilesh/model-artifacts/yolov5l6_640_ti_lite_47p1_65p6/
debug_level                                     = 2
target_priority                                 = 0
max_pre_empt_delay                              = 340282346638528859811704183484516925440.000000
Final number of subgraphs created are : 1, - Offloaded Nodes - 459, Total Nodes - 459
In TIDL_createStateInfer
Compute on node : TIDLExecutionProvider_TIDL_0_0
************ in TIDL_subgraphRtCreate ************
 APP: Init ... !!!
MEM: Init ... !!!
MEM: Initialized DMA HEAP (fd=5) !!!
MEM: Init ... Done !!!
IPC: Init ... !!!
IPC: Init ... Done !!!
REMOTE_SERVICE: Init ... !!!
REMOTE_SERVICE: Init ... Done !!!
 87559.288066 s: GTC Frequency = 200 MHz
APP: Init ... Done !!!
 87559.288127 s:  VX_ZONE_INIT:Enabled
 87559.288139 s:  VX_ZONE_ERROR:Enabled
 87559.288146 s:  VX_ZONE_WARNING:Enabled
 87559.289080 s:  VX_ZONE_INIT:[tivxInitLocal:130] Initialization Done !!!
 87559.289952 s:  VX_ZONE_INIT:[tivxHostInitLocal:101] Initialization Done for HOST !!!
 87559.386806 s:  VX_ZONE_ERROR:[ownContextSendCmd:822] Command ack message returned failure cmd_status: -1
 87559.386837 s:  VX_ZONE_ERROR:[ownContextSendCmd:862] tivxEventWait() failed.
 87559.386849 s:  VX_ZONE_ERROR:[ownNodeKernelInit:527] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
 87559.386860 s:  VX_ZONE_ERROR:[ownNodeKernelInit:528] Please be sure the target callbacks have been registered for this core
 87559.386873 s:  VX_ZONE_ERROR:[ownNodeKernelInit:529] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
 87559.386883 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
 87559.386897 s:  VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed
 87559.386906 s:  VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failed
TIDL_RT_OVX: ERROR: Verifying TIDL graph ... Failed !!!
TIDL_RT_OVX: ERROR: Verify OpenVX graph failed
************ TIDL_subgraphRtCreate done ************
created session for model.
height, width, channel, batch, floating_model:  640 640 3 1 True
running session for the model
input image shape :  (1, 3, 640, 640)
 *******   In TIDL_subgraphRtInvoke  ********
 87559.553195 s:  VX_ZONE_ERROR:[ownContextSendCmd:822] Command ack message returned failure cmd_status: -1
 87559.553223 s:  VX_ZONE_ERROR:[ownContextSendCmd:862] tivxEventWait() failed.
 87559.553235 s:  VX_ZONE_ERROR:[ownNodeKernelInit:527] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
 87559.553245 s:  VX_ZONE_ERROR:[ownNodeKernelInit:528] Please be sure the target callbacks have been registered for this core
 87559.553254 s:  VX_ZONE_ERROR:[ownNodeKernelInit:529] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
 87559.553265 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
 87559.553277 s:  VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed
 87559.553286 s:  VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failed
 87559.553402 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:799] graph is not in a state required to be scheduled
 87559.553412 s:  VX_ZONE_ERROR:[vxProcessGraph:734] schedule graph failed
 87559.553420 s:  VX_ZONE_ERROR:[vxProcessGraph:739] wait graph failed
ERROR: Running TIDL graph ... Failed !!!
Sub Graph Stats 2363.000000 1152.000000 18446744056666288.000000
*******  TIDL_subgraphRtInvoke done  ********
infer_time :  0.00429534912109375
bencharking the model
stats :  {'ts:run_start': 24290079386725, 'ts:run_end': 24290083282245, 'ddr:read_start': 0, 'ddr:read_end': 0, 'ddr:write_start': 0, 'ddr:write_end': 0, 'ts:subgraph_detections_copy_in_start': 24290079625030, 'ts:subgraph_detections_copy_in_end': 24290081988070, 'ts:subgraph_detections_proc_start': 24290081988130, 'ts:subgraph_detections_proc_end': 24290083140430, 'ts:subgraph_detections_copy_out_start': 17179869185, 'ts:subgraph_detections_copy_out_end': 136606608}
copy_time, sub_graphs_proc_time, totaltime :  -17040899537 1152300 3895520
height, width, channel, batch, floating_model:  640 640 3 1 True
running session for the model
input image shape :  (1, 3, 640, 640)
*******   In TIDL_subgraphRtInvoke  ********
 87559.687179 s:  VX_ZONE_ERROR:[ownContextSendCmd:822] Command ack message returned failure cmd_status: -1
 87559.687209 s:  VX_ZONE_ERROR:[ownContextSendCmd:862] tivxEventWait() failed.
 87559.687219 s:  VX_ZONE_ERROR:[ownNodeKernelInit:527] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
 87559.687229 s:  VX_ZONE_ERROR:[ownNodeKernelInit:528] Please be sure the target callbacks have been registered for this core
 87559.687238 s:  VX_ZONE_ERROR:[ownNodeKernelInit:529] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
 87559.687248 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
 87559.687261 s:  VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed
 87559.687270 s:  VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failed
 87559.687389 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:799] graph is not in a state required to be scheduled
 87559.687399 s:  VX_ZONE_ERROR:[vxProcessGraph:734] schedule graph failed
 87559.687409 s:  VX_ZONE_ERROR:[vxProcessGraph:739] wait graph failed
ERROR: Running TIDL graph ... Failed !!!
Sub Graph Stats 2316.000000 1177.000000 18446744056666288.000000
*******  TIDL_subgraphRtInvoke done  ********
infer_time :  0.00419163703918457
bencharking the model
stats :  {'ts:run_start': 24290213424755, 'ts:run_end': 24290217261095, 'ddr:read_start': 0, 'ddr:read_end': 0, 'ddr:write_start': 0, 'ddr:write_end': 0, 'ts:subgraph_detections_copy_in_start': 24290213635775, 'ts:subgraph_detections_copy_in_end': 24290215952535, 'ts:subgraph_detections_proc_start': 24290215952595, 'ts:subgraph_detections_proc_end': 24290217130230, 'ts:subgraph_detections_copy_out_start': 17179869185, 'ts:subgraph_detections_copy_out_end': 136606608}
copy_time, sub_graphs_proc_time, totaltime :  -17040945817 1177635 3836340
total_proc_time, sub_graphs_time :   34089.577214 2.329935
Output shape:  1

Saving image to  /opt/edgeai-gst-apps/akhilesh/output_images
out shape :  (640, 640, 3)
Image saved!!!!!


Completed_Model :     1, Name : yolov5l6_640_ti_lite_47p1_65p6                    , Total time :   17044.79, Offload Time :       1.16 , DDR RW MBs : 0, Output File : py_out_yolov5l6_640_ti_lite_47p1_65p6_test.jpg


************ in TIDL_subgraphRtDelete ************
  87559.727427 s:  VX_ZONE_INIT:[tivxHostDeInitLocal:115] De-Initialization Done for HOST !!!
 87559.731878 s:  VX_ZONE_INIT:[tivxDeInitLocal:193] De-Initialization Done !!!
APP: Deinit ... !!!
REMOTE_SERVICE: Deinit ... !!!
REMOTE_SERVICE: Deinit ... Done !!!
IPC: Deinit ... !!!
IPC: DeInit ... Done !!!
MEM: Deinit ... !!!
DDR_SHARED_MEM: Alloc's: 7 alloc's of 89868620 bytes
DDR_SHARED_MEM: Free's : 7 free's  of 89868620 bytes
DDR_SHARED_MEM: Open's : 0 allocs  of 0 bytes
MEM: Deinit ... Done !!!
APP: Deinit ... Done !!!

Also, when I tried to ran the same compiled model inside docker based setup using tidl artifacts, these are the logs - logs_docker_tidl.txt

Since the logs are too big, I am attaching the file.

One more things, I was able to run the model on am69a board without using tidl artifacts.

Also, let me know how do I check sdk version on cloud for future reference.

Regards

Akhilesh

0 Anand Pathak over 1 year ago in reply to Akhilesh Gangwar

TI__Genius 9065 points

Hi Akhilesh,

Thanks for trying this out. Can you share logs with debug_level = 2 enabled for inference? yolov5l is a large model, I am suspecting if we are running out of memory in TIDL inference here. The logs should help get more detailed error msg on this.

Regards,

Anand

0 Akhilesh Gangwar over 1 year ago in reply to Anand Pathak

Intellectual 310 points

Hi Anand, these are debug level 2 logs only. You can see I set that 2 and printed also.

Thanks

Akhilesh

0 Anand Pathak over 1 year ago in reply to Akhilesh Gangwar

TI__Genius 9065 points

Hi Akhilesh,

I am able to run the model at my end with 9.1 tidl_tools in docker used for compilation and corresponding SDK for inference.

Suspecting if there is any other compatibility issue at your end. Can you confirm the models part of the existing python examples in onnxrt_ep.py run fine on your end?

Regards,

Anand

0 Akhilesh Gangwar over 1 year ago in reply to Anand Pathak

Intellectual 310 points

Hi Anand,

Are you running this model on am69a device? I was also able to run inside docker using TIDL artifacts. But I was not able to run inside am69a board using TIDL artifacts.

Other models like yolox-s and yolox-m I was able to run on am69a board as well using TIDL artifacts.

Regards,

Akhilesh

0 Anand Pathak over 1 year ago in reply to Akhilesh Gangwar

TI__Genius 9065 points

Akhilesh, please try this out with 9.1 SDK, hopefully issue would be resolved and we can close this thread.

Regards,

Anand

0 Akhilesh Gangwar over 1 year ago in reply to Anand Pathak

Intellectual 310 points

Sure. Let me try and will let you know.

Regards

Akhilesh

Processors

Processors forum

AM69A: Not able to run model from compiled TIDL artifacts on cloud device