This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM69A: Not able to run model from compiled TIDL artifacts on cloud device

Part Number: AM69A

Hi all,

I have used the docker based setup in linux pc and tried to compile the yolov5l model provided by TI model zoo. I am successfully able to compile the model and also I am able to run the model using TIDL artifacts inside docker based setup for am69a device using my linux pc. 

Now I copied the artifact folder onto cloud and tried to run the model using these artifacts on the device am69a using cloud service. 

This is the main part of code I am running-

import onnxruntime as rt

onnx_model_path = '/home/root/notebooks/custom_models/yolov5l6_640_ti_lite_47p1_65p6.onnx'
delegate_options = {}
so = rt.SessionOptions()
delegate_options['artifacts_folder'] = '/home/root/notebooks/custom-artifacts/yolov5l/'
delegate_options.update(optional_options)   
EP_list = ['TIDLExecutionProvider','CPUExecutionProvider']
sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[delegate_options, {}], sess_options=so)

input_details = sess.get_inputs()
output_details = sess.get_outputs()

I am getting these errors when I am trying to run the model using onnxrt env-

RuntimeErrorTraceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in __init__(self, path_or_bytes, sess_options, providers, provider_options)
    282         try:
--> 283             self._create_inference_session(providers, provider_options)
    284         except RuntimeError:

/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in _create_inference_session(self, providers, provider_options)
    314         # initialize the C++ InferenceSession
--> 315         sess.initialize_session(providers, provider_options)
    316 

RuntimeError: std::exception

During handling of the above exception, another exception occurred:

AttributeErrorTraceback (most recent call last)
<ipython-input-9-4e829141b539> in <module>
      1 EP_list = ['TIDLExecutionProvider','CPUExecutionProvider']
----> 2 sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[delegate_options, {}], sess_options=so)
      3 
      4 input_details = sess.get_inputs()
      5 output_details = sess.get_outputs()

/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in __init__(self, path_or_bytes, sess_options, providers, provider_options)
    284         except RuntimeError:
    285             if self._enable_fallback:
--> 286                 print("EP Error using {}".format(self._providers))
    287                 print("Falling back to {} and retrying.".format(self._fallback_providers))
    288                 self._create_inference_session(self._fallback_providers, None)

AttributeError: 'InferenceSession' object has no attribute '_providers'

The model I picked is from here https://github.com/TexasInstruments/edgeai-yolov5/blob/master/pretrained_models/models/detection/coco/edgeai-yolov5/yolov5l6_640_ti_lite_47p1_65p6.onnx.link

Thanks

Akhilesh

  • Hello Akhilesh,

    Due to a regional holiday, half our team is out of office this week. Please expect delays on response this week.
    Apologies for the inconvenience and thank you for your patience.

    -Josue

  • Hi Akhilesh,

    Can you share the release version you are using? Also, please share the logs observed on setting debug_level = 2 on EVM.

    Regards,

    Anand

  • Hi Anand,

    I am using sdk 09_00_00_06 which is the latest one inside the docker based setup in my linux machine. 

    And regarding running on cloud, I am seeing kernel dead now every time I am trying to run the model using compiled TIDL artifacts. Not sure but this issue I use to get several time. 

    Thanks

  • Hello Akilesh,

    Can you help with the requested logs so that we can analyze the issue you are reporting?

    I am seeing kernel dead now every time I am trying to run the model using compiled TIDL artifacts. Not sure but this issue I use to get several time. 

    Again, share the debug terminal logs for this!

    Thanks,

  • Hi,

    These are the logs I got in log file. 

    Internal Error: do_recv() expected MSG_ID 5001, got 0!
    
    Stack trace:
      [bt] (0) /usr/lib/libti_inference_client.so(StackTrace[abi:cxx11](unsigned long, unsigned long)+0x1ed) [0x7fb76bfdc05d]
      [bt] (1) /usr/lib/libti_inference_client.so(send_once(unsigned int, void*, int, progressbar*, progressbar*)+0x838) [0x7fb76bfdbac8]
      [bt] (2) /usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so(+0x8cc9d) [0x7fb76c27ac9d]
      [bt] (3) /usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so(+0xc77f3) [0x7fb76c2b57f3]
      [bt] (4) /usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so(+0x74433) [0x7fb76c262433]
      [bt] (5) /usr/bin/python3(_PyCFunction_FastCallDict+0x35c) [0x566b0c]
      [bt] (6) /usr/bin/python3() [0x50a2c3]
      [bt] (7) /usr/bin/python3(_PyEval_EvalFrameDefault+0x444) [0x50bcb4]
      [bt] (8) /usr/bin/python3() [0x509459]
      [bt] (9) /usr/bin/python3() [0x50a18d]
      [bt] (10) /usr/bin/python3(_PyEval_EvalFrameDefault+0x444) [0x50bcb4]
      [bt] (11) /usr/bin/python3() [0x507a64]
      [bt] (12) /usr/bin/python3(_PyFunction_FastCallDict+0x2e2) [0x508d42]
      [bt] (13) /usr/bin/python3() [0x5946d1]
      [bt] (14) /usr/bin/python3() [0x549cff]
      [bt] (15) /usr/bin/python3() [0x5513f1]
    
    

    These are the logs printed in jupiter notebook

    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    /usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in __init__(self, path_or_bytes, sess_options, providers, provider_options)
        282         try:
    --> 283             self._create_inference_session(providers, provider_options)
        284         except RuntimeError:
    
    /usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in _create_inference_session(self, providers, provider_options)
        314         # initialize the C++ InferenceSession
    --> 315         sess.initialize_session(providers, provider_options)
        316 
    
    RuntimeError: std::exception
    
    During handling of the above exception, another exception occurred:
    
    AttributeError                            Traceback (most recent call last)
    <ipython-input-13-0d5c7f313037> in <module>
    ----> 1 sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[delegate_options, {}], sess_options=so)
          2 
          3 input_details = sess.get_inputs()
          4 output_details = sess.get_outputs()
    
    /usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in __init__(self, path_or_bytes, sess_options, providers, provider_options)
        284         except RuntimeError:
        285             if self._enable_fallback:
    --> 286                 print("EP Error using {}".format(self._providers))
        287                 print("Falling back to {} and retrying.".format(self._fallback_providers))
        288                 self._create_inference_session(self._fallback_providers, None)
    
    AttributeError: 'InferenceSession' object has no attribute '_providers'

    Thanks

  • Hi Akhilesh,

    I checked internally and the cloud currently supports SDK 8.6 and is not yet migrated to SDK 9.0, so this error would be expected.

    Few workarounds for this in order of recommendation:

    (1) If you have a board available locally, you can use that with SDK 9.0 instead of cloud to infer the model by using the artifacts you have currently generated

    (2) Compile the model on cloud itself, that would ensure compilation is compatible with inference version supported on the board

    (3) Compile the model offline using tools (tidl_tools) obtained from https://github.com/TexasInstruments/edgeai-tidl-tools/releases/tag/08_06_00_03 and try inference the current way, by copying artifacts to cloud.

    As a sanity test, would request you to try executing CPU Execution provider only as a first step without TIDL execution provider (as part of the inference session providers setting) to ensure environment works correctly.

    Let me know if you observe any issues in trying above.

    Regards,

    Anand

  • Thanks Anand. I'll try and will let you know. My apologies for the delay as I was on vacation.

    Also, let me know how do I check sdk version on cloud for future reference. 

    Regards

    Akhilesh

  • Thanks Anand. My apologies for the delay as I was on vacation.

    So here is what I did now -

    I had compiled the yolov5l model (downloaded from ti model zoo) with the latest sdk version (9.xxx) inside a docker based setup on ubuntu machine. After compiling, I copied the model artifacts onto the am69a device (having sdk version 9.xx, latest) that I have with me. When I tried to run the model using tidl artifacts, I got this following error - 

    python3 file.py
    
    args,  Namespace(compile=False, disable_offload=False, run_model_zoo=False, models=[])
    Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    
    Running_Model :  yolov5l6_640_ti_lite_47p1_65p6
    
    platform.machine() :  aarch64
    model config :  {'model_path': '/opt/edgeai-gst-apps/akhilesh/models/yolov5l6_640_ti_lite_47p1_65p6.onnx', 'mean': [0, 0, 0], 'scale': [0.003921568627, 0.003921568627, 0.003921568627], 'num_images': 100, 'num_classes': 91, 'model_type': 'od', 'od_type': 'YoloV5', 'session_name': 'onnxrt', 'framework': 'onnxrt', 'meta_layers_names_list': '/opt/edgeai-gst-apps/akhilesh/models/yolov5l6_640_ti_lite_metaarch.prototxt', 'meta_arch_type': 6}
    delegate :  {'artifacts_folder': '/opt/edgeai-gst-apps/akhilesh/model-artifacts/yolov5l6_640_ti_lite_47p1_65p6/', 'platform': 'J7', 'version': '7.2', 'tensor_bits': 8, 'debug_level': 2, 'max_num_subgraphs': 16, 'deny_list': '', 'deny_list:layer_type': '', 'deny_list:layer_name': '', 'model_type': '', 'accuracy_level': 1, 'advanced_options:calibration_frames': 2, 'advanced_options:calibration_iterations': 5, 'advanced_options:output_feature_16bit_names_list': '', 'advanced_options:params_16bit_names_list': '', 'advanced_options:mixed_precision_factor': -1, 'advanced_options:quantization_scale_type': 0, 'advanced_options:high_resolution_optimization': 0, 'advanced_options:pre_batchnorm_fold': 1, 'ti_internal_nc_flag': 1601, 'advanced_options:activation_clipping': 1, 'advanced_options:weight_clipping': 1, 'advanced_options:bias_calibration': 1, 'advanced_options:add_data_convert_ops': 3, 'advanced_options:channel_wise_quantization': 0, 'advanced_options:inference_mode': 0, 'advanced_options:num_cores': 1}
    ***** executing model *****
    model path :  /opt/edgeai-gst-apps/akhilesh/models/yolov5l6_640_ti_lite_47p1_65p6.onnx
    libtidl_onnxrt_EP loaded 0x831b680
    artifacts_folder                                = /opt/edgeai-gst-apps/akhilesh/model-artifacts/yolov5l6_640_ti_lite_47p1_65p6/
    debug_level                                     = 2
    target_priority                                 = 0
    max_pre_empt_delay                              = 340282346638528859811704183484516925440.000000
    Final number of subgraphs created are : 1, - Offloaded Nodes - 459, Total Nodes - 459
    In TIDL_createStateInfer
    Compute on node : TIDLExecutionProvider_TIDL_0_0
    ************ in TIDL_subgraphRtCreate ************
     APP: Init ... !!!
    MEM: Init ... !!!
    MEM: Initialized DMA HEAP (fd=5) !!!
    MEM: Init ... Done !!!
    IPC: Init ... !!!
    IPC: Init ... Done !!!
    REMOTE_SERVICE: Init ... !!!
    REMOTE_SERVICE: Init ... Done !!!
     87559.288066 s: GTC Frequency = 200 MHz
    APP: Init ... Done !!!
     87559.288127 s:  VX_ZONE_INIT:Enabled
     87559.288139 s:  VX_ZONE_ERROR:Enabled
     87559.288146 s:  VX_ZONE_WARNING:Enabled
     87559.289080 s:  VX_ZONE_INIT:[tivxInitLocal:130] Initialization Done !!!
     87559.289952 s:  VX_ZONE_INIT:[tivxHostInitLocal:101] Initialization Done for HOST !!!
     87559.386806 s:  VX_ZONE_ERROR:[ownContextSendCmd:822] Command ack message returned failure cmd_status: -1
     87559.386837 s:  VX_ZONE_ERROR:[ownContextSendCmd:862] tivxEventWait() failed.
     87559.386849 s:  VX_ZONE_ERROR:[ownNodeKernelInit:527] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
     87559.386860 s:  VX_ZONE_ERROR:[ownNodeKernelInit:528] Please be sure the target callbacks have been registered for this core
     87559.386873 s:  VX_ZONE_ERROR:[ownNodeKernelInit:529] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
     87559.386883 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
     87559.386897 s:  VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed
     87559.386906 s:  VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failed
    TIDL_RT_OVX: ERROR: Verifying TIDL graph ... Failed !!!
    TIDL_RT_OVX: ERROR: Verify OpenVX graph failed
    ************ TIDL_subgraphRtCreate done ************
    created session for model.
    height, width, channel, batch, floating_model:  640 640 3 1 True
    running session for the model
    input image shape :  (1, 3, 640, 640)
     *******   In TIDL_subgraphRtInvoke  ********
     87559.553195 s:  VX_ZONE_ERROR:[ownContextSendCmd:822] Command ack message returned failure cmd_status: -1
     87559.553223 s:  VX_ZONE_ERROR:[ownContextSendCmd:862] tivxEventWait() failed.
     87559.553235 s:  VX_ZONE_ERROR:[ownNodeKernelInit:527] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
     87559.553245 s:  VX_ZONE_ERROR:[ownNodeKernelInit:528] Please be sure the target callbacks have been registered for this core
     87559.553254 s:  VX_ZONE_ERROR:[ownNodeKernelInit:529] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
     87559.553265 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
     87559.553277 s:  VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed
     87559.553286 s:  VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failed
     87559.553402 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:799] graph is not in a state required to be scheduled
     87559.553412 s:  VX_ZONE_ERROR:[vxProcessGraph:734] schedule graph failed
     87559.553420 s:  VX_ZONE_ERROR:[vxProcessGraph:739] wait graph failed
    ERROR: Running TIDL graph ... Failed !!!
    Sub Graph Stats 2363.000000 1152.000000 18446744056666288.000000
    *******  TIDL_subgraphRtInvoke done  ********
    infer_time :  0.00429534912109375
    bencharking the model
    stats :  {'ts:run_start': 24290079386725, 'ts:run_end': 24290083282245, 'ddr:read_start': 0, 'ddr:read_end': 0, 'ddr:write_start': 0, 'ddr:write_end': 0, 'ts:subgraph_detections_copy_in_start': 24290079625030, 'ts:subgraph_detections_copy_in_end': 24290081988070, 'ts:subgraph_detections_proc_start': 24290081988130, 'ts:subgraph_detections_proc_end': 24290083140430, 'ts:subgraph_detections_copy_out_start': 17179869185, 'ts:subgraph_detections_copy_out_end': 136606608}
    copy_time, sub_graphs_proc_time, totaltime :  -17040899537 1152300 3895520
    height, width, channel, batch, floating_model:  640 640 3 1 True
    running session for the model
    input image shape :  (1, 3, 640, 640)
    *******   In TIDL_subgraphRtInvoke  ********
     87559.687179 s:  VX_ZONE_ERROR:[ownContextSendCmd:822] Command ack message returned failure cmd_status: -1
     87559.687209 s:  VX_ZONE_ERROR:[ownContextSendCmd:862] tivxEventWait() failed.
     87559.687219 s:  VX_ZONE_ERROR:[ownNodeKernelInit:527] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
     87559.687229 s:  VX_ZONE_ERROR:[ownNodeKernelInit:528] Please be sure the target callbacks have been registered for this core
     87559.687238 s:  VX_ZONE_ERROR:[ownNodeKernelInit:529] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
     87559.687248 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
     87559.687261 s:  VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed
     87559.687270 s:  VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failed
     87559.687389 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:799] graph is not in a state required to be scheduled
     87559.687399 s:  VX_ZONE_ERROR:[vxProcessGraph:734] schedule graph failed
     87559.687409 s:  VX_ZONE_ERROR:[vxProcessGraph:739] wait graph failed
    ERROR: Running TIDL graph ... Failed !!!
    Sub Graph Stats 2316.000000 1177.000000 18446744056666288.000000
    *******  TIDL_subgraphRtInvoke done  ********
    infer_time :  0.00419163703918457
    bencharking the model
    stats :  {'ts:run_start': 24290213424755, 'ts:run_end': 24290217261095, 'ddr:read_start': 0, 'ddr:read_end': 0, 'ddr:write_start': 0, 'ddr:write_end': 0, 'ts:subgraph_detections_copy_in_start': 24290213635775, 'ts:subgraph_detections_copy_in_end': 24290215952535, 'ts:subgraph_detections_proc_start': 24290215952595, 'ts:subgraph_detections_proc_end': 24290217130230, 'ts:subgraph_detections_copy_out_start': 17179869185, 'ts:subgraph_detections_copy_out_end': 136606608}
    copy_time, sub_graphs_proc_time, totaltime :  -17040945817 1177635 3836340
    total_proc_time, sub_graphs_time :   34089.577214 2.329935
    Output shape:  1
    
    Saving image to  /opt/edgeai-gst-apps/akhilesh/output_images
    out shape :  (640, 640, 3)
    Image saved!!!!!
    
    
    Completed_Model :     1, Name : yolov5l6_640_ti_lite_47p1_65p6                    , Total time :   17044.79, Offload Time :       1.16 , DDR RW MBs : 0, Output File : py_out_yolov5l6_640_ti_lite_47p1_65p6_test.jpg
    
    
    ************ in TIDL_subgraphRtDelete ************
      87559.727427 s:  VX_ZONE_INIT:[tivxHostDeInitLocal:115] De-Initialization Done for HOST !!!
     87559.731878 s:  VX_ZONE_INIT:[tivxDeInitLocal:193] De-Initialization Done !!!
    APP: Deinit ... !!!
    REMOTE_SERVICE: Deinit ... !!!
    REMOTE_SERVICE: Deinit ... Done !!!
    IPC: Deinit ... !!!
    IPC: DeInit ... Done !!!
    MEM: Deinit ... !!!
    DDR_SHARED_MEM: Alloc's: 7 alloc's of 89868620 bytes
    DDR_SHARED_MEM: Free's : 7 free's  of 89868620 bytes
    DDR_SHARED_MEM: Open's : 0 allocs  of 0 bytes
    MEM: Deinit ... Done !!!
    APP: Deinit ... Done !!!
    

    Also, when I tried to ran the same compiled model inside docker based setup using tidl artifacts, these are the logs - logs_docker_tidl.txt

    Since the logs are too big, I am attaching the file.

    One more things, I was able to run the model on am69a board without using tidl artifacts. 

    Also, let me know how do I check sdk version on cloud for future reference. 

    Regards

    Akhilesh

  • Hi Akhilesh,

    Thanks for trying this out. Can you share logs with debug_level = 2 enabled for inference? yolov5l is a large model, I am suspecting if we are running out of memory in TIDL inference here. The logs should help get more detailed error msg on this.

    Regards,

    Anand 

  • Hi Anand, these are debug level 2 logs only. You can see I set that 2 and printed also.

    Thanks

    Akhilesh

  • Hi Akhilesh,

    I am able to run the model at my end with 9.1 tidl_tools in docker used for compilation and corresponding SDK for inference.

    Suspecting if there is any other compatibility issue at your end. Can you confirm the models part of the existing python examples in onnxrt_ep.py run fine on your end?

    Regards,

    Anand

  • Hi Anand, 

    Are you running this model on am69a device? I was also able to run inside docker using TIDL artifacts. But I was not able to run inside am69a board using TIDL artifacts.

    Other models like yolox-s and yolox-m I was able to run on am69a board as well using TIDL artifacts.

    Regards,

    Akhilesh

  • Akhilesh, please try this out with 9.1 SDK, hopefully issue would be resolved and we can close this thread.

    Regards,

    Anand 

  • Sure. Let me try and will let you know.

    Regards

    Akhilesh