This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: Custom model creation problem (edge-ai tidl-tools)

Part Number: TDA4VM

Hello,
I have TDA4VM Jacinto J7 EVM kit. I trained a custom model following the github.com/.../edgeai-yolov5 repo. I started the custom training with "python3 train.py --data data.yaml --cfg yolov5l6.yaml --weights 'yolov5l6.pt' --batch-size 40" Then apply the onnx conversion with " python3 export.py --weights run/exp5/weights/best.pt --img 640 --batch 1 --simplify --export-nms --opset 11" as a result I copied the .prototxt and .onnx files. I copied it to the "models" folder created using the "">github.com/.../edgeai-tidl-tools" repository and made the following changes. But the creation of model artifacts is interrupted. What could be the problem?

I shared the terminal log file.

tidl tools repo installation

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
sefau18@ubuntu:~$ git clone https://github.com/TexasInstruments/edgeai-tidl-tools.git
Cloning into 'edgeai-tidl-tools'...
remote: Enumerating objects: 2166, done.
remote: Counting objects: 100% (460/460), done.
remote: Compressing objects: 100% (138/138), done.
remote: Total 2166 (delta 349), reused 380 (delta 306), pack-reused 1706
Receiving objects: 100% (2166/2166), 10.48 MiB | 3.40 MiB/s, done.
Resolving deltas: 100% (1335/1335), done.
sefau18@ubuntu:~$ export DEVICE=j7
sefau18@ubuntu:~$ cd edgeai-tidl-tools
sefau18@ubuntu:~/edgeai-tidl-tools$ pip3 install -r requirements_pc.txt
Defaulting to user installation because normal site-packages is not writeable
Collecting git+https://github.com/kumardesappan/caffe2onnx (from -r requirements_pc.txt (line 12))
Cloning https://github.com/kumardesappan/caffe2onnx to /tmp/pip-req-build-96mf9b4d
Running command git clone --filter=blob:none -q https://github.com/kumardesappan/caffe2onnx /tmp/pip-req-build-96mf9b4d
Resolved https://github.com/kumardesappan/caffe2onnx to commit b7e73feed3bbc5ddbdf25b87af93a2bae596055d
Preparing metadata (setup.py) ... done
Collecting dlr==1.10.0
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

../edgeai-tidl-tools/examples/osrt_python/common_utils.py add

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
'best' : {
'model_path' : os.path.join('/home/sefau18/edgeai-tidl-tools/models/public/best.onnx'),
'mean': [0, 0, 0],
'std' : [0.003921568627,0.003921568627,0.003921568627],
'num_images' : numImages,
'num_classes': 36,
'model_type': 'od',
'od_type' : 'YoloV5',
'framework' : '',
'meta_layers_names_list' : os.path.join('/home/sefau18/edgeai-tidl-tools/models/public/best.prototxt'),
'session_name' : 'onnxrt' ,
'meta_arch_type' : 6
},
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

../edgeai-tidl-tools/examples/osrt_python/ort/onnxrt_ep.py edit

from

Fullscreen
1
models = ['cl-ort-resnet18-v1', 'cl-ort-caffe_squeezenet_v1_1', 'ss-ort-deeplabv3lite_mobilenetv2', 'od-ort-ssd-lite_mobilenetv2_fpn']
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

to

Fullscreen
1
models = ['best']
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

and run

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
sefau18@ubuntu:~/edgeai-tidl-tools$ cd examples/osrt_python/ort
sefau18@ubuntu:~/edgeai-tidl-tools/examples/osrt_python/ort$ python3 onnxrt_ep.py -c
Available execution providers : ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
Running 1 Models - ['best']
Running_Model : best
TIDL Meta PipeLine (Proto) File : /home/sefau18/edgeai-tidl-tools/models/public/best.prototxt
yolo_v3
yolo_v3
Number of OD backbone nodes = 0
Size of odBackboneNodeIds = 0
Preliminary subgraphs created = 0
Final number of subgraphs created are : 0, - Offloaded Nodes - 0, Total Nodes - 1
TIDL Meta PipeLine (Proto) File : /home/sefau18/edgeai-tidl-tools/models/public/best.prototxt
yolo_v3
yolo_v3
Number of OD backbone nodes = 0
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

But the creation of model artifacts is interrupted. What could be the problem?

Thanks in advance

  • Hi Sefa,

    Can you share the error that you get?

  • Can you please try yolov5s6

    You can first try the yolov5s6 model that we have shared and after that you can try the model that you will train.

    Just trying to understand if this crash is due to memory shortage, because yolov5l6 is a large model.

  • Adding "onnx shape inference" part in onnxrt_ep.py file fixed the problem. Now the artifacts folder is created.

    Fullscreen
    1
    2
    3
    4
    #onnx shape inference
    if not os.path.isfile(os.path.join(models_base_path, model + '_shape.onnx')):
    print("Writing model with shapes after running onnx shape inference -- ", os.path.join(models_base_path, model + '_shape.onnx'))
    onnx.shape_inference.infer_shapes_path(config['model_path'], config['model_path'])#os.path.join(models_base_path, model + '_shape.onnx'))
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    sefau18@ubuntu:~/edgeai-tidl-tools/examples/osrt_python/ort$ python3 onnxrt_ep.py -c
    Available execution providers : ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    Running 1 Models - ['best']
    Running_Model : best
    Writing model with shapes after running onnx shape inference -- ../../../models/public/best_shape.onnx
    TIDL Meta PipeLine (Proto) File : /home/sefau18/edgeai-tidl-tools/models/public/best.prototxt
    yolo_v3
    yolo_v3
    Number of OD backbone nodes = 0
    Size of odBackboneNodeIds = 0
    Preliminary subgraphs created = 0
    Final number of subgraphs created are : 0, - Offloaded Nodes - 0, Total Nodes - 1
    TIDL Meta PipeLine (Proto) File : /home/sefau18/edgeai-tidl-tools/models/public/best.prototxt
    yolo_v3
    yolo_v3
    Number of OD backbone nodes = 0
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    best/
    ├── allowedNode.txt
    ├── best.onnx
    ├── detections_tidl_io_1.bin
    ├── detections_tidl_net.bin
    ├── onnxrtMetaData.txt
    ├── param.yaml
    └── tempDir
    ├── detections_calib_raw_data.bin
    ├── detections_tidl_io_1.bin
    ├── detections_tidl_io__LayerPerChannelMean.bin
    ├── detections_tidl_io_.perf_sim_config.txt
    ├── detections_tidl_io_.qunat_stats_config.txt
    ├── detections_tidl_io__stats_tool_out.bin
    ├── detections_tidl_net
    │ ├── bufinfolog.csv
    │ ├── bufinfolog.txt
    │ └── perfSimInfo.bin
    ├── detections_tidl_net.bin
    ├── detections_tidl_net.bin.layer_info.txt
    ├── detections_tidl_net.bin_netLog.txt
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    https://software-dl.ti.com/jacinto7/esd/processor-sdk-linux-sk-tda4vm/latest/exports/docs/inference_models.html

    As mentioned in the above document, I manually create the folder structure for inference using edgeai apps python on TDA4VM.

    Then I copy the folder I created to TDA4M.

    Then I edit the onject_detection.yaml file on TDA4VM.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    title: "YoloV5 Object Detection Test"
    log_level: 2
    inputs:
    input0:
    source: /dev/video2
    format: jpeg
    width: 1280
    height: 720
    framerate: 30
    input1:
    source: /opt/edge_ai_apps/data/videos/video_0000_h264.mp4
    format: h264_sw
    width: 1280
    height: 720
    framerate: 30
    loop: True
    input2:
    source: /dev/video18
    width: 1936
    height: 1100
    format: rggb12
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Here is the error I got on TDA4VM.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    root@j7-evm:/opt/edge_ai_apps# ./init_script.sh
    IMX390 Camera 0 detected
    device = /dev/video18
    name = imx390 10-0021
    format = [fmt:SRGGB12_1X12/1936x1100 field: none]
    subdev_id = /dev/v4l-subdev7
    isp_required = yes
    ldc_required = yes
    root@j7-evm:/opt/edge_ai_apps# cd apps_python/
    root@j7-evm:/opt/edge_ai_apps/apps_python# ./app_edgeai.py ../configs/object_detection.yaml
    libtidl_onnxrt_EP loaded 0x2a7d53c0
    Final number of subgraphs created are : 1, - Offloaded Nodes - 434, Total Nodes - 434
    2022-06-15 05:46:45.764130779 [E:onnxruntime:, inference_session.cc:1310 operator()] Exception during initialization: /usr/src/debug/onnxruntime/1.7.0-r0_psdkla_4/git/include/onnxruntime/core/graph/graph.h:1299 onnxruntime::Node* onnxruntime::Graph::NodeAtIndexImpl(onnxruntime::NodeIndex) const node_index < nodes_.size() was false. Validating no unexpected access using an invalid node_index. Got:28271 Max:1
    Traceback (most recent call last):
    File "./app_edgeai.py", line 71, in <module>
    main(sys.argv)
    File "./app_edgeai.py", line 45, in main
    demo = EdgeAIDemo(config)
    File "/opt/edge_ai_apps/apps_python/edge_ai_class.py", line 74, in __init__
    model_obj = config_parser.Model(model_config)
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    I'm not sure I'm following the right steps.

    Thank you for your answers

  • Does the model inference (using the compiled artifacts) work when using the script provided in edgeai-tidl-tools?

  • As i suggested earlier please try yolov5s6 and make sure it works.

  • https://github.com/TexasInstruments/edgeai-yolov5/tree/master/pretrained_models/models/yolov5s6_640_ti_lite/weights

    If I download the "best.pt" file from the above link and do exactly the same, I get the same error.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    root@j7-evm:/opt/edge_ai_apps/apps_python# ./app_edgeai.py ../configs/object_detection.yaml
    libtidl_onnxrt_EP loaded 0x1973190
    Final number of subgraphs created are : 1, - Offloaded Nodes - 298, Total Nodes - 298
    2022-06-15 07:30:20.151819751 [E:onnxruntime:, inference_session.cc:1310 operator()] Exception during initialization: /usr/src/debug/onnxruntime/1.7.0-r0_psdkla_4/git/include/onnxruntime/core/graph/graph.h:1299 onnxruntime::Node* onnxruntime::Graph::NodeAtIndexImpl(onnxruntime::NodeIndex) const node_index < nodes_.size() was false. Validating no unexpected access using an invalid node_index. Got:28271 Max:1
    Traceback (most recent call last):
    File "./app_edgeai.py", line 71, in <module>
    main(sys.argv)
    File "./app_edgeai.py", line 45, in main
    demo = EdgeAIDemo(config)
    File "/opt/edge_ai_apps/apps_python/edge_ai_class.py", line 74, in __init__
    model_obj = config_parser.Model(model_config)
    File "/opt/edge_ai_apps/apps_python/config_parser.py", line 136, in __init__
    self.run_time = RunTime(self)
    File "/opt/edge_ai_apps/apps_python/run_times.py", line 109, in __init__
    self.interpreter = onnxruntime.InferenceSession(params.model_path,\
    File "/usr/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 283, in __init__
    self._create_inference_session(providers, provider_options)
    File "/usr/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 315, in _create_inference_session
    sess.initialize_session(providers, provider_options)
    onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /usr/src/debug/onnxruntime/1.7.0-r0_psdkla_4/git/include/onnxruntime/core/graph/graph.h:1299 onnxruntime::Node* onnxruntime::Graph::NodeAtIndexImpl(onnxruntime::NodeIndex) const node_index < nodes_.size() was false. Validating no unexpected access using an invalid node_index. Got:28271 Max:1
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    https://github.com/TexasInstruments/edgeai-yolov5/blob/master/pretrained_models/modelartifacts/8bits/od-8100_onnxrt_coco_edgeai-yolov5_yolov5s6_640_ti_lite_37p4_56p0_onnx.tar.gz.link

    But I have no problem running the previously compiled model on TDA4VM. Everything is working as it should and I can see the inference results on the TDA4M screen.

  • With my custom model, I can infer with the following code.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    sefau18@ubuntu:~/edgeai-tidl-tools/examples/osrt_python/ort$ python3 onnxrt_ep.py -d
    Available execution providers : ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    Running 1 Models - ['best']
    Running_Model : best
    Writing model with shapes after running onnx shape inference -- ../../../models/public/best_shape.onnx
    Saving image to ../../../output_images/
    Completed_Model : 1, Name : best , Total time : 459.70, Offload Time : 0.00 , DDR RW MBs : 0, Output File : py_out_best_ADE_val_00001801.jpg
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    py_out_best_ADE_val_00001801.jpg

    Likewise, the output of the yolov5s6 model, which I compiled myself, works just as well.

    py_out_yolov5s6_ADE_val_00001801.jpg

  • shape_inference is required for onnx models - for TIDL to be able to compile them correctly.

    Has your issue been resolved?

  • Inference works fine on PC. But if I copy the folders to the EVM and run the script mentioned in the repo, I get an error.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    Available execution providers : ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    Running 1 Models - ['best']
    Running_Model : best
    libtidl_onnxrt_EP loaded 0x2d61e220
    ******** WARNING ******* : Could not open ../../../model-artifacts//best//allowedNode.txt for reading... Entire model will run on ARM without any delegation to TIDL !
    Final number of subgraphs created are : 1, - Offloaded Nodes - 0, Total Nodes - 0
    ******** WARNING ******* : Could not open ../../../model-artifacts//best//allowedNode.txt for reading... Entire model will run on ARM without any delegation to TIDL !
    Final number of subgraphs created are : 1, - Offloaded Nodes - 0, Total Nodes - 0
    ******** WARNING ******* : Could not open ../../../model-artifacts//best//allowedNode.txt for reading... Entire model will run on ARM without any delegation to TIDL !
    Final number of subgraphs created are : 1, - Offloaded Nodes - 0, Total Nodes - 65535
    ./scripts/run_python_examples.sh: line 23: 3351 Killed python3 onnxrt_ep.py
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

  • If I do the same operations with the pre-trained yolov5s6 model, I get a different error.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    Available execution providers : ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    Running 1 Models - ['yolov5s6']
    Running_Model : yolov5s6
    Traceback (most recent call last):
    File "onnxrt_ep.py", line 251, in <module>
    run_model(model, mIdx)
    File "onnxrt_ep.py", line 168, in run_model
    sess = rt.InferenceSession(config['model_path'] ,providers=EP_list, provider_options=[delegate_options, {}], sess_options=so)
    File "/usr/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 283, in __init__
    self._create_inference_session(providers, provider_options)
    File "/usr/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 310, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
    onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from /opt/edgeai-tidl-tools/model-artifacts/yolov5s6/model/yolov5s6.onnx failed:/usr/src/debug/onnxruntime/1.7.0-r0_psdkla_4/git/onnxruntime/core/graph/model.cc:101 onnxruntime::Model::Model(onnx::ModelProto&&, const PathString&, const IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&) ModelProto does not have a graph.
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

  • Hi ,

    Looks like model-artifacts are not copied to expected path. Before concluding anything,  for a sanity check of the setup could you please run the default model examples in the python by reverting back the models to 

    models = ['cl-ort-resnet18-v1', 'cl-ort-caffe_squeezenet_v1_1', 'ss-ort-deeplabv3lite_mobilenetv2', 'od-ort-ssd-lite_mobilenetv2_fp']

  • İşte orijinal sonuçlar. Çıkarım sonuçları "output_images" klasörüne eklenmiş gibi görünüyor.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    APP: Deinit ... !!!
    REMOTE_SERVICE: Deinit ... !!!
    REMOTE_SERVICE: Deinit ... Done !!!
    IPC: Deinit ... !!!
    IPC: DeInit ... Done !!!
    MEM: Deinit ... !!!
    MEM: Alloc's: 10 alloc's of 18610376 bytes
    MEM: Free's : 10 free's of 18610376 bytes
    MEM: Open's : 0 allocs of 0 bytes
    MEM: Deinit ... Done !!!
    APP: Deinit ... Done !!!
    Available execution providers : ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    Running 4 Models - ['cl-ort-resnet18-v1', 'cl-ort-caffe_squeezenet_v1_1', 'ss-ort-deeplabv3lite_mobilenetv2', 'od-ort-ssd-lite_mobilenetv2_fpn']
    Running_Model : cl-ort-resnet18-v1
    2022-06-22 13:57:50.723167216 [W:onnxruntime:, graph.cc:3106 CleanUnusedInitializers] Removing initializer 'layer4.0.downsample.1.num_batches_tracked'. It is not used by any node and should be removed from the model.
    2022-06-22 13:57:50.723244407 [W:onnxruntime:, graph.cc:3106 CleanUnusedInitializers] Removing initializer 'layer4.0.bn2.num_batches_tracked'. It is not used by any node and should be removed from the model.
    2022-06-22 13:57:50.723301998 [W:onnxruntime:, graph.cc:3106 CleanUnusedInitializers] Removing initializer 'layer4.1.bn2.num_batches_tracked'. It is not used by any node and should be removed from the model.
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

  • Hi,

    Thanks. Looks like setup is good. 

    Now let us figure out the error : " Could not open ../../../model-artifacts//best//allowedNode.tx"

    could you verify these file  exist :

    1)model-artifacts//best//allowedNode.txt

    2)the model exist at models/public/best_shape.onnx

    Also jus to avoid confusion can you please update the dictionary in /edgeai-tidl-tools/examples/osrt_python/common_utils.py

    'best' : {
    'model_path' : os.path.join('/home/sefau18/edgeai-tidl-tools/models/public/best.onnx'),

    'model_path' : os.path.join(models_base_path, 'best.onnx')
    'mean': [0, 0, 0],
    .
    .

    }

  • Here is the error I got after making the change you mentioned.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    Running 1 Models - ['best']
    Running_Model : best
    libtidl_onnxrt_EP loaded 0x39b3120
    Final number of subgraphs created are : 1, - Offloaded Nodes - 434, Total Nodes - 434
    2022-06-23 06:37:14.949809627 [E:onnxruntime:, inference_session.cc:1310 operator()] Exception during initialization: /usr/src/debug/onnxruntime/1.7.0-r0_psdkla_4/git/include/onnxruntime/core/graph/graph.h:1299 onnxruntime::Node* onnxruntime::Graph::NodeAtIndexImpl(onnxruntime::NodeIndex) const node_index < nodes_.size() was false. Validating no unexpected access using an invalid node_index. Got:28271 Max:1
    Traceback (most recent call last):
    File "onnxrt_ep.py", line 251, in <module>
    run_model(model, mIdx)
    File "onnxrt_ep.py", line 168, in run_model
    sess = rt.InferenceSession(config['model_path'] ,providers=EP_list, provider_options=[delegate_options, {}], sess_options=so)
    File "/usr/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 283, in __init__
    self._create_inference_session(providers, provider_options)
    File "/usr/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 315, in _create_inference_session
    sess.initialize_session(providers, provider_options)
    onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /usr/src/debug/onnxruntime/1.7.0-r0_psdkla_4/git/include/onnxruntime/core/graph/graph.h:1299 onnxruntime::Node* onnxruntime::Graph::NodeAtIndexImpl(onnxruntime::NodeIndex) const node_index < nodes_.size() was false. Validating no unexpected access using an invalid node_index. Got:28271 Max:1
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

  • Hi , 

    Could you run the model without delegate mode. By running python3 onnxrt_ep.py -d.

  • Hi Muhammed,
    If I do as you say, the code freezes and gives no results. However, there seems to be no problem with the original models.
    models = ['best']

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    root@j7-evm:/opt/edgeai-tidl-tools/examples/osrt_python/ort# python3 onnxrt_ep.py -d
    Available execution providers : ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    Running 1 Models - ['best']
    Running_Model : best
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    models = ['cl-ort-resnet18-v1', 'cl-ort-caffe_squeezenet_v1_1', 'ss-ort-deeplabv3lite_mobilenetv2', 'od-ort-ssd-lite_mobilenetv2_fpn']

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    oot@j7-evm:/opt/edgeai-tidl-tools/examples/osrt_python/ort# python3 onnxrt_ep.py -d
    Available execution providers : ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    Running 4 Models - ['cl-ort-resnet18-v1', 'cl-ort-caffe_squeezenet_v1_1', 'ss-ort-deeplabv3lite_mobilenetv2', 'od-ort-ssd-lite_mobilenetv2_fpn']
    Running_Model : cl-ort-resnet18-v1
    2022-06-24 06:48:00.375196036 [W:onnxruntime:, graph.cc:3106 CleanUnusedInitializers] Removing initializer 'layer4.0.downsample.1.num_batches_tracked'. It is not used by any node and should be removed from the model.
    2022-06-24 06:48:00.375263804 [W:onnxruntime:, graph.cc:3106 CleanUnusedInitializers] Removing initializer 'layer4.0.bn2.num_batches_tracked'. It is not used by any node and should be removed from the model.
    2022-06-24 06:48:00.375909706 [W:onnxruntime:, graph.cc:3106 CleanUnusedInitializers] Removing initializer 'layer4.1.bn2.num_batches_tracked'. It is not used by any node and should be removed from the model.
    2022-06-24 06:48:00.375963148 [W:onnxruntime:, graph.cc:3106 CleanUnusedInitializers] Removing initializer 'layer2.0.bn1.num_batches_tracked'. It is not used by any node and should be removed from the model.
    2022-06-24 06:48:00.376396804 [W:onnxruntime:, graph.cc:3106 CleanUnusedInitializers] Removing initializer 'layer1.1.bn2.num_batches_tracked'. It is not used by any node and should be removed from the model.
    2022-06-24 06:48:00.376982218 [W:onnxruntime:, graph.cc:3106 CleanUnusedInitializers] Removing initializer 'layer3.0.bn2.num_batches_tracked'. It is not used by any node and should be removed from the model.
    2022-06-24 06:48:00.377027725 [W:onnxruntime:, graph.cc:3106 CleanUnusedInitializers] Removing initializer 'layer1.0.bn2.num_batches_tracked'. It is not used by any node and should be removed from the model.
    2022-06-24 06:48:00.377247281 [W:onnxruntime:, graph.cc:3106 CleanUnusedInitializers] Removing initializer 'layer1.1.bn1.num_batches_tracked'. It is not used by any node and should be removed from the model.
    2022-06-24 06:48:00.377464381 [W:onnxruntime:, graph.cc:3106 CleanUnusedInitializers] Removing initializer 'layer2.0.downsample.1.num_batches_tracked'. It is not used by any node and should be removed from the model.
    2022-06-24 06:48:00.377673067 [W:onnxruntime:, graph.cc:3106 CleanUnusedInitializers] Removing initializer 'layer1.0.bn1.num_batches_tracked'. It is not used by any node and should be removed from the model.
    2022-06-24 06:48:00.377883242 [W:onnxruntime:, graph.cc:3106 CleanUnusedInitializers] Removing initializer 'layer2.0.bn2.num_batches_tracked'. It is not used by any node and should be removed from the model.
    2022-06-24 06:48:00.378252270 [W:onnxruntime:, graph.cc:3106 CleanUnusedInitializers] Removing initializer 'layer4.1.bn1.num_batches_tracked'. It is not used by any node and should be removed from the model.
    2022-06-24 06:48:00.378544674 [W:onnxruntime:, graph.cc:3106 CleanUnusedInitializers] Removing initializer 'layer4.0.bn1.num_batches_tracked'. It is not used by any node and should be removed from the model.
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

  • Hi BEKER,

    As we verified the setup is good to go and this particular model is failing. Mostly this is coz of insufficient memory. Could you run the same command in a PC environment (X86) to verify the model? 

  • Hi Muhammed,
    I had already inferred in an x86 environment. You can check the comment I shared the traffic signs. I believe my problem is with creating folders for model artifacts.

    Thank you in advance for your help.

  • Hi BEKER,

    model-artifacts folder will be generated when you run the python with -c option, as you have done above.

    From the error it doesn't look like model-artifacts folder misplace issue. We can try a smaller textbook model and do the same procedure, so that we can narrow down the issue. (you can refer to ti model zoo https://github.com/TexasInstruments/edgeai-modelzoo for models)

  • Hi Muhammed,
    If I do the same operations with the pre-trained yolov5s6 model, I get a different error.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    Available execution providers : ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    Running 1 Models - ['yolov5s6']
    Running_Model : yolov5s6
    Traceback (most recent call last):
    File "onnxrt_ep.py", line 251, in <module>
    run_model(model, mIdx)
    File "onnxrt_ep.py", line 168, in run_model
    sess = rt.InferenceSession(config['model_path'] ,providers=EP_list, provider_options=[delegate_options, {}], sess_options=so)
    File "/usr/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 283, in __init__
    self._create_inference_session(providers, provider_options)
    File "/usr/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 310, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
    onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from /opt/edgeai-tidl-tools/model-artifacts/yolov5s6/model/yolov5s6.onnx failed:/usr/src/debug/onnxruntime/1.7.0-r0_psdkla_4/git/onnxruntime/core/graph/model.cc:101 onnxruntime::Model::Model(onnx::ModelProto&&, const PathString&, const IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&) ModelProto does not have a graph.
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

  • Hi Muhammed,

    Let me explain what we did step-by-step from the very beginning so that we can better understand the problem.

    We have the EVM kit, whose images I have shared. We also have one GMSL camera. We have one screen for the output of the EVM kit. We connect to the EVM with a pc using ubuntu 18.04 via ssh method using VS code. Using edge_ai_apps we can take images from the camera and test the demos live.

    Step 1


    https://www.ti.com/tool/download/PROCESSOR-SDK-LINUX-SK-TDA4VM

    We wrote the image file we downloaded via the relevant link to the sd card with balena.


    Step 2

    https://github.com/TexasInstruments/edgeai-yolov5

    We train the yolov5 model using our custom data with the relevant repostory. Again, we are doing the onnx model conversion with the same repo. Now we have two files.

    1- best.onnx

    2- best.prototxt


    Step 3

    https://github.com/TexasInstruments/edgeai-tidl-tools

    We create the artifacts folder of the "best.onnx" and "best.prototxt" files using the edgeai-tidl-tools repository.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    sefau18@ubuntu:~/edgeai-tidl-tools/examples/osrt_python/ort$ python3 onnxrt_ep.py -c
    Available execution providers : ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    Running 1 Models - ['best']
    Running_Model : best
    Writing model with shapes after running onnx shape inference -- ../../../models/public/best_shape.onnx
    TIDL Meta PipeLine (Proto) File : ../../../models/public/best.prototxt
    yolo_v3
    yolo_v3
    Number of OD backbone nodes = 0
    Size of odBackboneNodeIds = 0
    Preliminary subgraphs created = 0
    Final number of subgraphs created are : 0, - Offloaded Nodes - 0, Total Nodes - 1
    TIDL Meta PipeLine (Proto) File : ../../../models/public/best.prototxt
    yolo_v3
    yolo_v3
    Number of OD backbone nodes = 0
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


    Step 4

    We can extract inferences to the x86 environment. The model works as expected.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    sefau18@ubuntu:~/edgeai-tidl-tools/examples/osrt_python/ort$ python3 onnxrt_ep.py -d
    Available execution providers : ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    Running 1 Models - ['best']
    Running_Model : best
    Writing model with shapes after running onnx shape inference -- ../../../models/public/best_shape.onnx
    Saving image to ../../../output_images/
    Completed_Model : 1, Name : best , Total time : 301.03, Offload Time : 0.00 , DDR RW MBs : 0, Output File : py_out_best_ADE_val_00001801.jpg
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


    Step 5

    Our artifacts folder created in step 3 looks like this.

    sefau18@ubuntu:~/edgeai-tidl-tools/model-artifacts/best$ tree

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    sefau18@ubuntu:~/edgeai-tidl-tools/model-artifacts/best$ tree
    .
    ├── tempDir
    ·· ├── detections_calib_raw_data.bin
    ·· ├── detections_tidl_io_1.bin
    ·· ├── detections_tidl_io__LayerPerChannelMean.bin
    ·· ├── detections_tidl_io_.perf_sim_config.txt
    ·· ├── detections_tidl_io_.qunat_stats_config.txt
    ·· ├── detections_tidl_io__stats_tool_out.bin
    ·· ├── detections_tidl_net
    ···· ├── bufinfolog.csv
    ···· ├── bufinfolog.txt
    ···· └── perfSimInfo.bin
    ·· ├── detections_tidl_net.bin
    ·· ├── detections_tidl_net.bin.layer_info.txt
    ·· ├── detections_tidl_net.bin_netLog.txt
    ·· ├── detections_tidl_net.bin_paramDebug.csv
    ·· ├── detections_tidl_net.bin.svg
    ·· ├── graphvizInfo.txt
    ·· └── runtimes_visualization.svg
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


    Step 6

    https://software-dl.ti.com/jacinto7/esd/processor-sdk-linux-sk-tda4vm/latest/exports/docs/inference_models.html

    We manually create the artifacts folder shown in the link above.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    sefau18@ubuntu:~/edgeai-tidl-tools/model-artifacts/best-manual$ tree
    .
    ├── artifacts
    ·· ├── allowedNode.txt
    ·· ├── detections_tidl_io_1.bin
    ·· ├── detections_tidl_net.bin
    ·· ├── detections_tidl_net.bin_netLog.txt
    ·· ├── detections_tidl_net.bin.svg
    ·· ├── onnxrtMetaData.txt
    ·· └── runtimes_visualization.svg
    ├── model
    ·· ├── best.onnx
    ·· └── best.prototxt
    └── param.yaml
    2 directories, 10 files
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


    Step 7

    We install the edgeai_tidl-tools repository on the EVM. Then we copy the artifacts folder we created manually to the EVM.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    root@j7-evm:/opt/edgeai-tidl-tools# export DEVICE=j7
    root@j7-evm:/opt/edgeai-tidl-tools# source ./setup.sh
    .
    .
    .
    inflating: opencv-4.1.0/samples/wp8/OpenCVXaml/OpenCVXaml/Properties/AppManifest.xml
    inflating: opencv-4.1.0/samples/wp8/OpenCVXaml/OpenCVXaml/Properties/AssemblyInfo.cs
    inflating: opencv-4.1.0/samples/wp8/OpenCVXaml/OpenCVXaml/Properties/WMAppManifest.xml
    creating: opencv-4.1.0/samples/wp8/OpenCVXaml/OpenCVXaml/Resources/
    inflating: opencv-4.1.0/samples/wp8/OpenCVXaml/OpenCVXaml/Resources/AppResources.Designer.cs
    inflating: opencv-4.1.0/samples/wp8/OpenCVXaml/OpenCVXaml/Resources/AppResources.resx
    inflating: opencv-4.1.0/samples/wp8/readme.txt
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


    Step 8

    examples/osrt_python/common_utils.py edit

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    models_configs = {
    # ONNX RT OOB Models
    'best' : {
    'model_path' : os.path.join(models_base_path, 'best.onnx'),
    'mean': [0, 0, 0],
    'std' : [0.003921568627,0.003921568627,0.003921568627],
    'num_images' : numImages,
    'num_classes': 36,
    'model_type': 'od',
    'od_type' : 'YoloV5',
    'framework' : '',
    'meta_layers_names_list' : os.path.join(models_base_path, 'best.prototxt'),
    'session_name' : 'onnxrt' ,
    'meta_arch_type' : 6
    },
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


    Step 9

    examples/osrt_python/ort/onnx_ep.py edit

    models = ['best']

    Step 10

    We can make inferences on the EVM. It works fine just like the x86 environment.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    root@j7-evm:/opt/edgeai-tidl-tools/examples/osrt_python/ort# python3 onnxrt_ep.py -d
    Available execution providers : ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    Running 1 Models - ['best']
    Running_Model : best
    Saving image to ../../../output_images/
    Completed_Model : 1, Name : best , Total time : 12447.09, Offload Time : 0.00 , DDR RW MBs : 0, Output File : py_out_best_ADE_val_00001801.jpg
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


    Step 11

    In the EVM, we manually copy the /op/model_zoo folder to the "best" folder we created in step 6.

    object_detection.yaml edit

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    title: "Object Detection Demo"
    log_level: 2
    inputs:
    input0:
    source: /dev/video2
    format: jpeg
    width: 1280
    height: 720
    framerate: 30
    input1:
    source: /opt/edge_ai_apps/data/videos/video_0000_h264.mp4
    format: h264_sw
    width: 1280
    height: 720
    framerate: 30
    loop: True
    input2:
    source: /opt/edge_ai_apps/data/images/%04d.jpg
    width: 1280
    height: 720
    index: 0
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


    Step 12

    We are running the edge_ai_apps instance on the EVM.

    You see the error we got.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    root@j7-evm:/opt/edge_ai_apps/apps_python# ./app_edgeai.py ../configs/object_detection.yaml
    Traceback (most recent call last):
    File "./app_edgeai.py", line 71, in <module>
    main(sys.argv)
    File "./app_edgeai.py", line 45, in main
    demo = EdgeAIDemo(config)
    File "/opt/edge_ai_apps/apps_python/edge_ai_class.py", line 74, in __init__
    model_obj = config_parser.Model(model_config)
    File "/opt/edge_ai_apps/apps_python/config_parser.py", line 104, in __init__
    self.mean = params['session']['input_mean']
    KeyError: 'input_mean'
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


    Step 13

    Here is the original param.yaml file.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    postprocess:
    data_layout: NCHW
    detection_thr: 0.3
    preprocess:
    crop:
    - 640
    - 640
    data_layout: NCHW
    mean:
    - 0
    - 0
    - 0
    resize:
    - 640
    - 640
    scale:
    - 0.003921568627
    - 0.003921568627
    - 0.003921568627
    session:
    artifacts_folder: ''
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


    Step 14

    Then we edit the param.yaml file using other examples.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    postprocess:
    data_layout: NCHW
    detection_thr: 0.3
    formatter:
    dst_indices:
    - 4
    - 5
    name: DetectionBoxSL2BoxLS
    src_indices:
    - 5
    - 4
    ignore_index: null
    normalized_detections: false
    resize_with_pad: true
    save_output: false
    shuffle_indices: null
    squeeze_axis: null
    preprocess:
    crop:
    - 640
    - 640
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


    Step 15

    This is the error we get when we try again.

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    root@j7-evm:/opt/edge_ai_apps/apps_python# ./app_edgeai.py ../configs/object_detection.yaml
    libtidl_onnxrt_EP loaded 0xdc00a70
    Final number of subgraphs created are : 1, - Offloaded Nodes - 434, Total Nodes - 434
    2022-07-06 12:51:44.042351690 [E:onnxruntime:, inference_session.cc:1310 operator()] Exception during initialization: /usr/src/debug/onnxruntime/1.7.0-r0_psdkla_4/git/include/onnxruntime/core/graph/graph.h:1299 onnxruntime::Node* onnxruntime::Graph::NodeAtIndexImpl(onnxruntime::NodeIndex) const node_index < nodes_.size() was false. Validating no unexpected access using an invalid node_index. Got:28271 Max:1
    Traceback (most recent call last):
    File "./app_edgeai.py", line 71, in <module>
    main(sys.argv)
    File "./app_edgeai.py", line 45, in main
    demo = EdgeAIDemo(config)
    File "/opt/edge_ai_apps/apps_python/edge_ai_class.py", line 74, in __init__
    model_obj = config_parser.Model(model_config)
    File "/opt/edge_ai_apps/apps_python/config_parser.py", line 136, in __init__
    self.run_time = RunTime(self)
    File "/opt/edge_ai_apps/apps_python/run_times.py", line 109, in __init__
    self.interpreter = onnxruntime.InferenceSession(params.model_path,\
    File "/usr/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 283, in __init__
    self._create_inference_session(providers, provider_options)
    File "/usr/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 315, in _create_inference_session
    sess.initialize_session(providers, provider_options)
    onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /usr/src/debug/onnxruntime/1.7.0-r0_psdkla_4/git/include/onnxruntime/core/graph/graph.h:1299 onnxruntime::Node* onnxruntime::Graph::NodeAtIndexImpl(onnxruntime::NodeIndex) const node_index < nodes_.size() was false. Validating no unexpected access using an invalid node_index. Got:28271 Max:1
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    At this point, our search for a solution continues. We doubt these are the right steps. Thank you for your help in advance.

  •   Do you have any ideas on the subject?

  • Let's continue the discussion if the other thread that you started: e2e.ti.com/.../4164033