AM69A: edgeai-tidl-tools Accelerator Fatal Error - Processors forum - Processors

I am encountering errors while setting up the egeai-tidl-tool environment for GPU-based PTQ and model compilation,

and I am unable to proceed with testing the provided examples.

Could you please help identify the issue?

Accelerator Fatal Error: This file was compiled: -acc=gpu -gpu=cc50 -gpu=cc60 -gpu=cc60 -gpu=cc70 -gpu=cc75 -gpu=cc80 -gpu=cc80 -gpu=cc86 -gpu=cc90 -acc=host o
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 0

Below are the environment details and error logs attached.

my local PC environment:

- ubuntu 22.04

- cuda 12.2

- nvidia-driver 535.183.01

- gpu NVIDIA-RTX4500 Ada

and edgeai-tidl-tools docker enviroment is not changed.

here is the example command.

mkdir build && cd build
cmake ../examples && make -j && cd ..
source ./scripts/run_python_examples.sh

here is the error log.

root@93fa3515e700:/home/root# source ./scripts/run_python_examples.sh 
X64 Architecture
1

Available execution providers : ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']

Running 5 Models - ['cl-ort-resnet18-v1', 'od-ort-ssd-lite_mobilenetv2_fpn', 'cl-ort-resnet18-v1_4batch', 'cl-ort-resnet18-v1_low_latency', 'ss-ort-deeplabv3lite_mobilenetv2']


Running_Model : cl-ort-resnet18-v1 


Running_Model : od-ort-ssd-lite_mobilenetv2_fpn 


Running_Model : cl-ort-resnet18-v1_4batch 


Running_Model : cl-ort-resnet18-v1_low_latency 


Running_Model : ss-ort-deeplabv3lite_mobilenetv2 


Running shape inference on model ../../../models/public/resnet18_opset9.onnx 


Running shape inference on model ../../../models/public/ssd-lite_mobilenetv2_fpn.onnx 


Running shape inference on model ../../../models/public/resnet18_opset9_4batch.onnx 


Running shape inference on model ../../../models/public/resnet18_opset9.onnx 


Running shape inference on model ../../../models/public/deeplabv3lite_mobilenetv2.onnx 

Process Process-4:
Traceback (most recent call last):
 File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
 self.run()
 File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
 self._target(*self._args, **self._kwargs)
 File "/home/root/examples/osrt_python/ort/onnxrt_ep.py", line 308, in run_model
 onnx.shape_inference.infer_shapes_path(
 File "/usr/local/lib/python3.10/dist-packages/onnx/shape_inference.py", line 79, in infer_shapes_path
 C.infer_shapes_path(model_path, output_path, check_type, strict_mode, data_prop)
onnx.onnx_cpp2py_export.checker.ValidationError: Unable to parse proto from file: ../../../models/public/resnet18_opset9.onnx. Please check if it is a valid protobuf file of proto. 
========================= [Model Compilation Started] =========================

Model compilation will perform the following stages:
1. Parsing
2. Graph Optimization
3. Quantization & Calibration
4. Memory Planning

============================== [Version Summary] ==============================

-------------------------------------------------------------------------------
| TIDL Tools Version | 10_01_00_01 |
-------------------------------------------------------------------------------
| C7x Firmware Version | 10_01_00_01 |
-------------------------------------------------------------------------------
| Runtime Version | 1.15.0 |
-------------------------------------------------------------------------------
| Model Opset Version | 11 |
-------------------------------------------------------------------------------

NOTE: The runtime version here specifies ONNXRT_VERSION+TIDL_VERSION
Ex: 1.14.0+1000XXXX -> ONNXRT 1.14.0 and a TIDL_VERSION 10.00.XX.XX

============================== [Parsing Started] ==============================

[TIDL Import] [PARSER] WARNING: Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options

------------------------- Subgraph Information Summary -------------------------
-------------------------------------------------------------------------------
| Core | No. of Nodes | Number of Subgraphs |
-------------------------------------------------------------------------------
| C7x | 124 | 1 |
| CPU | 0 | x |
-------------------------------------------------------------------------------
========================= [Model Compilation Started] =========================

Model compilation will perform the following stages:
1. Parsing
2. Graph Optimization
3. Quantization & Calibration
4. Memory Planning

============================== [Version Summary] ==============================

-------------------------------------------------------------------------------
| TIDL Tools Version | 10_01_00_01 |
-------------------------------------------------------------------------------
| C7x Firmware Version | 10_01_00_01 |
-------------------------------------------------------------------------------
| Runtime Version | 1.15.0 |
-------------------------------------------------------------------------------
| Model Opset Version | 11 |
-------------------------------------------------------------------------------

NOTE: The runtime version here specifies ONNXRT_VERSION+TIDL_VERSION
Ex: 1.14.0+1000XXXX -> ONNXRT 1.14.0 and a TIDL_VERSION 10.00.XX.XX

============================== [Parsing Started] ==============================

ssd is meta arch name 

Number of OD backbone nodes = 159 
Size of odBackboneNodeIds = 159 
============================= [Parsing Completed] =============================


------------------------- Subgraph Information Summary -------------------------
-------------------------------------------------------------------------------
| Core | No. of Nodes | Number of Subgraphs |
-------------------------------------------------------------------------------
| C7x | 478 | 1 |
| CPU | 0 | x |
-------------------------------------------------------------------------------
========================= [Model Compilation Started] =========================

Model compilation will perform the following stages:
1. Parsing
2. Graph Optimization
3. Quantization & Calibration
4. Memory Planning

============================== [Version Summary] ==============================

-------------------------------------------------------------------------------
| TIDL Tools Version | 10_01_00_01 |
-------------------------------------------------------------------------------
| C7x Firmware Version | 10_01_00_01 |
-------------------------------------------------------------------------------
| Runtime Version | 1.15.0 |
-------------------------------------------------------------------------------
| Model Opset Version | 11 |
-------------------------------------------------------------------------------

NOTE: The runtime version here specifies ONNXRT_VERSION+TIDL_VERSION
Ex: 1.14.0+1000XXXX -> ONNXRT 1.14.0 and a TIDL_VERSION 10.00.XX.XX

============================== [Parsing Started] ==============================

[TIDL Import] [PARSER] WARNING: Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options
============================= [Parsing Completed] =============================


------------------------- Subgraph Information Summary -------------------------
-------------------------------------------------------------------------------
| Core | No. of Nodes | Number of Subgraphs |
-------------------------------------------------------------------------------
| C7x | 52 | 1 |
| CPU | 0 | x |
-------------------------------------------------------------------------------
========================= [Model Compilation Started] =========================

Model compilation will perform the following stages:
1. Parsing
2. Graph Optimization
3. Quantization & Calibration
4. Memory Planning

============================== [Version Summary] ==============================

-------------------------------------------------------------------------------
| TIDL Tools Version | 10_01_00_01 |
-------------------------------------------------------------------------------
| C7x Firmware Version | 10_01_00_01 |
-------------------------------------------------------------------------------
| Runtime Version | 1.15.0 |
-------------------------------------------------------------------------------
| Model Opset Version | 11 |
-------------------------------------------------------------------------------

NOTE: The runtime version here specifies ONNXRT_VERSION+TIDL_VERSION
Ex: 1.14.0+1000XXXX -> ONNXRT 1.14.0 and a TIDL_VERSION 10.00.XX.XX

============================== [Parsing Started] ==============================

[TIDL Import] [PARSER] WARNING: Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options
============================= [Parsing Completed] =============================


------------------------- Subgraph Information Summary -------------------------
-------------------------------------------------------------------------------
| Core | No. of Nodes | Number of Subgraphs |
-------------------------------------------------------------------------------
| C7x | 52 | 1 |
| CPU | 0 | x |
-------------------------------------------------------------------------------
============================= [Parsing Completed] =============================

==================== [Optimization for subgraph_0 Started] ====================

[TIDL Import] [PARSER] WARNING: Requested output data convert layer is not added to the network, It is currently not optimal
----------------------------- Optimization Summary -----------------------------
--------------------------------------------------------------------------------
| Layer | Nodes before optimization | Nodes after optimization |
--------------------------------------------------------------------------------
| TIDL_ArgMaxLayer | 1 | 1 |
| TIDL_ConcatLayer | 2 | 2 |
| TIDL_ReLULayer | 43 | 0 |
| TIDL_ResizeLayer | 2 | 2 |
| TIDL_ConvolutionLayer | 62 | 62 |
| TIDL_EltWiseLayer | 12 | 10 |
| TIDL_CastLayer | 2 | 0 |
--------------------------------------------------------------------------------

=================== [Optimization for subgraph_0 Completed] ===================

The soft limit is 10240
The hard limit is 10240
MEM: Init ... !!!
MEM: Init ... Done !!!
 0.0s: VX_ZONE_INIT:Enabled
 0.6s: VX_ZONE_ERROR:Enabled
 0.8s: VX_ZONE_WARNING:Enabled
 0.3462s: VX_ZONE_INIT:[tivxInit:190] Initialization Done !!!
============= [Quantization & Calibration for subgraph_0 Started] =============

Accelerator Fatal Error: This file was compiled: -acc=gpu -gpu=cc50 -gpu=cc60 -gpu=cc60 -gpu=cc70 -gpu=cc75 -gpu=cc80 -gpu=cc80 -gpu=cc86 -gpu=cc90 -acc=host o
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 0
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 1
 File: /sdk/OSRT/Build/J784S4/c7x-mma-tidl/ti_dl/algo/src/ref/tidl_conv2d_base_ref.c
 Function: _Z24TIDL_refConv2dKernelFastILi3EffffEvPT0_PT1_PT2_PT3_S7_S7_iiiiiiiiiiiiiiiiiiijiiiiii:462
 Line: 472

==================== [Optimization for subgraph_0 Started] ====================

==================== [Optimization for subgraph_0 Started] ====================

TIDL Meta pipeLine (proto) file : ../../../models/public/ssd-lite_mobilenetv2_fpn.prototxt 
ssd
[TIDL Import] WARNING: Image dimensions is not provided, Please provide it as part of prior_box_param in form of either (img_w & img_h) or img_size. Proceeding with img_w = 512 and img_h = 512 in prior box decoding
[TIDL Import] WARNING: Image dimensions is not provided, Please provide it as part of prior_box_param in form of either (img_w & img_h) or img_size. Proceeding with img_w = 512 and img_h = 512 in prior box decoding
[TIDL Import] WARNING: Image dimensions is not provided, Please provide it as part of prior_box_param in form of either (img_w & img_h) or img_size. Proceeding with img_w = 512 and img_h = 512 in prior box decoding
[TIDL Import] WARNING: Image dimensions is not provided, Please provide it as part of prior_box_param in form of either (img_w & img_h) or img_size. Proceeding with img_w = 512 and img_h = 512 in prior box decoding
[TIDL Import] WARNING: Image dimensions is not provided, Please provide it as part of prior_box_param in form of either (img_w & img_h) or img_size. Proceeding with img_w = 512 and img_h = 512 in prior box decoding
[TIDL Import] WARNING: Image dimensions is not provided, Please provide it as part of prior_box_param in form of either (img_w & img_h) or img_size. Proceeding with img_w = 512 and img_h = 512 in prior box decoding
==================== [Optimization for subgraph_0 Started] ====================

----------------------------- Optimization Summary -----------------------------
---------------------------------------------------------------------------------
| Layer | Nodes before optimization | Nodes after optimization |
---------------------------------------------------------------------------------
| TIDL_ReLULayer | 17 | 0 |
| TIDL_FlattenLayer | 1 | 0 |
| TIDL_ConvolutionLayer | 20 | 20 |
| TIDL_EltWiseLayer | 10 | 8 |
| TIDL_InnerProductLayer | 1 | 1 |
| TIDL_CastLayer | 1 | 0 |
| TIDL_PoolingLayer | 2 | 2 |
---------------------------------------------------------------------------------

=================== [Optimization for subgraph_0 Completed] ===================

----------------------------- Optimization Summary -----------------------------
---------------------------------------------------------------------------------
| Layer | Nodes before optimization | Nodes after optimization |
---------------------------------------------------------------------------------
| TIDL_ReLULayer | 17 | 0 |
| TIDL_FlattenLayer | 1 | 0 |
| TIDL_ConvolutionLayer | 20 | 20 |
| TIDL_EltWiseLayer | 10 | 8 |
| TIDL_InnerProductLayer | 1 | 1 |
| TIDL_CastLayer | 1 | 0 |
| TIDL_PoolingLayer | 2 | 2 |
---------------------------------------------------------------------------------

=================== [Optimization for subgraph_0 Completed] ===================

The soft limit is 10240
The hard limit is 10240
MEM: Init ... !!!
MEM: Init ... Done !!!
 0.0s: VX_ZONE_INIT:Enabled
 0.7s: VX_ZONE_ERROR:Enabled
 0.9s: VX_ZONE_WARNING:Enabled
 0.3726s: VX_ZONE_INIT:[tivxInit:190] Initialization Done !!!
============= [Quantization & Calibration for subgraph_0 Started] =============

The soft limit is 10240
The hard limit is 10240
MEM: Init ... !!!
MEM: Init ... Done !!!
 0.0s: VX_ZONE_INIT:Enabled
 0.7s: VX_ZONE_ERROR:Enabled
 0.11s: VX_ZONE_WARNING:Enabled
 0.4235s: VX_ZONE_INIT:[tivxInit:190] Initialization Done !!!
[TIDL Import] [PARSER] WARNING: Requested output data convert layer is not added to the network, It is currently not optimal
[TIDL Import] [PARSER] WARNING: Requested output data convert layer is not added to the network, It is currently not optimal
Accelerator Fatal Error: This file was compiled: -acc=gpu -gpu=cc50 -gpu=cc60 -gpu=cc60 -gpu=cc70 -gpu=cc75 -gpu=cc80 -gpu=cc80 -gpu=cc86 -gpu=cc90 -acc=host o
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 0
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 1
 File: /sdk/OSRT/Build/J784S4/c7x-mma-tidl/ti_dl/algo/src/ref/tidl_conv2d_base_ref.c
 Function: _Z20TIDL_refConv2dKernelIffffEvPKT_PKT0_PKT1_PT2_SA_SA_iiiiiijjiiiiiiiiiiijiiiiii:300
 Line: 309

============= [Quantization & Calibration for subgraph_0 Started] =============

----------------------------- Optimization Summary -----------------------------
-------------------------------------------------------------------------------------
| Layer | Nodes before optimization | Nodes after optimization |
-------------------------------------------------------------------------------------
| TIDL_OdOutputReformatLayer | 0 | 2 |
| TIDL_ReLULayer | 52 | 0 |
| TIDL_ResizeLayer | 2 | 2 |
| TIDL_ConvolutionLayer | 90 | 90 |
| TIDL_EltWiseLayer | 14 | 12 |
| TIDL_DetectionOutputLayer | 0 | 1 |
| TIDL_CastLayer | 1 | 0 |
-------------------------------------------------------------------------------------

=================== [Optimization for subgraph_0 Completed] ===================

Accelerator Fatal Error: This file was compiled: -acc=gpu -gpu=cc50 -gpu=cc60 -gpu=cc60 -gpu=cc70 -gpu=cc75 -gpu=cc80 -gpu=cc80 -gpu=cc86 -gpu=cc90 -acc=host o
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 0
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 1
 File: /sdk/OSRT/Build/J784S4/c7x-mma-tidl/ti_dl/algo/src/ref/tidl_conv2d_base_ref.c
 Function: _Z20TIDL_refConv2dKernelIffffEvPKT_PKT0_PKT1_PT2_SA_SA_iiiiiijjiiiiiiiiiiijiiiiii:300
 Line: 309

The soft limit is 10240
The hard limit is 10240
MEM: Init ... !!!
MEM: Init ... Done !!!
 0.0s: VX_ZONE_INIT:Enabled
 0.7s: VX_ZONE_ERROR:Enabled
 0.9s: VX_ZONE_WARNING:Enabled
 0.3514s: VX_ZONE_INIT:[tivxInit:190] Initialization Done !!!
============= [Quantization & Calibration for subgraph_0 Started] =============

Accelerator Fatal Error: This file was compiled: -acc=gpu -gpu=cc50 -gpu=cc60 -gpu=cc60 -gpu=cc70 -gpu=cc75 -gpu=cc80 -gpu=cc80 -gpu=cc86 -gpu=cc90 -acc=host o
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 0
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 1
 File: /sdk/OSRT/Build/J784S4/c7x-mma-tidl/ti_dl/algo/src/ref/tidl_conv2d_base_ref.c
 Function: _Z24TIDL_refConv2dKernelFastILi3EffffEvPT0_PT1_PT2_PT3_S7_S7_iiiiiiiiiiiiiiiiiiijiiiiii:462
 Line: 472