Tool/software:
Hi.
I am encountering errors while setting up the egeai-tidl-tool environment for GPU-based PTQ and model compilation,
and I am unable to proceed with testing the provided examples.
Could you please help identify the issue?
Accelerator Fatal Error: This file was compiled: -acc=gpu -gpu=cc50 -gpu=cc60 -gpu=cc60 -gpu=cc70 -gpu=cc75 -gpu=cc80 -gpu=cc80 -gpu=cc86 -gpu=cc90 -acc=host o
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 0
Below are the environment details and error logs attached.
my local PC environment:
- ubuntu 22.04
- cuda 12.2
- nvidia-driver 535.183.01
- gpu NVIDIA-RTX4500 Ada
and edgeai-tidl-tools docker enviroment is not changed.
here is the example command.
mkdir build && cd build
cmake ../examples && make -j && cd ..
source ./scripts/run_python_examples.sh
here is the error log.
root@93fa3515e700:/home/root# source ./scripts/run_python_examples.sh
X64 Architecture
1
Available execution providers : ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
Running 5 Models - ['cl-ort-resnet18-v1', 'od-ort-ssd-lite_mobilenetv2_fpn', 'cl-ort-resnet18-v1_4batch', 'cl-ort-resnet18-v1_low_latency', 'ss-ort-deeplabv3lite_mobilenetv2']
Running_Model : cl-ort-resnet18-v1
Running_Model : od-ort-ssd-lite_mobilenetv2_fpn
Running_Model : cl-ort-resnet18-v1_4batch
Running_Model : cl-ort-resnet18-v1_low_latency
Running_Model : ss-ort-deeplabv3lite_mobilenetv2
Running shape inference on model ../../../models/public/resnet18_opset9.onnx
Running shape inference on model ../../../models/public/ssd-lite_mobilenetv2_fpn.onnx
Running shape inference on model ../../../models/public/resnet18_opset9_4batch.onnx
Running shape inference on model ../../../models/public/resnet18_opset9.onnx
Running shape inference on model ../../../models/public/deeplabv3lite_mobilenetv2.onnx
Process Process-4:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/root/examples/osrt_python/ort/onnxrt_ep.py", line 308, in run_model
onnx.shape_inference.infer_shapes_path(
File "/usr/local/lib/python3.10/dist-packages/onnx/shape_inference.py", line 79, in infer_shapes_path
C.infer_shapes_path(model_path, output_path, check_type, strict_mode, data_prop)
onnx.onnx_cpp2py_export.checker.ValidationError: Unable to parse proto from file: ../../../models/public/resnet18_opset9.onnx. Please check if it is a valid protobuf file of proto.
========================= [Model Compilation Started] =========================
Model compilation will perform the following stages:
1. Parsing
2. Graph Optimization
3. Quantization & Calibration
4. Memory Planning
============================== [Version Summary] ==============================
-------------------------------------------------------------------------------
| TIDL Tools Version | 10_01_00_01 |
-------------------------------------------------------------------------------
| C7x Firmware Version | 10_01_00_01 |
-------------------------------------------------------------------------------
| Runtime Version | 1.15.0 |
-------------------------------------------------------------------------------
| Model Opset Version | 11 |
-------------------------------------------------------------------------------
NOTE: The runtime version here specifies ONNXRT_VERSION+TIDL_VERSION
Ex: 1.14.0+1000XXXX -> ONNXRT 1.14.0 and a TIDL_VERSION 10.00.XX.XX
============================== [Parsing Started] ==============================
[TIDL Import] [PARSER] WARNING: Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options
------------------------- Subgraph Information Summary -------------------------
-------------------------------------------------------------------------------
| Core | No. of Nodes | Number of Subgraphs |
-------------------------------------------------------------------------------
| C7x | 124 | 1 |
| CPU | 0 | x |
-------------------------------------------------------------------------------
========================= [Model Compilation Started] =========================
Model compilation will perform the following stages:
1. Parsing
2. Graph Optimization
3. Quantization & Calibration
4. Memory Planning
============================== [Version Summary] ==============================
-------------------------------------------------------------------------------
| TIDL Tools Version | 10_01_00_01 |
-------------------------------------------------------------------------------
| C7x Firmware Version | 10_01_00_01 |
-------------------------------------------------------------------------------
| Runtime Version | 1.15.0 |
-------------------------------------------------------------------------------
| Model Opset Version | 11 |
-------------------------------------------------------------------------------
NOTE: The runtime version here specifies ONNXRT_VERSION+TIDL_VERSION
Ex: 1.14.0+1000XXXX -> ONNXRT 1.14.0 and a TIDL_VERSION 10.00.XX.XX
============================== [Parsing Started] ==============================
ssd is meta arch name
Number of OD backbone nodes = 159
Size of odBackboneNodeIds = 159
============================= [Parsing Completed] =============================
------------------------- Subgraph Information Summary -------------------------
-------------------------------------------------------------------------------
| Core | No. of Nodes | Number of Subgraphs |
-------------------------------------------------------------------------------
| C7x | 478 | 1 |
| CPU | 0 | x |
-------------------------------------------------------------------------------
========================= [Model Compilation Started] =========================
Model compilation will perform the following stages:
1. Parsing
2. Graph Optimization
3. Quantization & Calibration
4. Memory Planning
============================== [Version Summary] ==============================
-------------------------------------------------------------------------------
| TIDL Tools Version | 10_01_00_01 |
-------------------------------------------------------------------------------
| C7x Firmware Version | 10_01_00_01 |
-------------------------------------------------------------------------------
| Runtime Version | 1.15.0 |
-------------------------------------------------------------------------------
| Model Opset Version | 11 |
-------------------------------------------------------------------------------
NOTE: The runtime version here specifies ONNXRT_VERSION+TIDL_VERSION
Ex: 1.14.0+1000XXXX -> ONNXRT 1.14.0 and a TIDL_VERSION 10.00.XX.XX
============================== [Parsing Started] ==============================
[TIDL Import] [PARSER] WARNING: Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options
============================= [Parsing Completed] =============================
------------------------- Subgraph Information Summary -------------------------
-------------------------------------------------------------------------------
| Core | No. of Nodes | Number of Subgraphs |
-------------------------------------------------------------------------------
| C7x | 52 | 1 |
| CPU | 0 | x |
-------------------------------------------------------------------------------
========================= [Model Compilation Started] =========================
Model compilation will perform the following stages:
1. Parsing
2. Graph Optimization
3. Quantization & Calibration
4. Memory Planning
============================== [Version Summary] ==============================
-------------------------------------------------------------------------------
| TIDL Tools Version | 10_01_00_01 |
-------------------------------------------------------------------------------
| C7x Firmware Version | 10_01_00_01 |
-------------------------------------------------------------------------------
| Runtime Version | 1.15.0 |
-------------------------------------------------------------------------------
| Model Opset Version | 11 |
-------------------------------------------------------------------------------
NOTE: The runtime version here specifies ONNXRT_VERSION+TIDL_VERSION
Ex: 1.14.0+1000XXXX -> ONNXRT 1.14.0 and a TIDL_VERSION 10.00.XX.XX
============================== [Parsing Started] ==============================
[TIDL Import] [PARSER] WARNING: Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options
============================= [Parsing Completed] =============================
------------------------- Subgraph Information Summary -------------------------
-------------------------------------------------------------------------------
| Core | No. of Nodes | Number of Subgraphs |
-------------------------------------------------------------------------------
| C7x | 52 | 1 |
| CPU | 0 | x |
-------------------------------------------------------------------------------
============================= [Parsing Completed] =============================
==================== [Optimization for subgraph_0 Started] ====================
[TIDL Import] [PARSER] WARNING: Requested output data convert layer is not added to the network, It is currently not optimal
----------------------------- Optimization Summary -----------------------------
--------------------------------------------------------------------------------
| Layer | Nodes before optimization | Nodes after optimization |
--------------------------------------------------------------------------------
| TIDL_ArgMaxLayer | 1 | 1 |
| TIDL_ConcatLayer | 2 | 2 |
| TIDL_ReLULayer | 43 | 0 |
| TIDL_ResizeLayer | 2 | 2 |
| TIDL_ConvolutionLayer | 62 | 62 |
| TIDL_EltWiseLayer | 12 | 10 |
| TIDL_CastLayer | 2 | 0 |
--------------------------------------------------------------------------------
=================== [Optimization for subgraph_0 Completed] ===================
The soft limit is 10240
The hard limit is 10240
MEM: Init ... !!!
MEM: Init ... Done !!!
0.0s: VX_ZONE_INIT:Enabled
0.6s: VX_ZONE_ERROR:Enabled
0.8s: VX_ZONE_WARNING:Enabled
0.3462s: VX_ZONE_INIT:[tivxInit:190] Initialization Done !!!
============= [Quantization & Calibration for subgraph_0 Started] =============
Accelerator Fatal Error: This file was compiled: -acc=gpu -gpu=cc50 -gpu=cc60 -gpu=cc60 -gpu=cc70 -gpu=cc75 -gpu=cc80 -gpu=cc80 -gpu=cc86 -gpu=cc90 -acc=host o
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 0
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 1
File: /sdk/OSRT/Build/J784S4/c7x-mma-tidl/ti_dl/algo/src/ref/tidl_conv2d_base_ref.c
Function: _Z24TIDL_refConv2dKernelFastILi3EffffEvPT0_PT1_PT2_PT3_S7_S7_iiiiiiiiiiiiiiiiiiijiiiiii:462
Line: 472
==================== [Optimization for subgraph_0 Started] ====================
==================== [Optimization for subgraph_0 Started] ====================
TIDL Meta pipeLine (proto) file : ../../../models/public/ssd-lite_mobilenetv2_fpn.prototxt
ssd
[TIDL Import] WARNING: Image dimensions is not provided, Please provide it as part of prior_box_param in form of either (img_w & img_h) or img_size. Proceeding with img_w = 512 and img_h = 512 in prior box decoding
[TIDL Import] WARNING: Image dimensions is not provided, Please provide it as part of prior_box_param in form of either (img_w & img_h) or img_size. Proceeding with img_w = 512 and img_h = 512 in prior box decoding
[TIDL Import] WARNING: Image dimensions is not provided, Please provide it as part of prior_box_param in form of either (img_w & img_h) or img_size. Proceeding with img_w = 512 and img_h = 512 in prior box decoding
[TIDL Import] WARNING: Image dimensions is not provided, Please provide it as part of prior_box_param in form of either (img_w & img_h) or img_size. Proceeding with img_w = 512 and img_h = 512 in prior box decoding
[TIDL Import] WARNING: Image dimensions is not provided, Please provide it as part of prior_box_param in form of either (img_w & img_h) or img_size. Proceeding with img_w = 512 and img_h = 512 in prior box decoding
[TIDL Import] WARNING: Image dimensions is not provided, Please provide it as part of prior_box_param in form of either (img_w & img_h) or img_size. Proceeding with img_w = 512 and img_h = 512 in prior box decoding
==================== [Optimization for subgraph_0 Started] ====================
----------------------------- Optimization Summary -----------------------------
---------------------------------------------------------------------------------
| Layer | Nodes before optimization | Nodes after optimization |
---------------------------------------------------------------------------------
| TIDL_ReLULayer | 17 | 0 |
| TIDL_FlattenLayer | 1 | 0 |
| TIDL_ConvolutionLayer | 20 | 20 |
| TIDL_EltWiseLayer | 10 | 8 |
| TIDL_InnerProductLayer | 1 | 1 |
| TIDL_CastLayer | 1 | 0 |
| TIDL_PoolingLayer | 2 | 2 |
---------------------------------------------------------------------------------
=================== [Optimization for subgraph_0 Completed] ===================
----------------------------- Optimization Summary -----------------------------
---------------------------------------------------------------------------------
| Layer | Nodes before optimization | Nodes after optimization |
---------------------------------------------------------------------------------
| TIDL_ReLULayer | 17 | 0 |
| TIDL_FlattenLayer | 1 | 0 |
| TIDL_ConvolutionLayer | 20 | 20 |
| TIDL_EltWiseLayer | 10 | 8 |
| TIDL_InnerProductLayer | 1 | 1 |
| TIDL_CastLayer | 1 | 0 |
| TIDL_PoolingLayer | 2 | 2 |
---------------------------------------------------------------------------------
=================== [Optimization for subgraph_0 Completed] ===================
The soft limit is 10240
The hard limit is 10240
MEM: Init ... !!!
MEM: Init ... Done !!!
0.0s: VX_ZONE_INIT:Enabled
0.7s: VX_ZONE_ERROR:Enabled
0.9s: VX_ZONE_WARNING:Enabled
0.3726s: VX_ZONE_INIT:[tivxInit:190] Initialization Done !!!
============= [Quantization & Calibration for subgraph_0 Started] =============
The soft limit is 10240
The hard limit is 10240
MEM: Init ... !!!
MEM: Init ... Done !!!
0.0s: VX_ZONE_INIT:Enabled
0.7s: VX_ZONE_ERROR:Enabled
0.11s: VX_ZONE_WARNING:Enabled
0.4235s: VX_ZONE_INIT:[tivxInit:190] Initialization Done !!!
[TIDL Import] [PARSER] WARNING: Requested output data convert layer is not added to the network, It is currently not optimal
[TIDL Import] [PARSER] WARNING: Requested output data convert layer is not added to the network, It is currently not optimal
Accelerator Fatal Error: This file was compiled: -acc=gpu -gpu=cc50 -gpu=cc60 -gpu=cc60 -gpu=cc70 -gpu=cc75 -gpu=cc80 -gpu=cc80 -gpu=cc86 -gpu=cc90 -acc=host o
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 0
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 1
File: /sdk/OSRT/Build/J784S4/c7x-mma-tidl/ti_dl/algo/src/ref/tidl_conv2d_base_ref.c
Function: _Z20TIDL_refConv2dKernelIffffEvPKT_PKT0_PKT1_PT2_SA_SA_iiiiiijjiiiiiiiiiiijiiiiii:300
Line: 309
============= [Quantization & Calibration for subgraph_0 Started] =============
----------------------------- Optimization Summary -----------------------------
-------------------------------------------------------------------------------------
| Layer | Nodes before optimization | Nodes after optimization |
-------------------------------------------------------------------------------------
| TIDL_OdOutputReformatLayer | 0 | 2 |
| TIDL_ReLULayer | 52 | 0 |
| TIDL_ResizeLayer | 2 | 2 |
| TIDL_ConvolutionLayer | 90 | 90 |
| TIDL_EltWiseLayer | 14 | 12 |
| TIDL_DetectionOutputLayer | 0 | 1 |
| TIDL_CastLayer | 1 | 0 |
-------------------------------------------------------------------------------------
=================== [Optimization for subgraph_0 Completed] ===================
Accelerator Fatal Error: This file was compiled: -acc=gpu -gpu=cc50 -gpu=cc60 -gpu=cc60 -gpu=cc70 -gpu=cc75 -gpu=cc80 -gpu=cc80 -gpu=cc86 -gpu=cc90 -acc=host o
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 0
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 1
File: /sdk/OSRT/Build/J784S4/c7x-mma-tidl/ti_dl/algo/src/ref/tidl_conv2d_base_ref.c
Function: _Z20TIDL_refConv2dKernelIffffEvPKT_PKT0_PKT1_PT2_SA_SA_iiiiiijjiiiiiiiiiiijiiiiii:300
Line: 309
The soft limit is 10240
The hard limit is 10240
MEM: Init ... !!!
MEM: Init ... Done !!!
0.0s: VX_ZONE_INIT:Enabled
0.7s: VX_ZONE_ERROR:Enabled
0.9s: VX_ZONE_WARNING:Enabled
0.3514s: VX_ZONE_INIT:[tivxInit:190] Initialization Done !!!
============= [Quantization & Calibration for subgraph_0 Started] =============
Accelerator Fatal Error: This file was compiled: -acc=gpu -gpu=cc50 -gpu=cc60 -gpu=cc60 -gpu=cc70 -gpu=cc75 -gpu=cc80 -gpu=cc80 -gpu=cc86 -gpu=cc90 -acc=host o
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 0
Rebuild this file with -gpu=cc89 to use NVIDIA Tesla GPU 1
File: /sdk/OSRT/Build/J784S4/c7x-mma-tidl/ti_dl/algo/src/ref/tidl_conv2d_base_ref.c
Function: _Z24TIDL_refConv2dKernelFastILi3EffffEvPT0_PT1_PT2_PT3_S7_S7_iiiiiiiiiiiiiiiiiiijiiiiii:462
Line: 472