J721EXSOMXEVM: Unable to compile custom convolution model

Vaibhav Kashera

Greetings,

I have been trying to compile a a fully convolution neural network for text recognition using edgeai-tidl-tools. The model runs fine when using only CPUExecutionProvider in the EP_list, but fails when we try to compile it with TIDLExecutionProvider.

Here are the compile options:

output_dir = 'fcnn-artifacts'
num_bits = 8
accuracy = 1
onnx_model_path = 'fcnn.onnx'

compile_options = {
    'tidl_tools_path': os.environ['TIDL_TOOLS_PATH'],
    'artifacts_folder': output_dir,
    'tensor_bits': num_bits,
    'accuracy_level': accuracy,
    'advanced_options:calibration_frames': len(calib_images),
    'advanced_options:calibration_iterations': 3,  # used if accuracy_level = 1
    'debug_level': 3,
    # Comma separated string of operator types as defined by ONNX runtime, ex "MaxPool, Concat"
    'deny_list': ""
}

Here are the logs during the segmentation fault:

Running shape inference on model ../../../models/public/vgg_fcn_text_opset11.onnx 

tidl_tools_path                                 = /home/root/tidl_tools 
artifacts_folder                                = ../../../model-artifacts//fcnn/ 
tidl_tensor_bits                                = 16 
debug_level                                     = 3 
num_tidl_subgraphs                              = 16 
tidl_denylist                                   = 
tidl_denylist_layer_name                        = 
tidl_denylist_layer_type                         = 
tidl_allowlist_layer_name                        = 
model_type                                      =  
tidl_calibration_accuracy_level                 = 7 
tidl_calibration_options:num_frames_calibration = 3 
tidl_calibration_options:bias_calibration_iterations = 5 
mixed_precision_factor = -1.000000 
model_group_id = 0 
power_of_2_quantization                         = 2 
enable_high_resolution_optimization             = 0 
pre_batchnorm_fold                              = 1 
add_data_convert_ops                          = 3 
output_feature_16bit_names_list                 =  
m_params_16bit_names_list                       =  
reserved_compile_constraints_flag               = 1601 
ti_internal_reserved_1                          = 

 ****** WARNING : Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options******

Supported TIDL layer type ---         Reshape -- model/block1_conv1/BiasAdd__6 
Supported TIDL layer type ---            Conv -- model/block1_conv1/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block1_conv1/Relu 
Supported TIDL layer type ---            Conv -- model/block1_conv2/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block1_conv2/Relu 
Supported TIDL layer type ---         MaxPool -- model/block1_pool/MaxPool 
Supported TIDL layer type ---            Conv -- model/block2_conv1/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block2_conv1/Relu 
Supported TIDL layer type ---            Conv -- model/block2_conv2/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block2_conv2/Relu 
Supported TIDL layer type ---         MaxPool -- model/block2_pool/MaxPool 
Supported TIDL layer type ---            Conv -- model/block3_conv1/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block3_conv1/Relu 
Supported TIDL layer type ---            Conv -- model/block3_conv2/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block3_conv2/Relu 
Supported TIDL layer type ---            Conv -- model/block3_conv3/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block3_conv3/Relu 
Supported TIDL layer type ---         MaxPool -- model/block3_pool/MaxPool 
Supported TIDL layer type ---       Transpose -- Transpose__70 
Unsupported (import) TIDL layer type for ONNX op type --- Shape 
Unsupported (TIDL check) TIDL layer type ---          Gather 
Supported TIDL layer type ---            Cast -- model/reshape/Shape__46 

 Unsupported slice - axis parameters, in Slice -- model/reshape/strided_slice 
Unsupported (TIDL check) TIDL layer type ---           Slice 
Unsupported (TIDL check) TIDL layer type ---          Concat 
Supported TIDL layer type ---            Cast -- model/reshape/Reshape__55 
Segmentation fault (core dumped)

When we put certain layers into the deny list:

delegate_options['deny_list'] = "Shape, Concat, Slice, Gather, Reshape, Transpose"

**********  Frame Index 1 : Running float inference **********
2024-01-04 05:21:57.262513053 [E:onnxruntime:, sequential_executor.cc:339 Execute] Non-zero status code returned while running Concat node. Name:'model/reshape/Reshape/shape_Concat__54' Status Message: /home/a0496663/work/edgeaitidltools/rel90/onnx/onnxruntime_bit/onnxruntime/onnxruntime/core/providers/cpu/tensor/concat.cc:72 onnxruntime::common::Status onnxruntime::ConcatBase::PrepareForCompute(onnxruntime::OpKernelContext*, const std::vector<const onnxruntime::Tensor*>&, onnxruntime::Prepare&) const inputs_n_rank == inputs_0_rank was false. Ranks of input data are different, cannot concatenate them. expected rank: 4 got: 1

Traceback (most recent call last):
  File "/home/root/examples/osrt_python/ort/fcnn_onnxrt_ep.py", line 266, in <module>
    run_model(model, mIdx)
  File "/home/root/examples/osrt_python/ort/fcnn_onnxrt_ep.py", line 215, in run_model
    preds = sess.run([output_name], {input_name: test_img})
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 188, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Concat node. Name:'model/reshape/Reshape/shape_Concat__54' Status Message: /home/a0496663/work/edgeaitidltools/rel90/onnx/onnxruntime_bit/onnxruntime/onnxruntime/core/providers/cpu/tensor/concat.cc:72 onnxruntime::common::Status onnxruntime::ConcatBase::PrepareForCompute(onnxruntime::OpKernelContext*, const std::vector<const onnxruntime::Tensor*>&, onnxruntime::Prepare&) const inputs_n_rank == inputs_0_rank was false. Ranks of input data are different, cannot concatenate them. expected rank: 4 got: 1

************ in TIDL_subgraphRtDelete ************ 
 ************ in TIDL_subgraphRtDelete ************ 
 MEM: Deinit ... !!!
MEM: Alloc's: 54 alloc's of 183110924 bytes 
MEM: Free's : 54 free's  of 183110924 bytes 
MEM: Open's : 0 allocs  of 0 bytes 
MEM: Deinit ... Done !!!
************ in TIDL_subgraphRtDelete ************ 
Segmentation fault (core dumped)

Unable to get past this internal compilation errors even though the model is primarily a convolution model, with some extra operators added due to conversion from tensorflow to onnx. Please guide on how to get past this error, as this is critical to our pipeline.

Regards,

Vaibhav Kashera

over 2 years ago

0 Pratik Kedar over 2 years ago

TI__Mastermind 24041 points

Hi,

Vaibhav Kashera said:
but fails when we try to compile it with TIDLExecutionProvider.

Can you explain, how you are using TIDLExecutionProvider for Model Compilation? As this is used for TIDL-RT for inference.

Can you share few observations ?

As i see, the model is failing after Supported TIDL layer type --- Cast -- model/reshape/Reshape__55, when deny list is null.

Can you help me with debug_level = 2 logs for model inference ?

0 Vaibhav Kashera over 2 years ago in reply to Pratik Kedar

Prodigy 60 points

Hi Pratik,

Thank you for your reply. On keeping the deny list as null, and setting debug level =2
Here are the logs:

Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']

Running 1 Models - ['fcnn']


Running_Model :  fcnn  


Running shape inference on model ../../../models/public/vgg_fcn_text_opset11.onnx 

tidl_tools_path                                 = /home/root/tidl_tools 
artifacts_folder                                = ../../../model-artifacts//fcnn/ 
tidl_tensor_bits                                = 16 
debug_level                                     = 2 
num_tidl_subgraphs                              = 16 
tidl_denylist                                   = 
tidl_denylist_layer_name                        = 
tidl_denylist_layer_type                         = 
tidl_allowlist_layer_name                        = 
model_type                                      =  
tidl_calibration_accuracy_level                 = 7 
tidl_calibration_options:num_frames_calibration = 3 
tidl_calibration_options:bias_calibration_iterations = 5 
mixed_precision_factor = -1.000000 
model_group_id = 0 
power_of_2_quantization                         = 2 
enable_high_resolution_optimization             = 0 
pre_batchnorm_fold                              = 1 
add_data_convert_ops                          = 3 
output_feature_16bit_names_list                 =  
m_params_16bit_names_list                       =  
reserved_compile_constraints_flag               = 1601 
ti_internal_reserved_1                          = 

 ****** WARNING : Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options******

Supported TIDL layer type ---         Reshape -- model/block1_conv1/BiasAdd__6 
Supported TIDL layer type ---            Conv -- model/block1_conv1/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block1_conv1/Relu 
Supported TIDL layer type ---            Conv -- model/block1_conv2/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block1_conv2/Relu 
Supported TIDL layer type ---         MaxPool -- model/block1_pool/MaxPool 
Supported TIDL layer type ---            Conv -- model/block2_conv1/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block2_conv1/Relu 
Supported TIDL layer type ---            Conv -- model/block2_conv2/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block2_conv2/Relu 
Supported TIDL layer type ---         MaxPool -- model/block2_pool/MaxPool 
Supported TIDL layer type ---            Conv -- model/block3_conv1/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block3_conv1/Relu 
Supported TIDL layer type ---            Conv -- model/block3_conv2/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block3_conv2/Relu 
Supported TIDL layer type ---            Conv -- model/block3_conv3/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block3_conv3/Relu 
Supported TIDL layer type ---         MaxPool -- model/block3_pool/MaxPool 
Supported TIDL layer type ---       Transpose -- Transpose__70 
Unsupported (import) TIDL layer type for ONNX op type --- Shape 
Unsupported (TIDL check) TIDL layer type ---          Gather 
Supported TIDL layer type ---            Cast -- model/reshape/Shape__46 

 Unsupported slice - axis parameters, in Slice -- model/reshape/strided_slice 
Unsupported (TIDL check) TIDL layer type ---           Slice 
Unsupported (TIDL check) TIDL layer type ---          Concat 
Supported TIDL layer type ---            Cast -- model/reshape/Reshape__55 
Segmentation fault (core dumped)

If any other information is needed on my end, please let me know.

Thanks,

Vaibhav Kashera

0 Vaibhav Kashera over 2 years ago in reply to Vaibhav Kashera

Prodigy 60 points

Dear Pratik,

Attaching the onnx model here for your information as well. Again, very thankful for the effort you are putting on your end as this is an urgent issue we are facingvgg_fcn_text_opset11.onnx.zip.

Regards,

Vaibhav Kashera

0 Pratik Kedar over 2 years ago in reply to Vaibhav Kashera

TI__Mastermind 24041 points

Hi,

Thanks for sharing the logs.

Can you help me with svg file for compiled model ?

In particularly am interested in understanding for which layer its giving seg fault (layer after model/reshape/Reshape__55 )

Before that, can you share which sdk tag you are using for edgeai-tidl-tools repos ? (if its not latest 9.1.0.5) can you re-run the above experiment with this and share updated logs and svg file.

0 Vaibhav Kashera over 2 years ago in reply to Pratik Kedar

Prodigy 60 points

Hi Pratik,

Thanks for pointing out the new update in the sdk tag. However, even with github branch tag 09_01_00_05 for the latest tools, we are still getting same logs. Find them here.

Also since the compilation throws segmentation fault, no svg file is generated in this or older sdk versions. The only file created is allowedNode.txt which is also empty.

Kindly advise on the next debug steps.

Running shape inference on model ../../../models/public/vgg_fcn_text_opset11.onnx 

tidl_tools_path                                 = /home/root/tidl_tools 
artifacts_folder                                = ../../../model-artifacts//fcnn/ 
tidl_tensor_bits                                = 8 
debug_level                                     = 2 
num_tidl_subgraphs                              = 16 
tidl_denylist                                   = 
tidl_denylist_layer_name                        = 
tidl_denylist_layer_type                         = 
tidl_allowlist_layer_name                        = 
model_type                                      =  
tidl_calibration_accuracy_level                 = 7 
tidl_calibration_options:num_frames_calibration = 2 
tidl_calibration_options:bias_calibration_iterations = 5 
mixed_precision_factor = -1.000000 
model_group_id = 0 
power_of_2_quantization                         = 2 
ONNX QDQ Enabled                                = 0 
enable_high_resolution_optimization             = 0 
pre_batchnorm_fold                              = 1 
add_data_convert_ops                          = 3 
output_feature_16bit_names_list                 =  
m_params_16bit_names_list                       =  
reserved_compile_constraints_flag               = 1601 
ti_internal_reserved_1                          = 


 ****** WARNING : Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options******

Supported TIDL layer type ---         Reshape -- model/block1_conv1/BiasAdd__6 
Supported TIDL layer type ---            Conv -- model/block1_conv1/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block1_conv1/Relu 
Supported TIDL layer type ---            Conv -- model/block1_conv2/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block1_conv2/Relu 
Supported TIDL layer type ---         MaxPool -- model/block1_pool/MaxPool 
Supported TIDL layer type ---            Conv -- model/block2_conv1/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block2_conv1/Relu 
Supported TIDL layer type ---            Conv -- model/block2_conv2/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block2_conv2/Relu 
Supported TIDL layer type ---         MaxPool -- model/block2_pool/MaxPool 
Supported TIDL layer type ---            Conv -- model/block3_conv1/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block3_conv1/Relu 
Supported TIDL layer type ---            Conv -- model/block3_conv2/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block3_conv2/Relu 
Supported TIDL layer type ---            Conv -- model/block3_conv3/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block3_conv3/Relu 
Supported TIDL layer type ---         MaxPool -- model/block3_pool/MaxPool 
Unsupported (imports) TIDL layer type for ONNX op type --- Shape 
Supported TIDL layer type ---          Gather -- Gather__75 
Supported TIDL layer type ---            Cast -- model/reshape/Shape__46 
Supported TIDL layer type ---           Slice -- model/reshape/strided_slice 
Unsupported (TIDL check) TIDL layer type ---          Concat 
Supported TIDL layer type ---            Cast -- model/reshape/Reshape__55 
Supported TIDL layer type ---       Transpose -- Transpose__70 
Segmentation fault (core dumped)

0 Pratik Kedar over 2 years ago in reply to Vaibhav Kashera

TI__Mastermind 24041 points

Thanks for sharing observation on ToT SDK tag branch.

I have debug_level 2 logs as well, i will check this issue with dev team internally and try to get back to you in 1 week.

Thank you for your patience

0 Pratik Kedar over 2 years ago in reply to Pratik Kedar

TI__Mastermind 24041 points

Hi,

Seems like there could be possibility of issue coming from OSRT front, per my initial level discussion with the team.

I have file the JIRA, tentatively this fix could be available in next release 9.2.

Adding JIRA link for TI's internal tracking purpose.

jira.itg.ti.com/.../TIDL-3758

Processors

Processors forum

J721EXSOMXEVM: Unable to compile custom convolution model