This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

J721EXSOMXEVM: Unable to compile custom convolution model

Part Number: J721EXSOMXEVM

Greetings,

I have been trying to compile a a fully convolution neural network for text recognition using edgeai-tidl-tools. The model runs fine when using only CPUExecutionProvider in the EP_list, but fails when we try to compile it with TIDLExecutionProvider.

Here are the compile options:

output_dir = 'fcnn-artifacts'
num_bits = 8
accuracy = 1
onnx_model_path = 'fcnn.onnx'

compile_options = {
    'tidl_tools_path': os.environ['TIDL_TOOLS_PATH'],
    'artifacts_folder': output_dir,
    'tensor_bits': num_bits,
    'accuracy_level': accuracy,
    'advanced_options:calibration_frames': len(calib_images),
    'advanced_options:calibration_iterations': 3,  # used if accuracy_level = 1
    'debug_level': 3,
    # Comma separated string of operator types as defined by ONNX runtime, ex "MaxPool, Concat"
    'deny_list': ""
}

Here are the logs during the segmentation fault:

Running shape inference on model ../../../models/public/vgg_fcn_text_opset11.onnx 

tidl_tools_path                                 = /home/root/tidl_tools 
artifacts_folder                                = ../../../model-artifacts//fcnn/ 
tidl_tensor_bits                                = 16 
debug_level                                     = 3 
num_tidl_subgraphs                              = 16 
tidl_denylist                                   = 
tidl_denylist_layer_name                        = 
tidl_denylist_layer_type                         = 
tidl_allowlist_layer_name                        = 
model_type                                      =  
tidl_calibration_accuracy_level                 = 7 
tidl_calibration_options:num_frames_calibration = 3 
tidl_calibration_options:bias_calibration_iterations = 5 
mixed_precision_factor = -1.000000 
model_group_id = 0 
power_of_2_quantization                         = 2 
enable_high_resolution_optimization             = 0 
pre_batchnorm_fold                              = 1 
add_data_convert_ops                          = 3 
output_feature_16bit_names_list                 =  
m_params_16bit_names_list                       =  
reserved_compile_constraints_flag               = 1601 
ti_internal_reserved_1                          = 

 ****** WARNING : Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options******

Supported TIDL layer type ---         Reshape -- model/block1_conv1/BiasAdd__6 
Supported TIDL layer type ---            Conv -- model/block1_conv1/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block1_conv1/Relu 
Supported TIDL layer type ---            Conv -- model/block1_conv2/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block1_conv2/Relu 
Supported TIDL layer type ---         MaxPool -- model/block1_pool/MaxPool 
Supported TIDL layer type ---            Conv -- model/block2_conv1/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block2_conv1/Relu 
Supported TIDL layer type ---            Conv -- model/block2_conv2/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block2_conv2/Relu 
Supported TIDL layer type ---         MaxPool -- model/block2_pool/MaxPool 
Supported TIDL layer type ---            Conv -- model/block3_conv1/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block3_conv1/Relu 
Supported TIDL layer type ---            Conv -- model/block3_conv2/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block3_conv2/Relu 
Supported TIDL layer type ---            Conv -- model/block3_conv3/BiasAdd 
Supported TIDL layer type ---            Relu -- model/block3_conv3/Relu 
Supported TIDL layer type ---         MaxPool -- model/block3_pool/MaxPool 
Supported TIDL layer type ---       Transpose -- Transpose__70 
Unsupported (import) TIDL layer type for ONNX op type --- Shape 
Unsupported (TIDL check) TIDL layer type ---          Gather 
Supported TIDL layer type ---            Cast -- model/reshape/Shape__46 

 Unsupported slice - axis parameters, in Slice -- model/reshape/strided_slice 
Unsupported (TIDL check) TIDL layer type ---           Slice 
Unsupported (TIDL check) TIDL layer type ---          Concat 
Supported TIDL layer type ---            Cast -- model/reshape/Reshape__55 
Segmentation fault (core dumped)

When we put certain layers into the deny list: 


delegate_options['deny_list'] = "Shape, Concat, Slice, Gather, Reshape, Transpose"

**********  Frame Index 1 : Running float inference **********
2024-01-04 05:21:57.262513053 [E:onnxruntime:, sequential_executor.cc:339 Execute] Non-zero status code returned while running Concat node. Name:'model/reshape/Reshape/shape_Concat__54' Status Message: /home/a0496663/work/edgeaitidltools/rel90/onnx/onnxruntime_bit/onnxruntime/onnxruntime/core/providers/cpu/tensor/concat.cc:72 onnxruntime::common::Status onnxruntime::ConcatBase::PrepareForCompute(onnxruntime::OpKernelContext*, const std::vector<const onnxruntime::Tensor*>&, onnxruntime::Prepare&) const inputs_n_rank == inputs_0_rank was false. Ranks of input data are different, cannot concatenate them. expected rank: 4 got: 1

Traceback (most recent call last):
  File "/home/root/examples/osrt_python/ort/fcnn_onnxrt_ep.py", line 266, in <module>
    run_model(model, mIdx)
  File "/home/root/examples/osrt_python/ort/fcnn_onnxrt_ep.py", line 215, in run_model
    preds = sess.run([output_name], {input_name: test_img})
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 188, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Concat node. Name:'model/reshape/Reshape/shape_Concat__54' Status Message: /home/a0496663/work/edgeaitidltools/rel90/onnx/onnxruntime_bit/onnxruntime/onnxruntime/core/providers/cpu/tensor/concat.cc:72 onnxruntime::common::Status onnxruntime::ConcatBase::PrepareForCompute(onnxruntime::OpKernelContext*, const std::vector<const onnxruntime::Tensor*>&, onnxruntime::Prepare&) const inputs_n_rank == inputs_0_rank was false. Ranks of input data are different, cannot concatenate them. expected rank: 4 got: 1

************ in TIDL_subgraphRtDelete ************ 
 ************ in TIDL_subgraphRtDelete ************ 
 MEM: Deinit ... !!!
MEM: Alloc's: 54 alloc's of 183110924 bytes 
MEM: Free's : 54 free's  of 183110924 bytes 
MEM: Open's : 0 allocs  of 0 bytes 
MEM: Deinit ... Done !!!
************ in TIDL_subgraphRtDelete ************ 
Segmentation fault (core dumped)

Unable to get past this internal compilation errors even though the model is primarily a convolution model, with some extra operators added due to conversion from tensorflow to onnx. Please guide on how to get past this error, as this is critical to our pipeline.

Regards,

Vaibhav Kashera

  • Hi,

    but fails when we try to compile it with TIDLExecutionProvider.

    Can you explain, how you are using TIDLExecutionProvider for Model Compilation? As this is used for TIDL-RT for inference.

    Can you share few observations ?

    As i see, the model is failing after Supported TIDL layer type ---            Cast -- model/reshape/Reshape__55, when deny list is null.

    Can you help me with debug_level = 2 logs for model inference ?

  • Hi Pratik,

    Thank you for your reply. On keeping the deny list as null, and setting debug level =2 
    Here are the logs:

    Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    
    Running 1 Models - ['fcnn']
    
    
    Running_Model :  fcnn  
    
    
    Running shape inference on model ../../../models/public/vgg_fcn_text_opset11.onnx 
    
    tidl_tools_path                                 = /home/root/tidl_tools 
    artifacts_folder                                = ../../../model-artifacts//fcnn/ 
    tidl_tensor_bits                                = 16 
    debug_level                                     = 2 
    num_tidl_subgraphs                              = 16 
    tidl_denylist                                   = 
    tidl_denylist_layer_name                        = 
    tidl_denylist_layer_type                         = 
    tidl_allowlist_layer_name                        = 
    model_type                                      =  
    tidl_calibration_accuracy_level                 = 7 
    tidl_calibration_options:num_frames_calibration = 3 
    tidl_calibration_options:bias_calibration_iterations = 5 
    mixed_precision_factor = -1.000000 
    model_group_id = 0 
    power_of_2_quantization                         = 2 
    enable_high_resolution_optimization             = 0 
    pre_batchnorm_fold                              = 1 
    add_data_convert_ops                          = 3 
    output_feature_16bit_names_list                 =  
    m_params_16bit_names_list                       =  
    reserved_compile_constraints_flag               = 1601 
    ti_internal_reserved_1                          = 
    
     ****** WARNING : Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options******
    
    Supported TIDL layer type ---         Reshape -- model/block1_conv1/BiasAdd__6 
    Supported TIDL layer type ---            Conv -- model/block1_conv1/BiasAdd 
    Supported TIDL layer type ---            Relu -- model/block1_conv1/Relu 
    Supported TIDL layer type ---            Conv -- model/block1_conv2/BiasAdd 
    Supported TIDL layer type ---            Relu -- model/block1_conv2/Relu 
    Supported TIDL layer type ---         MaxPool -- model/block1_pool/MaxPool 
    Supported TIDL layer type ---            Conv -- model/block2_conv1/BiasAdd 
    Supported TIDL layer type ---            Relu -- model/block2_conv1/Relu 
    Supported TIDL layer type ---            Conv -- model/block2_conv2/BiasAdd 
    Supported TIDL layer type ---            Relu -- model/block2_conv2/Relu 
    Supported TIDL layer type ---         MaxPool -- model/block2_pool/MaxPool 
    Supported TIDL layer type ---            Conv -- model/block3_conv1/BiasAdd 
    Supported TIDL layer type ---            Relu -- model/block3_conv1/Relu 
    Supported TIDL layer type ---            Conv -- model/block3_conv2/BiasAdd 
    Supported TIDL layer type ---            Relu -- model/block3_conv2/Relu 
    Supported TIDL layer type ---            Conv -- model/block3_conv3/BiasAdd 
    Supported TIDL layer type ---            Relu -- model/block3_conv3/Relu 
    Supported TIDL layer type ---         MaxPool -- model/block3_pool/MaxPool 
    Supported TIDL layer type ---       Transpose -- Transpose__70 
    Unsupported (import) TIDL layer type for ONNX op type --- Shape 
    Unsupported (TIDL check) TIDL layer type ---          Gather 
    Supported TIDL layer type ---            Cast -- model/reshape/Shape__46 
    
     Unsupported slice - axis parameters, in Slice -- model/reshape/strided_slice 
    Unsupported (TIDL check) TIDL layer type ---           Slice 
    Unsupported (TIDL check) TIDL layer type ---          Concat 
    Supported TIDL layer type ---            Cast -- model/reshape/Reshape__55 
    Segmentation fault (core dumped)

    If any other information is needed on my end, please let me know.

    Thanks,

    Vaibhav Kashera

  • Dear Pratik,

    Attaching the onnx model here for your information as well. Again, very thankful for the effort you are putting on your end as this is an urgent issue we are facingvgg_fcn_text_opset11.onnx.zip.

    Regards,

    Vaibhav Kashera

  • Hi,

    Thanks for sharing the logs.

    Can you help me with svg file for compiled model ?

    In particularly am interested in understanding for which layer its giving seg fault (layer after model/reshape/Reshape__55 )

    Before that, can you share which sdk tag you are using for edgeai-tidl-tools repos ? (if its not latest 9.1.0.5) can you re-run the above experiment with this and share updated logs and svg file.

  • Hi Pratik,

    Thanks for pointing out the new update in the sdk tag. However, even with github branch tag 09_01_00_05 for the latest tools, we are still getting same logs. Find them here.

    Also since the compilation throws segmentation fault, no svg file is generated in this or older sdk versions. The only file created is allowedNode.txt which is also empty.

    Kindly advise on the next debug steps.

    Running shape inference on model ../../../models/public/vgg_fcn_text_opset11.onnx 
    
    tidl_tools_path                                 = /home/root/tidl_tools 
    artifacts_folder                                = ../../../model-artifacts//fcnn/ 
    tidl_tensor_bits                                = 8 
    debug_level                                     = 2 
    num_tidl_subgraphs                              = 16 
    tidl_denylist                                   = 
    tidl_denylist_layer_name                        = 
    tidl_denylist_layer_type                         = 
    tidl_allowlist_layer_name                        = 
    model_type                                      =  
    tidl_calibration_accuracy_level                 = 7 
    tidl_calibration_options:num_frames_calibration = 2 
    tidl_calibration_options:bias_calibration_iterations = 5 
    mixed_precision_factor = -1.000000 
    model_group_id = 0 
    power_of_2_quantization                         = 2 
    ONNX QDQ Enabled                                = 0 
    enable_high_resolution_optimization             = 0 
    pre_batchnorm_fold                              = 1 
    add_data_convert_ops                          = 3 
    output_feature_16bit_names_list                 =  
    m_params_16bit_names_list                       =  
    reserved_compile_constraints_flag               = 1601 
    ti_internal_reserved_1                          = 
    
    
     ****** WARNING : Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options******
    
    Supported TIDL layer type ---         Reshape -- model/block1_conv1/BiasAdd__6 
    Supported TIDL layer type ---            Conv -- model/block1_conv1/BiasAdd 
    Supported TIDL layer type ---            Relu -- model/block1_conv1/Relu 
    Supported TIDL layer type ---            Conv -- model/block1_conv2/BiasAdd 
    Supported TIDL layer type ---            Relu -- model/block1_conv2/Relu 
    Supported TIDL layer type ---         MaxPool -- model/block1_pool/MaxPool 
    Supported TIDL layer type ---            Conv -- model/block2_conv1/BiasAdd 
    Supported TIDL layer type ---            Relu -- model/block2_conv1/Relu 
    Supported TIDL layer type ---            Conv -- model/block2_conv2/BiasAdd 
    Supported TIDL layer type ---            Relu -- model/block2_conv2/Relu 
    Supported TIDL layer type ---         MaxPool -- model/block2_pool/MaxPool 
    Supported TIDL layer type ---            Conv -- model/block3_conv1/BiasAdd 
    Supported TIDL layer type ---            Relu -- model/block3_conv1/Relu 
    Supported TIDL layer type ---            Conv -- model/block3_conv2/BiasAdd 
    Supported TIDL layer type ---            Relu -- model/block3_conv2/Relu 
    Supported TIDL layer type ---            Conv -- model/block3_conv3/BiasAdd 
    Supported TIDL layer type ---            Relu -- model/block3_conv3/Relu 
    Supported TIDL layer type ---         MaxPool -- model/block3_pool/MaxPool 
    Unsupported (imports) TIDL layer type for ONNX op type --- Shape 
    Supported TIDL layer type ---          Gather -- Gather__75 
    Supported TIDL layer type ---            Cast -- model/reshape/Shape__46 
    Supported TIDL layer type ---           Slice -- model/reshape/strided_slice 
    Unsupported (TIDL check) TIDL layer type ---          Concat 
    Supported TIDL layer type ---            Cast -- model/reshape/Reshape__55 
    Supported TIDL layer type ---       Transpose -- Transpose__70 
    Segmentation fault (core dumped)

  • Thanks for sharing observation on ToT SDK tag branch.

    I have debug_level 2 logs as well, i will check this issue with dev team internally and try to get back to you in 1 week.

    Thank you for your patience

  • Hi,

    Seems like there could be possibility of issue coming from OSRT front, per my initial level discussion with the team.

    I have file the JIRA, tentatively this fix could be available in next release 9.2.

    Adding JIRA link for TI's internal tracking purpose.

    jira.itg.ti.com/.../TIDL-3758