This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM62A7: Problem Compiling Custom Model

Part Number: AM62A7


Tool/software:

Greetings!

I’ve been trying to offload the inference of a custom model to the C7x/MMA cores of the AM62A board without much success. Unlike the vast majority of the problems discussed on the forum and on the TIDL Tools / EdgeAI Academy material, the type of inference that I need to perform does not involve image processing. Instead, the model classifies meta-data that is structured in a panda dataframe. The model was trained using the sklearn library and exported to both ONNX and TFLite using skl2onnx and TFLiteConverter through keras. Both converted models run without problem on the ARM processor, but not on the C7x/MMA accelerators.

When trying to compile the TFLite model, only the tempDir is generated on the custom-artifacts folder. For the ONNX model, I’ve tried two ways of compiling it: adapting the Jupyter notebooks available in the examples folder, and adapting the onnxrt_ep.py script available on the examples/osrt_python/ort folder. The latter seem to yield better results, as it generates more files inside the tempDir and provides more extensive debugging information. However, the script hangs during the ‘Quantization & Calibration for subgraph_0’, resulting in the generation of the following files inside the custom-artifacts folder:

  • allowedNode.txt
  • onnxrtMetaData.txt
  • /tempDir
    • graphvizInfo.txt
    • runtimes_visualization.svg
    • subgraph_0_calib_raw_data.bin
    • subgraph_0_tidl_io_1.bin
    • subgraph_0_tidl_net.bin
    • subgraph_0_tidl_net.bin.layer_info.txt
    • subgraph_0_tidl_net.bin.svg
    • subgraph_0_tidl_net.bin_netLog.txt

It does provide the following error/warning moments before hanging:

============= [Quantization & Calibration for subgraph_0 Started] =============

2025-01-08 14:46:34.193707173 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running Gemm node. Name:'gemm_token_0' Status Message: /root/onnxruntime/onnxruntime/core/providers/cpu/math/gemm_helper.h:14 onnxruntime::GemmHelper::GemmHelper(const onnxruntime::TensorShape&, bool, const onnxruntime::TensorShape&, bool, const onnxruntime::TensorShape&) left.NumDimensions() == 2 || left.NumDimensions() == 1 was false. 

Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/root/examples/osrt_python/ort/onnxrt_ep.py", line 415, in run_model
    classified_data = run_prediction(sess, scaled_data)
  File "/home/root/examples/osrt_python/ort/onnxrt_ep.py", line 142, in run_prediction
    predictions = session.run([output_name], {input_name: input_data})[0]
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 200, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Gemm node. Name:'gemm_token_0' Status Message: /root/onnxruntime/onnxruntime/core/providers/cpu/math/gemm_helper.h:14 onnxruntime::GemmHelper::GemmHelper(const onnxruntime::TensorShape&, bool, const onnxruntime::TensorShape&, bool, const onnxruntime::TensorShape&) left.NumDimensions() == 2 || left.NumDimensions() == 1 was false. 

Following the advice presented on the TIDL’s git to include possibly problematic nodes to the model’s deny_list, I modified the model_configs.py file to look like this:

    "myModel": create_model_config(
        source=AttrDict(
            infer_shape=True,
        ),
        preprocess=AttrDict(
        ),
        session=AttrDict(
            session_name="onnxrt",
            model_path=os.path.join(models_base_path, "myModel.onnx"),
        ),
        task_type="other",
        extra_info=AttrDict(
        ),
        optional_options={
            'deny_list' : 'Gemm',
        }
    ),

However, although the node is successfully added to the list of unsupported nodes, the script still hangs and gives the exact same output.

------------------------------------------------------------------------------------------------------------------------------------------------------
|         Node          |       Node Name       |                                               Reason                                               |
------------------------------------------------------------------------------------------------------------------------------------------------------
| Gemm                  | gemm                  | Node gemm added to unsupported nodes as specified in deny list                                     |

The docker file I used to perform the above compilations was configured under the SDK version 10_00_08_00, which is the same I have on my AM62A, and the ONNXRuntime version is 1.14.0. As a way to check the docker’s and the board’s configuration setups, I have successfully executed some of the examples on the examples/jupyter_notebooks folder on both the ARM and the C7x/MMA processors.

However, considering that the onnxrt_ep.py script focuses on image processing examples, I had to comment out, modify, and include additional inference and data pre-processing methods to compile my custom model. As you can imagine, it is not straight forward to assess the impact of what I’ve modified on the entire compilation process. It is worth noting that the compilation process successfully identifies a set of nodes that should be offloaded to the accelerators and a set of nodes that could’ve been offloaded, but for the reason described during the compilation process, were not. I am attaching my modified version of the onnxrt_ep.py script, with a comment saying #### ADDED #### on top of the code I’ve added to the file: 

e2e_onnxrt_ep.zip

Best regards,

Giann.

  • Hi Giann,

    I can see you've done your homework here and have put substantial effort into this. I am glad to see this, and we'll seek a resolution to the challenges you're facing. 

    From the log and notes, looks like GEMM / matrix multiply layers are not running with TIDL. I would be curious to see the warnings / logs when you do not deny this layer type, and what the configuration looks like for the layer in the original model file. If you are comfortable sharing, a screenshot of that layer(s) from Netron would be perfect. 

    So when your layer is not run with TIDL on the accelerator, it needs the ONNX Runtime implementation for that GEMM layer. ONNX is throwing warnings about the size of the inputs to the GEMM layer. 

    Please try setting the TIDL_RT_ONNX_VARDIM environment variable to "1" in the calling linux environment.

    export TIDL_RT_ONNX_VARDIM=1

    This tells TIDL to squeeze / strip unneeded dimensions when passing data back to ONNX. By default, TIDL tends to use a 6-D representation -- this env variable is meant to resolve this. 

    • Worth noting this option should not be needed with 10.1 SDK release (repo tagged 10_01_00_02 for this). AM62A's release of this SDK will be within the next couple of weeks. 

    The process is likely hanging because onnxrt_ep.py likes to fork a thread for each model you are compiling. If your model hit a fault during compilation, it does not exit this thread correctly in Python, and you can KeyboardInterrupt to safely exit

    BR,
    Reese

  • Hi Reese.

    Thank you for your quick reply! Apologies for the delay, I just wanted to make sure I tested your suggestions properly before getting back to you.

    Setting up the flag worked to allow the compilation process to finish. However, I’m still not able to offload the inference to the C7x/MMA processors. Just importing the unaltered artifacts folder to the AM62A and running my Jupyter notebook gives me the following output:

    Fail: [ONNXRuntimeError] : 1 : FAIL : Create state function failed. Return value:-1

    When comparing my results to other compiled models, the main difference I see is that binaries are generated on the artifacts folder (as well as on the tempDir folder).

    myModel custom-artifacts folder

    myModel tempDir folder

    Correctly Compiled Model - Custom Artifacts Folder

    Are these binaries what the onnxruntime uses to offload the instructions? Could you point me to the part of the compilation script that generates these binaries? I wonder if I omitted the binary file generation part on my script by commenting out the following sections:

        ...
        # Post-Processing for inference
        output_image_file_name = "py_out_" + model + "_" + os.path.basename(input_image[i % len(input_image)])
        output_bin_file_name = output_image_file_name.replace(".jpg", "") + ".bin"
        ...
        # Generate param.yaml after model compilation
        if args.compile or args.disable_offload:
            gen_param_yaml(
                delegate_options["artifacts_folder"], config , int(height), int(width)
            )

    If I try to run my application moving the binaries from the tempDir folder to the custom-artifacts folder,

    I get the following output:

    Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running TIDL_0 node. Name:'TIDLExecutionProvider_TIDL_0_0' Status Message: TIDL Compute Invoke Failed.

    As per your request, I’m also attaching the logs of the compilation process with and without the deny list, and the generated .svg files for the global runtime and for the subgraph0.

    0508.e2e.zip

    Log without deny list

    root@ac9ed4158784:/home/root/examples/osrt_python/ort# python3 onnxrt_ep.py -c
    Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    
    Running 1 Models - ['myModel']
    
    
    Running_Model :  myModel  
    
    
    Running shape inference on model ../../../models/public/myModel.onnx 
    
    ========================= [Model Compilation Started] =========================
    
    Model compilation will perform the following stages:
    1. Parsing
    2. Graph Optimization
    3. Quantization & Calibration
    4. Memory Planning
    
    ============================== [Version Summary] ==============================
    
    -------------------------------------------------------------------------------
    |          TIDL Tools Version          |              10_00_08_00             |
    -------------------------------------------------------------------------------
    |         C7x Firmware Version         |              10_00_02_00             |
    -------------------------------------------------------------------------------
    |            Runtime Version           |            1.14.0+10000005           |
    -------------------------------------------------------------------------------
    |          Model Opset Version         |                  16                  |
    -------------------------------------------------------------------------------
    
    NOTE: The runtime version here specifies ONNXRT_VERSION+TIDL_VERSION
    Ex: 1.14.0+1000XXXX -> ONNXRT 1.14.0 and a TIDL_VERSION 10.00.XX.XX
    
    ============================== [Parsing Started] ==============================
    
    [TIDL Import] [PARSER] WARNING: Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options
    
    ------------------------- Subgraph Information Summary -------------------------
    -------------------------------------------------------------------------------
    |          Core           |      No. of Nodes       |   Number of Subgraphs   |
    -------------------------------------------------------------------------------
    | C7x                     |                       4 |                       4 |
    | CPU                     |                       9 |                       x |
    -------------------------------------------------------------------------------
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    |         Node          |       Node Name       |                                               Reason                                               |
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    | Gemm                  | gemm                  | Bias tensor input should be a vector (1, N) and N should match output dimension                    |
    | Gemm                  | gemm_token_0          | Bias tensor input should be a vector (1, N) and N should match output dimension                    |
    | Gemm                  | gemm_token_1          | Bias tensor input should be a vector (1, N) and N should match output dimension                    |
    | Gemm                  | gemm_token_2          | Bias tensor input should be a vector (1, N) and N should match output dimension                    |
    | Identity              | Identity              | Identity layer with a input node is not supported, only Identity layer at graph input is supported |
    | ArgMax                | ArgMax                | Only axis = -3 is supported                                                                        |
    | ArrayFeatureExtractor | ArrayFeatureExtractor | Layer type not supported by TIDL                                                                   |
    | Reshape               | Reshape               | Input volume should be equal to output volume                                                      |
    | Cast                  | Cast1                 | Subgraph does not have any compute node                                                            |
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    ============================= [Parsing Completed] =============================
    
    ==================== [Optimization for subgraph_0 Started] ====================
    
    ----------------------------- Optimization Summary -----------------------------
    ------------------------------------------------------------------------------
    |        Layer        | Nodes before optimization | Nodes after optimization |
    ------------------------------------------------------------------------------
    | TIDL_BatchNormLayer |                         0 |                        1 |
    | TIDL_ReLULayer      |                         1 |                        0 |
    ------------------------------------------------------------------------------
    
    =================== [Optimization for subgraph_0 Completed] ===================
    
    The soft limit is 10240
    The hard limit is 10240
    MEM: Init ... !!!
    MEM: Init ... Done !!!
     0.0s:  VX_ZONE_INIT:Enabled
     0.11s:  VX_ZONE_ERROR:Enabled
     0.13s:  VX_ZONE_WARNING:Enabled
     0.1850s:  VX_ZONE_INIT:[tivxInit:190] Initialization Done !!!
    ============= [Quantization & Calibration for subgraph_0 Started] =============
    
    ==================== [Optimization for subgraph_1 Started] ====================
    
    ----------------------------- Optimization Summary -----------------------------
    ------------------------------------------------------------------------------
    |        Layer        | Nodes before optimization | Nodes after optimization |
    ------------------------------------------------------------------------------
    | TIDL_BatchNormLayer |                         0 |                        1 |
    | TIDL_ReLULayer      |                         1 |                        0 |
    ------------------------------------------------------------------------------
    
    =================== [Optimization for subgraph_1 Completed] ===================
    
    ============= [Quantization & Calibration for subgraph_1 Started] =============
    
    ==================== [Optimization for subgraph_2 Started] ====================
    
    ----------------------------- Optimization Summary -----------------------------
    ------------------------------------------------------------------------------
    |        Layer        | Nodes before optimization | Nodes after optimization |
    ------------------------------------------------------------------------------
    | TIDL_BatchNormLayer |                         0 |                        1 |
    | TIDL_ReLULayer      |                         1 |                        0 |
    ------------------------------------------------------------------------------
    
    =================== [Optimization for subgraph_2 Completed] ===================
    
    ============= [Quantization & Calibration for subgraph_2 Started] =============
    
    ==================== [Optimization for subgraph_3 Started] ====================
    
    [TIDL Import] [PARSER] WARNING: Requested output data convert layer is not added to the network, It is currently not optimal
    ----------------------------- Optimization Summary -----------------------------
    ----------------------------------------------------------------------------
    |       Layer       | Nodes before optimization | Nodes after optimization |
    ----------------------------------------------------------------------------
    | TIDL_SoftMaxLayer |                         1 |                        1 |
    ----------------------------------------------------------------------------
    
    =================== [Optimization for subgraph_3 Completed] ===================
    
    ============= [Quantization & Calibration for subgraph_3 Started] =============
    
    MEM: Deinit ... !!!
    MEM: Alloc's: 104 alloc's of 76040456 bytes 
    MEM: Free's : 104 free's  of 76040456 bytes 
    MEM: Open's : 0 allocs  of 0 bytes 
    MEM: Deinit ... Done !!!

    Log with deny list

    root@ac9ed4158784:/home/root/examples/osrt_python/ort# python3 onnxrt_ep.py -c
    Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    
    Running 1 Models - ['myModel']
    
    
    Running_Model :  myModel  
    
    
    Running shape inference on model ../../../models/public/myModel.onnx 
    
    ========================= [Model Compilation Started] =========================
    
    Model compilation will perform the following stages:
    1. Parsing
    2. Graph Optimization
    3. Quantization & Calibration
    4. Memory Planning
    
    ============================== [Version Summary] ==============================
    
    -------------------------------------------------------------------------------
    |          TIDL Tools Version          |              10_00_08_00             |
    -------------------------------------------------------------------------------
    |         C7x Firmware Version         |              10_00_02_00             |
    -------------------------------------------------------------------------------
    |            Runtime Version           |            1.14.0+10000005           |
    -------------------------------------------------------------------------------
    |          Model Opset Version         |                  16                  |
    -------------------------------------------------------------------------------
    
    NOTE: The runtime version here specifies ONNXRT_VERSION+TIDL_VERSION
    Ex: 1.14.0+1000XXXX -> ONNXRT 1.14.0 and a TIDL_VERSION 10.00.XX.XX
    
    ============================== [Parsing Started] ==============================
    
    [TIDL Import] [PARSER] WARNING: Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options
    
    ------------------------- Subgraph Information Summary -------------------------
    -------------------------------------------------------------------------------
    |          Core           |      No. of Nodes       |   Number of Subgraphs   |
    -------------------------------------------------------------------------------
    | C7x                     |                       4 |                       4 |
    | CPU                     |                       9 |                       x |
    -------------------------------------------------------------------------------
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    |         Node          |       Node Name       |                                               Reason                                               |
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    | Gemm                  | gemm                  | Node gemm added to unsupported nodes as specified in deny list                                     |
    | Gemm                  | gemm_token_0          | Node gemm_token_0 added to unsupported nodes as specified in deny list                             |
    | Gemm                  | gemm_token_1          | Node gemm_token_1 added to unsupported nodes as specified in deny list                             |
    | Gemm                  | gemm_token_2          | Node gemm_token_2 added to unsupported nodes as specified in deny list                             |
    | Identity              | Identity              | Identity layer with a input node is not supported, only Identity layer at graph input is supported |
    | ArgMax                | ArgMax                | Only axis = -3 is supported                                                                        |
    | ArrayFeatureExtractor | ArrayFeatureExtractor | Layer type not supported by TIDL                                                                   |
    | Reshape               | Reshape               | Input volume should be equal to output volume                                                      |
    | Cast                  | Cast1                 | Subgraph does not have any compute node                                                            |
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    ============================= [Parsing Completed] =============================
    
    ==================== [Optimization for subgraph_0 Started] ====================
    
    ----------------------------- Optimization Summary -----------------------------
    ------------------------------------------------------------------------------
    |        Layer        | Nodes before optimization | Nodes after optimization |
    ------------------------------------------------------------------------------
    | TIDL_BatchNormLayer |                         0 |                        1 |
    | TIDL_ReLULayer      |                         1 |                        0 |
    ------------------------------------------------------------------------------
    
    =================== [Optimization for subgraph_0 Completed] ===================
    
    The soft limit is 10240
    The hard limit is 10240
    MEM: Init ... !!!
    MEM: Init ... Done !!!
     0.0s:  VX_ZONE_INIT:Enabled
     0.11s:  VX_ZONE_ERROR:Enabled
     0.13s:  VX_ZONE_WARNING:Enabled
     0.1727s:  VX_ZONE_INIT:[tivxInit:190] Initialization Done !!!
    ============= [Quantization & Calibration for subgraph_0 Started] =============
    
    ==================== [Optimization for subgraph_1 Started] ====================
    
    ----------------------------- Optimization Summary -----------------------------
    ------------------------------------------------------------------------------
    |        Layer        | Nodes before optimization | Nodes after optimization |
    ------------------------------------------------------------------------------
    | TIDL_BatchNormLayer |                         0 |                        1 |
    | TIDL_ReLULayer      |                         1 |                        0 |
    ------------------------------------------------------------------------------
    
    =================== [Optimization for subgraph_1 Completed] ===================
    
    ============= [Quantization & Calibration for subgraph_1 Started] =============
    
    ==================== [Optimization for subgraph_2 Started] ====================
    
    ----------------------------- Optimization Summary -----------------------------
    ------------------------------------------------------------------------------
    |        Layer        | Nodes before optimization | Nodes after optimization |
    ------------------------------------------------------------------------------
    | TIDL_BatchNormLayer |                         0 |                        1 |
    | TIDL_ReLULayer      |                         1 |                        0 |
    ------------------------------------------------------------------------------
    
    =================== [Optimization for subgraph_2 Completed] ===================
    
    ============= [Quantization & Calibration for subgraph_2 Started] =============
    
    ==================== [Optimization for subgraph_3 Started] ====================
    
    [TIDL Import] [PARSER] WARNING: Requested output data convert layer is not added to the network, It is currently not optimal
    ----------------------------- Optimization Summary -----------------------------
    ----------------------------------------------------------------------------
    |       Layer       | Nodes before optimization | Nodes after optimization |
    ----------------------------------------------------------------------------
    | TIDL_SoftMaxLayer |                         1 |                        1 |
    ----------------------------------------------------------------------------
    
    =================== [Optimization for subgraph_3 Completed] ===================
    
    ============= [Quantization & Calibration for subgraph_3 Started] =============
    
    MEM: Deinit ... !!!
    MEM: Alloc's: 104 alloc's of 76040456 bytes 
    MEM: Free's : 104 free's  of 76040456 bytes 
    MEM: Open's : 0 allocs  of 0 bytes 
    MEM: Deinit ... Done !!!

    Thank you for the suppport!

  • Hi Giann,

    Reese is out this week and won't be able to respond until next week.

    Regards,

    Jianzhong

  • Hi Giann,

    Thanks for your patience while I was out. I appreciate the effort in collecting information in your response.

    Your model artifacts folder does in fact look incomplete. As you noted, it is missing the subgraph binaries, which are copied into that top artifacts folder upon completion (automatically, you are not missing a step). Files in tempDir are not completely finished, and we are not expected to offload correctly -- they are intermediate files. 

    Are these binaries what the onnxruntime uses to offload the instructions? Could you point me to the part of the compilation script that generates these binaries?

    Those subgraph_N_tidl_net.bin and ..._io_1.bin are the binaries that TIDL reads to setup the network and run with acceleration. Throughout compilation, intermediate versions of this are used as a basis for different points in the compilation process. The ones you have in tempDir are probably applicable to 32-bit float mode, since calibration/quantization did not complete.

    In onnxrt_ep.py,  the network is setup to run compilation when the InferenceSession is created, targeting the TIDLCompilationProvider:

    The second phase (from a user-level) happens after passing images to the runtime (at least as many as the advanced_options:calibration_frames parameter). Once enough arrive, it runs those inputs through in floating point mode to capture some statistics, after which it tries to calibrate a few times so that a good float->fixed point translation function is found.

    It looks like your model with and without deny-list hung during the optimization phase (before calibration) on the 3rd subgraph, which only contains a softmax layer. It would be worth deny-listing this layer type to see if that's the issue here. Does your SoftMax layer match the criteria outlined in the support ops page?

    I am curious what is wrong with the GEMM configuration such that the layer is denied. Do you have a bias for this layer that matches the dimensions of the output?

    BR,
    Reese

  • Hello Reese!

    Thank you for the great explanation, the compilation process is much clearer to me now. I’ve been trying out your suggestions, as well as some other modifications to check their impact on the results. Starting with the Softmax layer, denying it does not seem make any positive impact in the compilation process. However, I think you correctly identified that my model is probably not respecting the criteria outlined in the support ops page. I’ll modify the model to ensure that both the Softmax and the Gemm layers comply with the criteria and get back to you once I have any news.

    For what it’s worth, as part of another debugging effort, I’ve retrained my model using PyTorch instead of Sklearn to check if there is any difference, and indeed there is. The layers of the model trained with PyTorch all seem to be supported by the compiler from the very beginning:

    The compilation process presents no warnings, even though the calibration + optimization is never finished:

    Running 1 Models - ['model_pytorch_v0']
    
    
    Running_Model :  model_pytorch_v0  
    
    
    Running shape inference on model ../../../models/public/model_pytorch_v0.onnx 
    
    ========================= [Model Compilation Started] =========================
    
    Model compilation will perform the following stages:
    1. Parsing
    2. Graph Optimization
    3. Quantization & Calibration
    4. Memory Planning
    
    ============================== [Version Summary] ==============================
    
    -------------------------------------------------------------------------------
    |          TIDL Tools Version          |              10_00_08_00             |
    -------------------------------------------------------------------------------
    |         C7x Firmware Version         |              10_00_02_00             |
    -------------------------------------------------------------------------------
    |            Runtime Version           |            1.14.0+10000005           |
    -------------------------------------------------------------------------------
    |          Model Opset Version         |                  17                  |
    -------------------------------------------------------------------------------
    
    NOTE: The runtime version here specifies ONNXRT_VERSION+TIDL_VERSION
    Ex: 1.14.0+1000XXXX -> ONNXRT 1.14.0 and a TIDL_VERSION 10.00.XX.XX
    
    ============================== [Parsing Started] ==============================
    
    [TIDL Import] [PARSER] WARNING: Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options
    
    ------------------------- Subgraph Information Summary -------------------------
    -------------------------------------------------------------------------------
    |          Core           |      No. of Nodes       |   Number of Subgraphs   |
    -------------------------------------------------------------------------------
    | C7x                     |                       7 |                       1 |
    | CPU                     |                       0 |                       x |
    -------------------------------------------------------------------------------
    ============================= [Parsing Completed] =============================
    
    ==================== [Optimization for subgraph_0 Started] ====================
    
    ----------------------------- Optimization Summary -----------------------------
    ---------------------------------------------------------------------------------
    |          Layer         | Nodes before optimization | Nodes after optimization |
    ---------------------------------------------------------------------------------
    | TIDL_ReLULayer         |                         3 |                        0 |
    | TIDL_InnerProductLayer |                         4 |                        4 |
    ---------------------------------------------------------------------------------
    
    =================== [Optimization for subgraph_0 Completed] ===================
    
    The soft limit is 10240
    The hard limit is 10240
    MEM: Init ... !!!
    MEM: Init ... Done !!!
     0.0s:  VX_ZONE_INIT:Enabled
     0.14s:  VX_ZONE_ERROR:Enabled
     0.16s:  VX_ZONE_WARNING:Enabled
     0.1510s:  VX_ZONE_INIT:[tivxInit:190] Initialization Done !!!
    ============= [Quantization & Calibration for subgraph_0 Started] =============
    
    MEM: Deinit ... !!!
    MEM: Alloc's: 26 alloc's of 29725377 bytes 
    MEM: Free's : 26 free's  of 29725377 bytes 
    MEM: Open's : 0 allocs  of 0 bytes 
    MEM: Deinit ... Done !!!

    I’ve also tried using the model optimizer script described here on the PyTorch model. The optimization log was the following:

    >>> optimize("model_pytorch_v0.onnx", "/home/root/models/public/optimized_model_pytorch_v2.onnx")
    [INFO]:Enabled pre-processing shape inference
    [INFO]:[1/20] Convert_unsqueeze_to_reshape optimization : Enabled
    [INFO]:[2/20] Convert_instancenorm_to_layernorm optimization : Enabled
    [INFO]:[3/20] Expand_slice_across_multiple_axis optimization : Enabled
    [INFO]:[4/20] Expand_layernorm_to_component_ops optimization : Enabled
    [INFO]:[5/20] Convert_conv_large_pad_to_smaller_kernel optimization : Disabled
    [INFO]:[6/20] Push_large_channel_dim_to_height_for_width_wise_softmax optimization : Enabled
    [INFO]:[7/20] Convert_softmax_axis_height_to_width optimization : Enabled
    [INFO]:[8/20] Convert_softmax_axis_channel_to_width optimization : Enabled
    [INFO]:[9/20] Convert_batchnorm_input_to_4d optimization : Enabled
    [INFO]:[10/20] Convert_gather_with_single_index_to_slice optimization : Enabled
    [INFO]:[11/20] Convert_matmul_to_conv_1x1s1 optimization : Disabled
    [INFO]:[12/20] Convert_gemm_to_matmul_and_add optimization : Enabled
    [INFO]:[13/20] Convert_large_global_avg_pooling_to_matmul optimization : Enabled
    [INFO]:[14/20] Push_matmul_channel_in_height optimization : Disabled
    [INFO]:[15/20] Convert_reducemean_to_matmul optimization : Disabled
    [INFO]:[16/20] Convert_maxpool_to_cascaded_maxpool optimization : Disabled
    [INFO]:[17/20] Split_batch_dim_to_parallel_input_branches optimization : Disabled
    [INFO]:[18/20] Convert_concat_axis_width_to_channel optimization : Disabled
    [INFO]:[19/20] Attention_block_optimization optimization : Disabled
    [INFO]:[20/20] Convert_resize_params_size_to_scale optimization : Enabled
    [INFO]:Enabled post-processing shape inference
    [INFO]:Saved modified model at /home/root/models/public/optimized_model_pytorch_v2.onnx

    And the generated model (without denying the Gemm layer) looks like this:

    But the compilation log of the optimized model looks just like my previous efforts with the model trained with Sklearn:

    Running 1 Models - ['optimized_model_pytorch_v0']
    Running_Model :  optimized_model_pytorch_v0  
    Running shape inference on model ../../../models/public/optimized_model_pytorch_v3.onnx 
    
    ========================= [Model Compilation Started] =========================
    
    Model compilation will perform the following stages:
    1. Parsing
    2. Graph Optimization
    3. Quantization & Calibration
    4. Memory Planning
    
    ============================== [Version Summary] ==============================
    
    -------------------------------------------------------------------------------
    |          TIDL Tools Version          |              10_00_08_00             |
    -------------------------------------------------------------------------------
    |         C7x Firmware Version         |              10_00_02_00             |
    -------------------------------------------------------------------------------
    |            Runtime Version           |            1.14.0+10000005           |
    -------------------------------------------------------------------------------
    |          Model Opset Version         |                  17                  |
    -------------------------------------------------------------------------------
    
    NOTE: The runtime version here specifies ONNXRT_VERSION+TIDL_VERSION
    Ex: 1.14.0+1000XXXX -> ONNXRT 1.14.0 and a TIDL_VERSION 10.00.XX.XX
    
    ============================== [Parsing Started] ==============================
    
    [TIDL Import] [PARSER] WARNING: Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options
    
    ------------------------- Subgraph Information Summary -------------------------
    -------------------------------------------------------------------------------
    |          Core           |      No. of Nodes       |   Number of Subgraphs   |
    -------------------------------------------------------------------------------
    | C7x                     |                       3 |                       3 |
    | CPU                     |                       4 |                       x |
    -------------------------------------------------------------------------------
    ---------------------------------------------------------------------------------------------------------
    | Node |   Node Name  |                                     Reason                                      |
    ---------------------------------------------------------------------------------------------------------
    | Gemm | gemm         | Bias tensor input should be a vector (1, N) and N should match output dimension |
    | Gemm | gemm_token_0 | Bias tensor input should be a vector (1, N) and N should match output dimension |
    | Gemm | gemm_token_1 | Bias tensor input should be a vector (1, N) and N should match output dimension |
    | Gemm | gemm_token_2 | Bias tensor input should be a vector (1, N) and N should match output dimension |
    ---------------------------------------------------------------------------------------------------------
    ============================= [Parsing Completed] =============================
    
    ==================== [Optimization for subgraph_0 Started] ====================
    
    ----------------------------- Optimization Summary -----------------------------
    ------------------------------------------------------------------------------
    |        Layer        | Nodes before optimization | Nodes after optimization |
    ------------------------------------------------------------------------------
    | TIDL_BatchNormLayer |                         0 |                        1 |
    | TIDL_ReLULayer      |                         1 |                        0 |
    ------------------------------------------------------------------------------
    
    =================== [Optimization for subgraph_0 Completed] ===================
    
    The soft limit is 10240
    The hard limit is 10240
    MEM: Init ... !!!
    MEM: Init ... Done !!!
     0.0s:  VX_ZONE_INIT:Enabled
     0.11s:  VX_ZONE_ERROR:Enabled
     0.12s:  VX_ZONE_WARNING:Enabled
     0.1500s:  VX_ZONE_INIT:[tivxInit:190] Initialization Done !!!
    ============= [Quantization & Calibration for subgraph_0 Started] =============
    
    ==================== [Optimization for subgraph_1 Started] ====================
    
    ----------------------------- Optimization Summary -----------------------------
    ------------------------------------------------------------------------------
    |        Layer        | Nodes before optimization | Nodes after optimization |
    ------------------------------------------------------------------------------
    | TIDL_BatchNormLayer |                         0 |                        1 |
    | TIDL_ReLULayer      |                         1 |                        0 |
    ------------------------------------------------------------------------------
    
    =================== [Optimization for subgraph_1 Completed] ===================
    
    ============= [Quantization & Calibration for subgraph_1 Started] =============
    
    ==================== [Optimization for subgraph_2 Started] ====================
    
    ----------------------------- Optimization Summary -----------------------------
    ------------------------------------------------------------------------------
    |        Layer        | Nodes before optimization | Nodes after optimization |
    ------------------------------------------------------------------------------
    | TIDL_BatchNormLayer |                         0 |                        1 |
    | TIDL_ReLULayer      |                         1 |                        0 |
    ------------------------------------------------------------------------------
    
    =================== [Optimization for subgraph_2 Completed] ===================
    
    ============= [Quantization & Calibration for subgraph_2 Started] =============
    
    MEM: Deinit ... !!!
    MEM: Alloc's: 78 alloc's of 61042707 bytes 
    MEM: Free's : 78 free's  of 61042707 bytes 
    MEM: Open's : 0 allocs  of 0 bytes 
    MEM: Deinit ... Done !!!

    I think my next debugging efforts will consist of modifying the model to ensure the criteria outlined in the support ops page is respected from the very beginning.

    Thank you for the great support. I will get back to you once I have an update.

    Best regards,

    Giann.

  • Hi Giann,

    For what it’s worth, as part of another debugging effort, I’ve retrained my model using PyTorch instead of Sklearn to check if there is any difference, and indeed there is. The layers of the model trained with PyTorch all seem to be supported by the compiler from the very beginning:

    That's absolutely worthwhile to know, I'd say. Model export from the training framework into the inference / exchange format (e.g. Pytorch -> ONNX) can have an effect on the resulting model. That export can add layers / configurations that were not entirely intentional, especially when a layer/operation or parameter exists in one framework but not the other.

    It is interesting to learn that you had an issue with SKLearn's output but not Pytorch. Yet even stranger that this Pytorch-exported model gives issues still, in that compilation hung.

    I think my next debugging efforts will consist of modifying the model to ensure the criteria outlined in the support ops page is respected from the very beginning.

    Yes, this would be a great step. You may find that model export is the cause of these 'unsupported' configuratins. The tidl-onnx-model-optimizer is good for programmatically updating any such configurations -- alternatively, onnx-modifier is a good GUI tool, but more tedious to reproduce.

    The compilation process presents no warnings, even though the calibration + optimization is never finished:

    Let's consider this further as well, because even when your model shows 100% of layers are delegated to TIDL for acceleration, you aren't getting usable results. As you're running this again, please add the following and share the log:

    • Set debug_level=2 in the delegate options passed while initializing the model through runtime API
      • You can do this by adding an 'optional_options' dictionary to the model_config entry with debug_level set OR by setting debug_level globally in common_utils.py
    • `export TIDL_RT_DEBUG=1`

    BR,
    Reese