Tool/software:
Greetings!
I’ve been trying to offload the inference of a custom model to the C7x/MMA cores of the AM62A board without much success. Unlike the vast majority of the problems discussed on the forum and on the TIDL Tools / EdgeAI Academy material, the type of inference that I need to perform does not involve image processing. Instead, the model classifies meta-data that is structured in a panda dataframe. The model was trained using the sklearn library and exported to both ONNX and TFLite using skl2onnx and TFLiteConverter through keras. Both converted models run without problem on the ARM processor, but not on the C7x/MMA accelerators.
When trying to compile the TFLite model, only the tempDir is generated on the custom-artifacts folder. For the ONNX model, I’ve tried two ways of compiling it: adapting the Jupyter notebooks available in the examples folder, and adapting the onnxrt_ep.py script available on the examples/osrt_python/ort folder. The latter seem to yield better results, as it generates more files inside the tempDir and provides more extensive debugging information. However, the script hangs during the ‘Quantization & Calibration for subgraph_0’, resulting in the generation of the following files inside the custom-artifacts folder:
- allowedNode.txt
- onnxrtMetaData.txt
- /tempDir
- graphvizInfo.txt
- runtimes_visualization.svg
- subgraph_0_calib_raw_data.bin
- subgraph_0_tidl_io_1.bin
- subgraph_0_tidl_net.bin
- subgraph_0_tidl_net.bin.layer_info.txt
- subgraph_0_tidl_net.bin.svg
- subgraph_0_tidl_net.bin_netLog.txt
It does provide the following error/warning moments before hanging:
============= [Quantization & Calibration for subgraph_0 Started] =============
2025-01-08 14:46:34.193707173 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running Gemm node. Name:'gemm_token_0' Status Message: /root/onnxruntime/onnxruntime/core/providers/cpu/math/gemm_helper.h:14 onnxruntime::GemmHelper::GemmHelper(const onnxruntime::TensorShape&, bool, const onnxruntime::TensorShape&, bool, const onnxruntime::TensorShape&) left.NumDimensions() == 2 || left.NumDimensions() == 1 was false.
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/root/examples/osrt_python/ort/onnxrt_ep.py", line 415, in run_model
classified_data = run_prediction(sess, scaled_data)
File "/home/root/examples/osrt_python/ort/onnxrt_ep.py", line 142, in run_prediction
predictions = session.run([output_name], {input_name: input_data})[0]
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 200, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Gemm node. Name:'gemm_token_0' Status Message: /root/onnxruntime/onnxruntime/core/providers/cpu/math/gemm_helper.h:14 onnxruntime::GemmHelper::GemmHelper(const onnxruntime::TensorShape&, bool, const onnxruntime::TensorShape&, bool, const onnxruntime::TensorShape&) left.NumDimensions() == 2 || left.NumDimensions() == 1 was false.
Following the advice presented on the TIDL’s git to include possibly problematic nodes to the model’s deny_list, I modified the model_configs.py file to look like this:
"myModel": create_model_config(
source=AttrDict(
infer_shape=True,
),
preprocess=AttrDict(
),
session=AttrDict(
session_name="onnxrt",
model_path=os.path.join(models_base_path, "myModel.onnx"),
),
task_type="other",
extra_info=AttrDict(
),
optional_options={
'deny_list' : 'Gemm',
}
),
However, although the node is successfully added to the list of unsupported nodes, the script still hangs and gives the exact same output.
------------------------------------------------------------------------------------------------------------------------------------------------------
| Node | Node Name | Reason |
------------------------------------------------------------------------------------------------------------------------------------------------------
| Gemm | gemm | Node gemm added to unsupported nodes as specified in deny list |
The docker file I used to perform the above compilations was configured under the SDK version 10_00_08_00, which is the same I have on my AM62A, and the ONNXRuntime version is 1.14.0. As a way to check the docker’s and the board’s configuration setups, I have successfully executed some of the examples on the examples/jupyter_notebooks folder on both the ARM and the C7x/MMA processors.
However, considering that the onnxrt_ep.py script focuses on image processing examples, I had to comment out, modify, and include additional inference and data pre-processing methods to compile my custom model. As you can imagine, it is not straight forward to assess the impact of what I’ve modified on the entire compilation process. It is worth noting that the compilation process successfully identifies a set of nodes that should be offloaded to the accelerators and a set of nodes that could’ve been offloaded, but for the reason described during the compilation process, were not. I am attaching my modified version of the onnxrt_ep.py script, with a comment saying #### ADDED #### on top of the code I’ve added to the file:
Best regards,
Giann.





