This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SK-TDA4VM: TIDL freeze when running Gemm op in ONNX Runtime

Part Number: SK-TDA4VM

Dear TI experts,

Me and my team are experiencing an issue when executing an ONNX model with a Gemm operator using TIDL 08.04.00.06 on our TI-TDA4VM with ONNX Runtime. The Python process is freezing during execution. The problem seems to persist after the freeze, where other models that previously ran successfully also freeze, until the device is rebooted.

We have also tested many other combinations of parameters for this operator, and then the freeze does not occur.

This is the command we ran to run the model using TIDL and ONNX Runtime:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@tda4vm-sk:~# python3 run_custom_model.py --model-path fail.onnx --model-artifacts fail/ --debug-level 3
libtidl_onnxrt_EP loaded 0x296bd560
artifacts_folder = fail/
debug_level = 3
target_priority = 0
max_pre_empt_delay = 340282346638528859811704183484516925440.000000
Final number of subgraphs created are : 1, - Offloaded Nodes - 1, Total Nodes - 1
In TIDL_createStateInfer
Compute on node : TIDLExecutionProvider_TIDL_0_0
************ in TIDL_subgraphRtCreate ************
APP: Init ... !!!
MEM: Init ... !!!
MEM: Initialized DMA HEAP (fd=4) !!!
MEM: Init ... Done !!!
IPC: Init ... !!!
IPC: Init ... Done !!!
REMOTE_SERVICE: Init ... !!!
REMOTE_SERVICE: Init ... Done !!!
3002408.839539 s: GTC Frequency = 200 MHz
APP: Init ... Done !!!
3002408.839772 s: VX_ZONE_INIT:Enabled
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

And this is the source code for the Python script used in the console output above:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
"""run_custom_model.py"""
import argparse
import os
import numpy as np
import onnxruntime as rt
def run_model(*, model_path, model_artifacts, debug_level):
delegate_options = {
"tidl_tools_path": os.environ["TIDL_TOOLS_PATH"],
"artifacts_folder": model_artifacts,
"platform": "J7",
"debug_level": debug_level,
}
so = rt.SessionOptions()
sess = rt.InferenceSession(
model_path,
providers=["TIDLExecutionProvider"],
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


Here is a link to Google Drive for the model that causes the freeze:

https://drive.google.com/file/d/1vl23tDDLhXz71USfhQCCjHd1rnD6DP3D/view?usp=sharing

And here is a link to another model with different parameters for the Gemm operator that does succeed and does not freeze:

https://drive.google.com/file/d/1vtsfQq6LEo1DI88DaHe_WeltBu-75i0U/view?usp=sharing

If you could help us find a way to fix this freeze and run this model, we would be most thankful.