TDA4VM: TIDL -> onnx -> TIDL runtime data dimension unmatch problem

Baidong Chu

Part Number: TDA4VM

Tool/software:

Hello everyone.

I am trying to use tidl_edgeai_tools to convert my own onnx model to TIDL. However, my model contains a layer that TIDL does not support.

I've found in runtime, if such layer exists, the model will be divided to two TIDL subgraphs and one onnx layer, and the data flow is TIDL1 -> onnx layer -> TIDL2.

However, in TIDL, 4-D NCHW data is converted to 6-D 1*1*NCHW automatically, and when TIDL1 result is input to onnx layer, there will be data dimension unmatch error.

I can not find how to fix this. Are there any methods to disable unsqueeze operation or can convert data format between TIDL and onnx automically?

Thanks a lot.

over 1 year ago

0 Pratik Kedar over 1 year ago

TI__Mastermind 24041 points

Hi,

The way TIDL works internally is through the 6 dimensional data (1x1xNCHW), this OSRT and TIDL RT handshake is taken care internally.

Baidong Chu said:
TIDL1 result is input to onnx layer, there will be data dimension unmatch error.

can you share some insights on this issue ? are there any logs that i can look into here ? Is the case that output tensor dimension of OSRT (Without TIDL offload) are Not same as TIDL inference (assuming 1x1xNCHW format) ?

0 Baidong Chu over 1 year ago in reply to Pratik Kedar

Prodigy 10 points

Thank you very much for your reply.

For my onnx model, data dimension is (1*C*H*W), and I have some MaxPool layers with kernel size 5 and 9, which is not supported by TIDL. You can see 3 MaxPool layers in the picture.

I am now using edgea-tidl-tools on x86 PC to compile the model to TIDL bin files by python script. My code refers to onnxrt_ep.py, and I use following code to create ORT session. delegate options are the same as those in common_utils.py.

so = ort.SessionOptions()
EP_list = ['TIDLCompilationProvider','CPUExecutionProvider']
sess = ort.InferenceSession(model_path ,providers=EP_list, provider_options=[delegate_options, {}], sess_options=so)

The model is divided to two TIDL parts because of MaxPool layers.

Preliminary subgraphs created = 2 
Final number of subgraphs created are : 2, - Offloaded Nodes - 194, Total Nodes - 197
TIDL ALLOWLISTING LAYER CHECK -- TIDL_PoolingLayer '': kernel size 9x9 with stride 1x1 not supported

After the compilation ORT is created, when I input the calib image and run the session by following code.

labels = [out.name for out in sess.get_outputs()]
blobs = sess.run(labels, {"data": image})

The first part of TIDL, before MaxPool layers, are compiled to bin files successfully. and I found following files in tempDir of the artifacts folder.

After the compilation of the first part, the following error occured, and the script crashed. I think this is because of the dimension of output of TIDL is mot matched with the next ORT input.

In TIDL_runtimesPostProcessNet 4
2024-07-16 04:26:59.048648762 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running ReorderInput node. Name:'ReorderInput' Status Message: /onnxruntime/onnxruntime/contrib_ops/cpu/nchwc_ops.cc:17 virtual onnxruntime::common::Status onnxruntime::contrib::ReorderInput::Compute(onnxruntime::OpKernelContext*) const X_rank == 4 was false.

When I use EP_list = ['CPUExecutionProvider'] to run ORT only, the model can be executed normally.

I would be appreciate if you can tell me how to slove the problem. Thank you very much.

0 Pratik Kedar over 1 year ago in reply to Baidong Chu

TI__Mastermind 24041 points

Hi,

Thanks for the clarity.

Yes it seems like its failing during import process, specifically in second subgraph.

Have you tried this exp on latest tidl tools 9.2.9.0 ? if not i recommend to try so and shared the observations.

Please shared debug_level 2 log along with model file and compilation options here so i can try to reproduce it at my end.

0 Baidong Chu over 1 year ago in reply to Pratik Kedar

Prodigy 10 points

Hi,

thank you very much for your reply. Sorry it took me some time to update to the new version

I've updated tidl_edgeai_tools to 9.2.9.0. However this problem remains.

Sorry I can not share you with the origin onnx model, but the problem can be reproduced in the following way.

Please let me to update a zip file. It contains two files: test.onnx and compile.py.

test.onnx has only three layers: Conv -> Pool -> Conv.

compile.py is the python script to compile the onnx model to TIDL, which contains my options.

I set "deny_list:layer_type": "MaxPool", so the Pool layer will not be compiled. After compiling the first Conv layer, there is an error shows "Compute(onnxruntime::OpKernelContext*) const X_rank == 4 was false."

I'd be appreciate if you can help me with this problem. Thank you very much!

6560.test.zip

0 Pratik Kedar over 1 year ago in reply to Baidong Chu

TI__Mastermind 24041 points

Sure, let me try the shared model, i will try explicitly denying pool layer to verify this observation.

0 Pratik Kedar over 1 year ago in reply to Pratik Kedar

TI__Mastermind 24041 points

Hi,

I tried above shared model compiling with adding maxpool to deny list however am not able to see issue mentioned below.

Am using latest sdk 9.2.9.0, here is the log for your reference.

osrt_python/advanced_examples/unit_tests_validation/ort$ python3 onnxrt_ep.py -c
Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']

Running 1 Models - ['test']


Running_Model :  test


Running shape inference on model ../unit_test_models/test.onnx

tidl_tools_path                                 = /home/pratik/edgeai-tidl-tools/j721e/lucid-test/edgeai-tidl-tools/tidl_tools
artifacts_folder                                = ../model-artifacts//test/
tidl_tensor_bits                                = 8
debug_level                                     = 2
num_tidl_subgraphs                              = 16
tidl_denylist                                   = MaxPool
tidl_denylist_layer_name                        =
tidl_denylist_layer_type                         =
tidl_allowlist_layer_name                        =
model_type                                      =
tidl_calibration_accuracy_level                 = 7
tidl_calibration_options:num_frames_calibration = 1
tidl_calibration_options:bias_calibration_iterations = 1
mixed_precision_factor = -1.000000
model_group_id = 0
power_of_2_quantization                         = 2
ONNX QDQ Enabled                                = 0
enable_high_resolution_optimization             = 0
pre_batchnorm_fold                              = 1
add_data_convert_ops                            = 3
output_feature_16bit_names_list                 =
m_params_16bit_names_list                       =
m_single_core_layers_names_list                    =
reserved_compile_constraints_flag               = 1601
ti_internal_reserved_1                          =


 ****** WARNING : Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options******

Supported TIDL layer type ---            Conv -- /0/Conv
Op type 'MaxPool'  added to unsupported nodes as specified in deny list
Supported TIDL layer type ---            Conv -- /2/Conv

Preliminary subgraphs created = 2
Final number of subgraphs created are : 2, - Offloaded Nodes - 2, Total Nodes - 3
Node in deny list...delegated to ARM --- layer type - MaxPool, Node name - /1/MaxPool
Running runtimes graphviz - /home/pratik/edgeai-tidl-tools/j721e/lucid-test/edgeai-tidl-tools/tidl_tools/tidl_graphVisualiser_runtimes.out ../model-artifacts//test//allowedNode.txt ../model-artifacts//test//tempDir/graphvizInfo.txt ../model-artifacts//test//tempDir/runtimes_visualization.svg
*** In TIDL_createStateImportFunc ***
Compute on node : TIDLExecutionProvider_TIDL_0_0
  0,            Conv, 3, 1, input, /0/Conv_output_0

Input tensor name -  input
Output tensor name - /0/Conv_output_0
*** In TIDL_createStateImportFunc ***
Compute on node : TIDLExecutionProvider_TIDL_1_1
  0,            Conv, 3, 1, /1/MaxPool_output_0, output

Input tensor name -  /1/MaxPool_output_0
Output tensor name - output
 Graph Domain TO version : 8In TIDL_onnxRtImportInit subgraph_name=subgraph_0
Layer 0, subgraph id subgraph_0, name=/0/Conv_output_0
Layer 1, subgraph id subgraph_0, name=input
In TIDL_runtimesOptimizeNet: LayerIndex = 3, dataIndex = 2

 ************** Frame index 1 : Running float import *************
In TIDL_runtimesPostProcessNet
In TIDL_runtimesPostProcessNet 1
In TIDL_runtimesPostProcessNet 2
In TIDL_runtimesPostProcessNet 3
****************************************************
**                ALL MODEL CHECK PASSED          **
****************************************************

In TIDL_runtimesPostProcessNet 4
************ in TIDL_subgraphRtCreate ************
 The soft limit is 2048
The hard limit is 2048
MEM: Init ... !!!
MEM: Init ... Done !!!
 0.0s:  VX_ZONE_INIT:Enabled
 0.5s:  VX_ZONE_ERROR:Enabled
 0.21s:  VX_ZONE_WARNING:Enabled
 0.2472s:  VX_ZONE_INIT:[tivxInit:190] Initialization Done !!!

--------------------------------------------
TIDL Memory size requiement (record wise):
MemRecNum   , Space               , Attribute   , Alignment   , Size(KBytes), BasePtr
0           , DDR Cacheable       , Persistent  ,  128, 19.25   , 0x00000000
1           , DDR Cacheable       , Persistent  ,  128, 0.64    , 0x00000000
2           , DDR Cacheable       , Scratch     ,  128, 16.00   , 0x00000000
3           , DDR Cacheable       , Scratch     ,  128, 4.00    , 0x00000000
4           , DDR Cacheable       , Scratch     ,  128, 56.00   , 0x00000000
5           , DDR Cacheable       , Persistent  ,  128, 257.88  , 0x00000000
6           , DDR Cacheable       , Scratch     ,  128, 33549.04, 0x00000000
7           , DDR Cacheable       , Scratch     ,  128, 0.12    , 0x00000000
8           , DDR Cacheable       , Scratch     ,  128, 49152.12, 0x00000000
9           , DDR Cacheable       , Scratch     ,  128, 65539.00, 0x00000000
10          , DDR Cacheable       , Persistent  ,  128, 274.51  , 0x00000000
11          , DDR Cacheable       , Scratch     ,  128, 512.25  , 0x00000000
12          , DDR Cacheable       , Persistent  ,  128, 0.12    , 0x00000000
13          , DDR Cacheable       , Persistent  ,  128, 6235.20 , 0x00000000
14          , DDR Cacheable       , Persistent  ,  128, 0.00    , 0x00000000
--------------------------------------------
Total memory size requirement (space wise):
Mem Space , Size(KBytes)
DDR Cacheable, 155616.13
--------------------------------------------
NOTE: Memory requirement in host emulation can be different from the same on EVM
      To get the actual TIDL memory requirement make sure to run on EVM with
      debugTraceLevel = 2

--------------------------------------------
TIDL init call from ivision API

--------------------------------------------
TIDL Memory size requiement (record wise):
MemRecNum   , Space               , Attribute   , Alignment   , Size(KBytes), BasePtr
0           , DDR Cacheable       , Persistent  ,  128, 19.25   , 0xaf267000
1           , DDR Cacheable       , Persistent  ,  128, 0.64    , 0xb1a2c000
2           , DDR Cacheable       , Scratch     ,  128, 16.00   , 0xac5c2000
3           , DDR Cacheable       , Scratch     ,  128, 4.00    , 0xb1a2b000
4           , DDR Cacheable       , Scratch     ,  128, 56.00   , 0xac5b4000
5           , DDR Cacheable       , Persistent  ,  128, 257.88  , 0xab47a000
6           , DDR Cacheable       , Scratch     ,  128, 33549.04, 0x55f3c000
7           , DDR Cacheable       , Scratch     ,  128, 0.12    , 0xb1a2a000
8           , DDR Cacheable       , Scratch     ,  128, 49152.12, 0x52f3b000
9           , DDR Cacheable       , Scratch     ,  128, 65539.00, 0x4ef3a000
10          , DDR Cacheable       , Persistent  ,  128, 274.51  , 0xab435000
11          , DDR Cacheable       , Scratch     ,  128, 512.25  , 0xab3b4000
12          , DDR Cacheable       , Persistent  ,  128, 0.12    , 0xb1a29000
13          , DDR Cacheable       , Persistent  ,  128, 6235.20 , 0x5d6c6000
14          , DDR Cacheable       , Persistent  ,  128, 0.00    , 0xb1774000
--------------------------------------------
Total memory size requirement (space wise):
Mem Space , Size(KBytes)
DDR Cacheable, 155616.13
--------------------------------------------
NOTE: Memory requirement in host emulation can be different from the same on EVM
      To get the actual TIDL memory requirement make sure to run on EVM with
      debugTraceLevel = 2

--------------------------------------------
Alg Init for Layer # -    1
Alg Init for Layer # -    2
Alg Init for Layer # -    3
PREEMPTION: Adding a new priority object for targetPriority = 2, handle = 0x7e62af267000
PREEMPTION: Now total number of priority objects = 1 at priorityId = 2,    with new memRec of base = 0x7e62b1a29000 and size = 128
PREEMPTION: Requesting context memory addr for handle 0x7e62af267000, return Addr = 0x7e628401a1b8
************ TIDL_subgraphRtCreate done ************
 *******   In TIDL_subgraphRtInvoke  ********
TIDL_activate is called with handle : af267000
Core 0 Alg Process for Layer # -    0, layer type 0
Core 0 Alg Process for Layer # -    1, layer type 29
Processing Layer # -    1
End of Layer # -    1 with outPtrs[0] = 0x7e6255f3c000
Core 0 Alg Process for Layer # -    2, layer type 1
Processing Layer # -    2
End of Layer # -    2 with outPtrs[0] = 0x7e6255fff100
Core 0 Alg Process for Layer # -    3, layer type 29
Processing Layer # -    3
End of Layer # -    3 with outPtrs[0] = 0x7e625dcdd000
Core 0 Alg Process for Layer # -    4, layer type 0
TIDL_process is completed with handle : af267000
 Layer,   Layer Cycles,kernelOnlyCycles, coreLoopCycles,LayerSetupCycles,dmaPipeupCycles, dmaPipeDownCycles, PrefetchCycles,copyKerCoeffCycles,LayerDeinitCycles,LastBlockCycles, paddingTrigger,    paddingWait,LayerWithoutPad,LayerHandleCopy,   BackupCycles,  RestoreCycles,Multic7xContextCopyCycles,
     0,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
     1,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
     2,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
     3,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
     0,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
 Sum of Layer Cycles 0
Sub Graph Stats 63.000000 237519.000000 13400.000000
*******  TIDL_subgraphRtInvoke done  ********

**********  Frame Index 1 : Running fixed point mode for calibration **********
In TIDL_runtimesPostProcessNet
In TIDL_runtimesPostProcessNet 1
In TIDL_runtimesPostProcessNet 2
In TIDL_runtimesPostProcessNet 3
Empty prototxt path, running calibration

~~~~~Running TIDL in PC emulation mode to collect Activations range for each layer~~~~~

Processing config file #0 : /home/pratik/edgeai-tidl-tools/j721e/lucid-test/edgeai-tidl-tools/examples/osrt_python/advanced_examples/unit_tests_validation/model-artifacts/test/tempDir/subgraph_0_tidl_io_.qunat_stats_config.txt
 Freeing memory for user provided Net
 ----------------------- TIDL Process with REF_ONLY FLOW ------------------------

#    0 . .. T     273.24  .... ..... ... .... .....


 *****************   Calibration iteration number 0 started ************************



Empty prototxt path, running calibration

~~~~~Running TIDL in PC emulation mode to collect Activations range for each layer~~~~~

Processing config file #0 : /home/pratik/edgeai-tidl-tools/j721e/lucid-test/edgeai-tidl-tools/examples/osrt_python/advanced_examples/unit_tests_validation/model-artifacts/test/tempDir/subgraph_0_tidl_io_.qunat_stats_config.txt
 Freeing memory for user provided Net
 ----------------------- TIDL Process with REF_ONLY FLOW ------------------------

#    0 . .. T     282.00  .... ..... ... .... .....


 *****************   Calibration iteration number 0 completed ************************



Empty prototxt path, running calibration

------------------ Network Compiler Traces -----------------------------
successful Memory allocation
successful Workload Creation
****************************************************
**                ALL MODEL CHECK PASSED          **
****************************************************

In TIDL_runtimesPostProcessNet 4
 Graph Domain TO version : 8In TIDL_onnxRtImportInit subgraph_name=subgraph_1
Layer 0, subgraph id subgraph_1, name=output
Layer 1, subgraph id subgraph_1, name=/1/MaxPool_output_0
In TIDL_runtimesOptimizeNet: LayerIndex = 3, dataIndex = 2

 ************** Frame index 1 : Running float import *************
In TIDL_runtimesPostProcessNet
In TIDL_runtimesPostProcessNet 1
In TIDL_runtimesPostProcessNet 2
In TIDL_runtimesPostProcessNet 3
****************************************************
**                ALL MODEL CHECK PASSED          **
****************************************************

In TIDL_runtimesPostProcessNet 4
************ in TIDL_subgraphRtCreate ************

--------------------------------------------
TIDL Memory size requiement (record wise):
MemRecNum   , Space               , Attribute   , Alignment   , Size(KBytes), BasePtr
0           , DDR Cacheable       , Persistent  ,  128, 19.25   , 0x00000000
1           , DDR Cacheable       , Persistent  ,  128, 0.64    , 0x00000000
2           , DDR Cacheable       , Scratch     ,  128, 16.00   , 0x00000000
3           , DDR Cacheable       , Scratch     ,  128, 4.00    , 0x00000000
4           , DDR Cacheable       , Scratch     ,  128, 56.00   , 0x00000000
5           , DDR Cacheable       , Persistent  ,  128, 256.90  , 0x00000000
6           , DDR Cacheable       , Scratch     ,  128, 8642.50 , 0x00000000
7           , DDR Cacheable       , Scratch     ,  128, 0.12    , 0x00000000
8           , DDR Cacheable       , Scratch     ,  128, 288.12  , 0x00000000
9           , DDR Cacheable       , Scratch     ,  128, 16902.00, 0x00000000
10          , DDR Cacheable       , Persistent  ,  128, 274.51  , 0x00000000
11          , DDR Cacheable       , Scratch     ,  128, 512.25  , 0x00000000
12          , DDR Cacheable       , Persistent  ,  128, 0.12    , 0x00000000
13          , DDR Cacheable       , Persistent  ,  128, 6234.76 , 0x00000000
14          , DDR Cacheable       , Persistent  ,  128, 0.00    , 0x00000000
--------------------------------------------
Total memory size requirement (space wise):
Mem Space , Size(KBytes)
DDR Cacheable, 33207.18
--------------------------------------------
NOTE: Memory requirement in host emulation can be different from the same on EVM
      To get the actual TIDL memory requirement make sure to run on EVM with
      debugTraceLevel = 2

--------------------------------------------
TIDL init call from ivision API

--------------------------------------------
TIDL Memory size requiement (record wise):
MemRecNum   , Space               , Attribute   , Alignment   , Size(KBytes), BasePtr
0           , DDR Cacheable       , Persistent  ,  128, 19.25   , 0xac598000
1           , DDR Cacheable       , Persistent  ,  128, 0.64    , 0xb0df9000
2           , DDR Cacheable       , Scratch     ,  128, 16.00   , 0xac594000
3           , DDR Cacheable       , Scratch     ,  128, 4.00    , 0xafc2e000
4           , DDR Cacheable       , Scratch     ,  128, 56.00   , 0xac586000
5           , DDR Cacheable       , Persistent  ,  128, 256.90  , 0xab2d1000
6           , DDR Cacheable       , Scratch     ,  128, 8642.50 , 0x5c654000
7           , DDR Cacheable       , Scratch     ,  128, 0.12    , 0xafc2d000
8           , DDR Cacheable       , Scratch     ,  128, 288.12  , 0xab288000
9           , DDR Cacheable       , Scratch     ,  128, 16902.00, 0x467fd000
10          , DDR Cacheable       , Persistent  ,  128, 274.51  , 0xab243000
11          , DDR Cacheable       , Scratch     ,  128, 512.25  , 0xab1c2000
12          , DDR Cacheable       , Persistent  ,  128, 0.12    , 0xafc2c000
13          , DDR Cacheable       , Persistent  ,  128, 6234.76 , 0x5c03d000
14          , DDR Cacheable       , Persistent  ,  128, 0.00    , 0xaf96b000
--------------------------------------------
Total memory size requirement (space wise):
Mem Space , Size(KBytes)
DDR Cacheable, 33207.18
--------------------------------------------
NOTE: Memory requirement in host emulation can be different from the same on EVM
      To get the actual TIDL memory requirement make sure to run on EVM with
      debugTraceLevel = 2

--------------------------------------------
Alg Init for Layer # -    1
Alg Init for Layer # -    2
Alg Init for Layer # -    3
PREEMPTION: Adding a new priority object for targetPriority = 2, handle = 0x7e62ac598000
PREEMPTION: Now total number of priority objects = 2 at priorityId = 2,    with new memRec of base = 0x7e62afc2c000 and size = 128
PREEMPTION: Requesting context memory addr for handle 0x7e62ac598000, return Addr = 0x7e628401a1b8
************ TIDL_subgraphRtCreate done ************
 *******   In TIDL_subgraphRtInvoke  ********
TIDL_deactivate is called with handle : af267000
TIDL_activate is called with handle : ac598000
Core 0 Alg Process for Layer # -    0, layer type 0
Core 0 Alg Process for Layer # -    1, layer type 29
Processing Layer # -    1
End of Layer # -    1 with outPtrs[0] = 0x7e625c654000
Core 0 Alg Process for Layer # -    2, layer type 1
Processing Layer # -    2
End of Layer # -    2 with outPtrs[0] = 0x7e625ce94680
Core 0 Alg Process for Layer # -    3, layer type 29
Processing Layer # -    3
End of Layer # -    3 with outPtrs[0] = 0x7e62ab339000
Core 0 Alg Process for Layer # -    4, layer type 0
TIDL_process is completed with handle : ac598000
 Layer,   Layer Cycles,kernelOnlyCycles, coreLoopCycles,LayerSetupCycles,dmaPipeupCycles, dmaPipeDownCycles, PrefetchCycles,copyKerCoeffCycles,LayerDeinitCycles,LastBlockCycles, paddingTrigger,    paddingWait,LayerWithoutPad,LayerHandleCopy,   BackupCycles,  RestoreCycles,Multic7xContextCopyCycles,
     0,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
     1,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
     2,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
     3,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
     0,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
 Sum of Layer Cycles 0
Sub Graph Stats 720.000000 60768.000000 309.000000
*******  TIDL_subgraphRtInvoke done  ********

**********  Frame Index 1 : Running fixed point mode for calibration **********
In TIDL_runtimesPostProcessNet
In TIDL_runtimesPostProcessNet 1
In TIDL_runtimesPostProcessNet 2
In TIDL_runtimesPostProcessNet 3
Empty prototxt path, running calibration

~~~~~Running TIDL in PC emulation mode to collect Activations range for each layer~~~~~

Processing config file #0 : /home/pratik/edgeai-tidl-tools/j721e/lucid-test/edgeai-tidl-tools/examples/osrt_python/advanced_examples/unit_tests_validation/model-artifacts/test/tempDir/subgraph_1_tidl_io_.qunat_stats_config.txt
 Freeing memory for user provided Net
 ----------------------- TIDL Process with REF_ONLY FLOW ------------------------

#    0 . .. T      67.93  .... ..... ... .... .....


 *****************   Calibration iteration number 0 started ************************



Empty prototxt path, running calibration

~~~~~Running TIDL in PC emulation mode to collect Activations range for each layer~~~~~

Processing config file #0 : /home/pratik/edgeai-tidl-tools/j721e/lucid-test/edgeai-tidl-tools/examples/osrt_python/advanced_examples/unit_tests_validation/model-artifacts/test/tempDir/subgraph_1_tidl_io_.qunat_stats_config.txt
 Freeing memory for user provided Net
 ----------------------- TIDL Process with REF_ONLY FLOW ------------------------

#    0 . .. T      51.05  .... ..... ... .... .....


 *****************   Calibration iteration number 0 completed ************************



Empty prototxt path, running calibration

------------------ Network Compiler Traces -----------------------------
successful Memory allocation
successful Workload Creation
****************************************************
**                ALL MODEL CHECK PASSED          **
****************************************************

In TIDL_runtimesPostProcessNet 4
Completed model -  test.onnx
************ in TIDL_subgraphRtDelete ************
 PREEMPTION: Removing priroty object with handle = 0x7e62af267000 and targetPriority = 2,      Number of obejcts left are = 1, removed object with base  = 0x7e62afc2c000 and size =128
************ in TIDL_subgraphRtDelete ************
 TIDL_deactivate is called with handle : ac598000
PREEMPTION: Removing priroty object with handle = 0x7e62ac598000 and targetPriority = 2,      Number of obejcts left are = 0, removed object with base  = 0x7e62b1a29000 and size =128
MEM: Deinit ... !!!
MEM: Alloc's: 50 alloc's of 250351926 bytes
MEM: Free's : 50 free's  of 250351926 bytes
MEM: Open's : 0 allocs  of 0 bytes
MEM: Deinit ... Done !!!

Can you please do diligence of compile options and input you are passing to onnx graph.

Thanks

0 Baidong Chu over 1 year ago in reply to Pratik Kedar

Prodigy 10 points

Thank you very much for your test.

I've copied test.onnx to .../models/public, and add following config to model_configs.py.

'test' : {
        'model_path' : os.path.join(models_base_path, 'test.onnx'),
        'mean': [0, 0, 0],
        'scale' : [1, 1, 1],
        'num_images' : 1,
        'session_name' : 'onnxrt' ,
        'model_type': 'seg',
        'optional_options' : 
        {
            "deny_list:layer_type": "MaxPool",
        }
    },

After running onnxrt_ep.py with "-c" parameter, same issue occured.

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running ReorderInput node. Name:'ReorderInput' Status Message: /onnxruntime/onnxruntime/contrib_ops/cpu/nchwc_ops.cc:17 virtual onnxruntime::common::Status onnxruntime::contrib::ReorderInput::Compute(onnxruntime::OpKernelContext*) const X_rank == 4 was false.

I used default configs, so I a, wondering if you can share your configs, so I can check where is wrong.

Thank you very much.

0 Pratik Kedar over 1 year ago in reply to Baidong Chu

TI__Mastermind 24041 points

Hi,

I have tried advanced example scripts inside example directory, can you please check the same (also my compilation options are visible in above log, can you please confirm the same)

Thanks

0 Baidong Chu over 1 year ago in reply to Pratik Kedar

Prodigy 10 points

Hi,

I use same parameters as yours and still can not pass the compilation. I have checked READMEs and searched the forums for a solution, and still can not find the problem.

I want to ask that why in my case, the n * c * h * w data is converted to 1 * 1 * n * c * h * w automatically. I found that in this README, data in network svg images are in n * c * h * w format, however in my model artifacts, there are 2 extra dimensions. I am wondering if there are any configs that can control this?

I found likely problem in the forum, but there are not clear solutions. They are all caused by that MaxPool layers in YOLO8 can not be converted to TIDL, I think.

link1

link2

0 Pratik Kedar over 1 year ago in reply to Baidong Chu

TI__Mastermind 24041 points

Hi,

Clarification on results :

1) I have used model shared by you (test.onnx), my edgeai tidl tools repo is checked out at tag 9.2.9.0 (TDA4VM/AM68PA)

2) The model compilation flow (As mentioned above is done using advanced_examples), the logs are available above.

I suspect some env level nuances, may be can you redo the experiment again.

Clarification on dims :

Baidong Chu said:
I want to ask that why in my case, the n * c * h * w data is converted to 1 * 1 * n * c * h * w automatically. I found that in this README, data in network svg images are in n * c * h * w format, however in my model artifacts, there are 2 extra dimensions. I am wondering if there are any configs that can control this?

Internally TIDL works in 6 dimensional flow (1x1xNCHW), this was added 9.0 sdk onward to support transformer models

Processors

Processors forum

TDA4VM: TIDL -> onnx -> TIDL runtime data dimension unmatch problem