This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: TIDL -> onnx -> TIDL runtime data dimension unmatch problem

Part Number: TDA4VM


Tool/software:

Hello everyone.

I am trying to use tidl_edgeai_tools to convert my own onnx model to TIDL. However, my model contains a layer that TIDL does not support.

I've found in runtime, if such layer exists, the model will be divided to two TIDL subgraphs and one onnx layer, and the data flow is TIDL1 -> onnx layer -> TIDL2.

However, in TIDL, 4-D NCHW data is converted to 6-D 1*1*NCHW automatically, and when TIDL1 result is input to onnx layer, there will be data dimension unmatch error.

I can not find how to fix this. Are there any methods to disable unsqueeze operation or can convert data format between TIDL and onnx automically?

Thanks a lot.

  • Hi,

    The way TIDL works internally is through the 6 dimensional data (1x1xNCHW), this OSRT and TIDL RT handshake is taken care internally.

    TIDL1 result is input to onnx layer, there will be data dimension unmatch error.

    can you share some insights on this issue ? are there any logs that i can look into here ? Is the case that output tensor dimension of OSRT (Without TIDL offload) are Not same as TIDL inference (assuming 1x1xNCHW format) ?

  • Thank you very much for your reply.

    For my onnx model, data dimension is (1*C*H*W), and I have some MaxPool layers with kernel size 5 and 9, which is not supported by TIDL. You can see 3 MaxPool layers in the picture.

    I am now using edgea-tidl-tools on x86 PC to compile the model to TIDL bin files by python script. My code refers to onnxrt_ep.py, and I use following code to create ORT session. delegate options are the same as those in common_utils.py. 

    so = ort.SessionOptions()
    EP_list = ['TIDLCompilationProvider','CPUExecutionProvider']
    sess = ort.InferenceSession(model_path ,providers=EP_list, provider_options=[delegate_options, {}], sess_options=so)

    The model is divided to two TIDL parts because of MaxPool layers.

    Preliminary subgraphs created = 2 
    Final number of subgraphs created are : 2, - Offloaded Nodes - 194, Total Nodes - 197
    TIDL ALLOWLISTING LAYER CHECK -- TIDL_PoolingLayer '': kernel size 9x9 with stride 1x1 not supported 

    After the compilation ORT is created, when I input the calib image and run the session by following code.

    labels = [out.name for out in sess.get_outputs()]
    blobs = sess.run(labels, {"data": image})

    The first part of TIDL, before MaxPool layers, are compiled to bin files successfully. and I found following files in tempDir of the artifacts folder.

    After the compilation of the first part, the following error occured, and the script crashed. I think this is because of the dimension of output of TIDL is mot matched with the next ORT input.

    In TIDL_runtimesPostProcessNet 4
    2024-07-16 04:26:59.048648762 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running ReorderInput node. Name:'ReorderInput' Status Message: /onnxruntime/onnxruntime/contrib_ops/cpu/nchwc_ops.cc:17 virtual onnxruntime::common::Status onnxruntime::contrib::ReorderInput::Compute(onnxruntime::OpKernelContext*) const X_rank == 4 was false. 

    When I use EP_list = ['CPUExecutionProvider'] to run ORT only, the model can be executed normally. 

    I would be appreciate if you can tell me how to slove the problem. Thank you very much.

  • Hi,

    Thanks for the clarity.

    Yes it seems like its failing during import process, specifically in second subgraph.

    Have you tried this exp on latest tidl tools 9.2.9.0 ? if not i recommend to try so and shared the observations.

    Please shared debug_level 2 log along with model file and compilation options here so i can try to reproduce it at my end.

  • Hi,

    thank you very much for your reply. Sorry it took me some time to update to the new version

    I've updated tidl_edgeai_tools to 9.2.9.0. However this problem remains.

    Sorry I can not share you with the origin onnx model, but the problem can be reproduced in the following way.

    Please let me to update a zip file. It contains two files: test.onnx and compile.py.

    test.onnx has only three layers: Conv -> Pool -> Conv.

    compile.py is the python script to compile the onnx model to TIDL, which contains my options.

    I set "deny_list:layer_type": "MaxPool", so the Pool layer will not be compiled. After compiling the first Conv layer, there is an error shows "Compute(onnxruntime::OpKernelContext*) const X_rank == 4 was false."

    I'd be appreciate if you can help me with this problem. Thank you very much!

    6560.test.zip

  • Sure, let me try the shared model, i will try explicitly denying pool layer to verify this observation.

  • Hi,

    I tried above shared model compiling with adding maxpool to deny list however am not able to see issue mentioned below.

    Am using latest sdk 9.2.9.0, here is the log for your reference.

    osrt_python/advanced_examples/unit_tests_validation/ort$ python3 onnxrt_ep.py -c
    Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    
    Running 1 Models - ['test']
    
    
    Running_Model :  test
    
    
    Running shape inference on model ../unit_test_models/test.onnx
    
    tidl_tools_path                                 = /home/pratik/edgeai-tidl-tools/j721e/lucid-test/edgeai-tidl-tools/tidl_tools
    artifacts_folder                                = ../model-artifacts//test/
    tidl_tensor_bits                                = 8
    debug_level                                     = 2
    num_tidl_subgraphs                              = 16
    tidl_denylist                                   = MaxPool
    tidl_denylist_layer_name                        =
    tidl_denylist_layer_type                         =
    tidl_allowlist_layer_name                        =
    model_type                                      =
    tidl_calibration_accuracy_level                 = 7
    tidl_calibration_options:num_frames_calibration = 1
    tidl_calibration_options:bias_calibration_iterations = 1
    mixed_precision_factor = -1.000000
    model_group_id = 0
    power_of_2_quantization                         = 2
    ONNX QDQ Enabled                                = 0
    enable_high_resolution_optimization             = 0
    pre_batchnorm_fold                              = 1
    add_data_convert_ops                            = 3
    output_feature_16bit_names_list                 =
    m_params_16bit_names_list                       =
    m_single_core_layers_names_list                    =
    reserved_compile_constraints_flag               = 1601
    ti_internal_reserved_1                          =
    
    
     ****** WARNING : Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options******
    
    Supported TIDL layer type ---            Conv -- /0/Conv
    Op type 'MaxPool'  added to unsupported nodes as specified in deny list
    Supported TIDL layer type ---            Conv -- /2/Conv
    
    Preliminary subgraphs created = 2
    Final number of subgraphs created are : 2, - Offloaded Nodes - 2, Total Nodes - 3
    Node in deny list...delegated to ARM --- layer type - MaxPool, Node name - /1/MaxPool
    Running runtimes graphviz - /home/pratik/edgeai-tidl-tools/j721e/lucid-test/edgeai-tidl-tools/tidl_tools/tidl_graphVisualiser_runtimes.out ../model-artifacts//test//allowedNode.txt ../model-artifacts//test//tempDir/graphvizInfo.txt ../model-artifacts//test//tempDir/runtimes_visualization.svg
    *** In TIDL_createStateImportFunc ***
    Compute on node : TIDLExecutionProvider_TIDL_0_0
      0,            Conv, 3, 1, input, /0/Conv_output_0
    
    Input tensor name -  input
    Output tensor name - /0/Conv_output_0
    *** In TIDL_createStateImportFunc ***
    Compute on node : TIDLExecutionProvider_TIDL_1_1
      0,            Conv, 3, 1, /1/MaxPool_output_0, output
    
    Input tensor name -  /1/MaxPool_output_0
    Output tensor name - output
     Graph Domain TO version : 8In TIDL_onnxRtImportInit subgraph_name=subgraph_0
    Layer 0, subgraph id subgraph_0, name=/0/Conv_output_0
    Layer 1, subgraph id subgraph_0, name=input
    In TIDL_runtimesOptimizeNet: LayerIndex = 3, dataIndex = 2
    
     ************** Frame index 1 : Running float import *************
    In TIDL_runtimesPostProcessNet
    In TIDL_runtimesPostProcessNet 1
    In TIDL_runtimesPostProcessNet 2
    In TIDL_runtimesPostProcessNet 3
    ****************************************************
    **                ALL MODEL CHECK PASSED          **
    ****************************************************
    
    In TIDL_runtimesPostProcessNet 4
    ************ in TIDL_subgraphRtCreate ************
     The soft limit is 2048
    The hard limit is 2048
    MEM: Init ... !!!
    MEM: Init ... Done !!!
     0.0s:  VX_ZONE_INIT:Enabled
     0.5s:  VX_ZONE_ERROR:Enabled
     0.21s:  VX_ZONE_WARNING:Enabled
     0.2472s:  VX_ZONE_INIT:[tivxInit:190] Initialization Done !!!
    
    --------------------------------------------
    TIDL Memory size requiement (record wise):
    MemRecNum   , Space               , Attribute   , Alignment   , Size(KBytes), BasePtr
    0           , DDR Cacheable       , Persistent  ,  128, 19.25   , 0x00000000
    1           , DDR Cacheable       , Persistent  ,  128, 0.64    , 0x00000000
    2           , DDR Cacheable       , Scratch     ,  128, 16.00   , 0x00000000
    3           , DDR Cacheable       , Scratch     ,  128, 4.00    , 0x00000000
    4           , DDR Cacheable       , Scratch     ,  128, 56.00   , 0x00000000
    5           , DDR Cacheable       , Persistent  ,  128, 257.88  , 0x00000000
    6           , DDR Cacheable       , Scratch     ,  128, 33549.04, 0x00000000
    7           , DDR Cacheable       , Scratch     ,  128, 0.12    , 0x00000000
    8           , DDR Cacheable       , Scratch     ,  128, 49152.12, 0x00000000
    9           , DDR Cacheable       , Scratch     ,  128, 65539.00, 0x00000000
    10          , DDR Cacheable       , Persistent  ,  128, 274.51  , 0x00000000
    11          , DDR Cacheable       , Scratch     ,  128, 512.25  , 0x00000000
    12          , DDR Cacheable       , Persistent  ,  128, 0.12    , 0x00000000
    13          , DDR Cacheable       , Persistent  ,  128, 6235.20 , 0x00000000
    14          , DDR Cacheable       , Persistent  ,  128, 0.00    , 0x00000000
    --------------------------------------------
    Total memory size requirement (space wise):
    Mem Space , Size(KBytes)
    DDR Cacheable, 155616.13
    --------------------------------------------
    NOTE: Memory requirement in host emulation can be different from the same on EVM
          To get the actual TIDL memory requirement make sure to run on EVM with
          debugTraceLevel = 2
    
    --------------------------------------------
    TIDL init call from ivision API
    
    --------------------------------------------
    TIDL Memory size requiement (record wise):
    MemRecNum   , Space               , Attribute   , Alignment   , Size(KBytes), BasePtr
    0           , DDR Cacheable       , Persistent  ,  128, 19.25   , 0xaf267000
    1           , DDR Cacheable       , Persistent  ,  128, 0.64    , 0xb1a2c000
    2           , DDR Cacheable       , Scratch     ,  128, 16.00   , 0xac5c2000
    3           , DDR Cacheable       , Scratch     ,  128, 4.00    , 0xb1a2b000
    4           , DDR Cacheable       , Scratch     ,  128, 56.00   , 0xac5b4000
    5           , DDR Cacheable       , Persistent  ,  128, 257.88  , 0xab47a000
    6           , DDR Cacheable       , Scratch     ,  128, 33549.04, 0x55f3c000
    7           , DDR Cacheable       , Scratch     ,  128, 0.12    , 0xb1a2a000
    8           , DDR Cacheable       , Scratch     ,  128, 49152.12, 0x52f3b000
    9           , DDR Cacheable       , Scratch     ,  128, 65539.00, 0x4ef3a000
    10          , DDR Cacheable       , Persistent  ,  128, 274.51  , 0xab435000
    11          , DDR Cacheable       , Scratch     ,  128, 512.25  , 0xab3b4000
    12          , DDR Cacheable       , Persistent  ,  128, 0.12    , 0xb1a29000
    13          , DDR Cacheable       , Persistent  ,  128, 6235.20 , 0x5d6c6000
    14          , DDR Cacheable       , Persistent  ,  128, 0.00    , 0xb1774000
    --------------------------------------------
    Total memory size requirement (space wise):
    Mem Space , Size(KBytes)
    DDR Cacheable, 155616.13
    --------------------------------------------
    NOTE: Memory requirement in host emulation can be different from the same on EVM
          To get the actual TIDL memory requirement make sure to run on EVM with
          debugTraceLevel = 2
    
    --------------------------------------------
    Alg Init for Layer # -    1
    Alg Init for Layer # -    2
    Alg Init for Layer # -    3
    PREEMPTION: Adding a new priority object for targetPriority = 2, handle = 0x7e62af267000
    PREEMPTION: Now total number of priority objects = 1 at priorityId = 2,    with new memRec of base = 0x7e62b1a29000 and size = 128
    PREEMPTION: Requesting context memory addr for handle 0x7e62af267000, return Addr = 0x7e628401a1b8
    ************ TIDL_subgraphRtCreate done ************
     *******   In TIDL_subgraphRtInvoke  ********
    TIDL_activate is called with handle : af267000
    Core 0 Alg Process for Layer # -    0, layer type 0
    Core 0 Alg Process for Layer # -    1, layer type 29
    Processing Layer # -    1
    End of Layer # -    1 with outPtrs[0] = 0x7e6255f3c000
    Core 0 Alg Process for Layer # -    2, layer type 1
    Processing Layer # -    2
    End of Layer # -    2 with outPtrs[0] = 0x7e6255fff100
    Core 0 Alg Process for Layer # -    3, layer type 29
    Processing Layer # -    3
    End of Layer # -    3 with outPtrs[0] = 0x7e625dcdd000
    Core 0 Alg Process for Layer # -    4, layer type 0
    TIDL_process is completed with handle : af267000
     Layer,   Layer Cycles,kernelOnlyCycles, coreLoopCycles,LayerSetupCycles,dmaPipeupCycles, dmaPipeDownCycles, PrefetchCycles,copyKerCoeffCycles,LayerDeinitCycles,LastBlockCycles, paddingTrigger,    paddingWait,LayerWithoutPad,LayerHandleCopy,   BackupCycles,  RestoreCycles,Multic7xContextCopyCycles,
         0,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
         1,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
         2,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
         3,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
         0,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
     Sum of Layer Cycles 0
    Sub Graph Stats 63.000000 237519.000000 13400.000000
    *******  TIDL_subgraphRtInvoke done  ********
    
    **********  Frame Index 1 : Running fixed point mode for calibration **********
    In TIDL_runtimesPostProcessNet
    In TIDL_runtimesPostProcessNet 1
    In TIDL_runtimesPostProcessNet 2
    In TIDL_runtimesPostProcessNet 3
    Empty prototxt path, running calibration
    
    ~~~~~Running TIDL in PC emulation mode to collect Activations range for each layer~~~~~
    
    Processing config file #0 : /home/pratik/edgeai-tidl-tools/j721e/lucid-test/edgeai-tidl-tools/examples/osrt_python/advanced_examples/unit_tests_validation/model-artifacts/test/tempDir/subgraph_0_tidl_io_.qunat_stats_config.txt
     Freeing memory for user provided Net
     ----------------------- TIDL Process with REF_ONLY FLOW ------------------------
    
    #    0 . .. T     273.24  .... ..... ... .... .....
    
    
     *****************   Calibration iteration number 0 started ************************
    
    
    
    Empty prototxt path, running calibration
    
    ~~~~~Running TIDL in PC emulation mode to collect Activations range for each layer~~~~~
    
    Processing config file #0 : /home/pratik/edgeai-tidl-tools/j721e/lucid-test/edgeai-tidl-tools/examples/osrt_python/advanced_examples/unit_tests_validation/model-artifacts/test/tempDir/subgraph_0_tidl_io_.qunat_stats_config.txt
     Freeing memory for user provided Net
     ----------------------- TIDL Process with REF_ONLY FLOW ------------------------
    
    #    0 . .. T     282.00  .... ..... ... .... .....
    
    
     *****************   Calibration iteration number 0 completed ************************
    
    
    
    Empty prototxt path, running calibration
    
    ------------------ Network Compiler Traces -----------------------------
    successful Memory allocation
    successful Workload Creation
    ****************************************************
    **                ALL MODEL CHECK PASSED          **
    ****************************************************
    
    In TIDL_runtimesPostProcessNet 4
     Graph Domain TO version : 8In TIDL_onnxRtImportInit subgraph_name=subgraph_1
    Layer 0, subgraph id subgraph_1, name=output
    Layer 1, subgraph id subgraph_1, name=/1/MaxPool_output_0
    In TIDL_runtimesOptimizeNet: LayerIndex = 3, dataIndex = 2
    
     ************** Frame index 1 : Running float import *************
    In TIDL_runtimesPostProcessNet
    In TIDL_runtimesPostProcessNet 1
    In TIDL_runtimesPostProcessNet 2
    In TIDL_runtimesPostProcessNet 3
    ****************************************************
    **                ALL MODEL CHECK PASSED          **
    ****************************************************
    
    In TIDL_runtimesPostProcessNet 4
    ************ in TIDL_subgraphRtCreate ************
    
    --------------------------------------------
    TIDL Memory size requiement (record wise):
    MemRecNum   , Space               , Attribute   , Alignment   , Size(KBytes), BasePtr
    0           , DDR Cacheable       , Persistent  ,  128, 19.25   , 0x00000000
    1           , DDR Cacheable       , Persistent  ,  128, 0.64    , 0x00000000
    2           , DDR Cacheable       , Scratch     ,  128, 16.00   , 0x00000000
    3           , DDR Cacheable       , Scratch     ,  128, 4.00    , 0x00000000
    4           , DDR Cacheable       , Scratch     ,  128, 56.00   , 0x00000000
    5           , DDR Cacheable       , Persistent  ,  128, 256.90  , 0x00000000
    6           , DDR Cacheable       , Scratch     ,  128, 8642.50 , 0x00000000
    7           , DDR Cacheable       , Scratch     ,  128, 0.12    , 0x00000000
    8           , DDR Cacheable       , Scratch     ,  128, 288.12  , 0x00000000
    9           , DDR Cacheable       , Scratch     ,  128, 16902.00, 0x00000000
    10          , DDR Cacheable       , Persistent  ,  128, 274.51  , 0x00000000
    11          , DDR Cacheable       , Scratch     ,  128, 512.25  , 0x00000000
    12          , DDR Cacheable       , Persistent  ,  128, 0.12    , 0x00000000
    13          , DDR Cacheable       , Persistent  ,  128, 6234.76 , 0x00000000
    14          , DDR Cacheable       , Persistent  ,  128, 0.00    , 0x00000000
    --------------------------------------------
    Total memory size requirement (space wise):
    Mem Space , Size(KBytes)
    DDR Cacheable, 33207.18
    --------------------------------------------
    NOTE: Memory requirement in host emulation can be different from the same on EVM
          To get the actual TIDL memory requirement make sure to run on EVM with
          debugTraceLevel = 2
    
    --------------------------------------------
    TIDL init call from ivision API
    
    --------------------------------------------
    TIDL Memory size requiement (record wise):
    MemRecNum   , Space               , Attribute   , Alignment   , Size(KBytes), BasePtr
    0           , DDR Cacheable       , Persistent  ,  128, 19.25   , 0xac598000
    1           , DDR Cacheable       , Persistent  ,  128, 0.64    , 0xb0df9000
    2           , DDR Cacheable       , Scratch     ,  128, 16.00   , 0xac594000
    3           , DDR Cacheable       , Scratch     ,  128, 4.00    , 0xafc2e000
    4           , DDR Cacheable       , Scratch     ,  128, 56.00   , 0xac586000
    5           , DDR Cacheable       , Persistent  ,  128, 256.90  , 0xab2d1000
    6           , DDR Cacheable       , Scratch     ,  128, 8642.50 , 0x5c654000
    7           , DDR Cacheable       , Scratch     ,  128, 0.12    , 0xafc2d000
    8           , DDR Cacheable       , Scratch     ,  128, 288.12  , 0xab288000
    9           , DDR Cacheable       , Scratch     ,  128, 16902.00, 0x467fd000
    10          , DDR Cacheable       , Persistent  ,  128, 274.51  , 0xab243000
    11          , DDR Cacheable       , Scratch     ,  128, 512.25  , 0xab1c2000
    12          , DDR Cacheable       , Persistent  ,  128, 0.12    , 0xafc2c000
    13          , DDR Cacheable       , Persistent  ,  128, 6234.76 , 0x5c03d000
    14          , DDR Cacheable       , Persistent  ,  128, 0.00    , 0xaf96b000
    --------------------------------------------
    Total memory size requirement (space wise):
    Mem Space , Size(KBytes)
    DDR Cacheable, 33207.18
    --------------------------------------------
    NOTE: Memory requirement in host emulation can be different from the same on EVM
          To get the actual TIDL memory requirement make sure to run on EVM with
          debugTraceLevel = 2
    
    --------------------------------------------
    Alg Init for Layer # -    1
    Alg Init for Layer # -    2
    Alg Init for Layer # -    3
    PREEMPTION: Adding a new priority object for targetPriority = 2, handle = 0x7e62ac598000
    PREEMPTION: Now total number of priority objects = 2 at priorityId = 2,    with new memRec of base = 0x7e62afc2c000 and size = 128
    PREEMPTION: Requesting context memory addr for handle 0x7e62ac598000, return Addr = 0x7e628401a1b8
    ************ TIDL_subgraphRtCreate done ************
     *******   In TIDL_subgraphRtInvoke  ********
    TIDL_deactivate is called with handle : af267000
    TIDL_activate is called with handle : ac598000
    Core 0 Alg Process for Layer # -    0, layer type 0
    Core 0 Alg Process for Layer # -    1, layer type 29
    Processing Layer # -    1
    End of Layer # -    1 with outPtrs[0] = 0x7e625c654000
    Core 0 Alg Process for Layer # -    2, layer type 1
    Processing Layer # -    2
    End of Layer # -    2 with outPtrs[0] = 0x7e625ce94680
    Core 0 Alg Process for Layer # -    3, layer type 29
    Processing Layer # -    3
    End of Layer # -    3 with outPtrs[0] = 0x7e62ab339000
    Core 0 Alg Process for Layer # -    4, layer type 0
    TIDL_process is completed with handle : ac598000
     Layer,   Layer Cycles,kernelOnlyCycles, coreLoopCycles,LayerSetupCycles,dmaPipeupCycles, dmaPipeDownCycles, PrefetchCycles,copyKerCoeffCycles,LayerDeinitCycles,LastBlockCycles, paddingTrigger,    paddingWait,LayerWithoutPad,LayerHandleCopy,   BackupCycles,  RestoreCycles,Multic7xContextCopyCycles,
         0,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
         1,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
         2,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
         3,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
         0,              0,              0,              0,              0,              0,                 0,              0,                 0,              0,              0,              0,              0,              0,              0,              0,              0,              0,
     Sum of Layer Cycles 0
    Sub Graph Stats 720.000000 60768.000000 309.000000
    *******  TIDL_subgraphRtInvoke done  ********
    
    **********  Frame Index 1 : Running fixed point mode for calibration **********
    In TIDL_runtimesPostProcessNet
    In TIDL_runtimesPostProcessNet 1
    In TIDL_runtimesPostProcessNet 2
    In TIDL_runtimesPostProcessNet 3
    Empty prototxt path, running calibration
    
    ~~~~~Running TIDL in PC emulation mode to collect Activations range for each layer~~~~~
    
    Processing config file #0 : /home/pratik/edgeai-tidl-tools/j721e/lucid-test/edgeai-tidl-tools/examples/osrt_python/advanced_examples/unit_tests_validation/model-artifacts/test/tempDir/subgraph_1_tidl_io_.qunat_stats_config.txt
     Freeing memory for user provided Net
     ----------------------- TIDL Process with REF_ONLY FLOW ------------------------
    
    #    0 . .. T      67.93  .... ..... ... .... .....
    
    
     *****************   Calibration iteration number 0 started ************************
    
    
    
    Empty prototxt path, running calibration
    
    ~~~~~Running TIDL in PC emulation mode to collect Activations range for each layer~~~~~
    
    Processing config file #0 : /home/pratik/edgeai-tidl-tools/j721e/lucid-test/edgeai-tidl-tools/examples/osrt_python/advanced_examples/unit_tests_validation/model-artifacts/test/tempDir/subgraph_1_tidl_io_.qunat_stats_config.txt
     Freeing memory for user provided Net
     ----------------------- TIDL Process with REF_ONLY FLOW ------------------------
    
    #    0 . .. T      51.05  .... ..... ... .... .....
    
    
     *****************   Calibration iteration number 0 completed ************************
    
    
    
    Empty prototxt path, running calibration
    
    ------------------ Network Compiler Traces -----------------------------
    successful Memory allocation
    successful Workload Creation
    ****************************************************
    **                ALL MODEL CHECK PASSED          **
    ****************************************************
    
    In TIDL_runtimesPostProcessNet 4
    Completed model -  test.onnx
    ************ in TIDL_subgraphRtDelete ************
     PREEMPTION: Removing priroty object with handle = 0x7e62af267000 and targetPriority = 2,      Number of obejcts left are = 1, removed object with base  = 0x7e62afc2c000 and size =128
    ************ in TIDL_subgraphRtDelete ************
     TIDL_deactivate is called with handle : ac598000
    PREEMPTION: Removing priroty object with handle = 0x7e62ac598000 and targetPriority = 2,      Number of obejcts left are = 0, removed object with base  = 0x7e62b1a29000 and size =128
    MEM: Deinit ... !!!
    MEM: Alloc's: 50 alloc's of 250351926 bytes
    MEM: Free's : 50 free's  of 250351926 bytes
    MEM: Open's : 0 allocs  of 0 bytes
    MEM: Deinit ... Done !!!
    

    Can you please do diligence of compile options and input you are passing to onnx graph.

    Thanks

  • Thank you very much for your test.

    I've copied test.onnx to .../models/public, and add following config to model_configs.py.

    'test' : {
            'model_path' : os.path.join(models_base_path, 'test.onnx'),
            'mean': [0, 0, 0],
            'scale' : [1, 1, 1],
            'num_images' : 1,
            'session_name' : 'onnxrt' ,
            'model_type': 'seg',
            'optional_options' : 
            {
                "deny_list:layer_type": "MaxPool",
            }
        },

    After running onnxrt_ep.py with "-c" parameter, same issue occured.

    onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running ReorderInput node. Name:'ReorderInput' Status Message: /onnxruntime/onnxruntime/contrib_ops/cpu/nchwc_ops.cc:17 virtual onnxruntime::common::Status onnxruntime::contrib::ReorderInput::Compute(onnxruntime::OpKernelContext*) const X_rank == 4 was false.

    I used default configs, so I a, wondering if you can share your configs, so I can check where is wrong.

    Thank you very much.

  • Hi,

    I have tried advanced example scripts inside example directory, can you please check the same (also my compilation options are visible in above log, can you please confirm the same)

    Thanks 

  • Hi,

    I use same parameters as yours and still can not pass the compilation. I have checked READMEs and searched the forums for a solution, and still can not find the problem.

    I want to ask that why in my case, the n * c * h * w data is converted to 1 * 1 * n * c * h * w automatically. I found that in this README, data in network svg images are in n * c * h * w format, however in my model artifacts, there are 2 extra dimensions. I am wondering if there are any configs that can control this?

    I found likely problem in the forum, but there are not clear solutions. They are all caused by that MaxPool layers in YOLO8 can not be converted to TIDL, I think.

    link1

    link2

  • Hi,

    Clarification on results : 

    1) I have used model shared by you (test.onnx), my edgeai tidl tools repo is checked out at tag 9.2.9.0 (TDA4VM/AM68PA)

    2) The model compilation flow (As mentioned above is done using advanced_examples), the logs are available above.

    I suspect some env level nuances, may be can you redo the experiment again.

    Clarification on dims : 

    I want to ask that why in my case, the n * c * h * w data is converted to 1 * 1 * n * c * h * w automatically. I found that in this README, data in network svg images are in n * c * h * w format, however in my model artifacts, there are 2 extra dimensions. I am wondering if there are any configs that can control this?

    Internally TIDL works in 6 dimensional flow (1x1xNCHW), this was added 9.0 sdk onward to support transformer models