TDA4VH-Q1: Fatal Error at Memory Planning during compilation

Part Number: TDA4VH-Q1

Tool/software:

Hello,

Regarding my previous post (of which I'm hoping "ask a related question" button creating a link) I have updated the flattening process by including batch dimension inside the flattened spatial dimensions. The quantization process is now producing output detections and projected features turned to be non zero. But...

Now the compilation process fails at Memory Planning portion without giving any hints on how to solve it. So can someone help me about what this error is and how to fix it?

For the record, TIDL version 10.00.04, 8 bit and 16 bit quantization gives the same error.

------------------ Fixed-point Calibration Iteration [2 / 2]: ------------------


==================== [Quantization & Calibration Completed] ====================

========================== [Memory Planning Started] ==========================


------------------------- Network Compiler Traces ------------------------------
Successful Memory Allocation
Floating point exception (core dumped)
[TIDL Import] FATAL ERROR: Network Compiler failed to execute - Memory planning failed with return code - 34816 -- [tidl_import_core.cpp, 1002]
[TIDL Import] Aborting

Process finished with exit code 1

  • Hello Cern,

    Due to India holiday, responses will be delayed by a day or two.

    Josue

  • This error seemed to have been caused by slice-add I have used to replace torch.sum(). Torch.sum() was creating ReduceSum on onnx and it wasn't supported by TIDL causing it to divide the network into two parts excluding ReduceSum from GPU deployable layers.

    I have replaced summation with a convolutional layer, and it seems to work now on single core.

    My ultimate goal is to deploy onto multiple cores. Therefore I have set num_cores = 2 and inference_mode = 2 to compile the network for multi core usage. Now I am getting the error below. No idea which layer are the layers given with LayerNum. I have no idea which layer it is that does not support Workload and Batch processing that has more than 1 batch. 

    Any help would be appreciated. Getting frustrated by trial and error to fix errors, completely blindly. Total loss of workforce!

    ------------------ Fixed-point Calibration Iteration [1 / 1]: ------------------


    ==================== [Quantization & Calibration Completed] ====================

    ========================== [Memory Planning Started] ==========================


    ------------------------- Network Compiler Traces ------------------------------
    ERROR : [file:src/netanalysis.c, func:createBatchGroupInfoPostLayerExecutionDecision, line:3661] LayerNum=511643664 doesnt support Workload and Batch Processing and has more than 1 batch

    ========================= [Memory Planning Completed] =========================

    [TIDL Import] ERROR: Could not open /home/ct22/remote-pycharm/edgeai_tidl_tools/model-artifacts/ipm_mslr_ph_15x/tempDir/subgraph_0_tidl_net/perfSimInfo.bin -- [tidl_import_core.cpp, 1034]
    Rerunning network compiler...
    ========================== [Memory Planning Started] ==========================


    ------------------------- Network Compiler Traces ------------------------------
    ERROR : [file:src/netanalysis.c, func:createBatchGroupInfoPostLayerExecutionDecision, line:3661] LayerNum=884035600 doesnt support Workload and Batch Processing and has more than 1 batch

    ========================= [Memory Planning Completed] =========================

    [TIDL Import] ERROR: Could not open /home/ct22/remote-pycharm/edgeai_tidl_tools/model-artifacts/ipm_mslr_ph_15x/tempDir/subgraph_0_tidl_net/perfSimInfo.bin -- [tidl_import_core.cpp, 1034]
    [TIDL Import] [PARSER] WARNING:
    ********************************************************************
    * Network compiler returned with error or didn't executed *
    * This model can only be used on PC/Host emulation mode *
    * It is not expected to work on target/EVM *
    ********************************************************************

  • Hi,

    ReduceSum on onnx and it wasn't supported by TIDL causing it to divide the network into two parts excluding ReduceSum from GPU deployable layers.

    Yes the reduce sum is not supported by TIDL, however you can replace reduce sum layer by mat-mul.

    Ex.

    Lets say you want to do reducesum along height dim, have identity row matrix and get it multiplied by input feature map.

  • Hello,

    I tried your suggestion. Memory Planning gave a weird error and when I try to deploy on TI chip I get lots of errors as well. To repeat, these errors occur when I'm trying to deploy the model on multiple cores. And that is the part I need help with.

    Compilation error:

    ==================== [Quantization & Calibration Completed] ====================

    ========================== [Memory Planning Started] ==========================


    ------------------------- Network Compiler Traces ------------------------------
    Successful Memory Allocation
    Successful Workload Creation

    ========================= [Memory Planning Completed] =========================

    Rerunning network compiler...
    ========================== [Memory Planning Started] ==========================


    ------------------------- Network Compiler Traces ------------------------------
    ERROR : [file:src/dataflow_iter.c, func:degradeIterParams, line:1753] Free Space in L2 is negative

    ========================= [Memory Planning Completed] =========================

    Segmentation fault (core dumped)
    [TIDL Import] WARNING: System command failed with return code : 35584. Skipping Graph Visualization.
    Segmentation fault (core dumped)
    [TIDL Import] WARNING: System command failed with return code : 35584. Skipping Graph Visualization.
    ======================== Subgraph Compiled Successfully ========================

    TI Errors:

    602020.719757 s:  VX_ZONE_INIT:Enabled

    602020.719770 s:  VX_ZONE_ERROR:Enabled

    602020.719781 s:  VX_ZONE_WARNING:Enabled

    602020.720423 s:  VX_ZONE_INIT:[tivxPlatformCreateTargetId:124] Added target MPU-0 

    602020.720519 s:  VX_ZONE_INIT:[tivxPlatformCreateTargetId:124] Added target MPU-1 

    602020.720641 s:  VX_ZONE_INIT:[tivxPlatformCreateTargetId:124] Added target MPU-2 

    602020.720733 s:  VX_ZONE_INIT:[tivxPlatformCreateTargetId:124] Added target MPU-3 

    602020.720747 s:  VX_ZONE_INIT:[tivxInitLocal:136] Initialization Done !!!

    602020.721667 s:  VX_ZONE_INIT:[tivxHostInitLocal:106] Initialization Done for HOST !!!

    602020.892709 s:  VX_ZONE_ERROR:[ownContextSendCmd:885] Command ack message returned failure cmd_status: -1

    602020.892742 s:  VX_ZONE_ERROR:[ownNodeKernelInit:592] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode

    602020.892754 s:  VX_ZONE_ERROR:[ownNodeKernelInit:593] Please be sure the target callbacks have been registered for this core

    602020.892765 s:  VX_ZONE_ERROR:[ownNodeKernelInit:594] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel

    602020.892777 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:620] kernel init for node 0, kernel com.ti.tidl:2:8 ... failed !!!

    602020.892804 s:  VX_ZONE_ERROR:[vxVerifyGraph:2254] Node kernel init failed

    602020.892815 s:  VX_ZONE_ERROR:[vxVerifyGraph:2311] Graph verify failed

    TIDL_RT_OVX: ERROR: Verifying TIDL graph ... Failed !!!

    TIDL_RT_OVX: ERROR: Verify OpenVX graph failed

    RT-Profile: TIDLRT_init_profiling 

    tidlrt_create            :      217395810 ns,

    tidl_rt_ovx_Init         :       41637163 ns,

    vxCreateContext          :         832789 ns,

    init_tidl_tiovx          :       31998840 ns,

    create_graph_tidl_tiovx  :        9906286 ns,

    verify_graph_tidl_tiovx  :      128450963 ns,

    tivxTIDLLoadKernels      :          19601 ns,

    mapConfig                :        1079962 ns,

    tivxAddKernelTIDL        :          71957 ns,

    mapNetwork               :       30313995 ns,

    setCreateParams          :         233727 ns,

    setArgs                  :         277093 ns,

    vxCreateUserDataObject   :          28001 ns,

    vxMapUserDataObject      :       19889854 ns,

    memcopy_network_buffer   :       10387180 ns,

    vxUnmapUserDataObject    :           5880 ns,

    ************ TIDL_subgraphRtCreate done ************ 

    TIDL Runtime created successfully

    execution starts...

     *******   In TIDL_subgraphRtInvoke  ******** 

    602021.021250 s:  VX_ZONE_ERROR:[ownContextSendCmd:885] Command ack message returned failure cmd_status: -1

    602021.021288 s:  VX_ZONE_ERROR:[ownNodeKernelInit:592] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode

    602021.021302 s:  VX_ZONE_ERROR:[ownNodeKernelInit:593] Please be sure the target callbacks have been registered for this core

    602021.021312 s:  VX_ZONE_ERROR:[ownNodeKernelInit:594] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel

    602021.021328 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:620] kernel init for node 0, kernel com.ti.tidl:2:8 ... failed !!!

    602021.021345 s:  VX_ZONE_ERROR:[vxVerifyGraph:2254] Node kernel init failed

    602021.021355 s:  VX_ZONE_ERROR:[vxVerifyGraph:2311] Graph verify failed

    602021.021412 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:919] graph is not in a state required to be scheduled

    602021.021423 s:  VX_ZONE_ERROR:[vxProcessGraph:844] schedule graph failed

    602021.021430 s:  VX_ZONE_ERROR:[vxProcessGraph:849] wait graph failed

    ERROR: Running TIDL graph ... Failed !!!

    Sub Graph Stats 3785.000000 21782.000000 17797103087559424.000000 

    *******  TIDL_subgraphRtInvoke done  ******** 

    2024-10-04 08:04:56.398716326 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running TIDL_0 node. Name:'TIDLExecutionProvider_TIDL_0_0' Status Message: TIDL Compute Invoke Failed.

    Traceback (most recent call last):

      File "/root/***

        tidl_infer.run()

      File "/root/***

        output = list(self.sess.run(None, {input_name: input_data,

                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

      File "/usr/lib/python3.12/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 200, in run

        return self._sess.run(output_names, input_feed, run_options)

               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running TIDL_0 node. Name:'TIDLExecutionProvider_TIDL_0_0' Status Message: TIDL Compute Invoke Failed.

    ************ in TIDL_subgraphRtDelete ************ 

     602021.680441 s:  VX_ZONE_INIT:[tivxHostDeInitLocal:120] De-Initialization Done for HOST !!!

    602021.684990 s:  VX_ZONE_INIT:[tivxDeInitLocal:206] De-Initialization Done !!!

    APP: Deinit ... !!!

    REMOTE_SERVICE: Deinit ... !!!

    REMOTE_SERVICE: Deinit ... Done !!!

    602021.686021 s: IPC: Deinit ... !!!

    602021.686999 s: IPC: DeInit ... Done !!!

    602021.687036 s: MEM: Deinit ... !!!

    602021.687049 s: DDR_SHARED_MEM: Alloc's: 16 alloc's of 73530804 bytes 

    602021.687058 s: DDR_SHARED_MEM: Free's : 16 free's  of 73530804 bytes 

    602021.687065 s: DDR_SHARED_MEM: Open's : 0 allocs  of 0 bytes 

    602021.687075 s: MEM: Deinit ... Done !!!

    APP: Deinit ... Done !!!

  • Additionally using MatMul with ones vector, the model doesn't even work on single core giving the same errors as above.

    Whereas when using a convolution with unit weights, zero bias, no grad flag, single core compilation is successful and it works on TI chip

    ========================= [Memory Planning Completed] =========================

    ======================== Subgraph Compiled Successfully ========================

    So matmul addition doesn't work on single or multi core

    Convolution addition doesn't work on multi core works on single core

    Slice+Add (feat[0] + feat[1]...) doesn't work on multi core works on single core

  • Hi Cem,

    I apologize for the delay in getting back to you. 

    To summarize your progress/issues you are seeing:

    I tried your suggestion. Memory Planning gave a weird error and when I try to deploy on TI chip I get lots of errors as well. To repeat, these errors occur when I'm trying to deploy the model on multiple cores. And that is the part I need help with.

    After taking Pratik's suggestion, you are able to able to run single core inference, but seeing issues with multicore.

    Additionally using MatMul with ones vector, the model doesn't even work on single core giving the same errors as above.

    Whereas when using a convolution with unit weights, zero bias, no grad flag, single core compilation is successful and it works on TI chip

    ========================= [Memory Planning Completed] =========================

    ======================== Subgraph Compiled Successfully ========================

    So matmul addition doesn't work on single or multi core

    Convolution addition doesn't work on multi core works on single core

    Slice+Add (feat[0] + feat[1]...) doesn't work on multi core works on single core

    Based on your latest response, you were able to get some reducesum fixes working on single core but not multicore. 

    Looking at your related thread, you have a multicamera (2 camera = 2 batches) application you want to run on 2 DSPs, in which case are you looking to do parallel batch processing? 

    My ultimate goal is to deploy onto multiple cores. Therefore I have set num_cores = 2 and inference_mode = 2 to compile the network for multi core usage.

    Inference mode = 2 corresponds to TIDL_inferenceModeLowLatency. Based on some of your description, are you trying to get the same functionality as TIDL_inferenceModeHighThroughput instead (definitions of both these modes here)? This would correspond to inference_mode = 1.

    Best,

    Asha

  • Hello Asha

    Looking at your related thread, you have a multicamera (2 camera = 2 batches) application you want to run on 2 DSPs, in which case are you looking to do parallel batch processing? 

    Our main goal is to use multi core processing for multiple cameras. Due to the nature of Convolution2D from torch library, each camera has to be put inside the batch dimension (Conv2D accepts 4 dimensions: Batch, Channels (for color), Height and Width). We are open to suggestions if there is other ways to implement multi-cam multi-DSP processing without using batch dimension. (Because adding and removing data to batch dimension is also problematic with TIDL)

    Inference mode = 2 corresponds to TIDL_inferenceModeLowLatency. Based on some of your description, are you trying to get the same functionality as TIDL_inferenceModeHighThroughput instead (definitions of both these modes here)? This would correspond to inference_mode = 1.

    I cannot use inference_mode = 1, it throws this error:

    ==================== [Optimization for subgraph_0 Started] ====================

    [TIDL Import] [PARSER] WARNING: Requested input data convert layer is not added to the network, It is currently not optimal
    [TIDL Import] ERROR: High Throughtput Inference Mode is not supported when partial batch is detected in graph -- [tidl_import_core.cpp, 2836]
    [TIDL Import] ERROR: Network Optimization failed - Failed in function: TIDL_runtimesOptimizeNet -- [tidl_runtimes_import_common.cpp, 1232]
    [TIDL Import] [PARSER] ERROR: - Failed in function: TIDL_computeImportFunc -- [tidl_onnxRtImport_EP.cpp, 1449]