This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM62A7: edgeai_yolov5 model fails inference on AM62A

Part Number: AM62A7

Tool/software:

Hello, I trained a model based on edgeai_yolov5, and then set "export TIDL_RT_ONNX_VARDIM=1" to complete the compilation of the model (using edgeai_benckmark). However, when I used the compiled model to run on the AM62A board, an error occurred. The error message was kernel init for node 0, kernel com.ti.tidl:1:3 ... failed !!!. I will upload the error log and the model file I used, and hope to get your reply and answer. Thank you.file0918.zip

  • Hi Zhuangyh,

    We received your message. We will look at the data you provided and get back to you. Some of the experts on the matter are out this week. Please, expect some delay in the response. 

    Best regards,

    Qutaiba

  • Hi Zhuangyh

    The error message was kernel init for node 0, kernel com.ti.tidl:1:3 ... failed !!!.

    And thanks for including the logs too -- this is helpful

      1115.672579 s:  VX_ZONE_ERROR:[ownContextSendCmd:868] Command ack message returned failure cmd_status: -1
      1115.672622 s:  VX_ZONE_ERROR:[ownNodeKernelInit:584] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
      1115.672636 s:  VX_ZONE_ERROR:[ownNodeKernelInit:585] Please be sure the target callbacks have been registered for this core
      1115.672650 s:  VX_ZONE_ERROR:[ownNodeKernelInit:586] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
      1115.672666 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl:1:3 ... failed !!!
      1115.672688 s:  VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed
      1115.672700 s:  VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failed
      1115.672827 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:812] graph is not in a state required to be scheduled
      1115.672841 s:  VX_ZONE_ERROR:[vxProcessGraph:747] schedule graph failed
      1115.672853 s:  VX_ZONE_ERROR:[vxProcessGraph:752] wait graph failed
    ERROR: Running TIDL graph ... Failed !!!
    2024-09-18 02:22:31.509956502 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running MaxPool node. Name:'MaxPool_72' Status Message: Unsupported pooling size.
    terminate called after throwing an instance of 'Ort::Exception'
      what():  Non-zero status code returned while running MaxPool node. Name:'MaxPool_72' Status Message: Unsupported pooling size.
    Aborted (core dumped)

    I first noticed the 'maxpool' that is throwing an error -- this is 13x13 pooling kernel, which is beyond what the TIDL runtime can use. We usually work around this by implementing large kernels using multiple, cascaded kernels (e.g. 5x5 -> two 3x3's in sequence). This is functionally identical. You can try using the tidl_onnx_model_optimizer tools to implement some model update rules, namely tidl_convert_maxpool_to_cascaded_maxpool

    I'm not certain this is the reason for all the entire issue, though. There are error messages preceding this ONNX-level warning. There is not enough information in your log to be certain from. Could you please rerun with the following configuration:

    • echo $EDGEAI_VERSION     ### I need to know your SDK version
    • export TIDL_RT_DEBUG=1    ###
    • /opt/vx_app_arm_remote_log.out &  ### run  the background for more informative messages
    • We need to pass an extra option to the onnxruntime. Since you are using edgeai-gst-apps, this requires a small code edit.
      •  in  /usr/lib/python3.12/site-packages/edgeai_dl_inferer.py in the target filesystem, find the 'runtime_options' provded as an argument to onnxruntime.InferenceSession function call
      • Add 'debug_level': 2 to the runtime_options dictionary. This will impact performance. Reduce to 1 for smaller performance hit
        • This is necessary for more verbose OpenVX / TIOVX level error messages
    • run the app_edgeai.py with command line arugments "-n" and "-v"

    Could you then respond with the log, assuming that this isn't resolved by first using the maxpool model optimization rule I noted?


    BR,
    Reese