AM62A7: edgeai_yolov5 model fails inference on AM62A

zhuangyh

Part Number: AM62A7

Tool/software:

Hello, I trained a model based on edgeai_yolov5, and then set "export TIDL_RT_ONNX_VARDIM=1" to complete the compilation of the model (using edgeai_benckmark). However, when I used the compiled model to run on the AM62A board, an error occurred. The error message was kernel init for node 0, kernel com.ti.tidl:1:3 ... failed !!!. I will upload the error log and the model file I used, and hope to get your reply and answer. Thank you.file0918.zip

5 months ago

0 Qutaiba Saleh 5 months ago

TI__Expert 3050 points

Hi Zhuangyh,

We received your message. We will look at the data you provided and get back to you. Some of the experts on the matter are out this week. Please, expect some delay in the response.

Best regards,

Qutaiba

0 Reese Grimsley 5 months ago

TI__Genius 11246 points

Hi Zhuangyh

zhuangyh said:
The error message was kernel init for node 0, kernel com.ti.tidl:1:3 ... failed !!!.

And thanks for including the logs too -- this is helpful

  1115.672579 s:  VX_ZONE_ERROR:[ownContextSendCmd:868] Command ack message returned failure cmd_status: -1
  1115.672622 s:  VX_ZONE_ERROR:[ownNodeKernelInit:584] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
  1115.672636 s:  VX_ZONE_ERROR:[ownNodeKernelInit:585] Please be sure the target callbacks have been registered for this core
  1115.672650 s:  VX_ZONE_ERROR:[ownNodeKernelInit:586] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
  1115.672666 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl:1:3 ... failed !!!
  1115.672688 s:  VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed
  1115.672700 s:  VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failed
  1115.672827 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:812] graph is not in a state required to be scheduled
  1115.672841 s:  VX_ZONE_ERROR:[vxProcessGraph:747] schedule graph failed
  1115.672853 s:  VX_ZONE_ERROR:[vxProcessGraph:752] wait graph failed
ERROR: Running TIDL graph ... Failed !!!
2024-09-18 02:22:31.509956502 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running MaxPool node. Name:'MaxPool_72' Status Message: Unsupported pooling size.
terminate called after throwing an instance of 'Ort::Exception'
  what():  Non-zero status code returned while running MaxPool node. Name:'MaxPool_72' Status Message: Unsupported pooling size.
Aborted (core dumped)

I first noticed the 'maxpool' that is throwing an error -- this is 13x13 pooling kernel, which is beyond what the TIDL runtime can use. We usually work around this by implementing large kernels using multiple, cascaded kernels (e.g. 5x5 -> two 3x3's in sequence). This is functionally identical. You can try using the tidl_onnx_model_optimizer tools to implement some model update rules, namely tidl_convert_maxpool_to_cascaded_maxpool

I'm not certain this is the reason for all the entire issue, though. There are error messages preceding this ONNX-level warning. There is not enough information in your log to be certain from. Could you please rerun with the following configuration:

echo $EDGEAI_VERSION ### I need to know your SDK version
export TIDL_RT_DEBUG=1 ###
/opt/vx_app_arm_remote_log.out & ### run the background for more informative messages
We need to pass an extra option to the onnxruntime. Since you are using edgeai-gst-apps, this requires a small code edit.
- in /usr/lib/python3.12/site-packages/edgeai_dl_inferer.py in the target filesystem, find the 'runtime_options' provded as an argument to onnxruntime.InferenceSession function call
- Add 'debug_level': 2 to the runtime_options dictionary. This will impact performance. Reduce to 1 for smaller performance hit
  - This is necessary for more verbose OpenVX / TIOVX level error messages
run the app_edgeai.py with command line arugments "-n" and "-v"

Could you then respond with the log, assuming that this isn't resolved by first using the maxpool model optimization rule I noted?

BR,
Reese

Processors

Processors forum

AM62A7: edgeai_yolov5 model fails inference on AM62A