SK-TDA4VM: Issue Running YOLOX Model on AM68PA with ONNX Runtime in C++

Rosti New

Part Number: SK-TDA4VM
Other Parts Discussed in Thread: TDA4VM

Tool/software:

Hello,

I am trying to run a YOLOX model on the AM68PA board using the ONNX Runtime for object detection in C++. The model I compiled is a pretrained model yolox_nano_ti_lite_26p1_41p8.onnx, which I downloaded from https://github.com/TexasInstruments/edgeai-tensorlab/tree/main/edgeai-yolox/pretrained_models

I have successfully compiled the model with the EdgeAI-TIDL-Tools (Version 9.2.6) on an x86 machine and generated the model artifacts with the onnxrt_ep.py script. The configuration I used for this model are:

{ 'model_path' : os.path.join(models_base_path, 'yolox_nano_ti_lite_26p1_41p8.onnx'), 'mean': [0, 0, 0], 'scale' : [1, 1, 1], 'num_images' : 3, 'num_classes': 80, 'model_type': 'od', 'od_type' : 'YoloV5', 'framework' : '', 'session_name' : 'onnxrt', 'meta_layers_names_list' : os.path.join(models_base_path, 'yolox_nano_ti_lite_metaarch.prototxt'), 'meta_arch_type' : 6 }

I executed the compiled model on the AM68PA board using the example code in https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/examples/osrt_cpp/ort/onnx_main.cpp. While the execution completes, I do not receive any detection results. The bounding box output also seems incorrect. Here is an example of the inference output on a sample image:

user@tda4vm-sk:~/edgeai-tidl-tools$ ./bin/Release/ort_main -f "model-artifacts/yolox_nano_ti_lite/" -v 0 -i "test_data/ADE_val_00001801.jpg" -c 1

***** Display run Config: start *****

verbose level set to: 0

accelerated mode set to: 1

device mem set to: 1

loop count set to: 1

model path set to:

model artifacts path set to: model-artifacts/yolox_nano_ti_lite/

image path set to: test_data/ADE_val_00001801.jpg

device_type set to: cpu

labels path set to: test_data/labels.txt

num of threads set to: 4

num of results set to: 5

num of warmup runs set to: 2

***** Display run Config: end *****

[07:28:41.000.000000]:INFO:[runInference:0316] accelerated mode

[07:28:41.000.000076]:INFO:[runInference:0319] artifacts: model-artifacts/yolox_nano_ti_lite/

libtidl_onnxrt_EP loaded 0x3e1df930

Final number of subgraphs created are : 1, - Offloaded Nodes - 272, Total Nodes - 272

APP: Init ... !!!

MEM: Init ... !!!

MEM: Initialized DMA HEAP (fd=5) !!!

MEM: Init ... Done !!!

IPC: Init ... !!!

IPC: Init ... Done !!!

REMOTE_SERVICE: Init ... !!!

REMOTE_SERVICE: Init ... Done !!!

62759.329229 s: GTC Frequency = 200 MHz

APP: Init ... Done !!!

62759.330526 s: VX_ZONE_INIT:Enabled

62759.331575 s: VX_ZONE_ERROR:Enabled

62759.331594 s: VX_ZONE_WARNING:Enabled

62759.332342 s: VX_ZONE_INIT:[tivxPlatformCreateTargetId:116] Added target MPU-0

62759.332452 s: VX_ZONE_INIT:[tivxPlatformCreateTargetId:116] Added target MPU-1

62759.332549 s: VX_ZONE_INIT:[tivxPlatformCreateTargetId:116] Added target MPU-2

62759.332653 s: VX_ZONE_INIT:[tivxPlatformCreateTargetId:116] Added target MPU-3

62759.332668 s: VX_ZONE_INIT:[tivxInitLocal:136] Initialization Done !!!

62759.333198 s: VX_ZONE_INIT:[tivxHostInitLocal:101] Initialization Done for HOST !!!

[07:28:42.458.458114]:INFO:[runInference:0342] Loaded model model-artifacts/yolox_nano_ti_lite//yolox_nano_ti_lite_26p1_41p8.onnx

[07:28:42.458.458199]:INFO:[printTensorInfo:0216] number of inputs:1

[07:28:42.458.458222]:INFO:[printTensorInfo:0217] number of outputs: 1

[07:28:42.458.458245]:INFO:[printTensorInfo:0218] input(0) name: images

[07:28:42.458.458263]:INFO:[printTensorInfo:0223] Input 0 : name=images

[07:28:42.458.458281]:INFO:[printTensorInfo:0230] Input 0 : type=1

[07:28:42.458.458297]:INFO:[printTensorInfo:0233] Input 0 : num_dims=4

[07:28:42.458.458314]:INFO:[printTensorInfo:0236] Input 0 : dim 0=1

[07:28:42.458.458329]:INFO:[printTensorInfo:0236] Input 0 : dim 1=3

[07:28:42.458.458343]:INFO:[printTensorInfo:0236] Input 0 : dim 2=416

[07:28:42.458.458358]:INFO:[printTensorInfo:0236] Input 0 : dim 3=416

[07:28:42.552.552204]:INFO:[preprocImage:0127] template NCHW

[07:28:42.564.564517]:INFO:[runInference:0474] invoked

[07:28:42.564.564592]:INFO:[runInference:0475] average time: 4.066000 ms

[07:28:42.564.564789]:INFO:[prepDetectionResult:0453] preparing detection result

[07:28:42.565.565004]:INFO:[prepDetectionResult:0518] box with score:1.000000, threshold set to:0.300000

[07:28:42.565.565049]:INFO:[prepDetectionResult:0518] box with score:2.000000, threshold set to:0.300000

[07:28:42.565.565076]:INFO:[prepDetectionResult:0518] box with score:2.000000, threshold set to:0.300000

[07:28:42.565.565100]:INFO:[prepDetectionResult:0518] box with score:2.000000, threshold set to:0.300000

[07:28:42.565.565125]:INFO:[prepDetectionResult:0518] box with score:7.000000, threshold set to:0.300000

[07:28:42.565.565150]:INFO:[prepDetectionResult:0518] box with score:7.000000, threshold set to:0.300000

[07:28:42.565.565175]:INFO:[prepDetectionResult:0518] box with score:9.000000, threshold set to:0.300000

[07:28:42.565.565200]:INFO:[prepDetectionResult:0518] box with score:58.000000, threshold set to:0.300000

[07:28:42.570.570924]:INFO:[runInference:0607]

Completed_Model : 0, Name : yolox_nano_ti_lite, Total time : 4.066000, Offload Time : 0 , DDR RW MBs : 0, Output File : cpp_out_yolox_nano_ti_lite.jpg

62759.515855 s: VX_ZONE_INIT:[tivxHostDeInitLocal:115] De-Initialization Done for HOST !!!

62759.517251 s: VX_ZONE_INIT:[tivxDeInitLocal:204] De-Initialization Done !!!

APP: Deinit ... !!!

REMOTE_SERVICE: Deinit ... !!!

REMOTE_SERVICE: Deinit ... Done !!!

IPC: Deinit ... !!!

IPC: DeInit ... Done !!!

MEM: Deinit ... !!!

DDR_SHARED_MEM: Alloc's: 10 alloc's of 8512640 bytes

DDR_SHARED_MEM: Free's : 10 free's of 8512640 bytes

DDR_SHARED_MEM: Open's : 0 allocs of 0 bytes

MEM: Deinit ... Done !!!

APP: Deinit ... Done !!!

[07:28:42.596.596231]:DEBUG:[~ModelInfo:0407] DESTRUCTOR

Execution of the pretrained onnx model on a x86 machine using https://github.com/TexasInstruments/edgeai-yolox/blob/main/demo/ONNXRuntime/onnx_inference.py works fine however and results in one correct detection.

Could you please help me understand what might be going wrong and how I can get the model to run correctly on the edge device? Are my compilation configurations correct?

Thank you for your assistance.

Best regards

10 months ago

0 Rosti New 10 months ago

Prodigy 10 points

Hi,
We have tried other options but to no success. Any insights or assistance would be greatly appreciated.
Thanks!

0 Asha Bhandarkar 10 months ago in reply to Rosti New

TI__Genius 10170 points

Hi Rosti,

Rosti New said:
I executed the compiled model on the AM68PA board using the example code in https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/examples/osrt_cpp/ort/onnx_main.cpp. While the execution completes, I do not receive any detection results. The bounding box output also seems incorrect. Here is an example of the inference output on a sample image:

Can you clarify if host emulation with this script results in a correct detection?

Best,

Asha

0 Rosti New 10 months ago in reply to Asha Bhandarkar

Prodigy 10 points

Hi Asha,

Thank you for your response. I appreciate your assistance with this matter.

I would like to clarify what you mean by host emulation. Could you please provide more details or guidance on how to emulate. Are there specific tools or software that I should use for this process?

Thank you very much for your help.

Best regards
Rosti

0 Rosti New 10 months ago in reply to Rosti New

Prodigy 10 points

Hi Asha,

additionally, I have already performed the process of running the inference code with onnx_main.cpp on an x86 machine, which I now understand is what you refer to as the "host emulation mode." The results on the x86 machine were the same or very similar to those on the TDA4VM board. Specifically, the box with scores were of the same order of magnitude and also incorrect.

0 Asha Bhandarkar 9 months ago in reply to Rosti New

TI__Genius 10170 points

Hi Rosti,

I do apologize for the lack of response on my end for this issue. Is this still a blocking issue for you currently?

If so, could you provide the following information so I could understand the problem better?

Rosti New said:
Execution of the pretrained onnx model on a x86 machine using https://github.com/TexasInstruments/edgeai-yolox/blob/main/demo/ONNXRuntime/onnx_inference.py works fine however and results in one correct detection.

When using the script, did you run with the same input test image as edgeai-tidl-tools and see the differing results?

You've done the compilation with the onnxrt_ep.py script, have you tried the inference on the AM68PA board and/or host emulation on x86 machine and seen the same issue?

Best,

Asha

Processors

Processors forum

SK-TDA4VM: Issue Running YOLOX Model on AM68PA with ONNX Runtime in C++