SK-AM69: MODEL ACCURACY LOW WHILE USING TIDLEXECUTIONPROVIDER in onnxruntime

venk at

Part Number: SK-AM69
Other Parts Discussed in Thread: AM69A

Tool/software:

WHILE RUNNING INFERENCE ON CPUEXECUTIONPROVIDER KEYPOINT AND BBOX ARE CORRECT BUT WHEN I USE TIDLEXECUTIONPROVIDER THE OUTPUTS ARE INCORRECT IE INCORRECT DETECTION LOW ACCURACY

PLS HELP I FEEL THE ISSUE WITH MODEL COMPILE AND ARTIFACTS GENERATION

CODE I USED TO COMPILE AND GENERATE ARTIFACTS

import os

import sys

import shutil

import onnxruntime as rt

import onnx

import onnx.shape_inference

import numpy as np

from PIL import Image

os.environ["TIDL_RT_PERFSTATS"] = "1"

if __name__ == "__main__":

_, model_path, calibration_images_path, out_dir_path = sys.argv

tidl_tools_path = os.environ["TIDL_TOOLS_PATH"]

os.makedirs(out_dir_path, exist_ok=True)

out_model_name = os.path.splitext(os.path.basename(model_path))[0] + "_with_shapes.onnx"

out_model_path = os.path.join(out_dir_path, out_model_name)

onnx.shape_inference.infer_shapes_path(model_path, out_model_path)

artifacts_dir = os.path.join(out_dir_path, "tidl_output")

try:

shutil.rmtree(artifacts_dir)

except FileNotFoundError:

pass

os.makedirs(artifacts_dir, exist_ok=False)

so = rt.SessionOptions()

print("Available execution providers : ", rt.get_available_providers())

calibration_images = [ os.path.join(calibration_images_path, name) for name in os.listdir(calibration_images_path) ]

num_calibration_frames = len(calibration_images)

num_calibration_iterations = 10

compilation_options = {

"platform": "AM69A",

"tidl_tools_path": tidl_tools_path,

"artifacts_folder": artifacts_dir,

"tensor_bits": 8,

"model_type": "OD",

'object_detection:meta_arch_type': 6,

'object_detection:meta_layers_names_list': os.path.splitext(model_path)[0] + ".prototxt",

"debug_level": 300,

"advanced_options:calibration_frames": num_calibration_frames,

"advanced_options:calibration_iterations": num_calibration_iterations,

}

desired_eps = ['TIDLCompilationProvider','CPUExecutionProvider']

sess = rt.InferenceSession(

out_model_path,

providers=desired_eps,

provider_options=[compilation_options, {}],

sess_options=so

)

input_details, = sess.get_inputs()

batch_size, channel, height, width = input_details.shape

print(f"Input shape: {input_details.shape}")

assert isinstance(batch_size, str) or batch_size == 1

assert channel == 3

input_name = input_details.name

input_type = input_details.type

print(f'Input "{input_name}": {input_type}')

assert input_type == 'tensor(float)'

for image_path in calibration_images:

img = Image.open(image_path).convert("RGB").resize((640, 640))

input_data = np.asarray(img).astype(np.float32).transpose((2, 0, 1))

input_data = np.expand_dims(input_data, 0)

sess.run(None, {input_name: input_data})

MODEL IS yoloxpose_s_8xb32-300e_coco-640 TRAINED ON CUSTOM DATA WITH 4 KEYPOINTS

name: "yolox"
tidl_yolo {
yolo_param {
input: "276"
anchor_width: 8.0
anchor_height: 8.0
}
yolo_param {
input: "308"
anchor_width: 16.0
anchor_height: 16.0
}
yolo_param {
input: "339"
anchor_width: 32.0
anchor_height: 32.0
}
detection_output_param {
num_classes: 1
share_location: true
background_label_id: -1
nms_param {
nms_threshold: 0.45
top_k: 200
}
code_type: CODE_TYPE_YOLO_X
keep_top_k: 200
confidence_threshold: 0.3
num_keypoint: 4
keypoint_confidence: true
}
name: "yolox"
in_width: 640
in_height: 640
output: "dets"
output: "labels"
}

DEPLOY CONFIG

_base_ = ['./pose-detection_static.py', '../_base_/backends/onnxruntime.py']

onnx_config = dict(

output_names=['detections'],

codebase_config = dict(

post_processing=dict(

score_threshold=0.05,

iou_threshold=0.5,

max_output_boxes_per_class=200,

pre_top_k=5000,

keep_top_k=100,

background_label_id=-1,

))

1 month ago

0 Chris Tsongas 1 month ago

TI__Genius 14420 points

HI Venk,

A couple of questions.

1. Can you please send me your model to try out?

2. Why are you starting out with the "complicated approach"?

It is easiest to compile and run a model with OSRT first. Basically:

1. Go to examples/osrt_python/ort

2. Edit ../modle_configs.py and add your model in the ONNX section

3. python3 ./onnxrt_ep.py -c -m <model_name_you_set_in_model_configs>

Regards,

Chris

0 venk at 1 month ago in reply to Chris Tsongas

Prodigy 45 points

Thanks,
chris for the quick reply,
i tried the way u suggested
still same issue
points are not correct when i use TIDLEXECUTIONPROVIDER i feel the issue is with my config

1. Can you please send me your model to try out?
ans:since model i trained on medical data i have to take permission from my compliance team, i will train on a different data and share, i am working on FDA class 3, class 2 medical device so,

2. Why are you starting out with the "complicated approach"?
ans:i felt this as a easy way to do it :(

is this correct config (still same issue accuracy is low when tidlexecutionprovider) note: i didn't make any edit on common_utils.py

    "pose-ort-yoloxpose_s_lite_coco-640": create_model_config(
        task_type="detection",
        source=dict(
            model_url="",
            meta_arch_url="",
            infer_shape=True,
        ),
        preprocess=dict(
            resize=640,
            crop=640,
            data_layout="NCHW",
            pad_color=[114, 114, 114],
            resize_with_pad=[True, "corner"],
            reverse_channels=True,
        ),
        session=dict(
            session_name="onnxrt",
            model_path=os.path.join("/root/ti2/edgeai-tidl-tools/my_model/onnx_model", "yoloxpose_s_lite_coco-640.onnx"),
            meta_layers_names_list=os.path.join("/root/ti2/edgeai-tidl-tools/my_model/onnx_model", "yoloxpose_s_lite_coco-640.prototxt"),
            meta_arch_type=6,
            input_mean=[0, 0, 0],
            input_scale=[1, 1, 1],
            input_optimization=True,
        ),
        postprocess=dict(
            formatter="DetectionBoxSL2BoxLS",
            resize_with_pad=True,
            keypoint=True,
            object6dpose=False,
            normalized_detections=False,
            shuffle_indices=None,
            squeeze_axis=None,
            reshape_list=[(-1, 18)],  # 18 values per detection: bbox(4) + obj_conf(1) + cls_conf(1) + keypoints(4×3=12)
            ignore_index=None,
        ),
        extra_info=dict(
            od_type="yoloxpose",
            framework="MMPose",
            num_images=numImages,
            num_classes=1,
            label_offset_type="1to1",
            label_offset=0,
        ),
    ),

root@539aa78ad277:~/ti2/edgeai-tidl-tools/examples/osrt_python/ort# python3 ./onnxrt_ep.py -c -m pose-ort-yoloxpose_s_lite_coco-640
Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']

Running 1 Models - ['pose-ort-yoloxpose_s_lite_coco-640']


Running_Model :  pose-ort-yoloxpose_s_lite_coco-640  


Running shape inference on model /root/ti2/edgeai-tidl-tools/kenny_pos/onnx_model/yoloxpose_s_lite_coco-640.onnx 

========================= [Model Compilation Started] =========================

Model compilation will perform the following stages:
1. Parsing
2. Graph Optimization
3. Quantization & Calibration
4. Memory Planning

============================== [Version Summary] ==============================

-------------------------------------------------------------------------------
|          TIDL Tools Version          |              11_00_06_00             |
-------------------------------------------------------------------------------
|         C7x Firmware Version         |              11_00_00_00             |
-------------------------------------------------------------------------------
|            Runtime Version           |                1.15.0                |
-------------------------------------------------------------------------------
|          Model Opset Version         |                  17                  |
-------------------------------------------------------------------------------

============================== [Parsing Started] ==============================

yolox is meta arch name 
yolox
Number of OD backbone nodes = 219 

------------------------- Subgraph Information Summary -------------------------
-------------------------------------------------------------------------------
|          Core           |      No. of Nodes       |   Number of Subgraphs   |
-------------------------------------------------------------------------------
| C7x                     |                     327 |                       1 |
| CPU                     |                       0 |                       x |
-------------------------------------------------------------------------------
============================= [Parsing Completed] =============================

TIDL Meta pipeLine (proto) file  : /root/ti2/edgeai-tidl-tools/kenny_pos/onnx_model/yoloxpose_s_lite_coco-640.prototxt  
yolox
yolox
==================== [Optimization for subgraph_0 Started] ====================

[TIDL Import] [PARSER] WARNING: Requested output data convert layer is not added to the network, It is currently not optimal
----------------------------- Optimization Summary -----------------------------
-------------------------------------------------------------------------------------
|            Layer           | Nodes before optimization | Nodes after optimization |
-------------------------------------------------------------------------------------
| TIDL_OdOutputReformatLayer |                         0 |                        1 |
| TIDL_DetectionOutputLayer  |                         0 |                        1 |
| TIDL_EltWiseLayer          |                         7 |                        7 |
| TIDL_ConcatLayer           |                        16 |                       16 |
| TIDL_ReLULayer             |                        86 |                        0 |
| TIDL_ResizeLayer           |                         2 |                        2 |
| TIDL_ConvolutionLayer      |                       102 |                      102 |
| TIDL_PoolingLayer          |                         6 |                        6 |
-------------------------------------------------------------------------------------

Total nodes in subgraph: 138

=================== [Optimization for subgraph_0 Completed] ===================

The soft limit is 10240
The hard limit is 10240
MEM: Init ... !!!
MEM: Init ... Done !!!
 0.0s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_INFO
 0.4s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_ERROR
 0.5s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_WARNING
 0.84s:  VX_ZONE_INFO: [ownAddTargetKernelInternal:189] registered kernel vx_tutorial_graph.phase_rgb on target DSP_C7-2
 0.939s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-0 
 0.961s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-1 
 0.983s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-2 
 0.996s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-3 
 0.1015s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-1 
 0.1028s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-1_PRI_2 
 0.1044s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-1_PRI_3 
 0.1055s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-1_PRI_4 
 0.1070s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-1_PRI_5 
 0.1081s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-1_PRI_6 
 0.1094s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-1_PRI_7 
 0.1107s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-1_PRI_8 
 0.1122s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-2 
 0.1134s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-2_PRI_2 
 0.1148s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-2_PRI_3 
 0.1162s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-2_PRI_4 
 0.1177s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-2_PRI_5 
 0.1190s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-2_PRI_6 
 0.1201s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-2_PRI_7 
 0.1215s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-2_PRI_8 
 0.1229s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-3 
 0.1243s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-3_PRI_2 
 0.1257s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-3_PRI_3 
 0.1269s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-3_PRI_4 
 0.1283s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-3_PRI_5 
 0.1295s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-3_PRI_6 
 0.1309s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-3_PRI_7 
 0.1328s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-3_PRI_8 
 0.1345s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-4 
 0.1360s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-4_PRI_2 
 0.1375s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-4_PRI_3 
 0.1386s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-4_PRI_4 
 0.1398s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-4_PRI_5 
 0.1411s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-4_PRI_6 
 0.1423s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-4_PRI_7 
 0.1438s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-4_PRI_8 
 0.1454s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MCU2-0 
 0.1468s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC_NF 
 0.1482s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC_LDC1 
 0.1495s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC_MSC1 
 0.1508s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC_MSC2 
 0.1520s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC_VISS1 
 0.1533s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE1 
 0.1546s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE2 
 0.1561s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE3 
 0.1577s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE4 
 0.1589s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE5 
 0.1608s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE6 
 0.1623s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE7 
 0.1638s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE8 
 0.1655s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE9 
 0.1665s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE10 
 0.1675s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE11 
 0.1691s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE12 
 0.1702s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DISPLAY1 
 0.1714s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DISPLAY2 
 0.1729s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CSITX 
 0.1746s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CSITX2 
 0.1758s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSS_M2M1 
 0.1771s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSS_M2M2 
 0.1785s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSS_M2M3 
 0.1795s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSS_M2M4 
 0.1809s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC1_FC 
 0.1823s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MCU2-1 
 0.1836s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DMPAC_SDE 
 0.1852s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DMPAC_DOF 
 0.1868s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MCU3-0 
 0.1881s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MCU3-1 
 0.1899s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MCU4-0 
 0.1911s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC2_NF 
 0.1921s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC2_LDC1 
 0.1935s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC2_MSC1 
 0.1947s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC2_MSC2 
 0.1958s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC2_VISS1 
 0.1972s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC2_FC 
 0.1988s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MCU4-1 
 0.1989s:  VX_ZONE_INFO: [tivxInit:152] Initialization Done !!!
 0.1991s:  VX_ZONE_INFO: Globally Disabled VX_ZONE_INFO
============= [Quantization & Calibration for subgraph_0 Started] =============


-------- Running Calibration in Float Mode to Collect Tensor Statistics --------
[=============================================================================] 100 %

------------------ Fixed-point Calibration Iteration [1 / 5]: ------------------
[=============================================================================] 100 %

------------------ Fixed-point Calibration Iteration [2 / 5]: ------------------
[=============================================================================] 100 %

------------------ Fixed-point Calibration Iteration [3 / 5]: ------------------
[=============================================================================] 100 %

------------------ Fixed-point Calibration Iteration [4 / 5]: ------------------
[=============================================================================] 100 %

------------------ Fixed-point Calibration Iteration [5 / 5]: ------------------
[=============================================================================] 100 %

==================== [Quantization & Calibration Completed] ====================

========================== [Memory Planning Started] ==========================


------------------------- Network Compiler Traces ------------------------------
Successful Memory Allocation
Successful Workload Creation

========================= [Memory Planning Completed] =========================

Rerunning network compiler...
========================== [Memory Planning Started] ==========================


------------------------- Network Compiler Traces ------------------------------
Successful Memory Allocation
Successful Workload Creation

========================= [Memory Planning Completed] =========================

======================== Subgraph Compiled Successfully ========================




 
Completed_Model :     1, Name : pose-ort-yoloxpose_s_lite_coco-640                , Total time :   44495.52, Offload Time :    7496.82 , DDR RW MBs : 0, Output Image File : py_out_pose-ort-yoloxpose_s_lite_coco-640_ADE_val_00001801.jpg, Output Bin File : py_out_pose-ort-yoloxpose_s_lite_coco-640_ADE_val_00001801.bin
 
 
MEM: Deinit ... !!!
MEM: Alloc's: 26 alloc's of 260410961 bytes 
MEM: Free's : 26 free's  of 260410961 bytes 
MEM: Open's : 0 allocs  of 0 bytes 
MEM: Deinit ... Done !!!

MY INFER CODE (AI GEN)

import onnxruntime as ort
import numpy as np
import cv2

MODEL_PATH = "/zzken_pose/pose-ort-yoloxpose_s_lite_coco-640/model/yoloxpose_s_lite_coco-640.onnx"
IMAGE_PATH = "/zzken_pose/data/WIN_20231105_07_09_36_Pro_frame_0001.jpg"
ARTIFACTS_DIR= "/zzken_pose/pose-ort-yoloxpose_s_lite_coco-640/artifacts"

KEYPOINT_NAMES = ['point_1', 'point_2', 'point_3', 'point_4']
SKELETON = [[0, 1], [1, 2], [2, 3], [3, 0]]

def preprocess_correct(image_path, input_shape=(640, 640)):
    """Correct preprocessing - NO normalization to [0,1]"""
    img = cv2.imread(image_path)
    orig_h, orig_w = img.shape[:2]
    
    # Resize to 640x640
    img = cv2.resize(img, input_shape)
    
    # Convert BGR to RGB
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    
    # Convert to float32 but keep in [0,255] range
    img = img.astype(np.float32)  # NO division by 255!
    
    # Convert to CHW format
    img = np.transpose(img, (2, 0, 1))
    
    # Add batch dimension
    img = np.expand_dims(img, axis=0)
    
    return img, (orig_h, orig_w)

def get_best_detection(outputs, orig_shape, min_confidence=0.5):
    """Get only the highest confidence detection"""
    predictions = outputs[0][0]  # Shape: (N, 18)
    
    orig_h, orig_w = orig_shape
    scale_x = orig_w / 640
    scale_y = orig_h / 640
    
    print(f"Original image: {orig_w} x {orig_h}")
    print(f"Scale factors: x={scale_x:.3f}, y={scale_y:.3f}")
    print(f"Analyzing {len(predictions)} predictions for best detection...")
    
    best_detection = None
    best_confidence = 0.0
    
    for i, pred in enumerate(predictions):
        # Parse: [x1, y1, x2, y2, obj_conf, cls_conf, 4×(x,y,conf)]
        x1, y1, x2, y2, obj_conf, cls_conf = pred[:6]
        
        # Skip low confidence detections
        if obj_conf < min_confidence:
            continue
        
        # Check if this is the best so far
        if obj_conf > best_confidence:
            best_confidence = obj_conf
            
            # Extract keypoints
            keypoints_data = pred[6:]
            keypoints = keypoints_data.reshape(4, 3)  # 4 points × (x,y,conf)
            
            # Scale to original image size
            scaled_bbox = [x1 * scale_x, y1 * scale_y, x2 * scale_x, y2 * scale_y]
            scaled_keypoints = keypoints.copy()
            scaled_keypoints[:, 0] *= scale_x  # Scale x coordinates
            scaled_keypoints[:, 1] *= scale_y  # Scale y coordinates
            
            best_detection = {
                'bbox': scaled_bbox,
                'obj_conf': obj_conf,
                'cls_conf': cls_conf,
                'keypoints': scaled_keypoints,
                'raw_bbox': [x1, y1, x2, y2],
                'raw_keypoints': keypoints
            }
            
            print(f"  New best detection #{i+1} with confidence {obj_conf:.3f}")
    
    if best_detection:
        print(f"\n🏆 BEST DETECTION FOUND:")
        print(f"  Confidence: {best_detection['obj_conf']:.3f}")
        print(f"  Raw bbox (640x640): [{best_detection['raw_bbox'][0]:.1f}, {best_detection['raw_bbox'][1]:.1f}, {best_detection['raw_bbox'][2]:.1f}, {best_detection['raw_bbox'][3]:.1f}]")
        print(f"  Scaled bbox (original): [{best_detection['bbox'][0]:.1f}, {best_detection['bbox'][1]:.1f}, {best_detection['bbox'][2]:.1f}, {best_detection['bbox'][3]:.1f}]")
        
        print(f"  Raw keypoints (640x640 coords):")
        for j, (x, y, conf) in enumerate(best_detection['raw_keypoints']):
            print(f"    {KEYPOINT_NAMES[j]}: ({x:.1f}, {y:.1f}) conf={conf:.3f}")
        
        print(f"  Scaled keypoints (original image coords):")
        for j, (x, y, conf) in enumerate(best_detection['keypoints']):
            print(f"    {KEYPOINT_NAMES[j]}: ({x:.1f}, {y:.1f}) conf={conf:.3f}")
    
    return best_detection

def visualize_best_detection(image_path, detection, output_path="best_detection.jpg"):
    """Visualize only the best detection"""
    img = cv2.imread(image_path)
    
    print(f"\nVisualizing best detection on {img.shape[1]}x{img.shape[0]} image")
    
    # Use bright colors for the best detection
    bbox_color = (0, 255, 0)  # Bright green
    point_colors = [
        (255, 0, 0),    # Blue for point 1
        (0, 255, 255),  # Yellow for point 2  
        (255, 0, 255),  # Magenta for point 3
        (0, 165, 255)   # Orange for point 4
    ]
    
    # Draw bounding box
    bbox = detection['bbox']
    x1, y1, x2, y2 = map(int, bbox)
    cv2.rectangle(img, (x1, y1), (x2, y2), bbox_color, 4)
    
    print(f"Drawing bbox: ({x1}, {y1}) to ({x2}, {y2})")
    
    # Draw keypoints with individual colors
    keypoints = detection['keypoints']
    valid_points = []
    
    for j, (x, y, conf) in enumerate(keypoints):
        if conf > 0.3:
            x, y = int(x), int(y)
            color = point_colors[j % len(point_colors)]
            
            # Draw point with individual color
            cv2.circle(img, (x, y), 15, color, -1)
            cv2.circle(img, (x, y), 18, (255, 255, 255), 3)  # White border
            
            # Add point label
            cv2.putText(img, f'{KEYPOINT_NAMES[j]}', (x+25, y-25), 
                       cv2.FONT_HERSHEY_SIMPLEX, 1.2, color, 3)
            cv2.putText(img, f'{j+1}', (x-10, y+10), 
                       cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255, 255, 255), 3)
            
            # Add confidence
            cv2.putText(img, f'{conf:.2f}', (x+25, y+5), 
                       cv2.FONT_HERSHEY_SIMPLEX, 0.8, color, 2)
            
            valid_points.append((j, x, y, conf))
            print(f"  Point {j+1} ({KEYPOINT_NAMES[j]}): ({x}, {y}) conf={conf:.3f}")
    
    # Draw skeleton connections
    for connection in SKELETON:
        kpt1_idx, kpt2_idx = connection[0], connection[1]
        if (kpt1_idx < len(keypoints) and kpt2_idx < len(keypoints) and 
            keypoints[kpt1_idx][2] > 0.3 and keypoints[kpt2_idx][2] > 0.3):
            
            x1_line, y1_line = int(keypoints[kpt1_idx][0]), int(keypoints[kpt1_idx][1])
            x2_line, y2_line = int(keypoints[kpt2_idx][0]), int(keypoints[kpt2_idx][1])
            
            # Use gradient color for skeleton
            cv2.line(img, (x1_line, y1_line), (x2_line, y2_line), (0, 255, 255), 5)
            print(f"  Connection: Point {kpt1_idx+1} to Point {kpt2_idx+1}")
    
    # Add title with confidence
    cv2.putText(img, f'BEST DETECTION - Confidence: {detection["obj_conf"]:.3f}', 
               (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 255, 0), 4)
    
    # Add summary text
    cv2.putText(img, f'Valid keypoints: {len(valid_points)}/4', 
               (50, 100), cv2.FONT_HERSHEY_SIMPLEX, 1.2, (255, 255, 255), 3)
    
    cv2.imwrite(output_path, img)
    print(f"\n✅ Best detection visualization saved as {output_path}")
    
    return len(valid_points)

def create_clean_visualization(image_path, detection, output_path="best_detection_clean.jpg"):
    """Create a clean visualization with just the keypoints and connections"""
    img = cv2.imread(image_path)
    
    # Draw only keypoints and skeleton (no bbox, minimal text)
    point_colors = [
        (255, 0, 0),    # Blue for point 1
        (0, 255, 255),  # Yellow for point 2  
        (255, 0, 255),  # Magenta for point 3
        (0, 165, 255)   # Orange for point 4
    ]
    
    keypoints = detection['keypoints']
    
    # Draw skeleton first (behind points)
    for connection in SKELETON:
        kpt1_idx, kpt2_idx = connection[0], connection[1]
        if (kpt1_idx < len(keypoints) and kpt2_idx < len(keypoints) and 
            keypoints[kpt1_idx][2] > 0.3 and keypoints[kpt2_idx][2] > 0.3):
            
            x1_line, y1_line = int(keypoints[kpt1_idx][0]), int(keypoints[kpt1_idx][1])
            x2_line, y2_line = int(keypoints[kpt2_idx][0]), int(keypoints[kpt2_idx][1])
            
            cv2.line(img, (x1_line, y1_line), (x2_line, y2_line), (0, 255, 255), 6)
    
    # Draw keypoints on top
    for j, (x, y, conf) in enumerate(keypoints):
        if conf > 0.3:
            x, y = int(x), int(y)
            color = point_colors[j % len(point_colors)]
            
            cv2.circle(img, (x, y), 20, color, -1)
            cv2.circle(img, (x, y), 23, (255, 255, 255), 4)
            
            # Just the point number
            cv2.putText(img, f'{j+1}', (x-12, y+12), 
                       cv2.FONT_HERSHEY_SIMPLEX, 1.2, (255, 255, 255), 4)
    
    cv2.imwrite(output_path, img)
    print(f"Clean visualization saved as {output_path}")

def main():
    print("=== FINDING BEST POSE DETECTION ===")
    
    so = ort.SessionOptions()
    runtime_options = {
        "artifacts_folder": ARTIFACTS_DIR,
    }
    desired_eps = ['TIDLExecutionProvider', 'CPUExecutionProvider']
    # desired_eps = ['CPUExecutionProvider']
    # Load model
    session = ort.InferenceSession(
        MODEL_PATH,
        providers=desired_eps,
        provider_options=[runtime_options, {}],  # One dict per provider
        sess_options=so
    )
    input_name = session.get_inputs()[0].name
    
    # Use corrected preprocessing
    input_tensor, orig_shape = preprocess_correct(IMAGE_PATH)
    
    # Run inference
    outputs = session.run(None, {input_name: input_tensor})

    # Get only the best detection
    best_detection = get_best_detection(outputs, orig_shape, min_confidence=0.5)
    
    if best_detection:
        print(f"\n{'='*60}")
        print(f"SUCCESS! Found best detection with {best_detection['obj_conf']:.3f} confidence")
        print(f"{'='*60}")
        
        # Create detailed visualization
        valid_points = visualize_best_detection(IMAGE_PATH, best_detection, "best_detection_detailed.jpg")
        
        # Create clean visualization
        create_clean_visualization(IMAGE_PATH, best_detection, "best_detection_clean.jpg")
        
        print(f"\n📊 SUMMARY:")
        print(f"  • Best confidence: {best_detection['obj_conf']:.3f}")
        print(f"  • Valid keypoints: {valid_points}/4")
        print(f"  • Bounding box area: {(best_detection['bbox'][2]-best_detection['bbox'][0]) * (best_detection['bbox'][3]-best_detection['bbox'][1]):.0f} pixels")
        
        return best_detection
        
    else:
        print("❌ No high-confidence detections found. Try lowering min_confidence.")
        return None

if __name__ == "__main__":
    best = main()

o/p

=== FINDING BEST POSE DETECTION ===
libtidl_onnxrt_EP loaded 0x76daa80 
Final number of subgraphs created are : 1, - Offloaded Nodes - 327, Total Nodes - 327 
APP: Init ... !!!
 12328.534051 s: MEM: Init ... !!!
 12328.539472 s: MEM: Initialized DMA HEAP (fd=5) !!!
 12328.539643 s: MEM: Init ... Done !!!
 12328.539695 s: IPC: Init ... !!!
 12329.079910 s: IPC: Init ... Done !!!
REMOTE_SERVICE: Init ... !!!
REMOTE_SERVICE: Init ... Done !!!
 12329.632986 s: GTC Frequency = 200 MHz
APP: Init ... Done !!!
 12329.633145 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_ERROR
 12329.633176 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_WARNING
 12329.633200 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_INFO
 12329.633905 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-0 
 12329.634061 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-1 
 12329.634183 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-2 
 12329.634293 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-3 
 12329.634327 s:  VX_ZONE_INFO: [tivxInitLocal:202] Initialization Done !!!
 12329.634356 s:  VX_ZONE_INFO: Globally Disabled VX_ZONE_INFO
Original image: 1920 x 1080
Scale factors: x=3.000, y=1.688
Analyzing 200 predictions for best detection...
  New best detection #1 with confidence 0.902

🏆 BEST DETECTION FOUND:
  Confidence: 0.902
  Raw bbox (640x640): [288.0, 55.1, 410.6, 99.4]
  Scaled bbox (original): [863.9, 93.0, 1231.8, 167.8]
  Raw keypoints (640x640 coords):
    point_1: (414.6, 39.4) conf=0.073
    point_2: (311.4, 229.5) conf=0.700
    point_3: (164.7, 99.2) conf=0.927
    point_4: (420.1, 158.9) conf=0.938
  Scaled keypoints (original image coords):
    point_1: (1243.9, 66.5) conf=0.073
    point_2: (934.2, 387.3) conf=0.700
    point_3: (494.2, 167.3) conf=0.927
    point_4: (1260.2, 268.2) conf=0.938

============================================================
SUCCESS! Found best detection with 0.902 confidence
============================================================

Visualizing best detection on 1920x1080 image
Drawing bbox: (863, 93) to (1231, 167)
  Point 2 (point_2): (934, 387) conf=0.700
  Point 3 (point_3): (494, 167) conf=0.927
  Point 4 (point_4): (1260, 268) conf=0.938
  Connection: Point 2 to Point 3
  Connection: Point 3 to Point 4

✅ Best detection visualization saved as best_detection_detailed.jpg
Clean visualization saved as best_detection_clean.jpg

📊 SUMMARY:
  • Best confidence: 0.902
  • Valid keypoints: 3/4
  • Bounding box area: 27506 pixels
APP: Deinit ... !!!
REMOTE_SERVICE: Deinit ... !!!
REMOTE_SERVICE: Deinit ... Done !!!
 12330.608466 s: IPC: Deinit ... !!!
 12331.175421 s: IPC: DeInit ... Done !!!
 12331.175493 s: MEM: Deinit ... !!!
 12331.175514 s: DDR_SHARED_MEM: Alloc's: 7 alloc's of 19749280 bytes 
 12331.175527 s: DDR_SHARED_MEM: Free's : 7 free's  of 19749280 bytes 
 12331.175538 s: DDR_SHARED_MEM: Open's : 0 allocs  of 0 bytes 
 12331.175557 s: MEM: Deinit ... Done !!!
APP: Deinit ... Done !!!
root@am69-sk:/zzken_pose#

Regards,

venkat

0 venk at 1 month ago in reply to venk at

Prodigy 45 points

Comparing CPU and TIDL outputs...
Output 0 shape mismatch: CPU (1, 39, 18) vs TIDL (1, 200, 18)

0 Chris Tsongas 29 days ago in reply to venk at

TI__Genius 14420 points

Hi Venkat,

I see the issue, but without the model or an example model, I cannot debug this. One thought, again without the model, this is just conjecture, is that the middle tensor could be of a different data type? For example, if the middle output tensor is interpreted as int8 and it is a float (or some other data type), this could make a size difference.

The model would help here. It does not need to be trained as long as the output shapes are the same. It could also be a subset of an untrained model that is exhibiting this behavior.

Regards,

Chris

0 venk at 29 days ago in reply to Chris Tsongas

Prodigy 45 points

Hi Chris,

I switched to 16-bit and added around four layers to the deny list — it's working now! The issue seemed to be related to INT8 and a few specific layers that were causing problems.

Thanks a lot for your help. Going forward, I’ll make sure to include the model in any questions I raise.

Regards,
Venkat

Processors

Processors forum

SK-AM69: MODEL ACCURACY LOW WHILE USING TIDLEXECUTIONPROVIDER in onnxruntime