SK-AM69: MODEL ACCURACY LOW WHILE USING TIDLEXECUTIONPROVIDER in onnxruntime

Part Number: SK-AM69
Other Parts Discussed in Thread: AM69A

Tool/software:

WHILE RUNNING INFERENCE ON CPUEXECUTIONPROVIDER KEYPOINT AND BBOX ARE CORRECT BUT WHEN I USE TIDLEXECUTIONPROVIDER THE OUTPUTS ARE INCORRECT IE INCORRECT DETECTION LOW ACCURACY  

PLS HELP I FEEL THE ISSUE WITH MODEL COMPILE AND ARTIFACTS GENERATION 

CODE I USED TO COMPILE AND GENERATE ARTIFACTS


import
os
import sys
import shutil

import onnxruntime as rt
import onnx
import onnx.shape_inference
import numpy as np
from PIL import Image

os.environ["TIDL_RT_PERFSTATS"] = "1"

if __name__ == "__main__":
_, model_path, calibration_images_path, out_dir_path = sys.argv

tidl_tools_path = os.environ["TIDL_TOOLS_PATH"]

os.makedirs(out_dir_path, exist_ok=True)

out_model_name = os.path.splitext(os.path.basename(model_path))[0] + "_with_shapes.onnx"
out_model_path = os.path.join(out_dir_path, out_model_name)
onnx.shape_inference.infer_shapes_path(model_path, out_model_path)

artifacts_dir = os.path.join(out_dir_path, "tidl_output")
try:
shutil.rmtree(artifacts_dir)
except FileNotFoundError:
pass

os.makedirs(artifacts_dir, exist_ok=False)

so = rt.SessionOptions()
print("Available execution providers : ", rt.get_available_providers())

calibration_images = [ os.path.join(calibration_images_path, name) for name in os.listdir(calibration_images_path) ]

num_calibration_frames = len(calibration_images)
num_calibration_iterations = 10 
compilation_options = {
"platform": "AM69A",

"tidl_tools_path": tidl_tools_path,
"artifacts_folder": artifacts_dir,

"tensor_bits": 8,
"model_type": "OD",
'object_detection:meta_arch_type': 6,
'object_detection:meta_layers_names_list': os.path.splitext(model_path)[0] + ".prototxt",

"debug_level": 300,

"advanced_options:calibration_frames": num_calibration_frames,
"advanced_options:calibration_iterations": num_calibration_iterations,
}

desired_eps = ['TIDLCompilationProvider','CPUExecutionProvider']
sess = rt.InferenceSession(
out_model_path,
providers=desired_eps,
provider_options=[compilation_options, {}],
sess_options=so
)

input_details, = sess.get_inputs()
batch_size, channel, height, width = input_details.shape
print(f"Input shape: {input_details.shape}")

assert isinstance(batch_size, str) or batch_size == 1
assert channel == 3

input_name = input_details.name
input_type = input_details.type

print(f'Input "{input_name}": {input_type}')

assert input_type == 'tensor(float)'

for image_path in calibration_images:
img = Image.open(image_path).convert("RGB").resize((640, 640))
input_data = np.asarray(img).astype(np.float32).transpose((2, 0, 1))
input_data = np.expand_dims(input_data, 0)

sess.run(None, {input_name: input_data})


MODEL IS yoloxpose_s_8xb32-300e_coco-640 TRAINED ON CUSTOM DATA WITH 4 KEYPOINTS 

name: "yolox"
tidl_yolo {
yolo_param {
input: "276"
anchor_width: 8.0
anchor_height: 8.0
}
yolo_param {
input: "308"
anchor_width: 16.0
anchor_height: 16.0
}
yolo_param {
input: "339"
anchor_width: 32.0
anchor_height: 32.0
}
detection_output_param {
num_classes: 1
share_location: true
background_label_id: -1
nms_param {
nms_threshold: 0.45
top_k: 200
}
code_type: CODE_TYPE_YOLO_X
keep_top_k: 200
confidence_threshold: 0.3
num_keypoint: 4
keypoint_confidence: true
}
name: "yolox"
in_width: 640
in_height: 640
output: "dets"
output: "labels"
}


DEPLOY CONFIG 

_base_ = ['./pose-detection_static.py', '../_base_/backends/onnxruntime.py']

onnx_config = dict(
output_names=['detections'],

codebase_config = dict(
post_processing=dict(
score_threshold=0.05,
iou_threshold=0.5,
max_output_boxes_per_class=200,
pre_top_k=5000,
keep_top_k=100,
background_label_id=-1,
))


  • HI Venk,

    A couple of questions. 

    1. Can you please send me your model to try out? 

    2. Why are you starting out with the "complicated approach"?  

    It is easiest to compile and run a model with OSRT first.  Basically:

    1. Go to examples/osrt_python/ort

    2. Edit ../modle_configs.py and add your model in the ONNX section

    3. python3 ./onnxrt_ep.py -c -m <model_name_you_set_in_model_configs>

    Regards,

    Chris

  • Thanks,
    chris for the quick reply,i tried the way u suggested 
    still same issue 
    points are not correct when i use TIDLEXECUTIONPROVIDER i feel the issue is with my config


    1. Can you please send me your model to try out? 
    ans:since model i trained on medical data i have to take permission from my compliance team, i will train on a different data and share, i am working on FDA class 3, class 2 medical device so,

    2. Why are you starting out with the "complicated approach"?  
    ans:i felt this as a easy way to do it :(


    is this correct config (still same issue accuracy is low when tidlexecutionprovider)
    note: i didn't make any edit on common_utils.py

        "pose-ort-yoloxpose_s_lite_coco-640": create_model_config(
            task_type="detection",
            source=dict(
                model_url="",
                meta_arch_url="",
                infer_shape=True,
            ),
            preprocess=dict(
                resize=640,
                crop=640,
                data_layout="NCHW",
                pad_color=[114, 114, 114],
                resize_with_pad=[True, "corner"],
                reverse_channels=True,
            ),
            session=dict(
                session_name="onnxrt",
                model_path=os.path.join("/root/ti2/edgeai-tidl-tools/my_model/onnx_model", "yoloxpose_s_lite_coco-640.onnx"),
                meta_layers_names_list=os.path.join("/root/ti2/edgeai-tidl-tools/my_model/onnx_model", "yoloxpose_s_lite_coco-640.prototxt"),
                meta_arch_type=6,
                input_mean=[0, 0, 0],
                input_scale=[1, 1, 1],
                input_optimization=True,
            ),
            postprocess=dict(
                formatter="DetectionBoxSL2BoxLS",
                resize_with_pad=True,
                keypoint=True,
                object6dpose=False,
                normalized_detections=False,
                shuffle_indices=None,
                squeeze_axis=None,
                reshape_list=[(-1, 18)],  # 18 values per detection: bbox(4) + obj_conf(1) + cls_conf(1) + keypoints(4×3=12)
                ignore_index=None,
            ),
            extra_info=dict(
                od_type="yoloxpose",
                framework="MMPose",
                num_images=numImages,
                num_classes=1,
                label_offset_type="1to1",
                label_offset=0,
            ),
        ),

    root@539aa78ad277:~/ti2/edgeai-tidl-tools/examples/osrt_python/ort# python3 ./onnxrt_ep.py -c -m pose-ort-yoloxpose_s_lite_coco-640
    Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    
    Running 1 Models - ['pose-ort-yoloxpose_s_lite_coco-640']
    
    
    Running_Model :  pose-ort-yoloxpose_s_lite_coco-640  
    
    
    Running shape inference on model /root/ti2/edgeai-tidl-tools/kenny_pos/onnx_model/yoloxpose_s_lite_coco-640.onnx 
    
    ========================= [Model Compilation Started] =========================
    
    Model compilation will perform the following stages:
    1. Parsing
    2. Graph Optimization
    3. Quantization & Calibration
    4. Memory Planning
    
    ============================== [Version Summary] ==============================
    
    -------------------------------------------------------------------------------
    |          TIDL Tools Version          |              11_00_06_00             |
    -------------------------------------------------------------------------------
    |         C7x Firmware Version         |              11_00_00_00             |
    -------------------------------------------------------------------------------
    |            Runtime Version           |                1.15.0                |
    -------------------------------------------------------------------------------
    |          Model Opset Version         |                  17                  |
    -------------------------------------------------------------------------------
    
    ============================== [Parsing Started] ==============================
    
    yolox is meta arch name 
    yolox
    Number of OD backbone nodes = 219 
    
    ------------------------- Subgraph Information Summary -------------------------
    -------------------------------------------------------------------------------
    |          Core           |      No. of Nodes       |   Number of Subgraphs   |
    -------------------------------------------------------------------------------
    | C7x                     |                     327 |                       1 |
    | CPU                     |                       0 |                       x |
    -------------------------------------------------------------------------------
    ============================= [Parsing Completed] =============================
    
    TIDL Meta pipeLine (proto) file  : /root/ti2/edgeai-tidl-tools/kenny_pos/onnx_model/yoloxpose_s_lite_coco-640.prototxt  
    yolox
    yolox
    ==================== [Optimization for subgraph_0 Started] ====================
    
    [TIDL Import] [PARSER] WARNING: Requested output data convert layer is not added to the network, It is currently not optimal
    ----------------------------- Optimization Summary -----------------------------
    -------------------------------------------------------------------------------------
    |            Layer           | Nodes before optimization | Nodes after optimization |
    -------------------------------------------------------------------------------------
    | TIDL_OdOutputReformatLayer |                         0 |                        1 |
    | TIDL_DetectionOutputLayer  |                         0 |                        1 |
    | TIDL_EltWiseLayer          |                         7 |                        7 |
    | TIDL_ConcatLayer           |                        16 |                       16 |
    | TIDL_ReLULayer             |                        86 |                        0 |
    | TIDL_ResizeLayer           |                         2 |                        2 |
    | TIDL_ConvolutionLayer      |                       102 |                      102 |
    | TIDL_PoolingLayer          |                         6 |                        6 |
    -------------------------------------------------------------------------------------
    
    Total nodes in subgraph: 138
    
    =================== [Optimization for subgraph_0 Completed] ===================
    
    The soft limit is 10240
    The hard limit is 10240
    MEM: Init ... !!!
    MEM: Init ... Done !!!
     0.0s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_INFO
     0.4s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_ERROR
     0.5s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_WARNING
     0.84s:  VX_ZONE_INFO: [ownAddTargetKernelInternal:189] registered kernel vx_tutorial_graph.phase_rgb on target DSP_C7-2
     0.939s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-0 
     0.961s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-1 
     0.983s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-2 
     0.996s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-3 
     0.1015s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-1 
     0.1028s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-1_PRI_2 
     0.1044s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-1_PRI_3 
     0.1055s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-1_PRI_4 
     0.1070s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-1_PRI_5 
     0.1081s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-1_PRI_6 
     0.1094s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-1_PRI_7 
     0.1107s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-1_PRI_8 
     0.1122s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-2 
     0.1134s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-2_PRI_2 
     0.1148s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-2_PRI_3 
     0.1162s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-2_PRI_4 
     0.1177s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-2_PRI_5 
     0.1190s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-2_PRI_6 
     0.1201s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-2_PRI_7 
     0.1215s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-2_PRI_8 
     0.1229s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-3 
     0.1243s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-3_PRI_2 
     0.1257s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-3_PRI_3 
     0.1269s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-3_PRI_4 
     0.1283s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-3_PRI_5 
     0.1295s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-3_PRI_6 
     0.1309s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-3_PRI_7 
     0.1328s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-3_PRI_8 
     0.1345s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-4 
     0.1360s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-4_PRI_2 
     0.1375s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-4_PRI_3 
     0.1386s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-4_PRI_4 
     0.1398s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-4_PRI_5 
     0.1411s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-4_PRI_6 
     0.1423s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-4_PRI_7 
     0.1438s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSP_C7-4_PRI_8 
     0.1454s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MCU2-0 
     0.1468s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC_NF 
     0.1482s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC_LDC1 
     0.1495s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC_MSC1 
     0.1508s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC_MSC2 
     0.1520s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC_VISS1 
     0.1533s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE1 
     0.1546s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE2 
     0.1561s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE3 
     0.1577s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE4 
     0.1589s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE5 
     0.1608s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE6 
     0.1623s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE7 
     0.1638s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE8 
     0.1655s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE9 
     0.1665s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE10 
     0.1675s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE11 
     0.1691s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CAPTURE12 
     0.1702s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DISPLAY1 
     0.1714s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DISPLAY2 
     0.1729s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CSITX 
     0.1746s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target CSITX2 
     0.1758s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSS_M2M1 
     0.1771s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSS_M2M2 
     0.1785s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSS_M2M3 
     0.1795s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DSS_M2M4 
     0.1809s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC1_FC 
     0.1823s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MCU2-1 
     0.1836s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DMPAC_SDE 
     0.1852s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target DMPAC_DOF 
     0.1868s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MCU3-0 
     0.1881s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MCU3-1 
     0.1899s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MCU4-0 
     0.1911s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC2_NF 
     0.1921s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC2_LDC1 
     0.1935s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC2_MSC1 
     0.1947s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC2_MSC2 
     0.1958s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC2_VISS1 
     0.1972s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target VPAC2_FC 
     0.1988s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MCU4-1 
     0.1989s:  VX_ZONE_INFO: [tivxInit:152] Initialization Done !!!
     0.1991s:  VX_ZONE_INFO: Globally Disabled VX_ZONE_INFO
    ============= [Quantization & Calibration for subgraph_0 Started] =============
    
    
    -------- Running Calibration in Float Mode to Collect Tensor Statistics --------
    [=============================================================================] 100 %
    
    ------------------ Fixed-point Calibration Iteration [1 / 5]: ------------------
    [=============================================================================] 100 %
    
    ------------------ Fixed-point Calibration Iteration [2 / 5]: ------------------
    [=============================================================================] 100 %
    
    ------------------ Fixed-point Calibration Iteration [3 / 5]: ------------------
    [=============================================================================] 100 %
    
    ------------------ Fixed-point Calibration Iteration [4 / 5]: ------------------
    [=============================================================================] 100 %
    
    ------------------ Fixed-point Calibration Iteration [5 / 5]: ------------------
    [=============================================================================] 100 %
    
    ==================== [Quantization & Calibration Completed] ====================
    
    ========================== [Memory Planning Started] ==========================
    
    
    ------------------------- Network Compiler Traces ------------------------------
    Successful Memory Allocation
    Successful Workload Creation
    
    ========================= [Memory Planning Completed] =========================
    
    Rerunning network compiler...
    ========================== [Memory Planning Started] ==========================
    
    
    ------------------------- Network Compiler Traces ------------------------------
    Successful Memory Allocation
    Successful Workload Creation
    
    ========================= [Memory Planning Completed] =========================
    
    ======================== Subgraph Compiled Successfully ========================
    
    
    
    
     
    Completed_Model :     1, Name : pose-ort-yoloxpose_s_lite_coco-640                , Total time :   44495.52, Offload Time :    7496.82 , DDR RW MBs : 0, Output Image File : py_out_pose-ort-yoloxpose_s_lite_coco-640_ADE_val_00001801.jpg, Output Bin File : py_out_pose-ort-yoloxpose_s_lite_coco-640_ADE_val_00001801.bin
     
     
    MEM: Deinit ... !!!
    MEM: Alloc's: 26 alloc's of 260410961 bytes 
    MEM: Free's : 26 free's  of 260410961 bytes 
    MEM: Open's : 0 allocs  of 0 bytes 
    MEM: Deinit ... Done !!!


    MY INFER CODE (AI GEN)

    import onnxruntime as ort
    import numpy as np
    import cv2
    
    MODEL_PATH = "/zzken_pose/pose-ort-yoloxpose_s_lite_coco-640/model/yoloxpose_s_lite_coco-640.onnx"
    IMAGE_PATH = "/zzken_pose/data/WIN_20231105_07_09_36_Pro_frame_0001.jpg"
    ARTIFACTS_DIR= "/zzken_pose/pose-ort-yoloxpose_s_lite_coco-640/artifacts"
    
    KEYPOINT_NAMES = ['point_1', 'point_2', 'point_3', 'point_4']
    SKELETON = [[0, 1], [1, 2], [2, 3], [3, 0]]
    
    def preprocess_correct(image_path, input_shape=(640, 640)):
        """Correct preprocessing - NO normalization to [0,1]"""
        img = cv2.imread(image_path)
        orig_h, orig_w = img.shape[:2]
        
        # Resize to 640x640
        img = cv2.resize(img, input_shape)
        
        # Convert BGR to RGB
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        
        # Convert to float32 but keep in [0,255] range
        img = img.astype(np.float32)  # NO division by 255!
        
        # Convert to CHW format
        img = np.transpose(img, (2, 0, 1))
        
        # Add batch dimension
        img = np.expand_dims(img, axis=0)
        
        return img, (orig_h, orig_w)
    
    def get_best_detection(outputs, orig_shape, min_confidence=0.5):
        """Get only the highest confidence detection"""
        predictions = outputs[0][0]  # Shape: (N, 18)
        
        orig_h, orig_w = orig_shape
        scale_x = orig_w / 640
        scale_y = orig_h / 640
        
        print(f"Original image: {orig_w} x {orig_h}")
        print(f"Scale factors: x={scale_x:.3f}, y={scale_y:.3f}")
        print(f"Analyzing {len(predictions)} predictions for best detection...")
        
        best_detection = None
        best_confidence = 0.0
        
        for i, pred in enumerate(predictions):
            # Parse: [x1, y1, x2, y2, obj_conf, cls_conf, 4×(x,y,conf)]
            x1, y1, x2, y2, obj_conf, cls_conf = pred[:6]
            
            # Skip low confidence detections
            if obj_conf < min_confidence:
                continue
            
            # Check if this is the best so far
            if obj_conf > best_confidence:
                best_confidence = obj_conf
                
                # Extract keypoints
                keypoints_data = pred[6:]
                keypoints = keypoints_data.reshape(4, 3)  # 4 points × (x,y,conf)
                
                # Scale to original image size
                scaled_bbox = [x1 * scale_x, y1 * scale_y, x2 * scale_x, y2 * scale_y]
                scaled_keypoints = keypoints.copy()
                scaled_keypoints[:, 0] *= scale_x  # Scale x coordinates
                scaled_keypoints[:, 1] *= scale_y  # Scale y coordinates
                
                best_detection = {
                    'bbox': scaled_bbox,
                    'obj_conf': obj_conf,
                    'cls_conf': cls_conf,
                    'keypoints': scaled_keypoints,
                    'raw_bbox': [x1, y1, x2, y2],
                    'raw_keypoints': keypoints
                }
                
                print(f"  New best detection #{i+1} with confidence {obj_conf:.3f}")
        
        if best_detection:
            print(f"\n🏆 BEST DETECTION FOUND:")
            print(f"  Confidence: {best_detection['obj_conf']:.3f}")
            print(f"  Raw bbox (640x640): [{best_detection['raw_bbox'][0]:.1f}, {best_detection['raw_bbox'][1]:.1f}, {best_detection['raw_bbox'][2]:.1f}, {best_detection['raw_bbox'][3]:.1f}]")
            print(f"  Scaled bbox (original): [{best_detection['bbox'][0]:.1f}, {best_detection['bbox'][1]:.1f}, {best_detection['bbox'][2]:.1f}, {best_detection['bbox'][3]:.1f}]")
            
            print(f"  Raw keypoints (640x640 coords):")
            for j, (x, y, conf) in enumerate(best_detection['raw_keypoints']):
                print(f"    {KEYPOINT_NAMES[j]}: ({x:.1f}, {y:.1f}) conf={conf:.3f}")
            
            print(f"  Scaled keypoints (original image coords):")
            for j, (x, y, conf) in enumerate(best_detection['keypoints']):
                print(f"    {KEYPOINT_NAMES[j]}: ({x:.1f}, {y:.1f}) conf={conf:.3f}")
        
        return best_detection
    
    def visualize_best_detection(image_path, detection, output_path="best_detection.jpg"):
        """Visualize only the best detection"""
        img = cv2.imread(image_path)
        
        print(f"\nVisualizing best detection on {img.shape[1]}x{img.shape[0]} image")
        
        # Use bright colors for the best detection
        bbox_color = (0, 255, 0)  # Bright green
        point_colors = [
            (255, 0, 0),    # Blue for point 1
            (0, 255, 255),  # Yellow for point 2  
            (255, 0, 255),  # Magenta for point 3
            (0, 165, 255)   # Orange for point 4
        ]
        
        # Draw bounding box
        bbox = detection['bbox']
        x1, y1, x2, y2 = map(int, bbox)
        cv2.rectangle(img, (x1, y1), (x2, y2), bbox_color, 4)
        
        print(f"Drawing bbox: ({x1}, {y1}) to ({x2}, {y2})")
        
        # Draw keypoints with individual colors
        keypoints = detection['keypoints']
        valid_points = []
        
        for j, (x, y, conf) in enumerate(keypoints):
            if conf > 0.3:
                x, y = int(x), int(y)
                color = point_colors[j % len(point_colors)]
                
                # Draw point with individual color
                cv2.circle(img, (x, y), 15, color, -1)
                cv2.circle(img, (x, y), 18, (255, 255, 255), 3)  # White border
                
                # Add point label
                cv2.putText(img, f'{KEYPOINT_NAMES[j]}', (x+25, y-25), 
                           cv2.FONT_HERSHEY_SIMPLEX, 1.2, color, 3)
                cv2.putText(img, f'{j+1}', (x-10, y+10), 
                           cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255, 255, 255), 3)
                
                # Add confidence
                cv2.putText(img, f'{conf:.2f}', (x+25, y+5), 
                           cv2.FONT_HERSHEY_SIMPLEX, 0.8, color, 2)
                
                valid_points.append((j, x, y, conf))
                print(f"  Point {j+1} ({KEYPOINT_NAMES[j]}): ({x}, {y}) conf={conf:.3f}")
        
        # Draw skeleton connections
        for connection in SKELETON:
            kpt1_idx, kpt2_idx = connection[0], connection[1]
            if (kpt1_idx < len(keypoints) and kpt2_idx < len(keypoints) and 
                keypoints[kpt1_idx][2] > 0.3 and keypoints[kpt2_idx][2] > 0.3):
                
                x1_line, y1_line = int(keypoints[kpt1_idx][0]), int(keypoints[kpt1_idx][1])
                x2_line, y2_line = int(keypoints[kpt2_idx][0]), int(keypoints[kpt2_idx][1])
                
                # Use gradient color for skeleton
                cv2.line(img, (x1_line, y1_line), (x2_line, y2_line), (0, 255, 255), 5)
                print(f"  Connection: Point {kpt1_idx+1} to Point {kpt2_idx+1}")
        
        # Add title with confidence
        cv2.putText(img, f'BEST DETECTION - Confidence: {detection["obj_conf"]:.3f}', 
                   (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 255, 0), 4)
        
        # Add summary text
        cv2.putText(img, f'Valid keypoints: {len(valid_points)}/4', 
                   (50, 100), cv2.FONT_HERSHEY_SIMPLEX, 1.2, (255, 255, 255), 3)
        
        cv2.imwrite(output_path, img)
        print(f"\n✅ Best detection visualization saved as {output_path}")
        
        return len(valid_points)
    
    def create_clean_visualization(image_path, detection, output_path="best_detection_clean.jpg"):
        """Create a clean visualization with just the keypoints and connections"""
        img = cv2.imread(image_path)
        
        # Draw only keypoints and skeleton (no bbox, minimal text)
        point_colors = [
            (255, 0, 0),    # Blue for point 1
            (0, 255, 255),  # Yellow for point 2  
            (255, 0, 255),  # Magenta for point 3
            (0, 165, 255)   # Orange for point 4
        ]
        
        keypoints = detection['keypoints']
        
        # Draw skeleton first (behind points)
        for connection in SKELETON:
            kpt1_idx, kpt2_idx = connection[0], connection[1]
            if (kpt1_idx < len(keypoints) and kpt2_idx < len(keypoints) and 
                keypoints[kpt1_idx][2] > 0.3 and keypoints[kpt2_idx][2] > 0.3):
                
                x1_line, y1_line = int(keypoints[kpt1_idx][0]), int(keypoints[kpt1_idx][1])
                x2_line, y2_line = int(keypoints[kpt2_idx][0]), int(keypoints[kpt2_idx][1])
                
                cv2.line(img, (x1_line, y1_line), (x2_line, y2_line), (0, 255, 255), 6)
        
        # Draw keypoints on top
        for j, (x, y, conf) in enumerate(keypoints):
            if conf > 0.3:
                x, y = int(x), int(y)
                color = point_colors[j % len(point_colors)]
                
                cv2.circle(img, (x, y), 20, color, -1)
                cv2.circle(img, (x, y), 23, (255, 255, 255), 4)
                
                # Just the point number
                cv2.putText(img, f'{j+1}', (x-12, y+12), 
                           cv2.FONT_HERSHEY_SIMPLEX, 1.2, (255, 255, 255), 4)
        
        cv2.imwrite(output_path, img)
        print(f"Clean visualization saved as {output_path}")
    
    def main():
        print("=== FINDING BEST POSE DETECTION ===")
        
        so = ort.SessionOptions()
        runtime_options = {
            "artifacts_folder": ARTIFACTS_DIR,
        }
        desired_eps = ['TIDLExecutionProvider', 'CPUExecutionProvider']
        # desired_eps = ['CPUExecutionProvider']
        # Load model
        session = ort.InferenceSession(
            MODEL_PATH,
            providers=desired_eps,
            provider_options=[runtime_options, {}],  # One dict per provider
            sess_options=so
        )
        input_name = session.get_inputs()[0].name
        
        # Use corrected preprocessing
        input_tensor, orig_shape = preprocess_correct(IMAGE_PATH)
        
        # Run inference
        outputs = session.run(None, {input_name: input_tensor})
    
        # Get only the best detection
        best_detection = get_best_detection(outputs, orig_shape, min_confidence=0.5)
        
        if best_detection:
            print(f"\n{'='*60}")
            print(f"SUCCESS! Found best detection with {best_detection['obj_conf']:.3f} confidence")
            print(f"{'='*60}")
            
            # Create detailed visualization
            valid_points = visualize_best_detection(IMAGE_PATH, best_detection, "best_detection_detailed.jpg")
            
            # Create clean visualization
            create_clean_visualization(IMAGE_PATH, best_detection, "best_detection_clean.jpg")
            
            print(f"\n📊 SUMMARY:")
            print(f"  • Best confidence: {best_detection['obj_conf']:.3f}")
            print(f"  • Valid keypoints: {valid_points}/4")
            print(f"  • Bounding box area: {(best_detection['bbox'][2]-best_detection['bbox'][0]) * (best_detection['bbox'][3]-best_detection['bbox'][1]):.0f} pixels")
            
            return best_detection
            
        else:
            print("❌ No high-confidence detections found. Try lowering min_confidence.")
            return None
    
    if __name__ == "__main__":
        best = main()

    o/p

    === FINDING BEST POSE DETECTION ===
    libtidl_onnxrt_EP loaded 0x76daa80 
    Final number of subgraphs created are : 1, - Offloaded Nodes - 327, Total Nodes - 327 
    APP: Init ... !!!
     12328.534051 s: MEM: Init ... !!!
     12328.539472 s: MEM: Initialized DMA HEAP (fd=5) !!!
     12328.539643 s: MEM: Init ... Done !!!
     12328.539695 s: IPC: Init ... !!!
     12329.079910 s: IPC: Init ... Done !!!
    REMOTE_SERVICE: Init ... !!!
    REMOTE_SERVICE: Init ... Done !!!
     12329.632986 s: GTC Frequency = 200 MHz
    APP: Init ... Done !!!
     12329.633145 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_ERROR
     12329.633176 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_WARNING
     12329.633200 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_INFO
     12329.633905 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-0 
     12329.634061 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-1 
     12329.634183 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-2 
     12329.634293 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:169] Added target MPU-3 
     12329.634327 s:  VX_ZONE_INFO: [tivxInitLocal:202] Initialization Done !!!
     12329.634356 s:  VX_ZONE_INFO: Globally Disabled VX_ZONE_INFO
    Original image: 1920 x 1080
    Scale factors: x=3.000, y=1.688
    Analyzing 200 predictions for best detection...
      New best detection #1 with confidence 0.902
    
    🏆 BEST DETECTION FOUND:
      Confidence: 0.902
      Raw bbox (640x640): [288.0, 55.1, 410.6, 99.4]
      Scaled bbox (original): [863.9, 93.0, 1231.8, 167.8]
      Raw keypoints (640x640 coords):
        point_1: (414.6, 39.4) conf=0.073
        point_2: (311.4, 229.5) conf=0.700
        point_3: (164.7, 99.2) conf=0.927
        point_4: (420.1, 158.9) conf=0.938
      Scaled keypoints (original image coords):
        point_1: (1243.9, 66.5) conf=0.073
        point_2: (934.2, 387.3) conf=0.700
        point_3: (494.2, 167.3) conf=0.927
        point_4: (1260.2, 268.2) conf=0.938
    
    ============================================================
    SUCCESS! Found best detection with 0.902 confidence
    ============================================================
    
    Visualizing best detection on 1920x1080 image
    Drawing bbox: (863, 93) to (1231, 167)
      Point 2 (point_2): (934, 387) conf=0.700
      Point 3 (point_3): (494, 167) conf=0.927
      Point 4 (point_4): (1260, 268) conf=0.938
      Connection: Point 2 to Point 3
      Connection: Point 3 to Point 4
    
    ✅ Best detection visualization saved as best_detection_detailed.jpg
    Clean visualization saved as best_detection_clean.jpg
    
    📊 SUMMARY:
      • Best confidence: 0.902
      • Valid keypoints: 3/4
      • Bounding box area: 27506 pixels
    APP: Deinit ... !!!
    REMOTE_SERVICE: Deinit ... !!!
    REMOTE_SERVICE: Deinit ... Done !!!
     12330.608466 s: IPC: Deinit ... !!!
     12331.175421 s: IPC: DeInit ... Done !!!
     12331.175493 s: MEM: Deinit ... !!!
     12331.175514 s: DDR_SHARED_MEM: Alloc's: 7 alloc's of 19749280 bytes 
     12331.175527 s: DDR_SHARED_MEM: Free's : 7 free's  of 19749280 bytes 
     12331.175538 s: DDR_SHARED_MEM: Open's : 0 allocs  of 0 bytes 
     12331.175557 s: MEM: Deinit ... Done !!!
    APP: Deinit ... Done !!!
    root@am69-sk:/zzken_pose# 

    Regards,

    venkat

  • Comparing CPU and TIDL outputs...
    Output 0 shape mismatch: CPU (1, 39, 18) vs TIDL (1, 200, 18)

  • Hi Venkat,

    I see the issue, but without the model or an example model, I cannot debug this.  One thought, again without the model, this is just conjecture, is that the middle tensor could be of a different data type?  For example, if the middle output tensor is interpreted as int8 and it is a float (or some other data type), this could make a size difference.  


    The model would help here.  It does not need to be trained as long as the output shapes are the same.  It could also be a subset of an untrained model that is exhibiting this behavior.

    Regards,

    Chris 

  • Hi Chris,

    Hi Chris,

    I switched to 16-bit and added around four layers to the deny list — it's working now! The issue seemed to be related to INT8 and a few specific layers that were causing problems.

    Thanks a lot for your help. Going forward, I’ll make sure to include the model in any questions I raise.

    Regards,
    Venkat