PROCESSOR-SDK-J721S2: model test

wang.rui48

Tool/software:

Hi TI expert,

The attached is the model to be tested.

After the model conversion in sdk-j721s2-09_01_00_06, the converted bin model file is used for verification, and the result is completely wrong. modified_model is the modified model file.

5 months ago

0 Christina Kuruvilla 5 months ago

TI__Expert 5210 points

Hi Wang,

Is this related to a previous E2e, Jira or email request? If so can you share some information regarding.

Please see this FAQ for the guidelines for what information we need. https://e2e.ti.com/support/processors-group/processors---internal/f/processors---internal-forum/1500222/faq-tda4vh-faq-tda4x-am62a-information-required-while-reporting-issue-on-tidl-ti-deep-learning-solution-module-of-processor-sdk-on-tda4x-am6xa

Warm regards,
Christina

0 Adam Hua 5 months ago in reply to Christina Kuruvilla

TI__Expert 4910 points

Hi Christina,

Let me fill the list for the customer:

Property	Details
Device	J721S2 (example)
SDK Version	9.1
TIDL firmware version	9.1
TIDL Tools Version	9.1
Issue Category	Accuracy issue
Example AI model	In customer's original post
Compilation method	Using TIDL import configuration
Compilation log
Compilation artifacts
Inference method
Inference log
Inference artifacts

The major problem reported is accuracy loss.

I have tried this model on edgeai tidl tools 10.1.2 and I get result of onnx:

/cfs-file/__key/communityserver-discussions-components-files/791/modify_5F00_pc_5F00_onnx.npy

and the result of 8bit quantilization on pc:

/cfs-file/__key/communityserver-discussions-components-files/791/modify_5F00_pc_5F00_8bit.npy

The direct accuracy loss is small by just comparing the two npy files.

Pending on customer validate the result with their postprocessing.

Regards

Adam

0 wang.rui48 5 months ago in reply to Adam Hua

Prodigy 220 points

As shown in the figure below, the left picture is the result of the tool test, and the right picture is the output result of the model on the PC.

We have tried many quantization methods, but the results are not correct. Please provide an import configuration file that can correctly quantize the model.

0 Adam Hua 5 months ago in reply to wang.rui48

TI__Expert 4910 points

As we discussed locally, you found the result of my pc result with edgeai tidl tools 10.1.2 correct.

Adam Hua said:
and the result of 8bit quantilization on pc:

But your result using rtos tools 9.1 is poor.

We will try edgeai tidl tools 9.1 to see if this also works and you can try that too. We will update tomorrow.

Regards,

Adam

0 Christina Kuruvilla 5 months ago in reply to Adam Hua

TI__Expert 5210 points

Hi Wang and Adam,

Thank you for all the details and files. This is most likely an issue with 9.1, as it is an older version of TIDLtools with many bugs that 10.1 version fixes. I recommend the use of 10.1 if possible in regards to accuracy, especially with quantization. Is there any reason why you want to stay with 9.1 instead of going to 10.1?

Warm regards,

Christina

0 Adam Hua 5 months ago in reply to Christina Kuruvilla

TI__Expert 4910 points

Hi Christina,

I have discussed with customer on this topic. They refused and you can check with Fredy Zhang (FAE responsible for BYD) for more details.

Regards,

Adam

0 wang.rui48 5 months ago in reply to Christina Kuruvilla

Prodigy 220 points

Hi, We informed your engineers very early that the engineering board on our side is version 9.1 and the tools of 10.1 cannot be used

0 Adam Hua 5 months ago in reply to wang.rui48

TI__Expert 4910 points

I have tested on edgeai tidl tools 09 01 08, and importing process stuck with the following log:

(tidl_09_02) ht@ht-OMEN:~/edgeai/edgeai-tidl-tools/examples/osrt_python/ort$ python3 onnxrt_ep.py -c -m hsh_modify
Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']

Running 1 Models - ['hsh_modify']


Running_Model :  hsh_modify  


Running shape inference on model ../../../models/public/hsh_model_modified.onnx 

tidl_tools_path                                 = /home/ht/edgeai/edgeai-tidl-tools/tidl_tools 
artifacts_folder                                = ../../../model-artifacts//hsh_modify/ 
tidl_tensor_bits                                = 8 
debug_level                                     = 5 
num_tidl_subgraphs                              = 16 
tidl_denylist                                   = 
tidl_denylist_layer_name                        = 
tidl_denylist_layer_type                         = 
tidl_allowlist_layer_name                        = 
model_type                                      =  
tidl_calibration_accuracy_level                 = 7 
tidl_calibration_options:num_frames_calibration = 2 
tidl_calibration_options:bias_calibration_iterations = 5 
mixed_precision_factor = -1.000000 
model_group_id = 0 
power_of_2_quantization                         = 2 
ONNX QDQ Enabled                                = 0 
enable_high_resolution_optimization             = 0 
pre_batchnorm_fold                              = 1 
add_data_convert_ops                          = 3 
output_feature_16bit_names_list                 =  
m_params_16bit_names_list                       =  
reserved_compile_constraints_flag               = 1601 
ti_internal_reserved_1                          = 


 ****** WARNING : Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options******

Supported TIDL layer type ---         Reshape -- Reshape_11 
Supported TIDL layer type ---       Transpose -- Transpose_12 
Supported TIDL layer type ---         Reshape -- Reshape_21 
Supported TIDL layer type ---           Slice -- Slice_41 
Supported TIDL layer type ---           Slice -- Slice_36 
Supported TIDL layer type ---           Slice -- Slice_31 
Supported TIDL layer type ---           Slice -- Slice_26 
Supported TIDL layer type ---          Concat -- Concat_42 
Supported TIDL layer type ---       Transpose -- Transpose_43 
Supported TIDL layer type ---            Conv -- Conv_44 
Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_45 
Supported TIDL layer type ---            Conv -- Conv_46 
Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_47 
Supported TIDL layer type ---         MaxPool -- MaxPool_48 
Supported TIDL layer type ---            Conv -- Conv_49 
Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_50 
Supported TIDL layer type ---            Conv -- Conv_51 
Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_52 
Supported TIDL layer type ---         MaxPool -- MaxPool_53 
Supported TIDL layer type ---            Conv -- Conv_54 
Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_55 
Supported TIDL layer type ---            Conv -- Conv_56 
Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_57 
Supported TIDL layer type ---         MaxPool -- MaxPool_58 
Supported TIDL layer type ---            Conv -- Conv_59 
Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_60 
Supported TIDL layer type ---            Conv -- Conv_61 
Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_62 
Supported TIDL layer type ---         MaxPool -- MaxPool_63 
Supported TIDL layer type ---            Conv -- Conv_64 
Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_65 
Supported TIDL layer type ---            Conv -- Conv_66 
Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_67 
Supported TIDL layer type ---   ConvTranspose -- ConvTranspose_68 
Supported TIDL layer type ---          Concat -- Concat_69 
Supported TIDL layer type ---            Conv -- Conv_70 
Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_71 
Supported TIDL layer type ---            Conv -- Conv_72 
Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_73 
Supported TIDL layer type ---   ConvTranspose -- ConvTranspose_74 
Supported TIDL layer type ---          Concat -- Concat_75 
Supported TIDL layer type ---            Conv -- Conv_76 
Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_77 
Supported TIDL layer type ---            Conv -- Conv_78 
Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_79 
Supported TIDL layer type ---   ConvTranspose -- ConvTranspose_80 
Supported TIDL layer type ---          Concat -- Concat_81 
Supported TIDL layer type ---            Conv -- Conv_82 
Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_83 
Supported TIDL layer type ---            Conv -- Conv_84 
Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_85 
Supported TIDL layer type ---   ConvTranspose -- ConvTranspose_86 
Supported TIDL layer type ---          Concat -- Concat_87 
Supported TIDL layer type ---            Conv -- Conv_88 
Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_89 
Supported TIDL layer type ---            Conv -- Conv_90 
Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_91 
Supported TIDL layer type ---            Conv -- Conv_92 
Supported TIDL layer type ---           Slice -- Slice_112 
Supported TIDL layer type ---           Slice -- Slice_107 
Supported TIDL layer type ---           Slice -- Slice_102 
Supported TIDL layer type ---           Slice -- Slice_97 
Supported TIDL layer type ---          Concat -- Concat_113 
Supported TIDL layer type ---       Transpose -- Transpose_117 
Supported TIDL layer type ---         Reshape -- Reshape_126 
Supported TIDL layer type ---       Transpose -- Transpose_127 
Supported TIDL layer type ---         Reshape -- Reshape_143 

Preliminary subgraphs created = 1 
Final number of subgraphs created are : 1, - Offloaded Nodes - 67, Total Nodes - 67 
SUGGESTION -- [TIDL_Deconv2DLayer]  Please change to Upsample/Resize if possible. Upsample/Resize will be more efficient.  
SUGGESTION -- [TIDL_Deconv2DLayer]  Please change to Upsample/Resize if possible. Upsample/Resize will be more efficient.  
SUGGESTION -- [TIDL_Deconv2DLayer]  Please change to Upsample/Resize if possible. Upsample/Resize will be more efficient.  
SUGGESTION -- [TIDL_Deconv2DLayer]  Please change to Upsample/Resize if possible. Upsample/Resize will be more efficient.  
Running runtimes graphviz - /home/ht/edgeai/edgeai-tidl-tools/tidl_tools/tidl_graphVisualiser_runtimes.out ../../../model-artifacts//hsh_modify//allowedNode.txt ../../../model-artifacts//hsh_modify//tempDir/graphvizInfo.txt ../../../model-artifacts//hsh_modify//tempDir/runtimes_visualization.svg 
*** In TIDL_createStateImportFunc *** 
Compute on node : TIDLExecutionProvider_TIDL_0_0
  0,         Reshape, 2, 1, noise_img, 58
  1,       Transpose, 1, 1, 58, 59
  2,         Reshape, 2, 1, 59, 68
  3,           Slice, 5, 1, 68, 73
  4,           Slice, 5, 1, 68, 78
  5,           Slice, 5, 1, 68, 83
  6,           Slice, 5, 1, 68, 88
  7,          Concat, 4, 1, 73, 89
  8,       Transpose, 1, 1, 89, 90
  9,            Conv, 3, 1, 90, 91
 10,       LeakyRelu, 1, 1, 91, 92
 11,            Conv, 3, 1, 92, 93
 12,       LeakyRelu, 1, 1, 93, 94
 13,         MaxPool, 1, 1, 94, 95
 14,            Conv, 3, 1, 95, 96
 15,       LeakyRelu, 1, 1, 96, 97
 16,            Conv, 3, 1, 97, 98
 17,       LeakyRelu, 1, 1, 98, 99
 18,         MaxPool, 1, 1, 99, 100
 19,            Conv, 3, 1, 100, 101
 20,       LeakyRelu, 1, 1, 101, 102
 21,            Conv, 3, 1, 102, 103
 22,       LeakyRelu, 1, 1, 103, 104
 23,         MaxPool, 1, 1, 104, 105
 24,            Conv, 3, 1, 105, 106
 25,       LeakyRelu, 1, 1, 106, 107
 26,            Conv, 3, 1, 107, 108
 27,       LeakyRelu, 1, 1, 108, 109
 28,         MaxPool, 1, 1, 109, 110
 29,            Conv, 3, 1, 110, 111
 30,       LeakyRelu, 1, 1, 111, 112
 31,            Conv, 3, 1, 112, 113
 32,       LeakyRelu, 1, 1, 113, 114
 33,   ConvTranspose, 3, 1, 114, 115
 34,          Concat, 2, 1, 115, 116
 35,            Conv, 3, 1, 116, 117
 36,       LeakyRelu, 1, 1, 117, 118
 37,            Conv, 3, 1, 118, 119
 38,       LeakyRelu, 1, 1, 119, 120
 39,   ConvTranspose, 3, 1, 120, 121
 40,          Concat, 2, 1, 121, 122
 41,            Conv, 3, 1, 122, 123
 42,       LeakyRelu, 1, 1, 123, 124
 43,            Conv, 3, 1, 124, 125
 44,       LeakyRelu, 1, 1, 125, 126
 45,   ConvTranspose, 3, 1, 126, 127
 46,          Concat, 2, 1, 127, 128
 47,            Conv, 3, 1, 128, 129
 48,       LeakyRelu, 1, 1, 129, 130
 49,            Conv, 3, 1, 130, 131
 50,       LeakyRelu, 1, 1, 131, 132
 51,   ConvTranspose, 3, 1, 132, 133
 52,          Concat, 2, 1, 133, 134
 53,            Conv, 3, 1, 134, 135
 54,       LeakyRelu, 1, 1, 135, 136
 55,            Conv, 3, 1, 136, 137
 56,       LeakyRelu, 1, 1, 137, 138
 57,            Conv, 3, 1, 138, 139
 58,           Slice, 5, 1, 139, 144
 59,           Slice, 5, 1, 139, 149
 60,           Slice, 5, 1, 139, 154
 61,           Slice, 5, 1, 139, 159
 62,          Concat, 4, 1, 144, 160
 63,       Transpose, 1, 1, 160, 164
 64,         Reshape, 2, 1, 164, 173
 65,       Transpose, 1, 1, 173, 174
 66,         Reshape, 2, 1, 174, sharp_output

Input tensor name -  noise_img 
Output tensor name - sharp_output

It seems that there are some issues with tidl version 9.1. Here we propose tidl backport which uses your sdk 9.1 with tidl 10.1.

Regards,

Adam

0 wang.rui48 5 months ago in reply to Adam Hua

Prodigy 220 points

Could you elaborate on what the problem is?

0 Adam Hua 5 months ago in reply to wang.rui48

TI__Expert 4910 points

As we discussed locally, the status of the issue is:

1. this model does not work normally on 9.1 sdk and work on 10.1

2. customer will evaluate this model on evm on 10.1

3. if evaluation on evm on 10.1 works well, customer will backport tidl 10.1 to 9.1

Regards,

Adam

0 Adam Hua 5 months ago in reply to Adam Hua

TI__Expert 4910 points

Fullscreen onnxrt_ep_no_post.py Download

import onnxruntime as rt
import time
import os
import sys
import numpy as np
import PIL
from PIL import Image, ImageFont, ImageDraw, ImageEnhance
import argparse
import re
import multiprocessing
import platform
import shutil

current = os.path.dirname(os.path.realpath(__file__))
parent = os.path.dirname(current)

sys.path.append(parent)
from common_utils import *
from model_configs import *

from common import postprocess_utils as formatter_transform

mutex_lock = multiprocessing.Lock()

model_optimizer_found = False
if platform.machine() != "aarch64":
    try:
        from osrt_model_tools.onnx_tools.tidl_onnx_model_optimizer import optimize

        model_optimizer_found = True
    except ModuleNotFoundError as e:
        print("Skipping import of model optimizer")

required_options = {
    "tidl_tools_path": tidl_tools_path,
    "artifacts_folder": artifacts_folder,
}

parser = argparse.ArgumentParser()
parser.add_argument(
    "-c", "--compile", action="store_true", help="Run in Model compilation mode"
)
parser.add_argument(
    "-d", "--disable_offload", action="store_true", help="Disable offload to TIDL"
)
parser.add_argument(
    "-z", "--run_model_zoo", action="store_true", help="Run model zoo models"
)
parser.add_argument(
    "-o",
    "--graph_optimize",
    action="store_true",
    help="Run ONNX model optimization thourgh onnx-graph-surgeon-tidl",
)
parser.add_argument(
    "-m",
    "--models",
    action="append",
    default=[],
    help="Model name to be added to the list to run",
)
parser.add_argument(
    "-n", "--ncpus", type=int, default=None, help="Number of threads to spawn"
)
args = parser.parse_args()
os.environ["TIDL_RT_PERFSTATS"] = "1"

# Ort Session Options
so = rt.SessionOptions()
so.log_severity_level = 3

print("Available execution providers : ", rt.get_available_providers())

calib_images = [
    "../../../test_data/airshow.jpg",
    "../../../test_data/ADE_val_00001801.jpg",
]
class_test_images = ["../../../test_data/airshow.jpg"]
od_test_images = ["../../../test_data/ADE_val_00001801.jpg"]
seg_test_images = ["../../../test_data/ADE_val_00001801.jpg"]

# Initialize semaphore for multi-threading
sem = multiprocessing.Semaphore(0)
if platform.machine() == "aarch64":
    ncpus = 1
else:
    if args.ncpus and args.ncpus > 0 and args.ncpus < os.cpu_count():
        ncpus = args.ncpus
    else:
        ncpus = os.cpu_count()

idx = 0
nthreads = 0
run_count = 0

if "SOC" in os.environ:
    SOC = os.environ["SOC"]
else:
    print("Please export SOC var to proceed")
    exit(-1)

# Enforce compilation on x86 only
if platform.machine() == "aarch64" and args.compile == True:
    print(
        "Compilation of models is only supported on x86 machine \n\
        Please do the compilation on PC and copy artifacts for running on TIDL devices "
    )
    exit(-1)

# Disable compilation and offload for AM62 (ARM only analytics)
if SOC == "am62":
    args.disable_offload = True
    args.compile = False

def get_benchmark_output(interpreter):
    '''
    Returns benchmark data

    :param interpreter: Runtime session
    :return: Copy time
    :return: Processing time
    :return: Total time
    '''
    benchmark_dict = interpreter.get_TI_benchmark_data()
    proc_time = copy_time = 0
    cp_in_time = cp_out_time = 0
    subgraphIds = []
    for stat in benchmark_dict.keys():
        if "proc_start" in stat:
            value = stat.split("ts:subgraph_")
            value = value[1].split("_proc_start")
            subgraphIds.append(value[0])
    for i in range(len(subgraphIds)):
        proc_time += (
            benchmark_dict["ts:subgraph_" + str(subgraphIds[i]) + "_proc_end"]
            - benchmark_dict["ts:subgraph_" + str(subgraphIds[i]) + "_proc_start"]
        )
        cp_in_time += (
            benchmark_dict["ts:subgraph_" + str(subgraphIds[i]) + "_copy_in_end"]
            - benchmark_dict["ts:subgraph_" + str(subgraphIds[i]) + "_copy_in_start"]
        )
        cp_out_time += (
            benchmark_dict["ts:subgraph_" + str(subgraphIds[i]) + "_copy_out_end"]
            - benchmark_dict["ts:subgraph_" + str(subgraphIds[i]) + "_copy_out_start"]
        )
        copy_time += cp_in_time + cp_out_time
    copy_time = copy_time if len(subgraphIds) == 1 else 0
    totaltime = benchmark_dict["ts:run_end"] - benchmark_dict["ts:run_start"]
    return copy_time, proc_time, totaltime


def infer_image(sess, image_files, config):
    '''
    Invoke the runtime session

    :param sess: Runtime session
    :param image_files: List of input image filename
    :param config: Configuration dictionary
    :return: Input Images
    :return: Output tensors
    :return: Total Processing time
    :return: Subgraphs Processing time
    :return: Height of input tensor
    :return: Width of input tensor
    '''

    # Get input details from the session
    input_details = sess.get_inputs()
    input_name = input_details[0].name
    floating_model = input_details[0].type == "tensor(float)"
    height = input_details[0].shape[2]
    width = input_details[0].shape[3]
    channel = input_details[0].shape[1]
    batch = input_details[0].shape[0]
    imgs = []
    shape = [batch, channel, height, width]
    input_shape = input_details[0].shape
    input_data = np.random.random(input_shape).astype(np.float32) * (1 - 0) 
    print(len(input_details))
    if len(input_details)>1:
        print(len(input_details))
        input_name2 = input_details[1].name
        if input_details[1].type == "tensor(float)":
            input_data2 = np.random.random(input_details[1].shape).astype(np.float32) * (1 - 0) 
        else:
            input_data2 = np.random.randint(0, 2000, size=input_details[1].shape).astype(np.int32)
    # Prepare the input data
    # input_data = np.zeros(shape)
    # for i in range(batch):
    #     imgs.append(
    #         Image.open(image_files[i])
    #         .convert("RGB")
    #         .resize((width, height), PIL.Image.LANCZOS)
    #     )
    #     temp_input_data = np.expand_dims(imgs[i], axis=0)
    #     temp_input_data = np.transpose(temp_input_data, (0, 3, 1, 2))
    #     input_data[i] = temp_input_data[0]
    # if floating_model:
    #     input_data = np.float32(input_data)
    #     for mean, scale, ch in zip(
    #         config["session"]["input_mean"],
    #         config["session"]["input_scale"],
    #         range(input_data.shape[1]),
    #     ):
    #         input_data[:, ch, :, :] = (input_data[:, ch, :, :] - mean) * scale
    # else:
    #     input_data = np.uint8(input_data)
    #     config["session"]["input_mean"] = [0, 0, 0]
    #     config["session"]["input_scale"] = [1, 1, 1]

    data = np.fromfile("/home/ht/customer/BYD/1425.model/0.bin", dtype=np.uint16).astype(np.float32)
    raw_vis = data.reshape(1056, 1920) / 4096
    print(raw_vis)
    input_data[0,0,:,:] = raw_vis
    # Invoke the session
    start_time = time.time()
    
    if len(input_details)>1:
        print(len(input_details))
        output = list(sess.run(None, {input_name: input_data, input_name2: input_data2}))
    else:
        output = list(sess.run(None, {input_name: input_data}))
    stop_time = time.time()
    infer_time = stop_time - start_time

    copy_time, sub_graphs_proc_time, totaltime = get_benchmark_output(sess)
    proc_time = totaltime - copy_time

    return imgs, output, proc_time, sub_graphs_proc_time, height, width


def run_model(model, mIdx):
    '''
    Run a single model

    :param model: Name of the model
    :param mIdx: Run number
    '''
    print("\nRunning_Model : ", model, " \n")
    if platform.machine() != "aarch64":
        mutex_lock.acquire()
        download_model(models_configs, model)
        mutex_lock.release()

    config = models_configs[model]

    # Run graph optimization
    if args.graph_optimize:
        if model_optimizer_found:
            if (args.compile or args.disable_offload) and (
                platform.machine() != "aarch64"
            ):
                copy_path = config["model_path"][:-5] + "_org.onnx"
                # Check if copy path exists and prompt for permission to overwrite
                if os.path.isfile(copy_path):
                    overwrite_permission = input(
                        f"\033[96mThe file {copy_path} exists, do you want to overwrite? [Y/n] \033[00m"
                    )
                    if overwrite_permission != "Y":
                        print("Aborting run...")
                        sys.exit(-1)
                    else:
                        print(
                            f"\033[93m[WARNING] File {copy_path} will be overwritten\033[00m"
                        )

                shutil.copy2(config["model_path"], copy_path)
                print(
                    f"\033[93mOptimization Enabled: Moving {config['model_path']} to {copy_path} before overwriting by optimization\033[00m"
                )
                optimize(
                    model=config["model_path"], out_model=config["model_path"]
                )
            else:
                print(
                    "Model optimization is only supported in compilation or disabled offload mode on x86 machines"
                )
        else:
            print("Model optimizer not found, -o flag has no effect")

    # Set input images
    config = models_configs[model]
    if config["task_type"] == "classification":
        test_images = class_test_images
    elif config["task_type"] == "detection":
        test_images = od_test_images
    elif config["task_type"] == "segmentation":
        test_images = seg_test_images
    
    # Set delegate options 
    delegate_options = {}
    delegate_options.update(required_options)
    delegate_options.update(optional_options)
    if "optional_options" in config:
        delegate_options.update(config["optional_options"])

    delegate_options["artifacts_folder"] = (
        delegate_options["artifacts_folder"] + "/" + model + "/artifacts"
    )

    # Disabling onnxruntime optimizations for vision transformers
    if model == "cl-ort-deit-tiny":
        so.graph_optimization_level = rt.GraphOptimizationLevel.ORT_DISABLE_ALL

    if config["task_type"] == "detection":
        delegate_options["object_detection:meta_layers_names_list"] = config["session"].get("meta_layers_names_list", "")
        delegate_options["object_detection:meta_arch_type"] = config["session"].get("meta_arch_type", -1)

    # Create/Cleanup artifacts_folder
    if args.compile or args.disable_offload:
        os.makedirs(delegate_options["artifacts_folder"], exist_ok=True)
        for root, dirs, files in os.walk(
            delegate_options["artifacts_folder"], topdown=False
        ):
            [os.remove(os.path.join(root, f)) for f in files]
            [os.rmdir(os.path.join(root, d)) for d in dirs]

    if args.compile == True:
        input_image = calib_images
        import onnx

        log = f'\nRunning shape inference on model {config["session"]["model_path"]} \n'
        print(log)

        # Run shape inference on the model
        onnx.shape_inference.infer_shapes_path(
            config["session"]["model_path"], config["session"]["model_path"]
        )
    else:
        input_image = test_images

    numFrames = config["extra_info"]["num_images"]
    if args.compile:
        if numFrames > delegate_options["advanced_options:calibration_frames"]:
            numFrames = delegate_options["advanced_options:calibration_frames"]

    # Create the Inference Session
    if args.disable_offload:
        # Using default EP if offload is disabled
        EP_list = ["CPUExecutionProvider"]
        sess = rt.InferenceSession(
            config["session"]["model_path"], providers=EP_list, sess_options=so
        )
    elif args.compile:
        # Using TIDL Compilation Provider if compiling the model
        EP_list = ["TIDLCompilationProvider", "CPUExecutionProvider"]
        sess = rt.InferenceSession(
            config["session"]["model_path"],
            providers=EP_list,
            provider_options=[delegate_options, {}],
            sess_options=so,
        )
    else:
        # Using TIDL Execution Provider if running the inference
        EP_list = ["TIDLExecutionProvider", "CPUExecutionProvider"]
        sess = rt.InferenceSession(
            config["session"]["model_path"],
            providers=EP_list,
            provider_options=[delegate_options, {}],
            sess_options=so,
        )

    # Adding input_details and output_details to configuration
    input_details = sess.get_inputs()
    input_name = input_details[0].name
    type = input_details[0].type
    height = input_details[0].shape[2]
    width = input_details[0].shape[3]
    channel = input_details[0].shape[1]
    batch = input_details[0].shape[0]
    shape = [batch, channel, height, width]
    input_details = {"name": input_name, "shape": shape, "type": type}

    output_details = sess.get_outputs()
    output_name = output_details[0].name
    type = output_details[0].type
    num_class = output_details[0].shape[1]
    batch = output_details[0].shape[0]
    shape = [batch, num_class]
    output_details = {"name": input_name, "shape": shape, "type": type}

    config["session"]["input_details"] = [input_details]
    config["session"]["output_details"] = [output_details]

    # Set the formatter for post-processing
    if "formatter" in config["postprocess"]:
        formatter = config["postprocess"]["formatter"]
        if isinstance(formatter, str):
            formatter_name = formatter
            formatter = getattr(formatter_transform, formatter_name)()
        elif isinstance(formatter, dict) and "type" in formatter:
            formatter_name = formatter.pop("type")
            formatter = getattr(formatter_transform, formatter_name)(**formatter)
        config["postprocess"]["formatter"] = formatter

    for i in range(numFrames):
        start_index = i % len(input_image)
        input_details = sess.get_inputs()
        batch = input_details[0].shape[0]

        input_images = []
        # For batch processing different images are needed for a single input
        for j in range(batch):
            input_images.append(input_image[(start_index + j) % len(input_image)])

        # Invoke the session
        imgs, output, proc_time, sub_graph_time, height, width = infer_image(sess, input_images, config)

        total_proc_time = (
            total_proc_time + proc_time
            if ("total_proc_time" in locals())
            else proc_time
        )
        sub_graphs_time = (
            sub_graphs_time + sub_graph_time
            if ("sub_graphs_time" in locals())
            else sub_graph_time
        )

    total_proc_time = total_proc_time / 1000000
    sub_graphs_time = sub_graphs_time / 1000000

    # Post-Processing for inference
    output_image_file_name = "py_out_" + model + "_" + os.path.basename(input_image[i % len(input_image)])
    output_bin_file_name = output_image_file_name.replace(".jpg", "") + ".bin"
    
    for i in range(len(output)):
        np.save(output_binary_folder+'/'+model+str(i)+".npy",output[i])
    # if args.compile == False:
    #     images = []
    #     output_tensors = []
    #     if config["task_type"] == "classification":
    #         for j in range(batch):
    #             classes, image = get_class_labels(output[0][j], imgs[j])
    #             print("\n", classes)
    #             images.append(image)
    #             output_tensors.append(
    #                 np.array(output[0][j], dtype=np.float32).flatten()
    #             )
    #     elif config["task_type"] == "detection":
    #         for j in range(batch):
    #             classes, image = det_box_overlay(
    #                 output,
    #                 imgs[j],
    #                 config["extra_info"]["od_type"],
    #                 config["extra_info"]["framework"],
    #             )
    #             images.append(image)
    #             output_np = np.array([], dtype=np.float32)
    #             for tensor in output:
    #                 output_np = np.concatenate(
    #                     (output_np, np.array(tensor, dtype=np.float32).flatten())
    #                 )
    #             output_tensors.append(output_np)
    #     elif config["task_type"] == "segmentation":
    #         for j in range(batch):
    #             imgs[j] = imgs[j].resize(
    #                 (output[0][j].shape[-1], output[0][j].shape[-2]), PIL.Image.LANCZOS
    #             )
    #             classes, image = seg_mask_overlay(output[0][j], imgs[j])
    #             images.append(image)
    #             output_tensors.append(
    #                 np.array(output[0][j], dtype=np.float32).flatten()
    #             )
    #     else:
    #         print("\nInvalid task type ", config["task_type"])

    #     # Save the output images and output tensors
    #     for j in range(batch):
    #         output_image_file_name = "py_out_" + model + "_" + os.path.basename(input_images[j])
    #         print("\nSaving image to ", output_images_folder)
    #         if not os.path.exists(output_images_folder):
    #             os.makedirs(output_images_folder)
    #         images[j].save(output_images_folder + output_image_file_name, "JPEG")
    #         print("\nSaving output tensor to ", output_binary_folder)
    #         if not os.path.exists(output_binary_folder):
    #             os.makedirs(output_binary_folder)
    #         output_bin_file_name = output_image_file_name.replace(".jpg", "") + ".bin"
    #         output_tensors[j].tofile(output_binary_folder + output_bin_file_name)

    # Generate param.yaml after model compilation
    # if args.compile or args.disable_offload:
    #     gen_param_yaml(
    #         delegate_options["artifacts_folder"], config, int(height), int(width)
    #     )

    log = f"\n \nCompleted_Model : {mIdx+1:5d}, Name : {model:50s}, Total time : {total_proc_time/(i+1):10.2f}, Offload Time : {sub_graphs_time/(i+1):10.2f} , DDR RW MBs : 0, Output Image File : {output_image_file_name}, Output Bin File : {output_bin_file_name}\n \n "  # {classes} \n \n'
    print(log)
    if ncpus > 1:
        sem.release()


if len(args.models) > 0:
    models = args.models
else:
    models = ["cl-ort-resnet18-v1", "od-ort-ssd-lite_mobilenetv2_fpn"]
    if SOC == "am69a":
        # Model to demonstrate multi core parallel batch processing
        models.append("cl-ort-resnet18-v1_4batch")
        # Model to demonstrate multi core low latency inference
        models.append("cl-ort-resnet18-v1_low_latency")
    if SOC not in ("am62a", "am67a"):
        models.append("ss-ort-deeplabv3lite_mobilenetv2")
if args.run_model_zoo:
    models = [
        "od-8020_onnxrt_coco_edgeai-mmdet_ssd_mobilenetv2_lite_512x512_20201214_model_onnx",
        "od-8200_onnxrt_coco_edgeai-mmdet_yolox_nano_lite_416x416_20220214_model_onnx",
        "ss-8610_onnxrt_ade20k32_edgeai-tv_deeplabv3plus_mobilenetv2_edgeailite_512x512_20210308_outby4_onnx",
        "od-8220_onnxrt_coco_edgeai-mmdet_yolox_s_lite_640x640_20220221_model_onnx",
        "cl-6360_onnxrt_imagenet1k_fbr-pycls_regnetx-200mf_onnx",
    ]
log = f"\nRunning {len(models)} Models - {models}\n"
print(log)


def join_one(nthreads):
    '''
    Join the thread

    :param nthreads: Thread count
    '''
    global run_count
    sem.acquire()
    run_count = run_count + 1
    return nthreads - 1

def spawn_one(models, idx, nthreads):
    '''
    Spawn a process

    :param models: Name of the model to run
    :param idx: Index
    :param nthreads: Thread count
    '''
    p = multiprocessing.Process(
        target=run_model,
        args=(
            models,
            idx,
        ),
    )
    p.start()
    return idx + 1, nthreads + 1


# Run the models using multi-processing if possible
if ncpus > 1:
    for t in range(min(len(models), ncpus)):
        idx, nthreads = spawn_one(models[idx], idx, nthreads)

    while idx < len(models):
        nthreads = join_one(nthreads)
        idx, nthreads = spawn_one(models[idx], idx, nthreads)

    for n in range(nthreads):
        nthreads = join_one(nthreads)
else:
    for mIdx, model in enumerate(models):
        run_model(model, mIdx)

Please check the script I use. I modified the input and the postprocessing to read in your input files.

and here are the changes I add to model configs:

    "hsh_modify": create_model_config(
        source=AttrDict(
            model_url="dummy",
            infer_shape=True,
        ),
        preprocess=AttrDict(
            resize=256,
            crop=224,
            data_layout="NCHW",
            resize_with_pad=False,
            reverse_channels=False,
        ),
        session=AttrDict(
            session_name="onnxrt",
            model_path=os.path.join(models_base_path, "hsh_model_modified.onnx"),
            input_mean=[123.675, 116.28, 103.53],
            input_scale=[0.017125, 0.017507, 0.017429],
            input_optimization=True,
        ),
        task_type="classification",
        extra_info=AttrDict(num_images=numImages, num_classes=1000),
    ),

Regards,

Adam

0 Adam Hua 5 months ago in reply to Christina Kuruvilla

TI__Expert 4910 points

Hi Christina,

Current status:

customer is trying to reproduce my result with edgeai tidl tools 10.01.00.02.

If they evaluate their model successfully on sdk 10.1, they will try tidl backporing.

Regards,

Adam

0 Christina Kuruvilla 5 months ago in reply to Adam Hua

TI__Expert 5210 points

Thank you Adam for all the updates. Don't hesitate to reach out if there is anything that may be needed to help.

Warm regards,

Christina

0 Adam Hua 4 months ago in reply to wang.rui48

TI__Expert 4910 points

Hi,

As you said inference stuck on SOC, can you upload your infer log here. I will try reproduce your problem today as well.

Regards,

Adam

0 Adam Hua 4 months ago in reply to wang.rui48

TI__Expert 4910 points

I am able to reproduce this issue with infer log on sk-am68a:

root@am68a-sk:/opt/edgeai/edgeai-tidl-tools/examples/osrt_python/ort# python3 onnxrt_ep_no_post.py -m hsh_modify                                                                                                                                                                   
Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']

Running 1 Models - ['hsh_modify']


Running_Model :  hsh_modify  

libtidl_onnxrt_EP loaded 0x111455a0 
artifacts_folder                                = ../../../model-artifacts//hsh_modify/artifacts 
debug_level                                     = 2 
target_priority                                 = 0 
max_pre_empt_delay                              = 340282346638528859811704183484516925440.000000 
Final number of subgraphs created are : 1, - Offloaded Nodes - 67, Total Nodes - 67 
In TIDL_createStateInfer 
Compute on node : TIDLExecutionProvider_TIDL_0_0
************ in TIDL_subgraphRtCreate ************ 
 APP: Init ... !!!
   522.798054 s: MEM: Init ... !!!
   522.798124 s: MEM: Initialized DMA HEAP (fd=5) !!!
   522.798281 s: MEM: Init ... Done !!!
   522.798306 s: IPC: Init ... !!!
   522.856548 s: IPC: Init ... Done !!!
REMOTE_SERVICE: Init ... !!!
REMOTE_SERVICE: Init ... Done !!!
   522.863889 s: GTC Frequency = 200 MHz
APP: Init ... Done !!!
   522.866517 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_ERROR
   522.866555 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_WARNING
   522.866566 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_INFO
   522.871411 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:134] Added target MPU-0 
   522.871571 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:134] Added target MPU-1 
   522.871707 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:134] Added target MPU-2 
   522.871839 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:134] Added target MPU-3 
   522.871856 s:  VX_ZONE_INFO: [tivxInitLocal:126] Initialization Done !!!
   522.871868 s:  VX_ZONE_INFO: Globally Disabled VX_ZONE_INFO
[C7x_1 ]    522.913392 s: PREEMPTION: Requesting memory of size 3014656 for targetPriority = 256
[C7x_1 ]    522.913417 s: 
[C7x_1 ]    522.913437 s: --------------------------------------------
[C7x_1 ]    522.913463 s: TIDL Memory size requiement (record wise):
[C7x_1 ]    522.913503 s: MemRecNum   , Space               , Attribute   , Alignment   , Size(KBytes), BasePtr     
[C7x_1 ]    522.913548 s: 0           , DDR Cacheable       , Persistent  ,  128, 19.27   , 0x00000000
[C7x_1 ]    522.913591 s: 1           , DDR Cacheable       , Persistent  ,  128, 0.65    , 0x00000000
[C7x_1 ]    522.913633 s: 2           , L1D                 , Scratch     ,  128, 16.00   , 0x00000000
[C7x_1 ]    522.913673 s: 3           , L2                  , Scratch     ,  128, 448.00  , 0x00000000
[C7x_1 ]    522.913714 s: 4           , L3/MSMC             , Scratch     ,  128, 2944.00 , 0x00000000
[C7x_1 ]    522.913755 s: 5           , DDR Cacheable       , Persistent  ,  128, 1017.87 , 0x00000000
[C7x_1 ]    522.913796 s: 6           , DDR Cacheable       , Scratch     ,  128, 9.00    , 0x00000000
[C7x_1 ]    522.913837 s: 7           , DDR Cacheable       , Persistent  ,  128, 76965.25, 0x00000000
[C7x_1 ]    522.913878 s: 8           , DDR Cacheable       , Scratch     ,  128, 0.13    , 0x00000000
[C7x_1 ]    522.913918 s: 9           , DDR Cacheable       , Scratch     ,  128, 3.13    , 0x00000000
[C7x_1 ]    522.913959 s: 10          , DDR Cacheable       , Persistent  ,  128, 908.39  , 0x00000000
[C7x_1 ]    522.914000 s: 11          , DDR Cacheable       , Scratch     ,  128, 512.25  , 0x00000000
[C7x_1 ]    522.914041 s: 12          , DDR Cacheable       , Persistent  ,  128, 2944.00 , 0x00000000
[C7x_1 ]    522.914082 s: 13          , DDR Cacheable       , Persistent  ,  128, 1482.07 , 0x00000000
[C7x_1 ]    522.914122 s: 14          , DDR Cacheable       , Persistent  ,  128, 0.00    , 0x00000000
[C7x_1 ]    522.914166 s: 15          , DDR Cacheable       , Persistent  ,  128, 7589.00 , 0x00000000
[C7x_1 ]    522.914197 s: --------------------------------------------
[C7x_1 ]    522.914222 s: Total memory size requirement (space wise):
[C7x_1 ]    522.914243 s: Mem Space , Size(KBytes)
[C7x_1 ]    522.914261 s: L1D       , 16.00   
[C7x_1 ]    522.914277 s: L2        , 448.00  
[C7x_1 ]    522.914294 s: L3/MSMC   , 2944.00 
[C7x_1 ]    522.914313 s: DDR Cacheable, 91451.01
[C7x_1 ]    522.914335 s: --------------------------------------------
[C7x_1 ]    522.914370 s: NOTE: Memory requirement in host emulation can be different from the same on EVM
[C7x_1 ]    522.914409 s:       To get the actual TIDL memory requirement make sure to run on EVM with 
[C7x_1 ]    522.914432 s:       debugTraceLevel = 2
[C7x_1 ]    522.914441 s: 
[C7x_1 ]    522.914461 s: --------------------------------------------
[C7x_1 ]    522.915957 s: TIDL init call from ivision API 
[C7x_1 ]    522.915975 s: 
[C7x_1 ]    522.915992 s: --------------------------------------------
[C7x_1 ]    522.916017 s: TIDL Memory size requiement (record wise):
[C7x_1 ]    522.916057 s: MemRecNum   , Space               , Attribute   , Alignment   , Size(KBytes), BasePtr     
[C7x_1 ]    522.916101 s: 0           , DDR Cacheable       , Persistent  ,  128, 19.27   , 0x17027000
[C7x_1 ]    522.916142 s: 1           , DDR Cacheable       , Persistent  ,  128, 0.65    , 0x1702bf00
[C7x_1 ]    522.916188 s: 2           , L1D                 , Scratch     ,  128, 16.00   , 0x64e00000
[C7x_1 ]    522.916229 s: 3           , L2                  , Scratch     ,  128, 448.00  , 0x64800000
[C7x_1 ]    522.916270 s: 4           , L3/MSMC             , Scratch     ,  128, 2944.00 , 0x70020000
[C7x_1 ]    522.916312 s: 5           , DDR Cacheable       , Persistent  ,  128, 1017.87 , 0x1702c300
[C7x_1 ]    522.916352 s: 6           , DDR Cacheable       , Scratch     ,  128, 9.00    , 0x00000000
RT-Profile: TIDLRT_init_profiling 
[C7x_1 ]    522.916393 s: 7           , DDR Cacheable       , Persistent  ,  128, 76965.25, 0x1712ac00
tidlrt_create            :      193225336 ns,
tidl_rt_ovx_Init         :       78126339 ns,
vxCreateContext          :        2234285 ns,
init_tidl_tiovx          :       10836423 ns,
[C7x_1 ]    522.916433 s: 8           , DDR Cacheable       , Scratch     ,  128, 0.13    , 0x00002400
create_graph_tidl_tiovx  :       14287798 ns,
verify_graph_tidl_tiovx  :       86185671 ns,
tivxTIDLLoadKernels      :          25890 ns,
mapConfig                :         472790 ns,
[C7x_1 ]    522.916473 s: 9           , DDR Cacheable       , Scratch     ,  128, 3.13    , 0x00002800
tivxAddKernelTIDL        :          47890 ns,
mapNetwork               :        9763218 ns,
setCreateParams          :         235625 ns,
setArgs                  :         288680 ns,
[C7x_1 ]    522.916513 s: 10          , DDR Cacheable       , Persistent  ,  128, 908.39  , 0x1bc54200
vxCreateUserDataObject   :          26525 ns,
vxMapUserDataObject      :        6776062 ns,
memcopy_network_buffer   :        2954126 ns,
vxUnmapUserDataObject    :           4480 ns,
[C7x_1 ]    522.916555 s: 11          , DDR Cacheable       , Scratch     ,  128, 512.25  , 0x00003800
************ TIDL_subgraphRtCreate done ************ 
[C7x_1 ]    522.916597 s: 12          , DDR Cacheable       , Persistent  ,  128, 2944.00 , 0x1bd37500
[C7x_1 ]    522.916638 s: 13          , DDR Cacheable       , Persistent  ,  128, 1482.07 , 0x1c017600
[C7x_1 ]    522.916678 s: 14          , DDR Cacheable       , Persistent  ,  128, 0.00    , 0x1c18a000
[C7x_1 ]    522.916719 s: 15          , DDR Cacheable       , Persistent  ,  128, 7589.00 , 0x1c18a200
[C7x_1 ]    522.916748 s: --------------------------------------------
[C7x_1 ]    522.916773 s: Total memory size requirement (space wise):
[C7x_1 ]    522.916792 s: Mem Space , Size(KBytes)
[C7x_1 ]    522.916810 s: L1D       , 16.00   
[C7x_1 ]    522.916826 s: L2        , 448.00  
[C7x_1 ]    522.916843 s: L3/MSMC   , 2944.00 
[C7x_1 ]    522.916861 s: DDR Cacheable, 91451.01
[C7x_1 ]    522.916883 s: --------------------------------------------
[C7x_1 ]    522.916918 s: NOTE: Memory requirement in host emulation can be different from the same on EVM
[C7x_1 ]    522.916956 s:       To get the actual TIDL memory requirement make sure to run on EVM with 
[C7x_1 ]    522.916980 s:       debugTraceLevel = 2
[C7x_1 ]    522.916989 s: 
[C7x_1 ]    522.917009 s: --------------------------------------------
[C7x_1 ]    522.919445 s: Alg Init for Layer # -    1
[C7x_1 ]    522.919564 s: Alg Init for Layer # -    2
[C7x_1 ]    522.919644 s: Alg Init for Layer # -    3
[C7x_1 ]    522.919713 s: Alg Init for Layer # -    4
[C7x_1 ]    522.919775 s: Alg Init for Layer # -    5
[C7x_1 ]    522.919857 s: Alg Init for Layer # -    6
[C7x_1 ]    522.919958 s: Alg Init for Layer # -    7
[C7x_1 ]    522.920040 s: Alg Init for Layer # -    8
[C7x_1 ]    522.920121 s: Alg Init for Layer # -    9
[C7x_1 ]    522.920336 s: Alg Init for Layer # -   10
[C7x_1 ]    522.920411 s: Alg Init for Layer # -   11
[C7x_1 ]    522.920511 s: Alg Init for Layer # -   12
[C7x_1 ]    522.921074 s: Alg Init for Layer # -   13
[C7x_1 ]    522.921224 s: Alg Init for Layer # -   14
[C7x_1 ]    522.922292 s: Alg Init for Layer # -   15
[C7x_1 ]    522.922434 s: Alg Init for Layer # -   16
[C7x_1 ]    522.922563 s: Alg Init for Layer # -   17
[C7x_1 ]    522.922975 s: Alg Init for Layer # -   18
[C7x_1 ]    522.923128 s: Alg Init for Layer # -   19
[C7x_1 ]    522.923783 s: Alg Init for Layer # -   20
[C7x_1 ]    522.923939 s: Alg Init for Layer # -   21
[C7x_1 ]    522.924054 s: Alg Init for Layer # -   22
1
[C7x_1 ]    522.924505 s: Alg Init for Layer # -   23
[C7x_1 ]    522.924683 s: Alg Init for Layer # -   24
[C7x_1 ]    522.925359 s: Alg Init for Layer # -   25
[C7x_1 ]    522.925540 s: Alg Init for Layer # -   26
[C7x_1 ]    522.925658 s: Alg Init for Layer # -   27
[C7x_1 ]    522.926515 s: Alg Init for Layer # -   28
[C7x_1 ]    522.926728 s: Alg Init for Layer # -   29
[C7x_1 ]    522.930405 s: Alg Init for Layer # -   30
[C7x_1 ]    522.930636 s: Alg Init for Layer # -   31
[C7x_1 ]    522.930755 s: Alg Init for Layer # -   32
[C7x_1 ]    522.933340 s: Alg Init for Layer # -   33
[C7x_1 ]    522.933654 s: Alg Init for Layer # -   34
[C7x_1 ]    522.940085 s: Alg Init for Layer # -   35
[C7x_1 ]    522.940406 s: Alg Init for Layer # -   36
[C7x_1 ]    522.949784 s: Alg Init for Layer # -   37
[C7x_1 ]    522.949960 s: Alg Init for Layer # -   38
[C7x_1 ]    522.959704 s: Alg Init for Layer # -   39
[C7x_1 ]    522.959947 s: Alg Init for Layer # -   40
[C7x_1 ]    522.963625 s: Alg Init for Layer # -   41
[C7x_1 ]    522.963870 s: Alg Init for Layer # -   42
[C7x_1 ]    522.966381 s: Alg Init for Layer # -   43
[C7x_1 ]    522.966561 s: Alg Init for Layer # -   44
[C7x_1 ]    522.970673 s: Alg Init for Layer # -   45
[C7x_1 ]    522.970887 s: Alg Init for Layer # -   46
[C7x_1 ]    522.971623 s: Alg Init for Layer # -   47
[C7x_1 ]    522.971838 s: Alg Init for Layer # -   48
[C7x_1 ]    522.972602 s: Alg Init for Layer # -   49
[C7x_1 ]    522.972786 s: Alg Init for Layer # -   50
[C7x_1 ]    522.974862 s: Alg Init for Layer # -   51
[C7x_1 ]    522.975059 s: Alg Init for Layer # -   52
[C7x_1 ]    522.975770 s: Alg Init for Layer # -   53
[C7x_1 ]    522.975971 s: Alg Init for Layer # -   54
[C7x_1 ]    522.976296 s: Alg Init for Layer # -   55
[C7x_1 ]    522.976478 s: Alg Init for Layer # -   56
[C7x_1 ]    522.979262 s: Alg Init for Layer # -   57
[C7x_1 ]    522.979458 s: Alg Init for Layer # -   58
[C7x_1 ]    522.980588 s: Alg Init for Layer # -   59
[C7x_1 ]    522.980782 s: Alg Init for Layer # -   60
[C7x_1 ]    522.981666 s: Alg Init for Layer # -   61
[C7x_1 ]    522.981760 s: Alg Init for Layer # -   64
[C7x_1 ]    522.981847 s: Alg Init for Layer # -   62
[C7x_1 ]    522.981934 s: Alg Init for Layer # -   63
[C7x_1 ]    522.982019 s: Alg Init for Layer # -   65
[C7x_1 ]    522.982275 s: Alg Init for Layer # -   66
[C7x_1 ]    522.982391 s: Alg Init for Layer # -   67
[C7x_1 ]    522.982466 s: Alg Init for Layer # -   68
[C7x_1 ]    522.982532 s: Alg Init for Layer # -   69
[C7x_1 ]    522.982596 s: Alg Init for Layer # -   70
[C7x_1 ]    522.982748 s: PREEMPTION: Adding a new priority object for targetPriority = 256, handle = 117027000
[C7x_1 ]    522.982810 s: PREEMPTION: Now total number of priority objects = 1 at priorityId = 256,    with new memRec of base = 11bd37500 and size = 3014656
[C7x_1 ]    522.982881 s: PREEMPTION: Requesting context memory addr for handle 117027000, return Addr = b0f79a68
[C7x_1 ]    522.982912 s: Print preEmption Hnadle during init stage :
[C7x_1 ]    522.982938 s: ProcTime,      ctxSize,       dataId
[C7x_1 ]    522.982966 s: 0.000,         7288,            0
[C7x_1 ]    522.982992 s: 0.891,      2034936,            1
[C7x_1 ]    522.983015 s: 0.006,      2034936,            2
[C7x_1 ]    522.983039 s: 6.343,         7288,            3
[C7x_1 ]    522.983062 s: 0.006,         7288,            4
[C7x_1 ]    522.983084 s: 3.487,         7288,            5
[C7x_1 ]    522.983106 s: 3.487,         7288,            6
[C7x_1 ]    522.983129 s: 3.487,         7288,            7
[C7x_1 ]    522.983151 s: 3.487,         7288,            8
[C7x_1 ]    522.983185 s: 0.083,         7288,            9
[[0.3137207  0.28808594 0.26220703 ... 0.26660156 0.2685547  0.        ]
 [0.3737793  0.32202148 0.30786133 ... 0.16992188 0.28808594 0.1430664 ]
 [0.30786133 0.41259766 0.3371582  ... 0.28808594 0.2824707  0.24804688]
 ...
 [0.30004883 0.22607422 0.19555664 ... 0.17895508 0.25048828 0.15454102]
 [0.32470703 0.34423828 0.39257812 ... 0.28442383 0.23486328 0.32983398]
 [0.3955078  0.27270508 0.39111328 ... 0.17456055 0.2199707  0.22607422]][C7x_1 ]    522.983209 s: 0.006,         7288,           10
[C7x_1 ]    522.983232 s: 0.407,      2035192,           11
[C7x_1 ]    522.983255 s: 1.844,         7288,           12

[C7x_1 ]    522.983278 s: 3.539,         7288,           13
[C7x_1 ]    522.983301 s: 3.576,         7288,           14
[C7x_1 ]    522.983324 s: 3.539,         7288,           15
[C7x_1 ]    522.983347 s: 2.232,         7288,           16
[C7x_1 ]    522.983370 s: 1.333,         7288,           17
[C7x_1 ]    522.983392 s: 1.780,         7288,           18
[C7x_1 ]    522.983415 s: 1.792,         7288,           19
[C7x_1 ]    522.983438 s: 1.780,         7288,           20
[C7x_1 ]    522.983461 s: 0.892,      2034936,           21
[C7x_1 ]    522.983484 s: 0.664,         7288,           22
[C7x_1 ]    522.983506 s: 0.893,         7288,           23
[C7x_1 ]    522.983529 s: 1.296,         7288,           24
[C7x_1 ]    522.983552 s: 0.893,         7288,           25
[C7x_1 ]    522.983574 s: 0.448,      1031416,           26
[C7x_1 ]    522.983597 s: 0.685,         7288,           27
[C7x_1 ]    522.983620 s: 0.227,      2055416,           28
[C7x_1 ]    522.983642 s: 1.417,         7288,           29
[C7x_1 ]    522.983665 s: 0.450,         7288,           30
[C7x_1 ]    522.983688 s: 0.227,       515320,           31
[C7x_1 ]    522.983711 s: 0.663,      1023224,           32
[C7x_1 ]    522.983734 s: 0.079,      1023224,           33
[C7x_1 ]    522.983757 s: 2.578,      1023224,           34
[C7x_1 ]    522.983781 s: 0.079,      1023224,           35
[C7x_1 ]    522.983804 s: 0.341,         7288,           36
[C7x_1 ]    522.983826 s: 0.900,         7288,           37
[C7x_1 ]    522.983850 s: 7.795,      2119416,           38
[C7x_1 ]    522.983872 s: 0.150,      2055416,           39
[C7x_1 ]    522.983896 s: 1.417,         7288,           40
[C7x_1 ]    522.983918 s: 0.227,      2055416,           41
[C7x_1 ]    522.983941 s: 0.518,         7288,           42
[C7x_1 ]    522.983963 s: 1.786,         7288,           43
[C7x_1 ]    522.983986 s: 5.192,         7288,           44
[C7x_1 ]    522.984008 s: 0.893,         7288,           45
[C7x_1 ]    522.984031 s: 1.296,         7288,           46
[C7x_1 ]    522.984054 s: 0.893,         7288,           47
[C7x_1 ]    522.984076 s: 1.335,         7288,           48
[C7x_1 ]    522.984099 s: 3.561,         7288,           49
[C7x_1 ]    522.984121 s: 2.790,         7288,           50
[C7x_1 ]    522.984144 s: 1.780,         7288,           51
[C7x_1 ]    522.984172 s: 1.792,         7288,           52
[C7x_1 ]    522.984196 s: 1.780,         7288,           53
[C7x_1 ]    522.984218 s: 2.655,         7288,           54
[C7x_1 ]    522.984241 s: 7.110,         7288,           55
[C7x_1 ]    522.984264 s: 5.486,         7288,           56
[C7x_1 ]    522.984287 s: 3.539,         7288,           57
[C7x_1 ]    522.984309 s: 3.577,         7288,           58
[C7x_1 ]    522.984331 s: 3.539,         7288,           59
[C7x_1 ]    522.984354 s: 2.030,         7288,           60
[C7x_1 ]    522.984377 s: 0.002,         7288,           61
[C7x_1 ]    522.984400 s: 0.002,         7288,           64
[C7x_1 ]    522.984422 s: 0.002,         7288,           62
[C7x_1 ]    522.984444 s: 0.002,         7288,           63
 *******   In TIDL_subgraphRtInvoke  ******** 
[C7x_1 ]    522.984466 s: 0.243,      2035192,           65
[C7x_1 ]    522.984488 s: 0.288,         7288,           66
[C7x_1 ]    522.984510 s: 0.006,         7288,           67
[C7x_1 ]    522.984533 s: 6.343,      2034936,           68
[C7x_1 ]    522.984555 s: 0.006,      2034936,           69
[C7x_1 ]    522.984578 s: 1.824,         7288,           70
[C7x_1 ]    522.984600 s: 0.000,            0,           71
[C7x_1 ]    522.984669 s: TIDL_initializeHandleForPreemption is completed 
[C7x_1 ]    524.252040 s: TIDL_process is started with handle : 117027000 
[C7x_1 ]    524.252084 s: PREEMPTION: Requesting UNLOCK for priroty object and targetPriority 256 is serviced
[C7x_1 ]    524.252131 s: PREEMPTION: Requesting LOCK for priroty object with handle = 117027000 and targetPriority 256
[C7x_1 ]    524.252186 s: PREEMPTION: Request of LOCK for priroty object with handle = 117027000 and targetPriority 256 is serviced with state 0
[C7x_1 ]    524.252282 s: TIDL_activate is called with handle : 117027000 - Copying handle of size 19736 from 117027000 to 702f2000 
[C7x_1 ]    524.252368 s: PREEMPTION: Requesting UNLOCK for priroty object and targetPriority 256
[C7x_1 ]    524.252417 s: PREEMPTION: Requesting UNLOCK for priroty object and targetPriority 256 is serviced
[C7x_1 ]    524.252464 s: PREEMPTION: Requesting LOCK for priroty object with handle = 117027000 and targetPriority 256
[C7x_1 ]    524.252518 s: PREEMPTION: Request of LOCK for priroty object with handle = 117027000 and targetPriority 256 is serviced with state 0
[C7x_1 ]    524.252560 s: Core 0 Alg Process for Layer # -    1, layer type 29
[C7x_1 ]    524.252589 s: Processing Layer # -    1
[C7x_1 ]    524.252984 s: Core 0 End of Layer # -    1 with outPtrs[0] = 70020000
[C7x_1 ]    524.253022 s: Core 0 Alg Process for Layer # -    2, layer type 38
[C7x_1 ]    524.253048 s: Processing Layer # -    2
[C7x_1 ]    524.253080 s: Core 0 End of Layer # -    2 with outPtrs[0] = 70020000
[C7x_1 ]    524.253114 s: Core 0 Alg Process for Layer # -    3, layer type 41
[C7x_1 ]    524.253138 s: Processing Layer # -    3
[C7x_1 ]    524.274440 s: Core 0 End of Layer # -    3 with outPtrs[0] = 11712ac00
[C7x_1 ]    524.274476 s: Core 0 Alg Process for Layer # -    4, layer type 38
[C7x_1 ]    524.274500 s: Processing Layer # -    4
[C7x_1 ]    524.274531 s: Core 0 End of Layer # -    4 with outPtrs[0] = 11712ac00
[C7x_1 ]    524.274565 s: Core 0 Alg Process for Layer # -    5, layer type 14
[C7x_1 ]    524.274588 s: Processing Layer # -    5
[C7x_1 ]    524.287551 s: Core 0 End of Layer # -    5 with outPtrs[0] = 70020000
[C7x_1 ]    524.287586 s: Core 0 Alg Process for Layer # -    6, layer type 14
[C7x_1 ]    524.287611 s: Processing Layer # -    6
[C7x_1 ]    524.300561 s: Core 0 End of Layer # -    6 with outPtrs[0] = 7009bc80
[C7x_1 ]    524.300596 s: Core 0 Alg Process for Layer # -    7, layer type 14
[C7x_1 ]    524.300621 s: Processing Layer # -    7
[C7x_1 ]    524.313582 s: Core 0 End of Layer # -    7 with outPtrs[0] = 70117900
[C7x_1 ]    524.313616 s: Core 0 Alg Process for Layer # -    8, layer type 14
[C7x_1 ]    524.313641 s: Processing Layer # -    8
[C7x_1 ]    524.326588 s: Core 0 End of Layer # -    8 with outPtrs[0] = 70193580
[C7x_1 ]    524.326621 s: Core 0 Alg Process for Layer # -    9, layer type 12
[C7x_1 ]    524.326647 s: Processing Layer # -    9
[C7x_1 ]    524.328766 s: Core 0 End of Layer # -    9 with outPtrs[0] = 11712ac00
[C7x_1 ]    524.328809 s: Core 0 Alg Process for Layer # -   10, layer type 38
[C7x_1 ]    524.328835 s: Processing Layer # -   10
[C7x_1 ]    524.328868 s: Core 0 End of Layer # -   10 with outPtrs[0] = 11712ac00
[C7x_1 ]    524.328903 s: Core 0 Alg Process for Layer # -   11, layer type 29
[C7x_1 ]    524.328927 s: Processing Layer # -   11
[C7x_1 ]    524.329525 s: Core 0 End of Layer # -   11 with outPtrs[0] = 70020000
[C7x_1 ]    524.329560 s: Core 0 Alg Process for Layer # -   12, layer type 1
[C7x_1 ]    524.329586 s: Processing Layer # -   12
[C7x_1 ]    524.330449 s: Core 0 End of Layer # -   12 with outPtrs[0] = 11712ac00
[C7x_1 ]    524.330485 s: Core 0 Alg Process for Layer # -   13, layer type 8
[C7x_1 ]    524.330509 s: Processing Layer # -   13
[C7x_1 ]    524.333195 s: Core 0 End of Layer # -   13 with outPtrs[0] = 1180ef800
[C7x_1 ]    524.333231 s: Core 0 Alg Process for Layer # -   14, layer type 1
[C7x_1 ]    524.333255 s: Processing Layer # -   14
[C7x_1 ]    524.336298 s: Core 0 End of Layer # -   14 with outPtrs[0] = 119591c00
[C7x_1 ]    524.336333 s: Core 0 Alg Process for Layer # -   15, layer type 8
[C7x_1 ]    524.336357 s: Processing Layer # -   15
[C7x_1 ]    524.339043 s: Core 0 End of Layer # -   15 with outPtrs[0] = 11712ac00
[C7x_1 ]    524.339080 s: Core 0 Alg Process for Layer # -   16, layer type 2
[C7x_1 ]    524.339105 s: Processing Layer # -   16
[C7x_1 ]    524.339901 s: Core 0 End of Layer # -   16 with outPtrs[0] = 118dec400
[C7x_1 ]    524.339937 s: Core 0 Alg Process for Layer # -   17, layer type 1
[C7x_1 ]    524.339960 s: Processing Layer # -   17
[C7x_1 ]    524.341094 s: Core 0 End of Layer # -   17 with outPtrs[0] = 1185cd000
[C7x_1 ]    524.341130 s: Core 0 Alg Process for Layer # -   18, layer type 8
[C7x_1 ]    524.341155 s: Processing Layer # -   18
[C7x_1 ]    524.342539 s: Core 0 End of Layer # -   18 with outPtrs[0] = 118dec400
[C7x_1 ]    524.342576 s: Core 0 Alg Process for Layer # -   19, layer type 1
[C7x_1 ]    524.342599 s: Processing Layer # -   19
[C7x_1 ]    524.344400 s: Core 0 End of Layer # -   19 with outPtrs[0] = 1195a8800
[C7x_1 ]    524.344436 s: Core 0 Alg Process for Layer # -   20, layer type 8
[C7x_1 ]    524.344460 s: Processing Layer # -   20
[C7x_1 ]    524.345843 s: Core 0 End of Layer # -   20 with outPtrs[0] = 1185cd000
[C7x_1 ]    524.345878 s: Core 0 Alg Process for Layer # -   21, layer type 2
[C7x_1 ]    524.345902 s: Processing Layer # -   21
[C7x_1 ]    524.346259 s: Core 0 End of Layer # -   21 with outPtrs[0] = 70020000
[C7x_1 ]    524.346295 s: Core 0 Alg Process for Layer # -   22, layer type 1
[C7x_1 ]    524.346319 s: Processing Layer # -   22
[C7x_1 ]    524.347068 s: Core 0 End of Layer # -   22 with outPtrs[0] = 118d89400
[C7x_1 ]    524.347102 s: Core 0 Alg Process for Layer # -   23, layer type 8
[C7x_1 ]    524.347126 s: Processing Layer # -   23
[C7x_1 ]    524.347854 s: Core 0 End of Layer # -   23 with outPtrs[0] = 1191d7800
[C7x_1 ]    524.347890 s: Core 0 Alg Process for Layer # -   24, layer type 1
[C7x_1 ]    524.347916 s: Processing Layer # -   24
[C7x_1 ]    524.349401 s: Core 0 End of Layer # -   24 with outPtrs[0] = 1195b5c00
[C7x_1 ]    524.349437 s: Core 0 Alg Process for Layer # -   25, layer type 8
[C7x_1 ]    524.349462 s: Processing Layer # -   25
[C7x_1 ]    524.350189 s: Core 0 End of Layer # -   25 with outPtrs[0] = 118d89400
[C7x_1 ]    524.350225 s: Core 0 Alg Process for Layer # -   26, layer type 2
[C7x_1 ]    524.350249 s: Processing Layer # -   26
[C7x_1 ]    524.350448 s: Core 0 End of Layer # -   26 with outPtrs[0] = 70020000
[C7x_1 ]    524.350482 s: Core 0 Alg Process for Layer # -   27, layer type 1
[C7x_1 ]    524.350506 s: Processing Layer # -   27
[C7x_1 ]    524.351239 s: Core 0 End of Layer # -   27 with outPtrs[0] = 119167800
[C7x_1 ]    524.351274 s: Core 0 Alg Process for Layer # -   28, layer type 8
[C7x_1 ]    524.351299 s: Processing Layer # -   28
[C7x_1 ]    524.351691 s: Core 0 End of Layer # -   28 with outPtrs[0] = 70020000
[C7x_1 ]    524.351726 s: Core 0 Alg Process for Layer # -   29, layer type 1
[C7x_1 ]    524.351752 s: Processing Layer # -   29
[C7x_1 ]    524.354317 s: Core 0 End of Layer # -   29 with outPtrs[0] = 119356c00
[C7x_1 ]    524.354359 s: Core 0 Alg Process for Layer # -   30, layer type 8
[C7x_1 ]    524.354386 s: Processing Layer # -   30
[C7x_1 ]    524.354791 s: Core 0 End of Layer # -   30 with outPtrs[0] = 119167800
[C7x_1 ]    524.354828 s: Core 0 Alg Process for Layer # -   31, layer type 2
[C7x_1 ]    524.354853 s: Processing Layer # -   31
[C7x_1 ]    524.354980 s: Core 0 End of Layer # -   31 with outPtrs[0] = 70020000
[C7x_1 ]    524.355016 s: Core 0 Alg Process for Layer # -   32, layer type 1
[C7x_1 ]    524.355041 s: Processing Layer # -   32
[C7x_1 ]    524.355849 s: Core 0 End of Layer # -   32 with outPtrs[0] = 7009c080
[C7x_1 ]    524.355884 s: Core 0 Alg Process for Layer # -   33, layer type 8
[C7x_1 ]    524.355909 s: Processing Layer # -   33
[C7x_1 ]    524.356137 s: Core 0 End of Layer # -   33 with outPtrs[0] = 70020000
[C7x_1 ]    524.356174 s: Core 0 Alg Process for Layer # -   34, layer type 1
[C7x_1 ]    524.356199 s: Processing Layer # -   34
[C7x_1 ]    524.359443 s: Core 0 End of Layer # -   34 with outPtrs[0] = 70118080
[C7x_1 ]    524.359481 s: Core 0 Alg Process for Layer # -   35, layer type 8
[C7x_1 ]    524.359505 s: Processing Layer # -   35
[C7x_1 ]    524.359736 s: Core 0 End of Layer # -   35 with outPtrs[0] = 70020000
[C7x_1 ]    524.359772 s: Core 0 Alg Process for Layer # -   36, layer type 11
[C7x_1 ]    524.359798 s: Processing Layer # -   36
[C7x_1 ]    524.361007 s: Core 0 End of Layer # -   36 with outPtrs[0] = 119356c00
[C7x_1 ]    524.361044 s: Core 0 Alg Process for Layer # -   37, layer type 12
[C7x_1 ]    524.361070 s: Processing Layer # -   37
[C7x_1 ]    524.361425 s: Core 0 End of Layer # -   37 with outPtrs[0] = 1195af000
[C7x_1 ]    524.361460 s: Core 0 Alg Process for Layer # -   38, layer type 1
[C7x_1 ]    524.361487 s: Processing Layer # -   38
[C7x_1 ]    524.371372 s: Core 0 End of Layer # -   38 with outPtrs[0] = 70020000
[C7x_1 ]    524.371410 s: Core 0 Alg Process for Layer # -   39, layer type 8
[C7x_1 ]    524.371434 s: Processing Layer # -   39
[C7x_1 ]    524.371818 s: Core 0 End of Layer # -   39 with outPtrs[0] = 70020000
[C7x_1 ]    524.371854 s: Core 0 Alg Process for Layer # -   40, layer type 1
[C7x_1 ]    524.371879 s: Processing Layer # -   40
[C7x_1 ]    524.374443 s: Core 0 End of Layer # -   40 with outPtrs[0] = 119167800
[C7x_1 ]    524.374484 s: Core 0 Alg Process for Layer # -   41, layer type 8
[C7x_1 ]    524.374510 s: Processing Layer # -   41
[C7x_1 ]    524.374907 s: Core 0 End of Layer # -   41 with outPtrs[0] = 70020000
[C7x_1 ]    524.374943 s: Core 0 Alg Process for Layer # -   42, layer type 11
[C7x_1 ]    524.374967 s: Processing Layer # -   42
[C7x_1 ]    524.375562 s: Core 0 End of Layer # -   42 with outPtrs[0] = 119167800
[C7x_1 ]    524.375598 s: Core 0 Alg Process for Layer # -   43, layer type 12
[C7x_1 ]    524.375624 s: Processing Layer # -   43
[C7x_1 ]    524.376264 s: Core 0 End of Layer # -   43 with outPtrs[0] = 119585c00
[C7x_1 ]    524.376299 s: Core 0 Alg Process for Layer # -   44, layer type 1
[C7x_1 ]    524.376325 s: Processing Layer # -   44
[C7x_1 ]    524.382860 s: Core 0 End of Layer # -   44 with outPtrs[0] = 118d89400
[C7x_1 ]    524.382902 s: Core 0 Alg Process for Layer # -   45, layer type 8
[C7x_1 ]    524.382928 s: Processing Layer # -   45
[C7x_1 ]    524.383659 s: Core 0 End of Layer # -   45 with outPtrs[0] = 119179800
[C7x_1 ]    524.383694 s: Core 0 Alg Process for Layer # -   46, layer type 1
[C7x_1 ]    524.383718 s: Processing Layer # -   46
[C7x_1 ]    524.385203 s: Core 0 End of Layer # -   46 with outPtrs[0] = 119981c00
[C7x_1 ]    524.385239 s: Core 0 Alg Process for Layer # -   47, layer type 8
[C7x_1 ]    524.385264 s: Processing Layer # -   47
[C7x_1 ]    524.385991 s: Core 0 End of Layer # -   47 with outPtrs[0] = 1195a3800
[C7x_1 ]    524.386027 s: Core 0 Alg Process for Layer # -   48, layer type 11
[C7x_1 ]    524.386051 s: Processing Layer # -   48

CTRL-A Z for help | 115200 8N1 | NOR | Minicom 2.8 | VT102 | Offline | ttyUSB2

This is possible a TIDL bug. Please wait for internal debug.

Regards,

Adam

0 Adam Hua 4 months ago in reply to Adam Hua

TI__Expert 4910 points

Hi,

also upload config file for easier reproduce:

            "hsh_modify": create_model_config(
        source=AttrDict(
            model_url="dummy",
            infer_shape=True,
        ),
        preprocess=AttrDict(
            resize=256,
            crop=224,
            data_layout="NCHW",
            resize_with_pad=False,
            reverse_channels=False,
        ),
        session=AttrDict(
            session_name="onnxrt",
            model_path=os.path.join(models_base_path, "hsh_model_modified.onnx"),
            input_mean=[123.675, 116.28, 103.53],
            input_scale=[0.017125, 0.017507, 0.017429],
            input_optimization=True,
        ),
        task_type="classification",
        extra_info=AttrDict(num_images=numImages, num_classes=1000),
    ),

Modified script of onnxrt_ep.py:

def infer_image(sess, image_files, config):
    '''
    Invoke the runtime session

    :param sess: Runtime session
    :param image_files: List of input image filename
    :param config: Configuration dictionary
    :return: Input Images
    :return: Output tensors
    :return: Total Processing time
    :return: Subgraphs Processing time
    :return: Height of input tensor
    :return: Width of input tensor
    '''

    # Get input details from the session
    input_details = sess.get_inputs()
    input_name = input_details[0].name
    floating_model = input_details[0].type == "tensor(float)"
    height = input_details[0].shape[2]
    width = input_details[0].shape[3]
    channel = input_details[0].shape[1]
    batch = input_details[0].shape[0]
    imgs = []
    shape = [batch, channel, height, width]
    input_shape = input_details[0].shape
    input_data = np.random.random(input_shape).astype(np.float32) * (1 - 0) 
    print(len(input_details))
    if len(input_details)>1:
        print(len(input_details))
        input_name2 = input_details[1].name
        if input_details[1].type == "tensor(float)":
            input_data2 = np.random.random(input_details[1].shape).astype(np.float32) * (1 - 0) 
        else:
            input_data2 = np.random.randint(0, 2000, size=input_details[1].shape).astype(np.int32)
    # Prepare the input data
    # input_data = np.zeros(shape)
    # for i in range(batch):
    #     imgs.append(
    #         Image.open(image_files[i])
    #         .convert("RGB")
    #         .resize((width, height), PIL.Image.LANCZOS)
    #     )
    #     temp_input_data = np.expand_dims(imgs[i], axis=0)
    #     temp_input_data = np.transpose(temp_input_data, (0, 3, 1, 2))
    #     input_data[i] = temp_input_data[0]
    # if floating_model:
    #     input_data = np.float32(input_data)
    #     for mean, scale, ch in zip(
    #         config["session"]["input_mean"],
    #         config["session"]["input_scale"],
    #         range(input_data.shape[1]),
    #     ):
    #         input_data[:, ch, :, :] = (input_data[:, ch, :, :] - mean) * scale
    # else:
    #     input_data = np.uint8(input_data)
    #     config["session"]["input_mean"] = [0, 0, 0]
    #     config["session"]["input_scale"] = [1, 1, 1]

    data = np.fromfile("0.bin", dtype=np.uint16).astype(np.float32)
    raw_vis = data.reshape(1056, 1920) / 4096
    print(raw_vis)
    input_data[0,0,:,:] = raw_vis
    # Invoke the session
    start_time = time.time()
    
    if len(input_details)>1:
        print(len(input_details))
        output = list(sess.run(None, {input_name: input_data, input_name2: input_data2}))
    else:
        output = list(sess.run(None, {input_name: input_data}))
    stop_time = time.time()
    infer_time = stop_time - start_time

    copy_time, sub_graphs_proc_time, totaltime = get_benchmark_output(sess)
    proc_time = totaltime - copy_time

    return imgs, output, proc_time, sub_graphs_proc_time, height, width

model file please use /cfs-file/__key/communityserver-discussions-components-files/791/modified_5F00_model.zip

the input file 0.bin is in /cfs-file/__key/communityserver-discussions-components-files/791/1425.model.zip

Regards,

Adam

0 Adam Hua 4 months ago in reply to Adam Hua

TI__Expert 4910 points

The sdk version is 10.1 and edgeai tidl tools version 10.01.00.02.

0 Christina Kuruvilla 4 months ago in reply to Adam Hua

TI__Expert 5210 points

Hello,

I have added this to Jira TIDL-7532. I will also reproduce on my end and keep you updated.

Warm regards,

Christina

0 Christina Kuruvilla 4 months ago in reply to Christina Kuruvilla

TI__Expert 5210 points

Hi Adam and Wang,

Could you provide some context on how this model was created? We are investigating the layer info and this information would be useful for us to figure out why the layer names outputted are not corresponding to the real layer names of the model.

Also, if you have a copy of the correct outputs of the model, that would also be helpful for us when validating.

Warm regards,
Christina

0 wang.rui48 4 months ago in reply to Christina Kuruvilla

Prodigy 220 points

Using the onnx model in the onnxruntime inference engine can correctly obtain results, and there is no problem with the result names you mentioned at this level. Also, what kind of model do you mean by "model"? Is it the model definition script in the pytorch or tensorflow framework?

0 Christina Kuruvilla 4 months ago in reply to wang.rui48

TI__Expert 5210 points

Hi Wang,

Yes I was asking about the hsh_model_modified.onnx, which I believe is based on hsh_model.onnx, and any information how what was used to create/train (assuming pytorch). Did you test on Onnxruntime with 8bit as well?

Thank you for the confirmation that the onnxruntime inference results in correct outputs. I was wondering if the TIDL PC emulation (before running on device) had also given you correct outputs? From my testing, PC also shows some mismatch due to the layer info.

Warm regards,

Christina

0 wang.rui48 4 months ago in reply to Christina Kuruvilla

Prodigy 220 points

The model hsh_model_modified.onnx is the model file I gave to adam. He also tested the 8-bit results in onnxruntime and used tidl_pc for testing, and the results were all fine. As for why you mentioned modifying hsh_model.onnx, it is because not modifying it will report an error in the gather operator. This modification was also discussed with adam.

0 Adam Hua 4 months ago in reply to wang.rui48

TI__Expert 4910 points

HI Wang,

Can you try sdk 11.0 with new tidl released last week? if sdk11.0 work, we can make tidl backport.

Regards,

Adam

Processors

Processors forum

PROCESSOR-SDK-J721S2: model test