This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

PROCESSOR-SDK-J721S2: model test

Part Number: PROCESSOR-SDK-J721S2

Tool/software:

Hi TI expert,

      The attached is the model to be tested.

      After the model conversion in sdk-j721s2-09_01_00_06, the converted bin model file is used for verification, and the result is completely wrong. modified_model is the modified model file.

modified_model

model.zip

0.png

vis_0.png

  • Hi Wang, 

    Is this related to a previous E2e, Jira or email request? If so can you share some information regarding. 

    Please see this FAQ for the guidelines for what information we need. https://e2e.ti.com/support/processors-group/processors---internal/f/processors---internal-forum/1500222/faq-tda4vh-faq-tda4x-am62a-information-required-while-reporting-issue-on-tidl-ti-deep-learning-solution-module-of-processor-sdk-on-tda4x-am6xa

    Warm regards,
    Christina

  • Hi Christina,

    Let me fill the list for the customer:

    Property Details
    Device J721S2 (example)
    SDK Version 9.1
    TIDL firmware version 9.1
    TIDL Tools Version 9.1
    Issue Category

    Accuracy issue

    Example AI model In customer's original post
    Compilation method
    Compilation log

    Compilation artifacts
    Inference method
    Inference log

    Inference artifacts

    The major problem reported is accuracy loss.

    I have tried this model on edgeai tidl tools 10.1.2 and I get result of onnx:

    /cfs-file/__key/communityserver-discussions-components-files/791/modify_5F00_pc_5F00_onnx.npy

    and the result of 8bit quantilization on pc:

    /cfs-file/__key/communityserver-discussions-components-files/791/modify_5F00_pc_5F00_8bit.npy

    The direct accuracy loss is small by just comparing the two npy files. 

    Pending on customer validate the result with their postprocessing.

    Regards

    Adam

  • As shown in the figure below, the left picture is the result of the tool test, and the right picture is the output result of the model on the PC.

    We have tried many quantization methods, but the results are not correct. Please provide an import configuration file that can correctly quantize the model.

  • Hi 

    As we discussed locally, you found the result of my pc result with edgeai tidl tools 10.1.2 correct.

    and the result of 8bit quantilization on pc:

    But your result using rtos tools 9.1 is poor.

    We will try  edgeai tidl tools 9.1 to see if this also works and you can try that too. We will update tomorrow. 

    Regards,

    Adam

  • Hi Wang and Adam,

    Thank you for all the details and files. This is most likely an issue with 9.1, as it is an older version of TIDLtools with many bugs that 10.1 version fixes. I recommend the use of 10.1 if possible in regards to accuracy, especially with quantization. Is there any reason why you want to stay with 9.1 instead of going to 10.1? 

    Warm regards,

    Christina

  • Hi Christina,

    I have discussed with customer on this topic. They refused and you can check with Fredy Zhang (FAE responsible for BYD) for more details. 

    Regards,

    Adam

  • Hi, We informed your engineers very early that the engineering board on our side is version 9.1 and the tools of 10.1 cannot be used

  • Hi 

    I have tested on edgeai tidl tools 09 01 08, and importing process stuck with the following log:

    (tidl_09_02) ht@ht-OMEN:~/edgeai/edgeai-tidl-tools/examples/osrt_python/ort$ python3 onnxrt_ep.py -c -m hsh_modify
    Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    
    Running 1 Models - ['hsh_modify']
    
    
    Running_Model :  hsh_modify  
    
    
    Running shape inference on model ../../../models/public/hsh_model_modified.onnx 
    
    tidl_tools_path                                 = /home/ht/edgeai/edgeai-tidl-tools/tidl_tools 
    artifacts_folder                                = ../../../model-artifacts//hsh_modify/ 
    tidl_tensor_bits                                = 8 
    debug_level                                     = 5 
    num_tidl_subgraphs                              = 16 
    tidl_denylist                                   = 
    tidl_denylist_layer_name                        = 
    tidl_denylist_layer_type                         = 
    tidl_allowlist_layer_name                        = 
    model_type                                      =  
    tidl_calibration_accuracy_level                 = 7 
    tidl_calibration_options:num_frames_calibration = 2 
    tidl_calibration_options:bias_calibration_iterations = 5 
    mixed_precision_factor = -1.000000 
    model_group_id = 0 
    power_of_2_quantization                         = 2 
    ONNX QDQ Enabled                                = 0 
    enable_high_resolution_optimization             = 0 
    pre_batchnorm_fold                              = 1 
    add_data_convert_ops                          = 3 
    output_feature_16bit_names_list                 =  
    m_params_16bit_names_list                       =  
    reserved_compile_constraints_flag               = 1601 
    ti_internal_reserved_1                          = 
    
    
     ****** WARNING : Network not identified as Object Detection network : (1) Ignore if network is not Object Detection network (2) If network is Object Detection network, please specify "model_type":"OD" as part of OSRT compilation options******
    
    Supported TIDL layer type ---         Reshape -- Reshape_11 
    Supported TIDL layer type ---       Transpose -- Transpose_12 
    Supported TIDL layer type ---         Reshape -- Reshape_21 
    Supported TIDL layer type ---           Slice -- Slice_41 
    Supported TIDL layer type ---           Slice -- Slice_36 
    Supported TIDL layer type ---           Slice -- Slice_31 
    Supported TIDL layer type ---           Slice -- Slice_26 
    Supported TIDL layer type ---          Concat -- Concat_42 
    Supported TIDL layer type ---       Transpose -- Transpose_43 
    Supported TIDL layer type ---            Conv -- Conv_44 
    Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_45 
    Supported TIDL layer type ---            Conv -- Conv_46 
    Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_47 
    Supported TIDL layer type ---         MaxPool -- MaxPool_48 
    Supported TIDL layer type ---            Conv -- Conv_49 
    Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_50 
    Supported TIDL layer type ---            Conv -- Conv_51 
    Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_52 
    Supported TIDL layer type ---         MaxPool -- MaxPool_53 
    Supported TIDL layer type ---            Conv -- Conv_54 
    Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_55 
    Supported TIDL layer type ---            Conv -- Conv_56 
    Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_57 
    Supported TIDL layer type ---         MaxPool -- MaxPool_58 
    Supported TIDL layer type ---            Conv -- Conv_59 
    Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_60 
    Supported TIDL layer type ---            Conv -- Conv_61 
    Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_62 
    Supported TIDL layer type ---         MaxPool -- MaxPool_63 
    Supported TIDL layer type ---            Conv -- Conv_64 
    Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_65 
    Supported TIDL layer type ---            Conv -- Conv_66 
    Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_67 
    Supported TIDL layer type ---   ConvTranspose -- ConvTranspose_68 
    Supported TIDL layer type ---          Concat -- Concat_69 
    Supported TIDL layer type ---            Conv -- Conv_70 
    Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_71 
    Supported TIDL layer type ---            Conv -- Conv_72 
    Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_73 
    Supported TIDL layer type ---   ConvTranspose -- ConvTranspose_74 
    Supported TIDL layer type ---          Concat -- Concat_75 
    Supported TIDL layer type ---            Conv -- Conv_76 
    Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_77 
    Supported TIDL layer type ---            Conv -- Conv_78 
    Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_79 
    Supported TIDL layer type ---   ConvTranspose -- ConvTranspose_80 
    Supported TIDL layer type ---          Concat -- Concat_81 
    Supported TIDL layer type ---            Conv -- Conv_82 
    Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_83 
    Supported TIDL layer type ---            Conv -- Conv_84 
    Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_85 
    Supported TIDL layer type ---   ConvTranspose -- ConvTranspose_86 
    Supported TIDL layer type ---          Concat -- Concat_87 
    Supported TIDL layer type ---            Conv -- Conv_88 
    Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_89 
    Supported TIDL layer type ---            Conv -- Conv_90 
    Supported TIDL layer type ---       LeakyRelu -- LeakyRelu_91 
    Supported TIDL layer type ---            Conv -- Conv_92 
    Supported TIDL layer type ---           Slice -- Slice_112 
    Supported TIDL layer type ---           Slice -- Slice_107 
    Supported TIDL layer type ---           Slice -- Slice_102 
    Supported TIDL layer type ---           Slice -- Slice_97 
    Supported TIDL layer type ---          Concat -- Concat_113 
    Supported TIDL layer type ---       Transpose -- Transpose_117 
    Supported TIDL layer type ---         Reshape -- Reshape_126 
    Supported TIDL layer type ---       Transpose -- Transpose_127 
    Supported TIDL layer type ---         Reshape -- Reshape_143 
    
    Preliminary subgraphs created = 1 
    Final number of subgraphs created are : 1, - Offloaded Nodes - 67, Total Nodes - 67 
    SUGGESTION -- [TIDL_Deconv2DLayer]  Please change to Upsample/Resize if possible. Upsample/Resize will be more efficient.  
    SUGGESTION -- [TIDL_Deconv2DLayer]  Please change to Upsample/Resize if possible. Upsample/Resize will be more efficient.  
    SUGGESTION -- [TIDL_Deconv2DLayer]  Please change to Upsample/Resize if possible. Upsample/Resize will be more efficient.  
    SUGGESTION -- [TIDL_Deconv2DLayer]  Please change to Upsample/Resize if possible. Upsample/Resize will be more efficient.  
    Running runtimes graphviz - /home/ht/edgeai/edgeai-tidl-tools/tidl_tools/tidl_graphVisualiser_runtimes.out ../../../model-artifacts//hsh_modify//allowedNode.txt ../../../model-artifacts//hsh_modify//tempDir/graphvizInfo.txt ../../../model-artifacts//hsh_modify//tempDir/runtimes_visualization.svg 
    *** In TIDL_createStateImportFunc *** 
    Compute on node : TIDLExecutionProvider_TIDL_0_0
      0,         Reshape, 2, 1, noise_img, 58
      1,       Transpose, 1, 1, 58, 59
      2,         Reshape, 2, 1, 59, 68
      3,           Slice, 5, 1, 68, 73
      4,           Slice, 5, 1, 68, 78
      5,           Slice, 5, 1, 68, 83
      6,           Slice, 5, 1, 68, 88
      7,          Concat, 4, 1, 73, 89
      8,       Transpose, 1, 1, 89, 90
      9,            Conv, 3, 1, 90, 91
     10,       LeakyRelu, 1, 1, 91, 92
     11,            Conv, 3, 1, 92, 93
     12,       LeakyRelu, 1, 1, 93, 94
     13,         MaxPool, 1, 1, 94, 95
     14,            Conv, 3, 1, 95, 96
     15,       LeakyRelu, 1, 1, 96, 97
     16,            Conv, 3, 1, 97, 98
     17,       LeakyRelu, 1, 1, 98, 99
     18,         MaxPool, 1, 1, 99, 100
     19,            Conv, 3, 1, 100, 101
     20,       LeakyRelu, 1, 1, 101, 102
     21,            Conv, 3, 1, 102, 103
     22,       LeakyRelu, 1, 1, 103, 104
     23,         MaxPool, 1, 1, 104, 105
     24,            Conv, 3, 1, 105, 106
     25,       LeakyRelu, 1, 1, 106, 107
     26,            Conv, 3, 1, 107, 108
     27,       LeakyRelu, 1, 1, 108, 109
     28,         MaxPool, 1, 1, 109, 110
     29,            Conv, 3, 1, 110, 111
     30,       LeakyRelu, 1, 1, 111, 112
     31,            Conv, 3, 1, 112, 113
     32,       LeakyRelu, 1, 1, 113, 114
     33,   ConvTranspose, 3, 1, 114, 115
     34,          Concat, 2, 1, 115, 116
     35,            Conv, 3, 1, 116, 117
     36,       LeakyRelu, 1, 1, 117, 118
     37,            Conv, 3, 1, 118, 119
     38,       LeakyRelu, 1, 1, 119, 120
     39,   ConvTranspose, 3, 1, 120, 121
     40,          Concat, 2, 1, 121, 122
     41,            Conv, 3, 1, 122, 123
     42,       LeakyRelu, 1, 1, 123, 124
     43,            Conv, 3, 1, 124, 125
     44,       LeakyRelu, 1, 1, 125, 126
     45,   ConvTranspose, 3, 1, 126, 127
     46,          Concat, 2, 1, 127, 128
     47,            Conv, 3, 1, 128, 129
     48,       LeakyRelu, 1, 1, 129, 130
     49,            Conv, 3, 1, 130, 131
     50,       LeakyRelu, 1, 1, 131, 132
     51,   ConvTranspose, 3, 1, 132, 133
     52,          Concat, 2, 1, 133, 134
     53,            Conv, 3, 1, 134, 135
     54,       LeakyRelu, 1, 1, 135, 136
     55,            Conv, 3, 1, 136, 137
     56,       LeakyRelu, 1, 1, 137, 138
     57,            Conv, 3, 1, 138, 139
     58,           Slice, 5, 1, 139, 144
     59,           Slice, 5, 1, 139, 149
     60,           Slice, 5, 1, 139, 154
     61,           Slice, 5, 1, 139, 159
     62,          Concat, 4, 1, 144, 160
     63,       Transpose, 1, 1, 160, 164
     64,         Reshape, 2, 1, 164, 173
     65,       Transpose, 1, 1, 173, 174
     66,         Reshape, 2, 1, 174, sharp_output
    
    Input tensor name -  noise_img 
    Output tensor name - sharp_output 
    
    
    

    It seems that there are some issues with tidl version 9.1. Here we propose tidl backport which uses your sdk 9.1 with tidl 10.1.

    Regards,

    Adam

  • Could you elaborate on what the problem is?

  • Hi 

    As we discussed locally, the status of the issue is:

    1. this model does not work normally on 9.1 sdk and work on 10.1

    2. customer will evaluate this model on evm on 10.1

    3. if evaluation on evm on 10.1 works well, customer will backport tidl 10.1 to 9.1

    Regards,

    Adam

  • import onnxruntime as rt
    import time
    import os
    import sys
    import numpy as np
    import PIL
    from PIL import Image, ImageFont, ImageDraw, ImageEnhance
    import argparse
    import re
    import multiprocessing
    import platform
    import shutil
    
    current = os.path.dirname(os.path.realpath(__file__))
    parent = os.path.dirname(current)
    
    sys.path.append(parent)
    from common_utils import *
    from model_configs import *
    
    from common import postprocess_utils as formatter_transform
    
    mutex_lock = multiprocessing.Lock()
    
    model_optimizer_found = False
    if platform.machine() != "aarch64":
        try:
            from osrt_model_tools.onnx_tools.tidl_onnx_model_optimizer import optimize
    
            model_optimizer_found = True
        except ModuleNotFoundError as e:
            print("Skipping import of model optimizer")
    
    required_options = {
        "tidl_tools_path": tidl_tools_path,
        "artifacts_folder": artifacts_folder,
    }
    
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "-c", "--compile", action="store_true", help="Run in Model compilation mode"
    )
    parser.add_argument(
        "-d", "--disable_offload", action="store_true", help="Disable offload to TIDL"
    )
    parser.add_argument(
        "-z", "--run_model_zoo", action="store_true", help="Run model zoo models"
    )
    parser.add_argument(
        "-o",
        "--graph_optimize",
        action="store_true",
        help="Run ONNX model optimization thourgh onnx-graph-surgeon-tidl",
    )
    parser.add_argument(
        "-m",
        "--models",
        action="append",
        default=[],
        help="Model name to be added to the list to run",
    )
    parser.add_argument(
        "-n", "--ncpus", type=int, default=None, help="Number of threads to spawn"
    )
    args = parser.parse_args()
    os.environ["TIDL_RT_PERFSTATS"] = "1"
    
    # Ort Session Options
    so = rt.SessionOptions()
    so.log_severity_level = 3
    
    print("Available execution providers : ", rt.get_available_providers())
    
    calib_images = [
        "../../../test_data/airshow.jpg",
        "../../../test_data/ADE_val_00001801.jpg",
    ]
    class_test_images = ["../../../test_data/airshow.jpg"]
    od_test_images = ["../../../test_data/ADE_val_00001801.jpg"]
    seg_test_images = ["../../../test_data/ADE_val_00001801.jpg"]
    
    # Initialize semaphore for multi-threading
    sem = multiprocessing.Semaphore(0)
    if platform.machine() == "aarch64":
        ncpus = 1
    else:
        if args.ncpus and args.ncpus > 0 and args.ncpus < os.cpu_count():
            ncpus = args.ncpus
        else:
            ncpus = os.cpu_count()
    
    idx = 0
    nthreads = 0
    run_count = 0
    
    if "SOC" in os.environ:
        SOC = os.environ["SOC"]
    else:
        print("Please export SOC var to proceed")
        exit(-1)
    
    # Enforce compilation on x86 only
    if platform.machine() == "aarch64" and args.compile == True:
        print(
            "Compilation of models is only supported on x86 machine \n\
            Please do the compilation on PC and copy artifacts for running on TIDL devices "
        )
        exit(-1)
    
    # Disable compilation and offload for AM62 (ARM only analytics)
    if SOC == "am62":
        args.disable_offload = True
        args.compile = False
    
    def get_benchmark_output(interpreter):
        '''
        Returns benchmark data
    
        :param interpreter: Runtime session
        :return: Copy time
        :return: Processing time
        :return: Total time
        '''
        benchmark_dict = interpreter.get_TI_benchmark_data()
        proc_time = copy_time = 0
        cp_in_time = cp_out_time = 0
        subgraphIds = []
        for stat in benchmark_dict.keys():
            if "proc_start" in stat:
                value = stat.split("ts:subgraph_")
                value = value[1].split("_proc_start")
                subgraphIds.append(value[0])
        for i in range(len(subgraphIds)):
            proc_time += (
                benchmark_dict["ts:subgraph_" + str(subgraphIds[i]) + "_proc_end"]
                - benchmark_dict["ts:subgraph_" + str(subgraphIds[i]) + "_proc_start"]
            )
            cp_in_time += (
                benchmark_dict["ts:subgraph_" + str(subgraphIds[i]) + "_copy_in_end"]
                - benchmark_dict["ts:subgraph_" + str(subgraphIds[i]) + "_copy_in_start"]
            )
            cp_out_time += (
                benchmark_dict["ts:subgraph_" + str(subgraphIds[i]) + "_copy_out_end"]
                - benchmark_dict["ts:subgraph_" + str(subgraphIds[i]) + "_copy_out_start"]
            )
            copy_time += cp_in_time + cp_out_time
        copy_time = copy_time if len(subgraphIds) == 1 else 0
        totaltime = benchmark_dict["ts:run_end"] - benchmark_dict["ts:run_start"]
        return copy_time, proc_time, totaltime
    
    
    def infer_image(sess, image_files, config):
        '''
        Invoke the runtime session
    
        :param sess: Runtime session
        :param image_files: List of input image filename
        :param config: Configuration dictionary
        :return: Input Images
        :return: Output tensors
        :return: Total Processing time
        :return: Subgraphs Processing time
        :return: Height of input tensor
        :return: Width of input tensor
        '''
    
        # Get input details from the session
        input_details = sess.get_inputs()
        input_name = input_details[0].name
        floating_model = input_details[0].type == "tensor(float)"
        height = input_details[0].shape[2]
        width = input_details[0].shape[3]
        channel = input_details[0].shape[1]
        batch = input_details[0].shape[0]
        imgs = []
        shape = [batch, channel, height, width]
        input_shape = input_details[0].shape
        input_data = np.random.random(input_shape).astype(np.float32) * (1 - 0) 
        print(len(input_details))
        if len(input_details)>1:
            print(len(input_details))
            input_name2 = input_details[1].name
            if input_details[1].type == "tensor(float)":
                input_data2 = np.random.random(input_details[1].shape).astype(np.float32) * (1 - 0) 
            else:
                input_data2 = np.random.randint(0, 2000, size=input_details[1].shape).astype(np.int32)
        # Prepare the input data
        # input_data = np.zeros(shape)
        # for i in range(batch):
        #     imgs.append(
        #         Image.open(image_files[i])
        #         .convert("RGB")
        #         .resize((width, height), PIL.Image.LANCZOS)
        #     )
        #     temp_input_data = np.expand_dims(imgs[i], axis=0)
        #     temp_input_data = np.transpose(temp_input_data, (0, 3, 1, 2))
        #     input_data[i] = temp_input_data[0]
        # if floating_model:
        #     input_data = np.float32(input_data)
        #     for mean, scale, ch in zip(
        #         config["session"]["input_mean"],
        #         config["session"]["input_scale"],
        #         range(input_data.shape[1]),
        #     ):
        #         input_data[:, ch, :, :] = (input_data[:, ch, :, :] - mean) * scale
        # else:
        #     input_data = np.uint8(input_data)
        #     config["session"]["input_mean"] = [0, 0, 0]
        #     config["session"]["input_scale"] = [1, 1, 1]
    
        data = np.fromfile("/home/ht/customer/BYD/1425.model/0.bin", dtype=np.uint16).astype(np.float32)
        raw_vis = data.reshape(1056, 1920) / 4096
        print(raw_vis)
        input_data[0,0,:,:] = raw_vis
        # Invoke the session
        start_time = time.time()
        
        if len(input_details)>1:
            print(len(input_details))
            output = list(sess.run(None, {input_name: input_data, input_name2: input_data2}))
        else:
            output = list(sess.run(None, {input_name: input_data}))
        stop_time = time.time()
        infer_time = stop_time - start_time
    
        copy_time, sub_graphs_proc_time, totaltime = get_benchmark_output(sess)
        proc_time = totaltime - copy_time
    
        return imgs, output, proc_time, sub_graphs_proc_time, height, width
    
    
    def run_model(model, mIdx):
        '''
        Run a single model
    
        :param model: Name of the model
        :param mIdx: Run number
        '''
        print("\nRunning_Model : ", model, " \n")
        if platform.machine() != "aarch64":
            mutex_lock.acquire()
            download_model(models_configs, model)
            mutex_lock.release()
    
        config = models_configs[model]
    
        # Run graph optimization
        if args.graph_optimize:
            if model_optimizer_found:
                if (args.compile or args.disable_offload) and (
                    platform.machine() != "aarch64"
                ):
                    copy_path = config["model_path"][:-5] + "_org.onnx"
                    # Check if copy path exists and prompt for permission to overwrite
                    if os.path.isfile(copy_path):
                        overwrite_permission = input(
                            f"\033[96mThe file {copy_path} exists, do you want to overwrite? [Y/n] \033[00m"
                        )
                        if overwrite_permission != "Y":
                            print("Aborting run...")
                            sys.exit(-1)
                        else:
                            print(
                                f"\033[93m[WARNING] File {copy_path} will be overwritten\033[00m"
                            )
    
                    shutil.copy2(config["model_path"], copy_path)
                    print(
                        f"\033[93mOptimization Enabled: Moving {config['model_path']} to {copy_path} before overwriting by optimization\033[00m"
                    )
                    optimize(
                        model=config["model_path"], out_model=config["model_path"]
                    )
                else:
                    print(
                        "Model optimization is only supported in compilation or disabled offload mode on x86 machines"
                    )
            else:
                print("Model optimizer not found, -o flag has no effect")
    
        # Set input images
        config = models_configs[model]
        if config["task_type"] == "classification":
            test_images = class_test_images
        elif config["task_type"] == "detection":
            test_images = od_test_images
        elif config["task_type"] == "segmentation":
            test_images = seg_test_images
        
        # Set delegate options 
        delegate_options = {}
        delegate_options.update(required_options)
        delegate_options.update(optional_options)
        if "optional_options" in config:
            delegate_options.update(config["optional_options"])
    
        delegate_options["artifacts_folder"] = (
            delegate_options["artifacts_folder"] + "/" + model + "/artifacts"
        )
    
        # Disabling onnxruntime optimizations for vision transformers
        if model == "cl-ort-deit-tiny":
            so.graph_optimization_level = rt.GraphOptimizationLevel.ORT_DISABLE_ALL
    
        if config["task_type"] == "detection":
            delegate_options["object_detection:meta_layers_names_list"] = config["session"].get("meta_layers_names_list", "")
            delegate_options["object_detection:meta_arch_type"] = config["session"].get("meta_arch_type", -1)
    
        # Create/Cleanup artifacts_folder
        if args.compile or args.disable_offload:
            os.makedirs(delegate_options["artifacts_folder"], exist_ok=True)
            for root, dirs, files in os.walk(
                delegate_options["artifacts_folder"], topdown=False
            ):
                [os.remove(os.path.join(root, f)) for f in files]
                [os.rmdir(os.path.join(root, d)) for d in dirs]
    
        if args.compile == True:
            input_image = calib_images
            import onnx
    
            log = f'\nRunning shape inference on model {config["session"]["model_path"]} \n'
            print(log)
    
            # Run shape inference on the model
            onnx.shape_inference.infer_shapes_path(
                config["session"]["model_path"], config["session"]["model_path"]
            )
        else:
            input_image = test_images
    
        numFrames = config["extra_info"]["num_images"]
        if args.compile:
            if numFrames > delegate_options["advanced_options:calibration_frames"]:
                numFrames = delegate_options["advanced_options:calibration_frames"]
    
        # Create the Inference Session
        if args.disable_offload:
            # Using default EP if offload is disabled
            EP_list = ["CPUExecutionProvider"]
            sess = rt.InferenceSession(
                config["session"]["model_path"], providers=EP_list, sess_options=so
            )
        elif args.compile:
            # Using TIDL Compilation Provider if compiling the model
            EP_list = ["TIDLCompilationProvider", "CPUExecutionProvider"]
            sess = rt.InferenceSession(
                config["session"]["model_path"],
                providers=EP_list,
                provider_options=[delegate_options, {}],
                sess_options=so,
            )
        else:
            # Using TIDL Execution Provider if running the inference
            EP_list = ["TIDLExecutionProvider", "CPUExecutionProvider"]
            sess = rt.InferenceSession(
                config["session"]["model_path"],
                providers=EP_list,
                provider_options=[delegate_options, {}],
                sess_options=so,
            )
    
        # Adding input_details and output_details to configuration
        input_details = sess.get_inputs()
        input_name = input_details[0].name
        type = input_details[0].type
        height = input_details[0].shape[2]
        width = input_details[0].shape[3]
        channel = input_details[0].shape[1]
        batch = input_details[0].shape[0]
        shape = [batch, channel, height, width]
        input_details = {"name": input_name, "shape": shape, "type": type}
    
        output_details = sess.get_outputs()
        output_name = output_details[0].name
        type = output_details[0].type
        num_class = output_details[0].shape[1]
        batch = output_details[0].shape[0]
        shape = [batch, num_class]
        output_details = {"name": input_name, "shape": shape, "type": type}
    
        config["session"]["input_details"] = [input_details]
        config["session"]["output_details"] = [output_details]
    
        # Set the formatter for post-processing
        if "formatter" in config["postprocess"]:
            formatter = config["postprocess"]["formatter"]
            if isinstance(formatter, str):
                formatter_name = formatter
                formatter = getattr(formatter_transform, formatter_name)()
            elif isinstance(formatter, dict) and "type" in formatter:
                formatter_name = formatter.pop("type")
                formatter = getattr(formatter_transform, formatter_name)(**formatter)
            config["postprocess"]["formatter"] = formatter
    
        for i in range(numFrames):
            start_index = i % len(input_image)
            input_details = sess.get_inputs()
            batch = input_details[0].shape[0]
    
            input_images = []
            # For batch processing different images are needed for a single input
            for j in range(batch):
                input_images.append(input_image[(start_index + j) % len(input_image)])
    
            # Invoke the session
            imgs, output, proc_time, sub_graph_time, height, width = infer_image(sess, input_images, config)
    
            total_proc_time = (
                total_proc_time + proc_time
                if ("total_proc_time" in locals())
                else proc_time
            )
            sub_graphs_time = (
                sub_graphs_time + sub_graph_time
                if ("sub_graphs_time" in locals())
                else sub_graph_time
            )
    
        total_proc_time = total_proc_time / 1000000
        sub_graphs_time = sub_graphs_time / 1000000
    
        # Post-Processing for inference
        output_image_file_name = "py_out_" + model + "_" + os.path.basename(input_image[i % len(input_image)])
        output_bin_file_name = output_image_file_name.replace(".jpg", "") + ".bin"
        
        for i in range(len(output)):
            np.save(output_binary_folder+'/'+model+str(i)+".npy",output[i])
        # if args.compile == False:
        #     images = []
        #     output_tensors = []
        #     if config["task_type"] == "classification":
        #         for j in range(batch):
        #             classes, image = get_class_labels(output[0][j], imgs[j])
        #             print("\n", classes)
        #             images.append(image)
        #             output_tensors.append(
        #                 np.array(output[0][j], dtype=np.float32).flatten()
        #             )
        #     elif config["task_type"] == "detection":
        #         for j in range(batch):
        #             classes, image = det_box_overlay(
        #                 output,
        #                 imgs[j],
        #                 config["extra_info"]["od_type"],
        #                 config["extra_info"]["framework"],
        #             )
        #             images.append(image)
        #             output_np = np.array([], dtype=np.float32)
        #             for tensor in output:
        #                 output_np = np.concatenate(
        #                     (output_np, np.array(tensor, dtype=np.float32).flatten())
        #                 )
        #             output_tensors.append(output_np)
        #     elif config["task_type"] == "segmentation":
        #         for j in range(batch):
        #             imgs[j] = imgs[j].resize(
        #                 (output[0][j].shape[-1], output[0][j].shape[-2]), PIL.Image.LANCZOS
        #             )
        #             classes, image = seg_mask_overlay(output[0][j], imgs[j])
        #             images.append(image)
        #             output_tensors.append(
        #                 np.array(output[0][j], dtype=np.float32).flatten()
        #             )
        #     else:
        #         print("\nInvalid task type ", config["task_type"])
    
        #     # Save the output images and output tensors
        #     for j in range(batch):
        #         output_image_file_name = "py_out_" + model + "_" + os.path.basename(input_images[j])
        #         print("\nSaving image to ", output_images_folder)
        #         if not os.path.exists(output_images_folder):
        #             os.makedirs(output_images_folder)
        #         images[j].save(output_images_folder + output_image_file_name, "JPEG")
        #         print("\nSaving output tensor to ", output_binary_folder)
        #         if not os.path.exists(output_binary_folder):
        #             os.makedirs(output_binary_folder)
        #         output_bin_file_name = output_image_file_name.replace(".jpg", "") + ".bin"
        #         output_tensors[j].tofile(output_binary_folder + output_bin_file_name)
    
        # Generate param.yaml after model compilation
        # if args.compile or args.disable_offload:
        #     gen_param_yaml(
        #         delegate_options["artifacts_folder"], config, int(height), int(width)
        #     )
    
        log = f"\n \nCompleted_Model : {mIdx+1:5d}, Name : {model:50s}, Total time : {total_proc_time/(i+1):10.2f}, Offload Time : {sub_graphs_time/(i+1):10.2f} , DDR RW MBs : 0, Output Image File : {output_image_file_name}, Output Bin File : {output_bin_file_name}\n \n "  # {classes} \n \n'
        print(log)
        if ncpus > 1:
            sem.release()
    
    
    if len(args.models) > 0:
        models = args.models
    else:
        models = ["cl-ort-resnet18-v1", "od-ort-ssd-lite_mobilenetv2_fpn"]
        if SOC == "am69a":
            # Model to demonstrate multi core parallel batch processing
            models.append("cl-ort-resnet18-v1_4batch")
            # Model to demonstrate multi core low latency inference
            models.append("cl-ort-resnet18-v1_low_latency")
        if SOC not in ("am62a", "am67a"):
            models.append("ss-ort-deeplabv3lite_mobilenetv2")
    if args.run_model_zoo:
        models = [
            "od-8020_onnxrt_coco_edgeai-mmdet_ssd_mobilenetv2_lite_512x512_20201214_model_onnx",
            "od-8200_onnxrt_coco_edgeai-mmdet_yolox_nano_lite_416x416_20220214_model_onnx",
            "ss-8610_onnxrt_ade20k32_edgeai-tv_deeplabv3plus_mobilenetv2_edgeailite_512x512_20210308_outby4_onnx",
            "od-8220_onnxrt_coco_edgeai-mmdet_yolox_s_lite_640x640_20220221_model_onnx",
            "cl-6360_onnxrt_imagenet1k_fbr-pycls_regnetx-200mf_onnx",
        ]
    log = f"\nRunning {len(models)} Models - {models}\n"
    print(log)
    
    
    def join_one(nthreads):
        '''
        Join the thread
    
        :param nthreads: Thread count
        '''
        global run_count
        sem.acquire()
        run_count = run_count + 1
        return nthreads - 1
    
    def spawn_one(models, idx, nthreads):
        '''
        Spawn a process
    
        :param models: Name of the model to run
        :param idx: Index
        :param nthreads: Thread count
        '''
        p = multiprocessing.Process(
            target=run_model,
            args=(
                models,
                idx,
            ),
        )
        p.start()
        return idx + 1, nthreads + 1
    
    
    # Run the models using multi-processing if possible
    if ncpus > 1:
        for t in range(min(len(models), ncpus)):
            idx, nthreads = spawn_one(models[idx], idx, nthreads)
    
        while idx < len(models):
            nthreads = join_one(nthreads)
            idx, nthreads = spawn_one(models[idx], idx, nthreads)
    
        for n in range(nthreads):
            nthreads = join_one(nthreads)
    else:
        for mIdx, model in enumerate(models):
            run_model(model, mIdx)

    Please check the script I use. I modified the input and the postprocessing to read in your input files. 

    and here are the changes I add to model configs:

        "hsh_modify": create_model_config(
            source=AttrDict(
                model_url="dummy",
                infer_shape=True,
            ),
            preprocess=AttrDict(
                resize=256,
                crop=224,
                data_layout="NCHW",
                resize_with_pad=False,
                reverse_channels=False,
            ),
            session=AttrDict(
                session_name="onnxrt",
                model_path=os.path.join(models_base_path, "hsh_model_modified.onnx"),
                input_mean=[123.675, 116.28, 103.53],
                input_scale=[0.017125, 0.017507, 0.017429],
                input_optimization=True,
            ),
            task_type="classification",
            extra_info=AttrDict(num_images=numImages, num_classes=1000),
        ),

    Regards,

    Adam

  • Hi Christina,

    Current status:

    customer is trying to reproduce my result with edgeai tidl tools 10.01.00.02.

    If they evaluate their model successfully on sdk 10.1, they will try tidl backporing. 

    Regards,

    Adam

  • Thank you Adam for all the updates. Don't hesitate to reach out if there is anything that may be needed to help. 

    Warm regards,

    Christina

  • Hi,

    As you said inference stuck on SOC, can you upload your infer log here. I will try reproduce your problem today as well.

    Regards,

    Adam

  • Hi 

    I am able to reproduce this issue with infer log on sk-am68a:

    root@am68a-sk:/opt/edgeai/edgeai-tidl-tools/examples/osrt_python/ort# python3 onnxrt_ep_no_post.py -m hsh_modify                                                                                                                                                                   
    Available execution providers :  ['TIDLExecutionProvider', 'TIDLCompilationProvider', 'CPUExecutionProvider']
    
    Running 1 Models - ['hsh_modify']
    
    
    Running_Model :  hsh_modify  
    
    libtidl_onnxrt_EP loaded 0x111455a0 
    artifacts_folder                                = ../../../model-artifacts//hsh_modify/artifacts 
    debug_level                                     = 2 
    target_priority                                 = 0 
    max_pre_empt_delay                              = 340282346638528859811704183484516925440.000000 
    Final number of subgraphs created are : 1, - Offloaded Nodes - 67, Total Nodes - 67 
    In TIDL_createStateInfer 
    Compute on node : TIDLExecutionProvider_TIDL_0_0
    ************ in TIDL_subgraphRtCreate ************ 
     APP: Init ... !!!
       522.798054 s: MEM: Init ... !!!
       522.798124 s: MEM: Initialized DMA HEAP (fd=5) !!!
       522.798281 s: MEM: Init ... Done !!!
       522.798306 s: IPC: Init ... !!!
       522.856548 s: IPC: Init ... Done !!!
    REMOTE_SERVICE: Init ... !!!
    REMOTE_SERVICE: Init ... Done !!!
       522.863889 s: GTC Frequency = 200 MHz
    APP: Init ... Done !!!
       522.866517 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_ERROR
       522.866555 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_WARNING
       522.866566 s:  VX_ZONE_INFO: Globally Enabled VX_ZONE_INFO
       522.871411 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:134] Added target MPU-0 
       522.871571 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:134] Added target MPU-1 
       522.871707 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:134] Added target MPU-2 
       522.871839 s:  VX_ZONE_INFO: [tivxPlatformCreateTargetId:134] Added target MPU-3 
       522.871856 s:  VX_ZONE_INFO: [tivxInitLocal:126] Initialization Done !!!
       522.871868 s:  VX_ZONE_INFO: Globally Disabled VX_ZONE_INFO
    [C7x_1 ]    522.913392 s: PREEMPTION: Requesting memory of size 3014656 for targetPriority = 256
    [C7x_1 ]    522.913417 s: 
    [C7x_1 ]    522.913437 s: --------------------------------------------
    [C7x_1 ]    522.913463 s: TIDL Memory size requiement (record wise):
    [C7x_1 ]    522.913503 s: MemRecNum   , Space               , Attribute   , Alignment   , Size(KBytes), BasePtr     
    [C7x_1 ]    522.913548 s: 0           , DDR Cacheable       , Persistent  ,  128, 19.27   , 0x00000000
    [C7x_1 ]    522.913591 s: 1           , DDR Cacheable       , Persistent  ,  128, 0.65    , 0x00000000
    [C7x_1 ]    522.913633 s: 2           , L1D                 , Scratch     ,  128, 16.00   , 0x00000000
    [C7x_1 ]    522.913673 s: 3           , L2                  , Scratch     ,  128, 448.00  , 0x00000000
    [C7x_1 ]    522.913714 s: 4           , L3/MSMC             , Scratch     ,  128, 2944.00 , 0x00000000
    [C7x_1 ]    522.913755 s: 5           , DDR Cacheable       , Persistent  ,  128, 1017.87 , 0x00000000
    [C7x_1 ]    522.913796 s: 6           , DDR Cacheable       , Scratch     ,  128, 9.00    , 0x00000000
    [C7x_1 ]    522.913837 s: 7           , DDR Cacheable       , Persistent  ,  128, 76965.25, 0x00000000
    [C7x_1 ]    522.913878 s: 8           , DDR Cacheable       , Scratch     ,  128, 0.13    , 0x00000000
    [C7x_1 ]    522.913918 s: 9           , DDR Cacheable       , Scratch     ,  128, 3.13    , 0x00000000
    [C7x_1 ]    522.913959 s: 10          , DDR Cacheable       , Persistent  ,  128, 908.39  , 0x00000000
    [C7x_1 ]    522.914000 s: 11          , DDR Cacheable       , Scratch     ,  128, 512.25  , 0x00000000
    [C7x_1 ]    522.914041 s: 12          , DDR Cacheable       , Persistent  ,  128, 2944.00 , 0x00000000
    [C7x_1 ]    522.914082 s: 13          , DDR Cacheable       , Persistent  ,  128, 1482.07 , 0x00000000
    [C7x_1 ]    522.914122 s: 14          , DDR Cacheable       , Persistent  ,  128, 0.00    , 0x00000000
    [C7x_1 ]    522.914166 s: 15          , DDR Cacheable       , Persistent  ,  128, 7589.00 , 0x00000000
    [C7x_1 ]    522.914197 s: --------------------------------------------
    [C7x_1 ]    522.914222 s: Total memory size requirement (space wise):
    [C7x_1 ]    522.914243 s: Mem Space , Size(KBytes)
    [C7x_1 ]    522.914261 s: L1D       , 16.00   
    [C7x_1 ]    522.914277 s: L2        , 448.00  
    [C7x_1 ]    522.914294 s: L3/MSMC   , 2944.00 
    [C7x_1 ]    522.914313 s: DDR Cacheable, 91451.01
    [C7x_1 ]    522.914335 s: --------------------------------------------
    [C7x_1 ]    522.914370 s: NOTE: Memory requirement in host emulation can be different from the same on EVM
    [C7x_1 ]    522.914409 s:       To get the actual TIDL memory requirement make sure to run on EVM with 
    [C7x_1 ]    522.914432 s:       debugTraceLevel = 2
    [C7x_1 ]    522.914441 s: 
    [C7x_1 ]    522.914461 s: --------------------------------------------
    [C7x_1 ]    522.915957 s: TIDL init call from ivision API 
    [C7x_1 ]    522.915975 s: 
    [C7x_1 ]    522.915992 s: --------------------------------------------
    [C7x_1 ]    522.916017 s: TIDL Memory size requiement (record wise):
    [C7x_1 ]    522.916057 s: MemRecNum   , Space               , Attribute   , Alignment   , Size(KBytes), BasePtr     
    [C7x_1 ]    522.916101 s: 0           , DDR Cacheable       , Persistent  ,  128, 19.27   , 0x17027000
    [C7x_1 ]    522.916142 s: 1           , DDR Cacheable       , Persistent  ,  128, 0.65    , 0x1702bf00
    [C7x_1 ]    522.916188 s: 2           , L1D                 , Scratch     ,  128, 16.00   , 0x64e00000
    [C7x_1 ]    522.916229 s: 3           , L2                  , Scratch     ,  128, 448.00  , 0x64800000
    [C7x_1 ]    522.916270 s: 4           , L3/MSMC             , Scratch     ,  128, 2944.00 , 0x70020000
    [C7x_1 ]    522.916312 s: 5           , DDR Cacheable       , Persistent  ,  128, 1017.87 , 0x1702c300
    [C7x_1 ]    522.916352 s: 6           , DDR Cacheable       , Scratch     ,  128, 9.00    , 0x00000000
    RT-Profile: TIDLRT_init_profiling 
    [C7x_1 ]    522.916393 s: 7           , DDR Cacheable       , Persistent  ,  128, 76965.25, 0x1712ac00
    tidlrt_create            :      193225336 ns,
    tidl_rt_ovx_Init         :       78126339 ns,
    vxCreateContext          :        2234285 ns,
    init_tidl_tiovx          :       10836423 ns,
    [C7x_1 ]    522.916433 s: 8           , DDR Cacheable       , Scratch     ,  128, 0.13    , 0x00002400
    create_graph_tidl_tiovx  :       14287798 ns,
    verify_graph_tidl_tiovx  :       86185671 ns,
    tivxTIDLLoadKernels      :          25890 ns,
    mapConfig                :         472790 ns,
    [C7x_1 ]    522.916473 s: 9           , DDR Cacheable       , Scratch     ,  128, 3.13    , 0x00002800
    tivxAddKernelTIDL        :          47890 ns,
    mapNetwork               :        9763218 ns,
    setCreateParams          :         235625 ns,
    setArgs                  :         288680 ns,
    [C7x_1 ]    522.916513 s: 10          , DDR Cacheable       , Persistent  ,  128, 908.39  , 0x1bc54200
    vxCreateUserDataObject   :          26525 ns,
    vxMapUserDataObject      :        6776062 ns,
    memcopy_network_buffer   :        2954126 ns,
    vxUnmapUserDataObject    :           4480 ns,
    [C7x_1 ]    522.916555 s: 11          , DDR Cacheable       , Scratch     ,  128, 512.25  , 0x00003800
    ************ TIDL_subgraphRtCreate done ************ 
    [C7x_1 ]    522.916597 s: 12          , DDR Cacheable       , Persistent  ,  128, 2944.00 , 0x1bd37500
    [C7x_1 ]    522.916638 s: 13          , DDR Cacheable       , Persistent  ,  128, 1482.07 , 0x1c017600
    [C7x_1 ]    522.916678 s: 14          , DDR Cacheable       , Persistent  ,  128, 0.00    , 0x1c18a000
    [C7x_1 ]    522.916719 s: 15          , DDR Cacheable       , Persistent  ,  128, 7589.00 , 0x1c18a200
    [C7x_1 ]    522.916748 s: --------------------------------------------
    [C7x_1 ]    522.916773 s: Total memory size requirement (space wise):
    [C7x_1 ]    522.916792 s: Mem Space , Size(KBytes)
    [C7x_1 ]    522.916810 s: L1D       , 16.00   
    [C7x_1 ]    522.916826 s: L2        , 448.00  
    [C7x_1 ]    522.916843 s: L3/MSMC   , 2944.00 
    [C7x_1 ]    522.916861 s: DDR Cacheable, 91451.01
    [C7x_1 ]    522.916883 s: --------------------------------------------
    [C7x_1 ]    522.916918 s: NOTE: Memory requirement in host emulation can be different from the same on EVM
    [C7x_1 ]    522.916956 s:       To get the actual TIDL memory requirement make sure to run on EVM with 
    [C7x_1 ]    522.916980 s:       debugTraceLevel = 2
    [C7x_1 ]    522.916989 s: 
    [C7x_1 ]    522.917009 s: --------------------------------------------
    [C7x_1 ]    522.919445 s: Alg Init for Layer # -    1
    [C7x_1 ]    522.919564 s: Alg Init for Layer # -    2
    [C7x_1 ]    522.919644 s: Alg Init for Layer # -    3
    [C7x_1 ]    522.919713 s: Alg Init for Layer # -    4
    [C7x_1 ]    522.919775 s: Alg Init for Layer # -    5
    [C7x_1 ]    522.919857 s: Alg Init for Layer # -    6
    [C7x_1 ]    522.919958 s: Alg Init for Layer # -    7
    [C7x_1 ]    522.920040 s: Alg Init for Layer # -    8
    [C7x_1 ]    522.920121 s: Alg Init for Layer # -    9
    [C7x_1 ]    522.920336 s: Alg Init for Layer # -   10
    [C7x_1 ]    522.920411 s: Alg Init for Layer # -   11
    [C7x_1 ]    522.920511 s: Alg Init for Layer # -   12
    [C7x_1 ]    522.921074 s: Alg Init for Layer # -   13
    [C7x_1 ]    522.921224 s: Alg Init for Layer # -   14
    [C7x_1 ]    522.922292 s: Alg Init for Layer # -   15
    [C7x_1 ]    522.922434 s: Alg Init for Layer # -   16
    [C7x_1 ]    522.922563 s: Alg Init for Layer # -   17
    [C7x_1 ]    522.922975 s: Alg Init for Layer # -   18
    [C7x_1 ]    522.923128 s: Alg Init for Layer # -   19
    [C7x_1 ]    522.923783 s: Alg Init for Layer # -   20
    [C7x_1 ]    522.923939 s: Alg Init for Layer # -   21
    [C7x_1 ]    522.924054 s: Alg Init for Layer # -   22
    1
    [C7x_1 ]    522.924505 s: Alg Init for Layer # -   23
    [C7x_1 ]    522.924683 s: Alg Init for Layer # -   24
    [C7x_1 ]    522.925359 s: Alg Init for Layer # -   25
    [C7x_1 ]    522.925540 s: Alg Init for Layer # -   26
    [C7x_1 ]    522.925658 s: Alg Init for Layer # -   27
    [C7x_1 ]    522.926515 s: Alg Init for Layer # -   28
    [C7x_1 ]    522.926728 s: Alg Init for Layer # -   29
    [C7x_1 ]    522.930405 s: Alg Init for Layer # -   30
    [C7x_1 ]    522.930636 s: Alg Init for Layer # -   31
    [C7x_1 ]    522.930755 s: Alg Init for Layer # -   32
    [C7x_1 ]    522.933340 s: Alg Init for Layer # -   33
    [C7x_1 ]    522.933654 s: Alg Init for Layer # -   34
    [C7x_1 ]    522.940085 s: Alg Init for Layer # -   35
    [C7x_1 ]    522.940406 s: Alg Init for Layer # -   36
    [C7x_1 ]    522.949784 s: Alg Init for Layer # -   37
    [C7x_1 ]    522.949960 s: Alg Init for Layer # -   38
    [C7x_1 ]    522.959704 s: Alg Init for Layer # -   39
    [C7x_1 ]    522.959947 s: Alg Init for Layer # -   40
    [C7x_1 ]    522.963625 s: Alg Init for Layer # -   41
    [C7x_1 ]    522.963870 s: Alg Init for Layer # -   42
    [C7x_1 ]    522.966381 s: Alg Init for Layer # -   43
    [C7x_1 ]    522.966561 s: Alg Init for Layer # -   44
    [C7x_1 ]    522.970673 s: Alg Init for Layer # -   45
    [C7x_1 ]    522.970887 s: Alg Init for Layer # -   46
    [C7x_1 ]    522.971623 s: Alg Init for Layer # -   47
    [C7x_1 ]    522.971838 s: Alg Init for Layer # -   48
    [C7x_1 ]    522.972602 s: Alg Init for Layer # -   49
    [C7x_1 ]    522.972786 s: Alg Init for Layer # -   50
    [C7x_1 ]    522.974862 s: Alg Init for Layer # -   51
    [C7x_1 ]    522.975059 s: Alg Init for Layer # -   52
    [C7x_1 ]    522.975770 s: Alg Init for Layer # -   53
    [C7x_1 ]    522.975971 s: Alg Init for Layer # -   54
    [C7x_1 ]    522.976296 s: Alg Init for Layer # -   55
    [C7x_1 ]    522.976478 s: Alg Init for Layer # -   56
    [C7x_1 ]    522.979262 s: Alg Init for Layer # -   57
    [C7x_1 ]    522.979458 s: Alg Init for Layer # -   58
    [C7x_1 ]    522.980588 s: Alg Init for Layer # -   59
    [C7x_1 ]    522.980782 s: Alg Init for Layer # -   60
    [C7x_1 ]    522.981666 s: Alg Init for Layer # -   61
    [C7x_1 ]    522.981760 s: Alg Init for Layer # -   64
    [C7x_1 ]    522.981847 s: Alg Init for Layer # -   62
    [C7x_1 ]    522.981934 s: Alg Init for Layer # -   63
    [C7x_1 ]    522.982019 s: Alg Init for Layer # -   65
    [C7x_1 ]    522.982275 s: Alg Init for Layer # -   66
    [C7x_1 ]    522.982391 s: Alg Init for Layer # -   67
    [C7x_1 ]    522.982466 s: Alg Init for Layer # -   68
    [C7x_1 ]    522.982532 s: Alg Init for Layer # -   69
    [C7x_1 ]    522.982596 s: Alg Init for Layer # -   70
    [C7x_1 ]    522.982748 s: PREEMPTION: Adding a new priority object for targetPriority = 256, handle = 117027000
    [C7x_1 ]    522.982810 s: PREEMPTION: Now total number of priority objects = 1 at priorityId = 256,    with new memRec of base = 11bd37500 and size = 3014656
    [C7x_1 ]    522.982881 s: PREEMPTION: Requesting context memory addr for handle 117027000, return Addr = b0f79a68
    [C7x_1 ]    522.982912 s: Print preEmption Hnadle during init stage :
    [C7x_1 ]    522.982938 s: ProcTime,      ctxSize,       dataId
    [C7x_1 ]    522.982966 s: 0.000,         7288,            0
    [C7x_1 ]    522.982992 s: 0.891,      2034936,            1
    [C7x_1 ]    522.983015 s: 0.006,      2034936,            2
    [C7x_1 ]    522.983039 s: 6.343,         7288,            3
    [C7x_1 ]    522.983062 s: 0.006,         7288,            4
    [C7x_1 ]    522.983084 s: 3.487,         7288,            5
    [C7x_1 ]    522.983106 s: 3.487,         7288,            6
    [C7x_1 ]    522.983129 s: 3.487,         7288,            7
    [C7x_1 ]    522.983151 s: 3.487,         7288,            8
    [C7x_1 ]    522.983185 s: 0.083,         7288,            9
    [[0.3137207  0.28808594 0.26220703 ... 0.26660156 0.2685547  0.        ]
     [0.3737793  0.32202148 0.30786133 ... 0.16992188 0.28808594 0.1430664 ]
     [0.30786133 0.41259766 0.3371582  ... 0.28808594 0.2824707  0.24804688]
     ...
     [0.30004883 0.22607422 0.19555664 ... 0.17895508 0.25048828 0.15454102]
     [0.32470703 0.34423828 0.39257812 ... 0.28442383 0.23486328 0.32983398]
     [0.3955078  0.27270508 0.39111328 ... 0.17456055 0.2199707  0.22607422]][C7x_1 ]    522.983209 s: 0.006,         7288,           10
    [C7x_1 ]    522.983232 s: 0.407,      2035192,           11
    [C7x_1 ]    522.983255 s: 1.844,         7288,           12
    
    [C7x_1 ]    522.983278 s: 3.539,         7288,           13
    [C7x_1 ]    522.983301 s: 3.576,         7288,           14
    [C7x_1 ]    522.983324 s: 3.539,         7288,           15
    [C7x_1 ]    522.983347 s: 2.232,         7288,           16
    [C7x_1 ]    522.983370 s: 1.333,         7288,           17
    [C7x_1 ]    522.983392 s: 1.780,         7288,           18
    [C7x_1 ]    522.983415 s: 1.792,         7288,           19
    [C7x_1 ]    522.983438 s: 1.780,         7288,           20
    [C7x_1 ]    522.983461 s: 0.892,      2034936,           21
    [C7x_1 ]    522.983484 s: 0.664,         7288,           22
    [C7x_1 ]    522.983506 s: 0.893,         7288,           23
    [C7x_1 ]    522.983529 s: 1.296,         7288,           24
    [C7x_1 ]    522.983552 s: 0.893,         7288,           25
    [C7x_1 ]    522.983574 s: 0.448,      1031416,           26
    [C7x_1 ]    522.983597 s: 0.685,         7288,           27
    [C7x_1 ]    522.983620 s: 0.227,      2055416,           28
    [C7x_1 ]    522.983642 s: 1.417,         7288,           29
    [C7x_1 ]    522.983665 s: 0.450,         7288,           30
    [C7x_1 ]    522.983688 s: 0.227,       515320,           31
    [C7x_1 ]    522.983711 s: 0.663,      1023224,           32
    [C7x_1 ]    522.983734 s: 0.079,      1023224,           33
    [C7x_1 ]    522.983757 s: 2.578,      1023224,           34
    [C7x_1 ]    522.983781 s: 0.079,      1023224,           35
    [C7x_1 ]    522.983804 s: 0.341,         7288,           36
    [C7x_1 ]    522.983826 s: 0.900,         7288,           37
    [C7x_1 ]    522.983850 s: 7.795,      2119416,           38
    [C7x_1 ]    522.983872 s: 0.150,      2055416,           39
    [C7x_1 ]    522.983896 s: 1.417,         7288,           40
    [C7x_1 ]    522.983918 s: 0.227,      2055416,           41
    [C7x_1 ]    522.983941 s: 0.518,         7288,           42
    [C7x_1 ]    522.983963 s: 1.786,         7288,           43
    [C7x_1 ]    522.983986 s: 5.192,         7288,           44
    [C7x_1 ]    522.984008 s: 0.893,         7288,           45
    [C7x_1 ]    522.984031 s: 1.296,         7288,           46
    [C7x_1 ]    522.984054 s: 0.893,         7288,           47
    [C7x_1 ]    522.984076 s: 1.335,         7288,           48
    [C7x_1 ]    522.984099 s: 3.561,         7288,           49
    [C7x_1 ]    522.984121 s: 2.790,         7288,           50
    [C7x_1 ]    522.984144 s: 1.780,         7288,           51
    [C7x_1 ]    522.984172 s: 1.792,         7288,           52
    [C7x_1 ]    522.984196 s: 1.780,         7288,           53
    [C7x_1 ]    522.984218 s: 2.655,         7288,           54
    [C7x_1 ]    522.984241 s: 7.110,         7288,           55
    [C7x_1 ]    522.984264 s: 5.486,         7288,           56
    [C7x_1 ]    522.984287 s: 3.539,         7288,           57
    [C7x_1 ]    522.984309 s: 3.577,         7288,           58
    [C7x_1 ]    522.984331 s: 3.539,         7288,           59
    [C7x_1 ]    522.984354 s: 2.030,         7288,           60
    [C7x_1 ]    522.984377 s: 0.002,         7288,           61
    [C7x_1 ]    522.984400 s: 0.002,         7288,           64
    [C7x_1 ]    522.984422 s: 0.002,         7288,           62
    [C7x_1 ]    522.984444 s: 0.002,         7288,           63
     *******   In TIDL_subgraphRtInvoke  ******** 
    [C7x_1 ]    522.984466 s: 0.243,      2035192,           65
    [C7x_1 ]    522.984488 s: 0.288,         7288,           66
    [C7x_1 ]    522.984510 s: 0.006,         7288,           67
    [C7x_1 ]    522.984533 s: 6.343,      2034936,           68
    [C7x_1 ]    522.984555 s: 0.006,      2034936,           69
    [C7x_1 ]    522.984578 s: 1.824,         7288,           70
    [C7x_1 ]    522.984600 s: 0.000,            0,           71
    [C7x_1 ]    522.984669 s: TIDL_initializeHandleForPreemption is completed 
    [C7x_1 ]    524.252040 s: TIDL_process is started with handle : 117027000 
    [C7x_1 ]    524.252084 s: PREEMPTION: Requesting UNLOCK for priroty object and targetPriority 256 is serviced
    [C7x_1 ]    524.252131 s: PREEMPTION: Requesting LOCK for priroty object with handle = 117027000 and targetPriority 256
    [C7x_1 ]    524.252186 s: PREEMPTION: Request of LOCK for priroty object with handle = 117027000 and targetPriority 256 is serviced with state 0
    [C7x_1 ]    524.252282 s: TIDL_activate is called with handle : 117027000 - Copying handle of size 19736 from 117027000 to 702f2000 
    [C7x_1 ]    524.252368 s: PREEMPTION: Requesting UNLOCK for priroty object and targetPriority 256
    [C7x_1 ]    524.252417 s: PREEMPTION: Requesting UNLOCK for priroty object and targetPriority 256 is serviced
    [C7x_1 ]    524.252464 s: PREEMPTION: Requesting LOCK for priroty object with handle = 117027000 and targetPriority 256
    [C7x_1 ]    524.252518 s: PREEMPTION: Request of LOCK for priroty object with handle = 117027000 and targetPriority 256 is serviced with state 0
    [C7x_1 ]    524.252560 s: Core 0 Alg Process for Layer # -    1, layer type 29
    [C7x_1 ]    524.252589 s: Processing Layer # -    1
    [C7x_1 ]    524.252984 s: Core 0 End of Layer # -    1 with outPtrs[0] = 70020000
    [C7x_1 ]    524.253022 s: Core 0 Alg Process for Layer # -    2, layer type 38
    [C7x_1 ]    524.253048 s: Processing Layer # -    2
    [C7x_1 ]    524.253080 s: Core 0 End of Layer # -    2 with outPtrs[0] = 70020000
    [C7x_1 ]    524.253114 s: Core 0 Alg Process for Layer # -    3, layer type 41
    [C7x_1 ]    524.253138 s: Processing Layer # -    3
    [C7x_1 ]    524.274440 s: Core 0 End of Layer # -    3 with outPtrs[0] = 11712ac00
    [C7x_1 ]    524.274476 s: Core 0 Alg Process for Layer # -    4, layer type 38
    [C7x_1 ]    524.274500 s: Processing Layer # -    4
    [C7x_1 ]    524.274531 s: Core 0 End of Layer # -    4 with outPtrs[0] = 11712ac00
    [C7x_1 ]    524.274565 s: Core 0 Alg Process for Layer # -    5, layer type 14
    [C7x_1 ]    524.274588 s: Processing Layer # -    5
    [C7x_1 ]    524.287551 s: Core 0 End of Layer # -    5 with outPtrs[0] = 70020000
    [C7x_1 ]    524.287586 s: Core 0 Alg Process for Layer # -    6, layer type 14
    [C7x_1 ]    524.287611 s: Processing Layer # -    6
    [C7x_1 ]    524.300561 s: Core 0 End of Layer # -    6 with outPtrs[0] = 7009bc80
    [C7x_1 ]    524.300596 s: Core 0 Alg Process for Layer # -    7, layer type 14
    [C7x_1 ]    524.300621 s: Processing Layer # -    7
    [C7x_1 ]    524.313582 s: Core 0 End of Layer # -    7 with outPtrs[0] = 70117900
    [C7x_1 ]    524.313616 s: Core 0 Alg Process for Layer # -    8, layer type 14
    [C7x_1 ]    524.313641 s: Processing Layer # -    8
    [C7x_1 ]    524.326588 s: Core 0 End of Layer # -    8 with outPtrs[0] = 70193580
    [C7x_1 ]    524.326621 s: Core 0 Alg Process for Layer # -    9, layer type 12
    [C7x_1 ]    524.326647 s: Processing Layer # -    9
    [C7x_1 ]    524.328766 s: Core 0 End of Layer # -    9 with outPtrs[0] = 11712ac00
    [C7x_1 ]    524.328809 s: Core 0 Alg Process for Layer # -   10, layer type 38
    [C7x_1 ]    524.328835 s: Processing Layer # -   10
    [C7x_1 ]    524.328868 s: Core 0 End of Layer # -   10 with outPtrs[0] = 11712ac00
    [C7x_1 ]    524.328903 s: Core 0 Alg Process for Layer # -   11, layer type 29
    [C7x_1 ]    524.328927 s: Processing Layer # -   11
    [C7x_1 ]    524.329525 s: Core 0 End of Layer # -   11 with outPtrs[0] = 70020000
    [C7x_1 ]    524.329560 s: Core 0 Alg Process for Layer # -   12, layer type 1
    [C7x_1 ]    524.329586 s: Processing Layer # -   12
    [C7x_1 ]    524.330449 s: Core 0 End of Layer # -   12 with outPtrs[0] = 11712ac00
    [C7x_1 ]    524.330485 s: Core 0 Alg Process for Layer # -   13, layer type 8
    [C7x_1 ]    524.330509 s: Processing Layer # -   13
    [C7x_1 ]    524.333195 s: Core 0 End of Layer # -   13 with outPtrs[0] = 1180ef800
    [C7x_1 ]    524.333231 s: Core 0 Alg Process for Layer # -   14, layer type 1
    [C7x_1 ]    524.333255 s: Processing Layer # -   14
    [C7x_1 ]    524.336298 s: Core 0 End of Layer # -   14 with outPtrs[0] = 119591c00
    [C7x_1 ]    524.336333 s: Core 0 Alg Process for Layer # -   15, layer type 8
    [C7x_1 ]    524.336357 s: Processing Layer # -   15
    [C7x_1 ]    524.339043 s: Core 0 End of Layer # -   15 with outPtrs[0] = 11712ac00
    [C7x_1 ]    524.339080 s: Core 0 Alg Process for Layer # -   16, layer type 2
    [C7x_1 ]    524.339105 s: Processing Layer # -   16
    [C7x_1 ]    524.339901 s: Core 0 End of Layer # -   16 with outPtrs[0] = 118dec400
    [C7x_1 ]    524.339937 s: Core 0 Alg Process for Layer # -   17, layer type 1
    [C7x_1 ]    524.339960 s: Processing Layer # -   17
    [C7x_1 ]    524.341094 s: Core 0 End of Layer # -   17 with outPtrs[0] = 1185cd000
    [C7x_1 ]    524.341130 s: Core 0 Alg Process for Layer # -   18, layer type 8
    [C7x_1 ]    524.341155 s: Processing Layer # -   18
    [C7x_1 ]    524.342539 s: Core 0 End of Layer # -   18 with outPtrs[0] = 118dec400
    [C7x_1 ]    524.342576 s: Core 0 Alg Process for Layer # -   19, layer type 1
    [C7x_1 ]    524.342599 s: Processing Layer # -   19
    [C7x_1 ]    524.344400 s: Core 0 End of Layer # -   19 with outPtrs[0] = 1195a8800
    [C7x_1 ]    524.344436 s: Core 0 Alg Process for Layer # -   20, layer type 8
    [C7x_1 ]    524.344460 s: Processing Layer # -   20
    [C7x_1 ]    524.345843 s: Core 0 End of Layer # -   20 with outPtrs[0] = 1185cd000
    [C7x_1 ]    524.345878 s: Core 0 Alg Process for Layer # -   21, layer type 2
    [C7x_1 ]    524.345902 s: Processing Layer # -   21
    [C7x_1 ]    524.346259 s: Core 0 End of Layer # -   21 with outPtrs[0] = 70020000
    [C7x_1 ]    524.346295 s: Core 0 Alg Process for Layer # -   22, layer type 1
    [C7x_1 ]    524.346319 s: Processing Layer # -   22
    [C7x_1 ]    524.347068 s: Core 0 End of Layer # -   22 with outPtrs[0] = 118d89400
    [C7x_1 ]    524.347102 s: Core 0 Alg Process for Layer # -   23, layer type 8
    [C7x_1 ]    524.347126 s: Processing Layer # -   23
    [C7x_1 ]    524.347854 s: Core 0 End of Layer # -   23 with outPtrs[0] = 1191d7800
    [C7x_1 ]    524.347890 s: Core 0 Alg Process for Layer # -   24, layer type 1
    [C7x_1 ]    524.347916 s: Processing Layer # -   24
    [C7x_1 ]    524.349401 s: Core 0 End of Layer # -   24 with outPtrs[0] = 1195b5c00
    [C7x_1 ]    524.349437 s: Core 0 Alg Process for Layer # -   25, layer type 8
    [C7x_1 ]    524.349462 s: Processing Layer # -   25
    [C7x_1 ]    524.350189 s: Core 0 End of Layer # -   25 with outPtrs[0] = 118d89400
    [C7x_1 ]    524.350225 s: Core 0 Alg Process for Layer # -   26, layer type 2
    [C7x_1 ]    524.350249 s: Processing Layer # -   26
    [C7x_1 ]    524.350448 s: Core 0 End of Layer # -   26 with outPtrs[0] = 70020000
    [C7x_1 ]    524.350482 s: Core 0 Alg Process for Layer # -   27, layer type 1
    [C7x_1 ]    524.350506 s: Processing Layer # -   27
    [C7x_1 ]    524.351239 s: Core 0 End of Layer # -   27 with outPtrs[0] = 119167800
    [C7x_1 ]    524.351274 s: Core 0 Alg Process for Layer # -   28, layer type 8
    [C7x_1 ]    524.351299 s: Processing Layer # -   28
    [C7x_1 ]    524.351691 s: Core 0 End of Layer # -   28 with outPtrs[0] = 70020000
    [C7x_1 ]    524.351726 s: Core 0 Alg Process for Layer # -   29, layer type 1
    [C7x_1 ]    524.351752 s: Processing Layer # -   29
    [C7x_1 ]    524.354317 s: Core 0 End of Layer # -   29 with outPtrs[0] = 119356c00
    [C7x_1 ]    524.354359 s: Core 0 Alg Process for Layer # -   30, layer type 8
    [C7x_1 ]    524.354386 s: Processing Layer # -   30
    [C7x_1 ]    524.354791 s: Core 0 End of Layer # -   30 with outPtrs[0] = 119167800
    [C7x_1 ]    524.354828 s: Core 0 Alg Process for Layer # -   31, layer type 2
    [C7x_1 ]    524.354853 s: Processing Layer # -   31
    [C7x_1 ]    524.354980 s: Core 0 End of Layer # -   31 with outPtrs[0] = 70020000
    [C7x_1 ]    524.355016 s: Core 0 Alg Process for Layer # -   32, layer type 1
    [C7x_1 ]    524.355041 s: Processing Layer # -   32
    [C7x_1 ]    524.355849 s: Core 0 End of Layer # -   32 with outPtrs[0] = 7009c080
    [C7x_1 ]    524.355884 s: Core 0 Alg Process for Layer # -   33, layer type 8
    [C7x_1 ]    524.355909 s: Processing Layer # -   33
    [C7x_1 ]    524.356137 s: Core 0 End of Layer # -   33 with outPtrs[0] = 70020000
    [C7x_1 ]    524.356174 s: Core 0 Alg Process for Layer # -   34, layer type 1
    [C7x_1 ]    524.356199 s: Processing Layer # -   34
    [C7x_1 ]    524.359443 s: Core 0 End of Layer # -   34 with outPtrs[0] = 70118080
    [C7x_1 ]    524.359481 s: Core 0 Alg Process for Layer # -   35, layer type 8
    [C7x_1 ]    524.359505 s: Processing Layer # -   35
    [C7x_1 ]    524.359736 s: Core 0 End of Layer # -   35 with outPtrs[0] = 70020000
    [C7x_1 ]    524.359772 s: Core 0 Alg Process for Layer # -   36, layer type 11
    [C7x_1 ]    524.359798 s: Processing Layer # -   36
    [C7x_1 ]    524.361007 s: Core 0 End of Layer # -   36 with outPtrs[0] = 119356c00
    [C7x_1 ]    524.361044 s: Core 0 Alg Process for Layer # -   37, layer type 12
    [C7x_1 ]    524.361070 s: Processing Layer # -   37
    [C7x_1 ]    524.361425 s: Core 0 End of Layer # -   37 with outPtrs[0] = 1195af000
    [C7x_1 ]    524.361460 s: Core 0 Alg Process for Layer # -   38, layer type 1
    [C7x_1 ]    524.361487 s: Processing Layer # -   38
    [C7x_1 ]    524.371372 s: Core 0 End of Layer # -   38 with outPtrs[0] = 70020000
    [C7x_1 ]    524.371410 s: Core 0 Alg Process for Layer # -   39, layer type 8
    [C7x_1 ]    524.371434 s: Processing Layer # -   39
    [C7x_1 ]    524.371818 s: Core 0 End of Layer # -   39 with outPtrs[0] = 70020000
    [C7x_1 ]    524.371854 s: Core 0 Alg Process for Layer # -   40, layer type 1
    [C7x_1 ]    524.371879 s: Processing Layer # -   40
    [C7x_1 ]    524.374443 s: Core 0 End of Layer # -   40 with outPtrs[0] = 119167800
    [C7x_1 ]    524.374484 s: Core 0 Alg Process for Layer # -   41, layer type 8
    [C7x_1 ]    524.374510 s: Processing Layer # -   41
    [C7x_1 ]    524.374907 s: Core 0 End of Layer # -   41 with outPtrs[0] = 70020000
    [C7x_1 ]    524.374943 s: Core 0 Alg Process for Layer # -   42, layer type 11
    [C7x_1 ]    524.374967 s: Processing Layer # -   42
    [C7x_1 ]    524.375562 s: Core 0 End of Layer # -   42 with outPtrs[0] = 119167800
    [C7x_1 ]    524.375598 s: Core 0 Alg Process for Layer # -   43, layer type 12
    [C7x_1 ]    524.375624 s: Processing Layer # -   43
    [C7x_1 ]    524.376264 s: Core 0 End of Layer # -   43 with outPtrs[0] = 119585c00
    [C7x_1 ]    524.376299 s: Core 0 Alg Process for Layer # -   44, layer type 1
    [C7x_1 ]    524.376325 s: Processing Layer # -   44
    [C7x_1 ]    524.382860 s: Core 0 End of Layer # -   44 with outPtrs[0] = 118d89400
    [C7x_1 ]    524.382902 s: Core 0 Alg Process for Layer # -   45, layer type 8
    [C7x_1 ]    524.382928 s: Processing Layer # -   45
    [C7x_1 ]    524.383659 s: Core 0 End of Layer # -   45 with outPtrs[0] = 119179800
    [C7x_1 ]    524.383694 s: Core 0 Alg Process for Layer # -   46, layer type 1
    [C7x_1 ]    524.383718 s: Processing Layer # -   46
    [C7x_1 ]    524.385203 s: Core 0 End of Layer # -   46 with outPtrs[0] = 119981c00
    [C7x_1 ]    524.385239 s: Core 0 Alg Process for Layer # -   47, layer type 8
    [C7x_1 ]    524.385264 s: Processing Layer # -   47
    [C7x_1 ]    524.385991 s: Core 0 End of Layer # -   47 with outPtrs[0] = 1195a3800
    [C7x_1 ]    524.386027 s: Core 0 Alg Process for Layer # -   48, layer type 11
    [C7x_1 ]    524.386051 s: Processing Layer # -   48
    
    CTRL-A Z for help | 115200 8N1 | NOR | Minicom 2.8 | VT102 | Offline | ttyUSB2                                                                                                                                                                                                    
    
    

    This is possible a TIDL bug. Please wait for internal debug. 

    Regards,

    Adam

  • Hi,

    also upload config file for easier reproduce:

                "hsh_modify": create_model_config(
            source=AttrDict(
                model_url="dummy",
                infer_shape=True,
            ),
            preprocess=AttrDict(
                resize=256,
                crop=224,
                data_layout="NCHW",
                resize_with_pad=False,
                reverse_channels=False,
            ),
            session=AttrDict(
                session_name="onnxrt",
                model_path=os.path.join(models_base_path, "hsh_model_modified.onnx"),
                input_mean=[123.675, 116.28, 103.53],
                input_scale=[0.017125, 0.017507, 0.017429],
                input_optimization=True,
            ),
            task_type="classification",
            extra_info=AttrDict(num_images=numImages, num_classes=1000),
        ),
    

    Modified script of onnxrt_ep.py:

    def infer_image(sess, image_files, config):
        '''
        Invoke the runtime session
    
        :param sess: Runtime session
        :param image_files: List of input image filename
        :param config: Configuration dictionary
        :return: Input Images
        :return: Output tensors
        :return: Total Processing time
        :return: Subgraphs Processing time
        :return: Height of input tensor
        :return: Width of input tensor
        '''
    
        # Get input details from the session
        input_details = sess.get_inputs()
        input_name = input_details[0].name
        floating_model = input_details[0].type == "tensor(float)"
        height = input_details[0].shape[2]
        width = input_details[0].shape[3]
        channel = input_details[0].shape[1]
        batch = input_details[0].shape[0]
        imgs = []
        shape = [batch, channel, height, width]
        input_shape = input_details[0].shape
        input_data = np.random.random(input_shape).astype(np.float32) * (1 - 0) 
        print(len(input_details))
        if len(input_details)>1:
            print(len(input_details))
            input_name2 = input_details[1].name
            if input_details[1].type == "tensor(float)":
                input_data2 = np.random.random(input_details[1].shape).astype(np.float32) * (1 - 0) 
            else:
                input_data2 = np.random.randint(0, 2000, size=input_details[1].shape).astype(np.int32)
        # Prepare the input data
        # input_data = np.zeros(shape)
        # for i in range(batch):
        #     imgs.append(
        #         Image.open(image_files[i])
        #         .convert("RGB")
        #         .resize((width, height), PIL.Image.LANCZOS)
        #     )
        #     temp_input_data = np.expand_dims(imgs[i], axis=0)
        #     temp_input_data = np.transpose(temp_input_data, (0, 3, 1, 2))
        #     input_data[i] = temp_input_data[0]
        # if floating_model:
        #     input_data = np.float32(input_data)
        #     for mean, scale, ch in zip(
        #         config["session"]["input_mean"],
        #         config["session"]["input_scale"],
        #         range(input_data.shape[1]),
        #     ):
        #         input_data[:, ch, :, :] = (input_data[:, ch, :, :] - mean) * scale
        # else:
        #     input_data = np.uint8(input_data)
        #     config["session"]["input_mean"] = [0, 0, 0]
        #     config["session"]["input_scale"] = [1, 1, 1]
    
        data = np.fromfile("0.bin", dtype=np.uint16).astype(np.float32)
        raw_vis = data.reshape(1056, 1920) / 4096
        print(raw_vis)
        input_data[0,0,:,:] = raw_vis
        # Invoke the session
        start_time = time.time()
        
        if len(input_details)>1:
            print(len(input_details))
            output = list(sess.run(None, {input_name: input_data, input_name2: input_data2}))
        else:
            output = list(sess.run(None, {input_name: input_data}))
        stop_time = time.time()
        infer_time = stop_time - start_time
    
        copy_time, sub_graphs_proc_time, totaltime = get_benchmark_output(sess)
        proc_time = totaltime - copy_time
    
        return imgs, output, proc_time, sub_graphs_proc_time, height, width

    model file please use /cfs-file/__key/communityserver-discussions-components-files/791/modified_5F00_model.zip 

    the input file 0.bin is in /cfs-file/__key/communityserver-discussions-components-files/791/1425.model.zip 

    Regards,

    Adam

  • The sdk version is 10.1 and edgeai tidl tools version 10.01.00.02.

  • Hello,

    I have added this to Jira TIDL-7532. I will also reproduce on my end and keep you updated.

    Warm regards,

    Christina

  • Hi Adam and Wang,

    Could you provide some context on how this model was created? We are investigating the layer info and this information would be useful for us to figure out why the layer names outputted are not corresponding to the real layer names of the model.

    Also, if you have a copy of the correct outputs of the model, that would also be helpful for us when validating. 

    Warm regards,
    Christina

  • Using the onnx model in the onnxruntime inference engine can correctly obtain results, and there is no problem with the result names you mentioned at this level. Also, what kind of model do you mean by "model"? Is it the model definition script in the pytorch or tensorflow framework?

  • Hi Wang, 

    Yes I was asking about the hsh_model_modified.onnx, which I believe is based on hsh_model.onnx, and any information how what was used to create/train (assuming pytorch). Did you test on Onnxruntime with 8bit as well? 

    Thank you for the confirmation that the onnxruntime inference results in correct outputs. I was wondering if the TIDL PC emulation (before running on device) had also given you correct outputs? From my testing, PC also shows some mismatch due to the layer info.

    Warm regards,

    Christina

  • The model hsh_model_modified.onnx is the model file I gave to adam. He also tested the 8-bit results in onnxruntime and used tidl_pc for testing, and the results were all fine. As for why you mentioned modifying hsh_model.onnx, it is because not modifying it will report an error in the gather operator. This modification was also discussed with adam.

  • HI Wang,

    Can you try sdk 11.0 with new tidl released last week? if sdk11.0 work, we can make tidl backport.

    Regards,

    Adam