This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VH-Q1: Unable to run the compiled model with c7x_codegen set to 1 correctly in TDA4VH

Part Number: TDA4VH-Q1
Other Parts Discussed in Thread: TDA4VH, AM69A

hi

I tried to run the model compiled by TVM on TDA4VH, but encountered some problems. Below is the process of compiling and running the model

Model compilation

environment variable:

export SOC=am69a
export TIDL_TOOLS_PATH=/root/zilou.cao/edgeai-tidl-tools/tidl_tools/
export PSDKR_PATH=/opt/ti-processor-sdk-rtos-j784s4-evm-09_00_00_02
export TIDL_INSTALL_PATH=/opt/ti-processor-sdk-rtos-j784s4-evm-09_00_00_02
export ARM64_GCC_PATH=/opt/ti-processor-sdk-linux-adas-j784s4-evm-09_00_00_08/external-toolchain-dir/arm-gnu-toolchain-11.3.rel1-x86_64-aarch64-none-linux-gnu/
export CGT7X_ROOT=/opt/ti-cgt-c7000_3.1.0.LTS

input onnx model:

8535.img_stage_sim_1V.zip

I have made some modifications to the script based on examples/osrt_python/tvm dlr/tvm compilation-onnx example. py. The script for compiling the model is as follows: I have set c7x_codegen to 1;

import os
import sys
import argparse
# directory reach
current = os.path.dirname(os.path.realpath(__file__))
parent = os.path.dirname(current)
# setting path
sys.path.append(parent)
from common_utils import *
from model_configs import *

parser = argparse.ArgumentParser()
parser.add_argument('--num_bits', dest='num_bits', default=16, choices=[8, 16, 32], help='number of bits used for quantization (use 32 for float-mode TIDL subgraphs)')
parser.add_argument('--num_subgraphs', dest='num_subgraphs_max', default=16, type=int, help='maximum number of TIDL subgraphs for offload (actual number of subgraphs may be less that this)')
parser.add_argument('--pc-inference', dest='device', action='store_false', help='compile for inference on PC')
parser.add_argument('--num_calib_images', dest='calib_iters', default=4, type=int, help='number of images to use for calibration')

args = parser.parse_args()

# model specifics
model_path = "/tvm/img_stage_sim_1V.onnx"
model_output_directory = '/tvm/bev_patr1_tvm/bevdet_patr1'

images_input = np.random.rand(1, 3, 256, 704).astype(np.float32)
rot_input = np.random.rand(1, 1, 4, 4).astype(np.float32)
intrin_input = np.random.rand(1, 1, 3, 3).astype(np.float32)
post_rot_input = np.random.rand(1, 1, 3, 3).astype(np.float32)
post_trans_input = np.random.rand(1, 1, 3).astype(np.float32)
bda_input = np.random.rand(1, 3, 3).astype(np.float32)

input_name_to_shape = {
    'images': images_input.shape,
    'rot': rot_input.shape,
    'intrin': intrin_input.shape,
    'post_rot': post_rot_input.shape,
    'post_trans': post_trans_input.shape,
    'bda': bda_input.shape
}

# input_name_to_shape = {name: shape for name, shape in input_name_to_shape.items()}

# TIDL compiler specifics
# We are compiling the model for J7 device using
# a compiler distributed with SDK 7.0
DEVICE = os.environ['SOC']
SDK_VERSION = (9, 0)

# convert the model to relay IR format
import onnx
from tvm import relay
print(model_path)
onnx_model = onnx.load(model_path)
mod, params = relay.frontend.from_onnx(onnx_model,
                    shape=input_name_to_shape)

if args.device:
    build_target = 'llvm -device=arm_cpu -mtriple=aarch64-linux-gnu'
    cross_cc_args = {'cc' : os.path.join(os.environ['ARM64_GCC_PATH'], 'bin', 'aarch64-none-linux-gnu-gcc')}
    model_output_directory = model_output_directory+'_device'
else:
    build_target = 'llvm'
    cross_cc_args = {}

# image preprocessing for calibration
def preprocess_for_onnx_mobilenetv2(image_path):
    import cv2
    import numpy as np

    # Load the image with Pillow
    image = Image.open(image_path)

    # Resize image to the expected input size of the ONNX model
    # Note: The resize function expects the size in (width, height) format
    image = image.resize((704, 256))

    # Convert image to RGB just in case it is not
    image = image.convert('RGB')

    # Convert the image to a NumPy array
    image_np = np.array(image)

    # Transpose the image array to [channels, height, width] format as expected by ONNX
    image_np = image_np.transpose((2, 0, 1))

    # Normalize the image
    # This normalization is standard for MobileNet but may need to be adjusted for other models
    image_np = image_np / 255.0
    image_np = image_np.astype(np.float32)

    # Add a batch dimension by expanding the dimensions
    image_np = np.expand_dims(image_np, axis=0)

    return image_np

# create the directory if not present
# clear the directory
os.makedirs(model_output_directory, exist_ok=True)
for root, dirs, files in os.walk(model_output_directory, topdown=False):
    [os.remove(os.path.join(root, f)) for f in files]
    [os.rmdir(os.path.join(root, d)) for d in dirs]

from tvm.relay.backend.contrib import tidl

assert args.num_bits in [8, 16, 32]
assert args.num_subgraphs_max <= 16

# Use advanced calibration for 8-bit quantization
# Use simple calibration for 16-bit quantization and float-mode
advanced_options = {
    8 :  {
            'calibration_iterations' : 10,
            # below options are set to default values, include here for reference
            'quantization_scale_type' : 0,
            'high_resolution_optimization' : 0,
            'pre_batchnorm_fold' : 1,
            # below options are only overwritable at accuracy level 9, otherwise ignored
            'activation_clipping' : 1,
            'weight_clipping' : 1,
            'bias_calibration' : 1,
            'channel_wise_quantization' : 0,
            },
    16 : {
            'calibration_iterations' : 10,

            'pre_batchnorm_fold' : 1,
            'activation_clipping' : 1,
            'weight_clipping' : 1,
            'bias_calibration' : 1,
            'channel_wise_quantization' : 0,
            # below options are only overwritable at accuracy level 9, otherwise ignored
            },
    32 : {
            'calibration_iterations' : 1,
            }
}

calib_input_list = [{"images": images_input, "rot": rot_input, "intrin": intrin_input, "post_rot": post_rot_input, "post_trans": post_trans_input, "bda": bda_input}]

# Create the TIDL compiler with appropriate parameters
compiler = tidl.TIDLCompiler(
    DEVICE,
    SDK_VERSION,
    tidl_tools_path = os.environ['TIDL_TOOLS_PATH'],
    artifacts_folder = model_output_directory,
    tensor_bits = args.num_bits,
    debug_level = 3,
    max_num_subgraphs = args.num_subgraphs_max,
    c7x_codegen = 1,
    accuracy_level = (1 if args.num_bits == 8 else 0),
    advanced_options = advanced_options[args.num_bits],
    deny_list = "nn.batch_norm",
    )

# partition the graph into TIDL operations and TVM operations
mod, status = compiler.enable(mod, params, calib_input_list)

# build the relay module into deployables
with tidl.build_config(tidl_compiler=compiler):
    graph, lib, params = relay.build_module.build(mod, target=build_target, params=params)

# remove nodes / params not needed for inference
tidl.remove_tidl_params(params)

# save the deployables
path_lib = os.path.join(model_output_directory, 'deploy_lib.so')
path_graph = os.path.join(model_output_directory, 'deploy_graph.json')
path_params = os.path.join(model_output_directory, 'deploy_params.params')

lib.export_library(path_lib, **cross_cc_args)
with open(path_graph, "w") as fo:
    fo.write(graph)
with open(path_params, "wb") as fo:
    fo.write(relay.save_param_dict(params))

Compilation process log
compilation_my_custom_model.zip

Compilation product:

bevdet_patr1_device.zip

Then I exported the compiled model to tda4vh and ran it using the following Python script. The loading process of the model was normal during the run, but the script kept getting stuck in the line res=model.run

Run script:

import time
import platform
import os
import sys
from PIL import Image
import numpy as np
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('-z','--run_model_zoo', action='store_true',  help='Run model zoo models')
args = parser.parse_args()
# directory reach
current = os.path.dirname(os.path.realpath(__file__))
parent = os.path.dirname(current)
# setting path
sys.path.append(parent)
from common_utils import *
from model_configs import *


def model_create_and_run(model_dir):
    from dlr import DLRModel
    import numpy
    print(f'\n\nRunning Inference on Model -  {model_dir}\n')

    model = DLRModel(model_dir, 'cpu')
    # print(model.get_input_names())
    # print(model.get_output_names())
    # print(model.get_input_dtypes())
    
    print("Success load model")

    proc_time = 0.0
    input_npy_path = "image_stage_input/"
    batch = 0
    batches_num = 6
    for batch in range(batches_num):
      bda = np.load(input_npy_path + "bda.npy")
      img = np.load(input_npy_path + "img.npy")[:, batch, :, :, :]
      intrin = np.load(input_npy_path + "intrin.npy")[:, batch-1:batch, :, :]
      post_rot = np.load(input_npy_path + "post_rot.npy")[:, batch-1:batch, :, :]
      post_tran = np.load(input_npy_path + "post_tran.npy")[:, batch-1:batch, :]
      sensor2ego = np.load(input_npy_path + "sensor2ego.npy")[:, batch-1:batch, :, :]
      start_time = time.time()
      print(3)
      print(img.shape)
      res = model.run({"images" : img, "rot": sensor2ego, "intrin": intrin, "post_rot": post_rot, "post_trans": post_tran, "bda": bda})
      np.save(input_npy_path + str(batch) + "_tvm_bevdet_part1_depth.npy", res[1])
      np.save(input_npy_path + str(batch) + "_tvm_bevdet_part1_image_feature.npy", res[0])
      print("run...")
      stop_time = time.time()                                                                                                                                                                                        
      proc_time += (stop_time - start_time)*1000   

    print(f'\n Processing time in ms : {proc_time/batches_num:10.1f}\n') 

model_input_directory = '../../../bevdet_patr1_device'
model_create_and_run(model_input_directory)

Model Input:

image_stage_input.zip

runtime log,According to the log, the script has been running in model.run without any return value

The edgeai idle tools warehouse branch is 09:00:00:00; The SDK version is ti processor sdk rtos j784s4 evm-09:00:00:02;

When I set c7x_codegen to 0 (without modifying any other code), the script can run normally in the board and the model can output results normally.

  • Hi,

    Could you clarify which EdgeAI TIDL Tools Tag you are using? For J784S4 TI Processor SDK RTOS 09.00.00.02, you would need to check out with tag 09_00_00_06 or 09_00_00_07 as outlined here: https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/docs/version_compatibility_table.md

    Thank you,

    Fabiana

  • tag is 09_00_00_00 ,Do I need to change it to 09_00_00_06 or 09_00_00_07

  • Hi,

    Yes, could you try this and see if the issue persists?

    Thank you,

    Fabiana

  • I tried to compile Mobilenetv2 under tag 09_00_00_06  and set c7x_codegen to 1, but still couldn't run it

    I only modified SDK_version and c7x_codegen in the script for compiling the model

    The following is the runtime log, ”hhhhhhhhhhhhhhhhhhhhh“ is the log I printed before model.run

    2204.620426 s: 0.017,       220760,           45
    [C7x_1 ]   2204.620447 s: 0.015,       220760,           46
    [C7x_1 ]   2204.620469 s: 0.017,        67160,           47
    [C7x_1 ]   2204.620490 s: 0.007,        36312,           48
    [C7x_1 ]   2204.620515 s: 0.017,       189912,           49
    [C7x_1 ]   2204.620538 s: 0.033,        33880,           50
    [C7x_1 ]   2204.620560 s: 0.017,        13528,           51
    [C7x_1 ]   2204.620582 s: 0.025,        60760,           52
    [C7x_1 ]   2204.620603 s: 0.014,        60760,           53
    [C7x_1 ]   2204.620625 s: 0.024,        21592,           54
    [C7x_1 ]   2204.620646 s: 0.006,        13528,           55
    [C7x_1 ]   2204.620668 s: 0.025,        60760,           56
    [C7x_1 ]   2204.620689 s: 0.014,        60760,           57
    [C7x_1 ]   2204.620711 s: 0.024,        21592,           58
    [C7x_1 ]   2204.620732 s: 0.006,        13528,           59
    [C7x_1 ]   2204.620754 s: 0.025,        52696,           60
    [C7x_1 ]   2204.620776 s: 0.014,        52696,           61
    [C7x_1 ]   2204.620798 s: 0.041,        21336,           62
    [C7x_1 ]   2204.620819 s: 0.054,        68312,           63
    [C7x_1 ]   2204.620840 s: 0.009,         6872,           64
    [C7x_1 ]   2204.620862 s: 0.133,         6744,           65
    [C7x_1 ]   2204.620883 s: 0.006,         5464,           66
    [C7x_1 ]   2204.620905 s: 0.000,            0,           67
    [C7x_1 ]   2204.620976 s: TIDL_initializeHandleForPreemption is completed
    [19:57:28] /root/zilou.cao/neo-ai-dlr/3rdparty/tvm/src/runtime/contrib/tidl/tidl_runtime.cc:677: #TVM# TVMRT_create tidl_tvm_0: 0x7d0f260
    [19:57:28] /root/zilou.cao/neo-ai-dlr/src/dlr.cc:243: Error: No metadata file was found!
    []
    hhhhhhhhhhhhhhhhh
    [C7x_1 ]   2205.214934 s: TIDL_activate is called with handle : 8004400
    [C7x_1 ]   2205.214974 s: TIDL_initDmaUtils returned Error Code for handle: 8004400

  • Hi,

    Another engineer has been looped in and is looking into this issue.

    Thank you,

    Fabiana

  • Hi,

        Can you set env var "TIDL_RT_DEBUG=3 TVM_RT_DEBUG=4" to get a more detailed C7x debug log?  https://software-dl.ti.com/codegen/docs/tvm/tvm_tidl_users_guide/infering.html#debugging-inference

        Thanks!

    -Yuan