TDA4VM: can't compile custom model using edge-AI studio model analyzer

Liu Tsing-Yue
Intellectual 291 points
Part Number: TDA4VM
Hi, I'm using the model analyzer from edge-AI studio to test onnx model inference time.
The model can be compiled by edge-AI tools without error, version = master.
However, some models can't be compiled successfully using model analyzer, the kernel died while executing sess.run.
Is it able to run model inference using both edge-AI studio and edge-AI tools? Thanks!
Here's my modified ipynb
#!/usr/bin/env python
# coding: utf-8

# # Custom Model Compilation and Inference using Onnx runtime 
# 
# In this example notebook, we describe how to take a pre-trained classification model and compile it using ***Onnx runtime*** to generate deployable artifacts that can be deployed on the target using the ***Onnx*** interface. 
#  
#  - Pre-trained model: `resnet18v2` model trained on ***ImageNet*** dataset using ***Onnx***  
#  
# In particular, we will show how to
# - compile the model (during heterogenous model compilation, layers that are supported will be offloaded to the`TI-DSP` and artifacts needed for inference are generated)
# - use the generated artifacts for inference
# - perform input preprocessing and output postprocessing
# - enable debug logs
# - use deny-layer compilation option to isolate possible problematic layers and create additional model subgraphs
# - use the generated subgraphs artifacts for inference
# - perform input preprocessing and output postprocessing
#     
# ## Onnx Runtime based work flow
# 
# The diagram below describes the steps for Onnx Runtime based work flow. 
# 
# Note:
#  - The user needs to compile models(sub-graph creation and quantization) on a PC to generate model artifacts.
#  - The generated artifacts can then be used to run inference on the target.
# 
# <img src=docs/images/onnx_work_flow_2.png width="400">

# In[1]:


import os
import tqdm
import cv2
import numpy as np
import onnxruntime as rt
import shutil
from scripts.utils import imagenet_class_to_name, download_model
import matplotlib.pyplot as plt
from pathlib import Path
from IPython.display import Markdown as md
from scripts.utils import loggerWritter
from scripts.utils import get_svg_path
import onnx


# ## Define utility function to preprocess input images
# Below, we define a utility function to preprocess images for `resnet18v2`. This function takes a path as input, loads the image and preprocesses it for generic ***Onnx*** inference. The steps are as follows: 
# 
#  1. load image
#  2. convert BGR image to RGB
#  3. scale image so that the short edge is 256 pixels
#  4. center-crop image to 224x224 pixels
#  5. apply per-channel pixel scaling and mean subtraction
# 
# 
# - Note: If you are using a custom model or a model that was trained using a different framework, please remember to define your own utility function.

# In[2]:


def preprocess_for_onnx_resent18v2(image_path):
    
    # read the image using openCV
    img = cv2.imread(image_path)
    
    # convert to RGB
    img = img[:,:,::-1]
    
    # Most of the onnx models are trained using
    # 224x224 images. The general rule of thumb
    # is to scale the input image while preserving
    # the original aspect ratio so that the
    # short edge is 256 pixels, and then
    # center-crop the scaled image to 224x224
    orig_height, orig_width, _ = img.shape
    short_edge = min(img.shape[:2])
    new_height = (orig_height * 256) // short_edge
    new_width = (orig_width * 256) // short_edge
    img = cv2.resize(img, (new_width, new_height), interpolation=cv2.INTER_CUBIC)

    startx = new_width//2 - (640//2)
    starty = new_height//2 - (256//2)
    img = img[starty:starty+256,startx:startx+640]
    
    # apply scaling and mean subtraction.
    # if your model is built with an input
    # normalization layer, then you might
    # need to skip this
    img = img.astype('float32')
    for mean, scale, ch in zip([128, 128, 128], [0.0078125, 0.0078125, 0.0078125], range(img.shape[2])):
            img[:,:,ch] = ((img.astype('float32')[:,:,ch] - mean) * scale)
    img = np.expand_dims(img,axis=0)
    img = np.transpose(img, (0, 3, 1, 2))
    
    return img


# ## Compile the model
# In this step, we create Onnx runtime with `tidl_model_import_onnx` library to generate artifacts that offload supported portion of the DL model to the TI DSP.
#  - `sess` is created with the options below to calibrate the model for 8-bit fixed point inference
#    
#     * **artifacts_folder** - folder where all the compilation artifacts needed for inference are stored 
#     * **tidl_tools_path** - os.getenv('TIDL_TOOLS_PATH'), path to `TIDL` compilation tools 
#     * **tensor_bits** - 8 or 16, is the number of bits to be used for quantization 
#     * **advanced_options:calibration_frames**  - number of images to be used for calibration
#      
#     ``` 
#     compile_options = {
#         'tidl_tools_path' : os.environ['TIDL_TOOLS_PATH'],
#         'artifacts_folder' : output_dir,
#         'tensor_bits' : 16,
#         'accuracy_level' : 0,
#         'advanced_options:calibration_frames' : len(calib_images), 
#         'advanced_options:calibration_iterations' : 3 # used if accuracy_level = 1
#     }
#     ``` 
#     
# - Note: The path to `TIDL` compilation tools and `aarch64` `GCC` compiler is required for model compilation, both of which are accessed by this notebook using predefined environment variables `TIDL_TOOLS_PATH` and `ARM64_GCC_PATH`. The example usage of both the variables is demonstrated in the cell below. 
# - `accuracy_level` is set to 0 in this example. For better accuracy, set `accuracy_level = 1`. This option results in more time for compilation but better inference accuracy. 
# Compilation status log for accuracy_level = 1 is currently not implemented in this notebook. This will be added in future versions. 
# - Please refer to TIDL user guide for further advanced options.

# In[3]:



# calib_images = [
# 'sample-images/frame_0.raw',
# ]

# calib_images = [[
# 'sample-images/rnn_feat_input_99_0_0.raw',
# 'sample-images/hidden_state_0_99_0_0.raw',
# 'sample-images/hidden_state_1_99_0_0.raw'
# ]]

calib_images = [[
'sample-images/kernel_params_99.raw',
'sample-images/mask_branch_99.raw',
'sample-images/reg_branch_99.raw'
]]


output_dir = 'custom-artifacts/onnx/split_final_1.onnx'
onnx_model_path = 'models/public/onnx/split_final_1.onnx'
download_model(onnx_model_path)
onnx.shape_inference.infer_shapes_path(onnx_model_path, onnx_model_path)


# ### Compilation knobs  (optional - In case of debugging accuracy)
# if a model accuracy at 8bits is not good, user's can try compiling same model at 16 bits with accuracy level of 1. This will reduce the performance, but it will give users a good accuracy bar.
# As a second step, user can try to increase 8 bits accuracy by increasing the number of calibration frames and iterations, in order to get closer to 16 bits + accuracy level of 1 results.

# In[4]:


#compilation options - knobs to tweak 
num_bits =8
accuracy =1


# ### Layers debug (optional - In case of debugging)
# Debug_level 3 gives layer information and warnings/erros which could be useful during debug. User's can see logs from compilation inside a giving path to "loggerWritter" helper function.
# 
# Another technique is to use deny_list to exclude layers from running on TIDL and create additional subgraphs, in order to aisolate issues.

# In[5]:


from scripts.utils import loggerWritter

log_dir = Path("logs").mkdir(parents=True, exist_ok=True)

# stdout and stderr saved to a *.log file.  
with loggerWritter("logs/custon-model-onnx"):
    
    # model compilation options
    compile_options = {
        'tidl_tools_path' : os.environ['TIDL_TOOLS_PATH'],
        'artifacts_folder' : output_dir,
        'tensor_bits' : num_bits,
        'accuracy_level' : accuracy,
        'advanced_options:calibration_frames' : len(calib_images), 
        'advanced_options:calibration_iterations' : 3, # used if accuracy_level = 1
        'debug_level' : 1,
#         "deny_list":"Resize, Reshape, Transpose" #Comma separated string of operator types as defined by ONNX runtime, ex "MaxPool, Concat"
#         "deny_list": "" # rnn
        "deny_list":"Slice, Split" #final_1
    }


# <div class="alert alert-block alert-info">
# <b>Note:</b> Please note 'deny_list' is used in above cell as an example and it can be deleted as "MaxPool" is a supported layer
# </div>

# In[6]:


# create the output dir if not present
# clear the directory
os.makedirs(output_dir, exist_ok=True)
for root, dirs, files in os.walk(output_dir, topdown=False):
    [os.remove(os.path.join(root, f)) for f in files]
    [os.rmdir(os.path.join(root, d)) for d in dirs]


# In[7]:


# #head

# so = rt.SessionOptions()
# EP_list = ['TIDLCompilationProvider','CPUExecutionProvider']
# sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[compile_options, {}], sess_options=so)

# input_details = sess.get_inputs()

# with open(calib_images[0], 'rb') as raw_file:
#         img = raw_file.read()
#         img = np.frombuffer(img, dtype=np.float32)
#         img = img.reshape((1, 3, 256, 640))

# for i in range(5):
#     output = sess.run(None, {input_details[0].name: img})
    
# for num in tqdm.trange(len(calib_images)):
#     output = list(sess.run(None, {input_details[0].name: img}))


# In[8]:


# #rnn

# so = rt.SessionOptions()
# EP_list = ['TIDLCompilationProvider','CPUExecutionProvider']
# sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[compile_options, {}], sess_options=so)

# input_details = sess.get_inputs()

# with open(calib_images[0][0], 'rb') as raw_file:
#     rnn_feat_input = raw_file.read()
#     rnn_feat_input = np.frombuffer(rnn_feat_input, dtype=np.float32)
#     rnn_feat_input = rnn_feat_input.reshape((1, 128, 1, 1))
# with open(calib_images[0][1], 'rb') as raw_file:
#     hidden_h = raw_file.read()
#     hidden_h = np.frombuffer(hidden_h, dtype=np.float32)
#     hidden_h = hidden_h.reshape((1, 128, 1, 1))
# with open(calib_images[0][2], 'rb') as raw_file:
#     hidden_c = raw_file.read()   
#     hidden_c = np.frombuffer(hidden_c, dtype=np.float32)
#     hidden_c = hidden_c.reshape((1, 128, 1, 1)) 
 
# print(rnn_feat_input.shape)
# print(hidden_h.shape)   
# print(hidden_c.shape)
# print(input_details[0].shape)
# print(input_details[1].shape)   
# print(input_details[2].shape)

# for i in range(5):
#     output = sess.run(None, 
#                 {input_details[0].name: rnn_feat_input, 
#                 input_details[1].name: hidden_h, 
#                 input_details[2].name: hidden_c})
    
# for num in tqdm.trange(len(calib_images)):
#     output = list(sess.run(None, 
#                 {input_details[0].name: rnn_feat_input, 
#                 input_details[1].name: hidden_h, 
#                 input_details[2].name: hidden_c}))


# In[ ]:


#final

so = rt.SessionOptions()
EP_list = ['TIDLCompilationProvider','CPUExecutionProvider']
sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[compile_options, {}], sess_options=so)

input_details = sess.get_inputs()
num_ins = 1

with open(calib_images[0][0], 'rb') as raw_file:
    kernel_params = raw_file.read()
    kernel_params = np.frombuffer(kernel_params, dtype=np.float32)
    kernel_params = kernel_params.reshape((num_ins, 134))
with open(calib_images[0][1], 'rb') as raw_file:
    mask_branch = raw_file.read()
    mask_branch = np.frombuffer(mask_branch, dtype=np.float32)
    mask_branch = mask_branch.reshape((1, 64, 32, 80))
with open(calib_images[0][2], 'rb') as raw_file:
    reg_branch = raw_file.read()   
    reg_branch = np.frombuffer(reg_branch, dtype=np.float32)
    reg_branch = reg_branch.reshape((1, 64, 32, 80)) 

print(kernel_params.shape)
print(mask_branch.shape)   
print(reg_branch.shape)

floating_model = (input_details[0].type == 'tensor(float)')
if not floating_model:
        kernel_params = np.uint8(kernel_params)
        mask_branch = np.uint8(mask_branch)
        reg_branch = np.uint8(reg_branch)

for i in range(5):
    output = sess.run(
                None, 
                {input_details[0].name: kernel_params, 
                input_details[1].name: mask_branch, 
                input_details[2].name: reg_branch})
    
for num in tqdm.trange(len(calib_images)):
    output = list(sess.run(
                None, 
                {input_details[0].name: kernel_params, 
                input_details[1].name: mask_branch, 
                input_details[2].name: reg_branch}))


# ### Subgraphs visualization  (optional - In case of debugging models and subgraps)
# Running below cell gives links to complete graph and TIDL subgraphs visualizations. This, along with "deny_list" feature, explained above, offer tools for potencially checking and isolating issues in NN model layers.

# In[ ]:


subgraph_link =get_svg_path(output_dir) 
for sg in subgraph_link:
    hl_text = os.path.join(*Path(sg).parts[4:])
    sg_rel = os.path.join('../', sg)
    display(md("[{}]({})".format(hl_text,sg_rel)))


# ## Use compiled model for inference
# Then using ***Onnx*** with the ***`libtidl_onnxrt_EP`*** inference library we run the model and collect benchmark data.

# In[ ]:


# EP_list = ['TIDLExecutionProvider','CPUExecutionProvider']

# sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[compile_options, {}], sess_options=so)
# #Running inference several times to get an stable performance output
# for i in range(5):
#     output = list(sess.run(None, {input_details[0].name : preprocess_for_onnx_resent18v2('sample-images/elephant.bmp')}))

# for idx, cls in enumerate(output[0].squeeze().argsort()[-5:][::-1]):
#     print('[%d] %s' % (idx, '/'.join(imagenet_class_to_name(cls))))
    
# from scripts.utils import plot_TI_performance_data, plot_TI_DDRBW_data, get_benchmark_output
# stats = sess.get_TI_benchmark_data()
# fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(10,5))
# plot_TI_performance_data(stats, axis=ax)
# plt.show()

# tt, st, rb, wb = get_benchmark_output(stats)
# print(f'Statistics : \n Inferences Per Second   : {1000.0/tt :7.2f} fps')
# print(f' Inference Time Per Image : {tt :7.2f} ms  \n DDR BW Per Image        : {rb+ wb : 7.2f} MB')


# In[ ]:


# # head inference

# EP_list = ['TIDLExecutionProvider','CPUExecutionProvider']

# sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[compile_options, {}], sess_options=so)
# #Running inference several times to get an stable performance output

# with open(calib_images[0], 'rb') as raw_file:
#         img = raw_file.read()
#         img = np.frombuffer(img, dtype=np.float32)
#         img = img.reshape((1, 3, 256, 640))

# for i in range(5):
#     output = sess.run(None, {input_details[0].name: img})

# # for idx, cls in enumerate(output[0].squeeze().argsort()[-5:][::-1]):
# #     print('[%d] %s' % (idx, '/'.join(imagenet_class_to_name(cls))))
    
# from scripts.utils import plot_TI_performance_data, plot_TI_DDRBW_data, get_benchmark_output
# stats = sess.get_TI_benchmark_data()
# fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(10,5))
# plot_TI_performance_data(stats, axis=ax)
# plt.show()

# tt, st, rb, wb = get_benchmark_output(stats)
# print(f'Statistics : \n Inferences Per Second   : {1000.0/tt :7.2f} fps')
# print(f' Inference Time Per Image : {tt :7.2f} ms  \n DDR BW Per Image        : {rb+ wb : 7.2f} MB')


# In[ ]:


# # rnn inference

# EP_list = ['TIDLExecutionProvider','CPUExecutionProvider']

# sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[compile_options, {}], sess_options=so)
# #Running inference several times to get an stable performance output

# with open(calib_images[0][0], 'rb') as raw_file:
#     rnn_feat_input = raw_file.read()
#     rnn_feat_input = np.frombuffer(rnn_feat_input, dtype=np.float32)
#     rnn_feat_input = rnn_feat_input.reshape((1, 128, 1, 1))
# with open(calib_images[0][1], 'rb') as raw_file:
#     hidden_h = raw_file.read()
#     hidden_h = np.frombuffer(hidden_h, dtype=np.float32)
#     hidden_h = hidden_h.reshape((1, 128, 1, 1))
# with open(calib_images[0][2], 'rb') as raw_file:
#     hidden_c = raw_file.read()   
#     hidden_c = np.frombuffer(hidden_c, dtype=np.float32)
#     hidden_c = hidden_c.reshape((1, 128, 1, 1)) 

# for i in range(5):
#     output = sess.run(None, 
#                 {input_details[0].name: rnn_feat_input, 
#                 input_details[1].name: hidden_h, 
#                 input_details[2].name: hidden_c})

# # for idx, cls in enumerate(output[0].squeeze().argsort()[-5:][::-1]):
# #     print('[%d] %s' % (idx, '/'.join(imagenet_class_to_name(cls))))
    
# from scripts.utils import plot_TI_performance_data, plot_TI_DDRBW_data, get_benchmark_output
# stats = sess.get_TI_benchmark_data()
# fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(10,5))
# plot_TI_performance_data(stats, axis=ax)
# plt.show()

# tt, st, rb, wb = get_benchmark_output(stats)
# print(f'Statistics : \n Inferences Per Second   : {1000.0/tt :7.2f} fps')
# print(f' Inference Time Per Image : {tt :7.2f} ms  \n DDR BW Per Image        : {rb+ wb : 7.2f} MB')


# In[ ]:


# head inference

EP_list = ['TIDLExecutionProvider','CPUExecutionProvider']

sess = rt.InferenceSession(onnx_model_path ,providers=EP_list, provider_options=[compile_options, {}], sess_options=so)
#Running inference several times to get an stable performance output

num_ins = 1
with open(image_files[0][0], 'rb') as raw_file:
    kernel_params = raw_file.read()
    kernel_params = np.frombuffer(kernel_params, dtype=np.float32)
    kernel_params = kernel_params.reshape((num_ins, 134))
with open(image_files[0][1], 'rb') as raw_file:
    mask_branch = raw_file.read()
    mask_branch = np.frombuffer(mask_branch, dtype=np.float32)
    mask_branch = mask_branch.reshape((1, 64, 32, 80))
with open(image_files[0][2], 'rb') as raw_file:
    reg_branch = raw_file.read()   
    reg_branch = np.frombuffer(reg_branch, dtype=np.float32)
    reg_branch = reg_branch.reshape((1, 64, 32, 80)) 
    
for i in range(5):
    output = sess.run(
                None, 
                {input_details[0].name: kernel_params, 
                input_details[1].name: mask_branch, 
                input_details[2].name: reg_branch})

# for idx, cls in enumerate(output[0].squeeze().argsort()[-5:][::-1]):
#     print('[%d] %s' % (idx, '/'.join(imagenet_class_to_name(cls))))
    
from scripts.utils import plot_TI_performance_data, plot_TI_DDRBW_data, get_benchmark_output
stats = sess.get_TI_benchmark_data()
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(10,5))
plot_TI_performance_data(stats, axis=ax)
plt.show()

tt, st, rb, wb = get_benchmark_output(stats)
print(f'Statistics : \n Inferences Per Second   : {1000.0/tt :7.2f} fps')
print(f' Inference Time Per Image : {tt :7.2f} ms  \n DDR BW Per Image        : {rb+ wb : 7.2f} MB')


# ## EVM's console logs (optional - in case of inference failure)
# 
# To copy console logs from EVM to TI EdgeAI Cloud user's workspace, go to: "Help -> Troubleshooting -> EVM console log", In TI's EdgeAI Cloud landing page.
# 
# Alternatevely, from workspace, open/run evm-console-log.ipynb
over 1 year ago
0 Liu Tsing-Yue over 1 year ago
Intellectual 291 points
my models & data
split_head.onnx works, but others don't.models&data.zip
0 Liu Tsing-Yue over 1 year ago in reply to Liu Tsing-Yue
Intellectual 291 points
By the way, the inference statistics looks strange, the inference time is negative
0 Pratik Kedar over 1 year ago in reply to Liu Tsing-Yue
TI__Mastermind 24041 points
Hi,
Liu Tsing-Yue said:
The model can be compiled by edge-AI tools without error, version = master.
We recommend to use SDK Tags based on your project need while cloning the tidl tools repo (Please check out the extensive documentation here : https://github.com/TexasInstruments/edgeai-tidl-tools)
The tidl tools supports host emulation based model inference, please check out OSRT Python based examples documentation here : https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/examples/osrt_python/README.md
TIDLExecutionProvider can invoke the TIDL accelerated offload on Target SoC if the the script is ran on target, the details are available in above mentioned link.
For custom model compilation, you can use tidl tools repos with appropriate SDK tag to generate artifacts that can be inferred on target EVM.
0 Liu Tsing-Yue over 1 year ago in reply to Pratik Kedar
Intellectual 291 points
ok, thanks for the reply!
Processors

Processors forum

TDA4VM: can't compile custom model using edge-AI studio model analyzer