This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: Inference with Custom Artifacts Kills Kernel in Edge AI Studio

Part Number: TDA4VM

I'm attempting rerun calibration inference on the pre-trained YOLO X human pose estimation model in Model Zoo and then write a custom post processing function that adds on the the keypoints+skeleton drawing for judging the pose itself.

I’m using the Edge AI Cloud tool following the custom ONNX model example but using the Human Pose Estimation as a reference. However, after I run the calibration inference, I can’t seem to run interference again with the custom artifacts generated from the calibration run. Even in a totally new notebook, whenever I point to my custom artifacts in the rt.InferenceSession() function, it kills the kernel every time I try to run it. 
I’m not sure if I maybe just don’t have the directory structure set up right, or if I missed something in the backend. I’ve attached the Python code from my notebook I created in the Edge AI Cloud tool along with my log files. The error log just shows a double free or corruption error, but I couldn’t figure out the specific issue causing that error.
Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/usr/bin/env python
# coding: utf-8
# In[1]:
import os
import re
import sys
import cv2
import tqdm
import onnx
import math
import copy
import shutil
import platform
import itertools
import numpy as np
import onnxruntime as rt
import ipywidgets as widgets
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
custon-model-onnx_err.log
Fullscreen
1
double free or corruption (!prev)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
2821.custon-model-onnx_out.log
  • Hi Knitter, probably what is missing is to add "meta_arch_type" and "meta_layers_name_list" to "compile_options". Example below:

    # stdout and stderr saved to a *.log file.
    with loggerWritter("logs/custon-model-onnx"):
    compile_options = {
    'tidl_tools_path' : os.environ['TIDL_TOOLS_PATH'],
    'artifacts_folder' : output_dir,
    'tensor_bits' : 8,
    'accuracy_level' : 1,
    'advanced_options:calibration_frames' : len(calib_images),
    'advanced_options:calibration_iterations' : 3, # used if accuracy_level = 1
    'object_detection:meta_arch_type': 6,
    'object_detection:meta_layers_names_list': f'CustomModels/Yolop/yolop_640_ti_lite_metaarch.prototxt',
    }

    You can find *.prototxt files inside model-zoo. Example path: /home/root/notebooks/model-zoo/modelartifacts/8bits/kd-7060_onnxrt_coco_edgeai-yolox_yolox_s_pose_ti_lite_49p5_78p0_onnx/model/yolox_s_pose_ti_lite_metaarch.prototxt

    Also, you can comment (delete) 'deny_list' : "MaxPool". This was added only  as an example to create subgraphs, as a possible tool for debugging by denying layers which could have an issue.

    thank you,

    Paula

  • Hi Paula!

    Thanks for your reply! I updated my compile options to:

        compile_options = {
            'tidl_tools_path' : os.environ['TIDL_TOOLS_PATH'],
            'artifacts_folder' : output_dir,
            'tensor_bits' : num_bits,
            'accuracy_level' : accuracy,
            'advanced_options:calibration_frames' : len(calib_images),
            'advanced_options:calibration_iterations' : 3, # used if accuracy_level = 1 
            'object_detection:meta_arch_type': 6,
            'object_detection:meta_layers_names_list': f'/home/root/notebooks/prebuilt-models/8bits/kd-7060_onnxrt_coco_edgeai-yolox_yolox_s_pose_ti_lite_49p5_78p0_onnx/model/yolox_s_pose_ti_lite_metaarch.prototxt'
        }

    The kernel still dies after I try to run inference with my custom artifacts. Should I also be adding the "meta_arch_type" and "meta_layers_name_list"  options to the delegate options I pass for the inference being ran with my custom artifacts?:

    delegate_options = {
        'artifacts_folder': './custom-artifacts/onnx/yolox_s_pose_ti_lite_49p5_78p0.onnx'
    }

    so0 = rt.SessionOptions()
    EP_list = ['TIDLCompilationProvider','CPUExecutionProvider']

    sess0 = rt.InferenceSession(onnx_model_path_EdgeAIcloud ,providers=EP_list, provider_options=[delegate_options, {}], sess_options=so0)

  • Hi Whitney, yes please use same compilation options as delegate options. Let me share with you a notebook example for yolox_s_lite that I have, in case it helps.

    thank you,

    Paula

    custom-model-onnx-yolovx.ipynb

  • Also, for "yolox_s_pose_ti_lite" model I see we use mix precision to improve accuracy. Example below

    'advanced_options:output_feature_16bit_names_list': '513, 758, 883, 1008, 756, 753, 878, 881, 1003, 1006',
    'object_detection:meta_arch_type': 6,
    'object_detection:meta_layers_names_list': f'prebuilt-models/8bits/kd-7060_onnxrt_coco_edgeai-yolox_yolox_s_pose_ti_lite_49p5_78p0_onnx/model/yolox_s_pose_ti_lite_metaarch.prototxt',

    thank you,

    Paula

  • Hi Paula!

    Thank you for providing the example notebook. I tried running it as-is (pointing at a different sample image since the one specified didn't exist for me), however it also causes the kernel to die when calling the sess = rt.InferenceSession() function.

  • Hi Whitney,  You can run evm-console-log.ipynb and see if there is any errors or messages there which can give us a clue of the issue.

    However, my guess is that you probably haven't unzip the model. Probably also true for the pose estimation model you were trying. I will take a note to see if we can have unzip models in user's workspace by default.

    For now, you can extract all the models by running extract.sh inside notebooks/prebuilt-models/8bits/

    Or you can extract a particular model. Example below:

    user@1c26caa7764f:/home/root/notebooks/prebuilt-models/8bits$ find . -name "*od-8220_onnxrt_coco_edgeai-mmdet_yolox_s_lite_640x640_20220221_model_onnx.tar.gz" -exec tar --one-top-level -zxvf "{}" \;

    Another trick, for custom-models, is that you can convert them to python scripts and run them from terminal, to get more errors information, instead of just kernel died. Example below:

    in terminal you can type bash 

    user@1c26caa7764f:/home/root/notebooks$ jupyter nbconvert --to script custom-model-onnx-yolox.ipynb
    [NbConvertApp] Converting notebook custom-model-onnx-yolox.ipynb to script
    [NbConvertApp] Writing 7765 bytes to custom-model-onnx-yolox.py
    user@1c26caa7764f:/home/root/notebooks$ python3 custom-model-yolox.py

    Thank you,

    Paula

  • Hi Paula,

    You are right, I forgot the models need to be unzipped every time a new EVM session is started.

    I did also happen to find the issue causing my original notebook to hang and the kernel to die. I had a copy+paste error when setting the EP_list variable for the execution inference run after the calibration inference run. So I was trying to run calibration inference twice and recompile the model without realizing it, which causes some sort of memory allocation issue in Jupyter and kills the kernel. 

    So for the second inference run, the EP_list variable should have been: EP_list = ['TIDLExecutionProvider','CPUExecutionProvider']

    But I still had it set to: EP_list = ['TIDLCompilationProvider','CPUExecutionProvider']

    Although, I've hit a new issue. The skeleton keypoints are in the wrong place in the image (not on the person). I know the human pose yolox model can only handle one subject in the frame at a time, so I'm thinking the spot in the background the skeleton keypoints are being placed on must look like a person somehow (it's ceiling rafters in my basement haha). I'm going to try cropping this point in the image out and rerunning the model in the morning.

  • Hi Whitney, I am glad to hear that you are getting some progress =). I am checking with my team if we can host uncompressed model artifacts for the next release to avoid any issues.

    One thing is that if accuracy is not OK, you can try first to run the model at 16bit (num_bits = 16) and see if this helps. If so, you can increase calibration_frames, maybe 15 or 25 something like that, and, calibration_iterations 25 probably is OK, we sometimes use 50.

    thank you,

    Paula

  • Hi Paula!

    I reran calibration with the model at 16-bit and more sample images. The skeleton key points of the pose matches the subject in the photo, but instead of being placed over the subject, it ends up in the upper lefthand corner and is much smaller (see attached screenshot). Adding more calibration images doesn't seem to have any impact on this result. What would you recommend?

  • Hi Whitney,

    you can try by commenting below lines.

    # plot the outut using matplotlib
    #plt.rcParams["figure.figsize"]=20,20
    #plt.rcParams['figure.dpi'] = 200 # 200 e.g. is really fine, but slower

    if it doesn't help, please take a look at single_img_visualise() inside workspace's notebooks/scripts/utils.py.

    thank you,

    Paula

  • Hi Withney, I saw your post in Hackster.oi (Practicing Yoga with AI: Human Pose Estimation on the TDA4VM - Hackster.io), congratulations, it is pretty nice! .. and ok, your issues with postprocessing was due the image size =), I will note it in case this issue arise again to someone else.

    I will mark this as "TI thinks is resolved"

    thank you,

    Paula