TDA4VM: Deploying custom semantic segmentation onnx model

Florence Teste

Part Number: TDA4VM

Tool/software:

Hi,

I’m trying to perform inference on a TDA4VM board using a custom onnx model for semantic segmentation. I’ve tried two approaches:

1. Generate inference artifacts using the onnxruntime with TIDLCompilationProvider and the following provider options

TIDL_OPTIONS = {
   'tidl_tools_path' : get_tidl_tools_path(),
   'artifacts_folder' : WORKDIR + 'artifacts/',
   "platform":"J7",
   "version":"7.2",
   'debug_level' : 2,
   'tensor_bits' : 8,
   "ti_internal_nc_flag" : 1601,
   'advanced_options:calibration_frames' : 10,
   'advanced_options:calibration_iterations' : 5,
   'accuracy_level' : 1,
}

This returns no errors, and produces the required artifacts, but inference on the board based on these artifacts and onnxruntime with TIDLExecutionProvider yields severely degraded performance (near zero accuracy) for my model.

2. Pre-quantize the model using onnxruntime.quantization.quantize_static, and then try to generate artifacts using the TIDLCompilationProvider with provider options

TIDL_OPTIONS = {
   'tidl_tools_path' : get_tidl_tools_path(),
   'artifacts_folder' : WORKDIR + 'artifacts/',
   "platform":"J7",
   "version":"7.2",
   'debug_level' : 2,
   'tensor_bits' : 8,
   "ti_internal_nc_flag" : 1601,
   'advanced_options:prequantized_model': 1,
   'accuracy_level' : 9,
}

This also returns no errors, but does not produce all the required artifacts in the artifacts folder. It produces the allowedNode.txt and onnxrtMetaData.txt files, but no .bin files; the .bin files are present in the tempDir subdirectory of the artifacts folder, but the file sizes of these artifacts is 4x what they should be (indicating float data rather than int8), and they cannot be used successfully for inference on the board. Note also that running the pre-quantized model on CPU yields high accuracy network predictions, so I’m sure that my network can be quantized to int8 without substantial loss of accuracy.

This problem seems to be independent of the actual model I’m using. Below, I’ve attached a minimal example of a randomly initialized 2 layer conv net that, when compiled+evaluated according to approach 1 above, yields low quality predictions, while pre-quantizing yields much better quality predictions on CPU, but no .bin artifacts are produced at compilation time. If you would like to run this script to see the problem yourself:

For approach one, run
1. python3 minimal_workflow.py -c [in the TI dev docker with standard setup]
2. scp -r ti_comp_test minimal_workflow.py tda4vm: [i.e. copy script and model to board]
3. python3 minimal_workflow.py -e [on the board]
For approach two, run
1. python3 minimal_workflow.py -p [in the TI dev docker]
2. ls ti_comp_test/artifacts/

Step 1c and 2a print accuracy numbers that illustrate the effect of quantization; note that the numbers for 1c are much worse than those for 2a. Step 2b shows that no .bin files are generated when compiling pre-quantized models.

[Note that step 1a does show some warnings in stdout, but no fatal error is encountered; there are no warnings similar to these for my semantic segmentation model]

Any help regarding this would be highly appreciated! I’d prefer to use approach 2 (since this gives me more flexibility regarding the quantization algorithm), but fixing approach 1 would also be a step forward! Thanks for your help!

over 1 year ago

0 Florence Teste over 1 year ago

Prodigy 30 points

It looks like the python script I mentioned in the post wasn't properly attached, so I'm posting it here, inline.

import os
import argparse
import numpy as np
import onnxruntime as rt


def get_tidl_tools_path():
    if 'TIDL_TOOLS_PATH' in os.environ: return os.environ['TIDL_TOOLS_PATH']
    if os.path.exists('/home/root/tidl_tools'): return '/home/root/tidl_tools'
    if os.path.exists('/home/root/edgeai-tidl-tools'): return '/home/root/edgeai-tidl-tools'
    else: raise FileNotFoundError('Please set TIDL_TOOLS_PATH environment variable')

WORKDIR = './ti_comp_test/'
TIDL_PARAMS = {
    'tidl_tools_path' : get_tidl_tools_path(),
    'artifacts_folder' : WORKDIR + 'artifacts/',
    "platform":"J7",
    "version":"7.2",
    'debug_level' : 2,
    'tensor_bits' : 8,
    "ti_internal_nc_flag" : 1601,
}
IN_SHAPE=(1,3,128,128)
np.set_printoptions(precision=3)


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('-c','--compile', action='store_true', help='Compile model with TI quantization')
    parser.add_argument('-p','--prequant_compile', action='store_true', help='Compile model with onxruntime quantization')
    parser.add_argument('-e','--eval', action='store_true', help='Eval model')
    
    args = parser.parse_args()
    
    assert sum([args.compile, args.prequant_compile, args.eval]) == 1, "Please select one of the options"
    
    if not args.eval:
        rm_folder(WORKDIR)
        os.makedirs(WORKDIR, exist_ok=True)
    
    if args.compile:
        compile_model(pre_quantize=False)
    elif args.prequant_compile:
        compile_model(pre_quantize=True)
    elif args.eval:
        eval_model()
    else: raise NotImplementedError


def compile_model(pre_quantize=True):    
    make_onnx_model(pre_quantize)
    os.mkdir(TIDL_PARAMS['artifacts_folder'])
    engine = rt.InferenceSession(
        WORKDIR + 'model.onnx', 
        providers=['TIDLCompilationProvider', 'CPUExecutionProvider'],
        provider_options=[
            dict(
            **TIDL_PARAMS,
            **(
                {
                    'accuracy_level': 9,
                    'advanced_options:prequantized_model': 1,
                } if pre_quantize else {
                    'advanced_options:calibration_frames' : 10, 
                    'advanced_options:calibration_iterations' : 5,
                    'accuracy_level' : 1,
                })
            ),{}]
    )
    run(engine, '')
    if pre_quantize: 
        # os.rename(WORKDIR + 'artifacts/tempDir/subgraph_0_tidl_net.bin', WORKDIR + 'artifacts/subgraph_0_tidl_net.bin')
        # os.rename(WORKDIR + 'artifacts/tempDir/subgraph_0_tidl_io_1.bin', WORKDIR + 'artifacts/subgraph_0_tidl_io_1.bin')
        run(rt.InferenceSession(WORKDIR + 'model.onnx', providers=['CPUExecutionProvider']), '/chip_results.npy')
        compare_results()
    # rm_folder(WORKDIR + 'artifacts/tempDir')


def eval_model():
    engine = rt.InferenceSession(
        WORKDIR + 'model.onnx', 
        providers=['TIDLExecutionProvider', 'CPUExecutionProvider'],
        provider_options=[TIDL_PARAMS, {}]
    )
    run(engine, '/chip_results.npy')
    compare_results()


def compare_results():
    x = np.squeeze(np.load(WORKDIR+'host_results.npy'))
    y = np.squeeze(np.load(WORKDIR+'chip_results.npy'))
    print('Sample outputs from float comp', x.reshape(-1)[::150000])
    print('Sample outputs from int8  comp', y.reshape(-1)[::150000])
    print('Mean absolute difference', np.abs(x-y).mean())


def run(engine, save_path):
    out = [engine.run(None, x)[0] for x in make_images()]
    if save_path:
        np.save(WORKDIR+save_path, out[0])


def make_images():
    return [{'input.1': x} for x in np.linspace(0,100,128**2*3*10,dtype=np.float32).reshape(10,*IN_SHAPE)]


def make_onnx_model(pre_quantize=True):
    import torch    
    from torch import nn
    import onnx
    from onnx.version_converter import convert_version
    
    net = nn.Sequential(cbr(3,32),cbr(32,64))
    torch.onnx.export(net, torch.randn(*IN_SHAPE), WORKDIR + 'model.onnx', opset_version=17)
    net = onnx.load(WORKDIR + 'model.onnx')
    net = convert_version(net, 18)
    net.ir_version = 8
    onnx.save(net, WORKDIR + 'model.onnx')
    run(
        rt.InferenceSession(WORKDIR + 'model.onnx', providers=['CPUExecutionProvider']),
        'host_results.npy'
    )
    if pre_quantize: 
        prequantize_onnx_model(WORKDIR + 'model.onnx')


def cbr(d1, d2):
    from torch import nn
    return nn.Sequential(nn.Conv2d(d1,d2,3), nn.BatchNorm2d(d2), nn.ReLU())


def prequantize_onnx_model(onnx_model_path):
    from onnxruntime.quantization import quantize_static
    from onnxruntime.quantization.calibrate import CalibrationDataReader
    
    calibration_images = make_images()  
    
    class DR(CalibrationDataReader):
        def __init__(self) -> None:
            super().__init__()
            self.enum=None
            
        def get_next(self):
            if self.enum is None:
                self.enum = iter(calibration_images)
            return next(self.enum, None)
        
        def rewind(self):
            self.enum=None
    
    quantize_static(
        onnx_model_path,
        onnx_model_path,
        DR(),
        per_channel=False,
    )


def rm_folder(folder):
    if os.path.exists(folder):
        for root, dirs, files in os.walk(folder, topdown=False):
            [os.remove(os.path.join(root, f)) for f in files]
            [os.rmdir(os.path.join(root, d)) for d in dirs]
        os.rmdir(folder)


if __name__ == '__main__':
    main()

Processors

Processors forum

TDA4VM: Deploying custom semantic segmentation onnx model