PROCESSOR-SDK-J721S2: Compilation Options for Prequantized TFLite model

Ashay Mulye

Hi,

I converted an existing tensorflow model to post training quantized model. I used the following code to do so:

import tensorflow as tf
import numpy as np

images = np.load("D:\\quantization\\images.npy")
print(images.shape)


def representative_data_gen():
    for i in images:
        yield [i.astype(np.float32)]

model = tf.saved_model.load("D:\\quantization\\saved_model")


converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
tflite_model_quant = converter.convert()

with open("D:\\quantization\\calib.tflite", 'wb') as f:
    f.write(tflite_model_quant)

As per my understanding there are two possibilities now:

Option I: load the model as is and just compile it.

Option II: compile and recalibrate the model

I am using OSRT (TFLite), and I am confused with what options should be chosen for compilation.

As mentioned here : https://github.com/TexasInstruments/edgeai-torchvision/blob/master/docs/pixel2pixel/Quantization.md#instructions-for-compiling-models-in-open-source-runtimes-of-tidl-80-august-2021-onwards

compile_options = {"tidl_tools_path": os.environ['TIDL_TOOLS_PATH'],
                   "artifacts_folder": model_artifacts_path,
                   "tensor_bits": 8,
                   "accuracy_level":0,  #(Option I ?) to avoid further Calibration in TIDL.
                   "advanced_options:quantization_scale_type":1      #to use power of 2 quantization.
                   "deny_list":0,
                   }

However, I have seen in an answer on the forum, it has been suggested to use: advanced_options:quantization_scale_type as 3 (https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1232355/processor-sdk-j721s2-how-to-import-pre-quantized-network-tflite-cnn-models-using-tidl-without-calibration?tisearch=e2e-sitesearch&keymatch=TFLite%25252525252520prequantized%25252525252520model#)

And here the options mentioned are:

https://github.com/TexasInstruments/edgeai-torchvision/blob/master/docs/pixel2pixel/Quantization.md#instructions-for-ptq-compiling-models-in-open-source-runtimes-of-tidl

Does this mean that the options in the link above are default options and if we do not want to calibrate further and just compule we set accuracy_level to 0? and if its a PTQ TFLite model then scale_type should be set to 3?

Also, if the accuracy is not good enough with PTQ, can a Quantization Aware Trained model in Tensorflow be compiled with OSRT Python API? (Ref: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1101961/tda4vm-import-quantized-model-from-tflite-to-tidl?tisearch=e2e-sitesearch&keymatch=post%2525252525252520training%2525252525252520quantization#) However, the documentation mentioned only PTQ TFlite support explicitly.

Thanks in advance.

Best

Ashay

over 2 years ago

Processors

Processors forum

PROCESSOR-SDK-J721S2: Compilation Options for Prequantized TFLite model