PROCESSOR-SDK-J722S: The question about running inference with the default model(mobilenetv1) .

ken zhoung

Prodigy 185 points

Part Number: PROCESSOR-SDK-J722S

Tool/software:

Hi Team:

On TIDL 10.00.08.00

I try to convert the default model(mobilenet1v) with input = uint8 and input = float,

I found that the model with input=uint8 performs a dequantize step,

while the rest of the parts are identical to the model with input=float.

However, the model with input=uint8 actually has a shorter inference time.

I'm not sure why this symptom occurs.

Thanks for your kindly help.

Best regards,

Ken

8 months ago

0 Chris Tsongas 8 months ago

TI__Genius 15020 points

Hi Ken,

I do not understand your question. Please elaborate.

Chris

0 Christina Kuruvilla 8 months ago

TI__Expert 5740 points

Hi Ken,

The DSP is optimized for int, so despite the dequantize step, the actual computation still runs int. This is what cause the shorter inference time. TIDL is optimized for int8 and int16.

Regards,

Christina

0 ken zhoung 7 months ago

Prodigy 185 points

Hi Chris Tsongas, Christina Kuruvilla:
Sorry for the delayed response due to my personal leave.

I miss the result below:

Hi Christina Kuruvilla:
Apologies for the confusion, but may I kindly ask if my understanding is correct?
if the original ONNX file has input = u8, after converting it through onnxrt_ep.py (with tensor_bit=8), the inference process will run in int.
However, if the original ONNX file has input = float, after converting it through onnxrt_ep.py (with tensor_bit=8), the inference process will still run in float.

Thanks for your kindly help.

Best Regards,
Ken.

0 Christina Kuruvilla 7 months ago in reply to ken zhoung

TI__Expert 5740 points

Hi, Ken,

Yes, you have the right understanding.

Warm Regards,

Christina

0 ken zhoung 7 months ago in reply to Christina Kuruvilla

Prodigy 185 points

Hi Christina Kuruvilla:
Sorry to bother you, but I would like to ask:

if the input is u8 and becomes float after dequantization, how can the computations in the middle still be executed in u8?

Thanks for your kindly help.

Best regards,
Ken

0 Chris Tsongas 7 months ago in reply to ken zhoung

TI__Genius 15020 points

Hi Ken,

It depends on where and how you are running the network. When on the host running pure ONNX, the model is a 32-bit float. When running on the device(TIDL), the model's inputs are ints, and the model will run either 16 or 8-bit. I do not understand what you mean by dequantization. We take a 32-bit FP ONNX model and then quantize it into an integer approximation.

Chris

0 ken zhoung 7 months ago in reply to Chris Tsongas

Prodigy 185 points

Hi Chris Tsongas:

the tflite model in model_zoo/TFL-CL-0000-mobileNetV1-mlperf/model/mobilenet_v1_1.0_224.tflite is:
input(u8) --> dequantize(u8 to float) --> Conv(float)

That's why I'm confused.

The possible model workflows are:

Input (U8) → Dequantize (U8 to Float) → Other layers (Float)
Input (Float) → Other layers (Float)
Input (U8) → Other layers (U8)

Which of these workflows can be used to generate a U8 model via onnxrt_ep.py?

Additionally, I would like to confirm:
Is tensor_bits set in model_configs.py?
For example, in the following format:

optional_options=AttrDict(
    debug_level=6,
    tensor_bits=8
)

Thanks for your kindly help.
Best Regards,
Ken

+1 Chris Tsongas 7 months ago in reply to ken zhoung

TI__Genius 15020 points

Hi Ken,

All TIDL is either 8 bit or 16 bit int. There is no float support on the device. The flows are:

1. Int input -> 8/16bit flow through the network -> 8/16 bit out

2. Float input -> 8/16bit flow through the network -> 8/16 bit out

Set tensor_bits to 8. If your data is coming in a float (not from a camera, as that is int), you may need an input conversion layer from float to int, depending on your TIDL version.

Regards,

Chris

Processors

Processors forum

PROCESSOR-SDK-J722S: The question about running inference with the default model(mobilenetv1) .