This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

PROCESSOR-SDK-J722S: The question about running inference with the default model(mobilenetv1) .

Part Number: PROCESSOR-SDK-J722S

Tool/software:

Hi Team:

On TIDL 10.00.08.00

I try to convert the default model(mobilenet1v) with input = uint8 and input = float,

I found that the model with input=uint8 performs a dequantize step,

while the rest of the parts are identical to the model with input=float.

However, the model with input=uint8 actually has a shorter inference time.

I'm not sure why this symptom occurs.

Thanks for your kindly help.

Best regards,

Ken

  • Hi Ken,

    I do not understand your question.  Please elaborate. 

    Chris

  • Hi Ken,

    The DSP is optimized for int, so despite the dequantize step, the actual computation still runs int. This is what cause the shorter inference time. TIDL is optimized for int8 and int16.

    Regards,

    Christina

  • Hi , :
    Sorry for the delayed response due to my personal leave.

    I miss the result below:


    Hi :
    Apologies for the confusion, but may I kindly ask if my understanding is correct?
    if the original ONNX file has input = u8, after converting it through onnxrt_ep.py (with tensor_bit=8), the inference process will run in int.
    However, if the original ONNX file has input = float, after converting it through onnxrt_ep.py (with tensor_bit=8), the inference process will still run in float.

    Thanks for your kindly help.

    Best Regards,
    Ken.

  • Hi, Ken,

    Yes, you have the right understanding. 

    Warm Regards,

    Christina

  • Hi
    Sorry to bother you, but I would like to ask:

    if the input is u8 and becomes float after dequantization, how can the computations in the middle still be executed in u8?

    Thanks for your kindly help.

    Best regards,
    Ken

  • Hi Ken,

    It depends on where and how you are running the network. When on the host running pure ONNX, the model is a 32-bit float. When running on the device(TIDL), the model's inputs are ints, and the model will run either 16 or 8-bit. I do not understand what you mean by dequantization. We take a 32-bit FP ONNX model and then quantize it into an integer approximation. 

    Chris

  • Hi :  

    the tflite model in model_zoo/TFL-CL-0000-mobileNetV1-mlperf/model/mobilenet_v1_1.0_224.tflite is:
    input(u8) --> dequantize(u8 to float) --> Conv(float)

    That's why I'm confused.

    The possible model workflows are:

    1. Input (U8) → Dequantize (U8 to Float) → Other layers (Float)
    2. Input (Float) → Other layers (Float)
    3. Input (U8) → Other layers (U8)

    Which of these workflows can be used to generate a U8 model via onnxrt_ep.py?

    Additionally, I would like to confirm:
    Is tensor_bits set in model_configs.py?
    For example, in the following format:

    optional_options=AttrDict( debug_level=6, tensor_bits=8 )

    Thanks for your kindly help.
    Best Regards,
    Ken
  • Hi Ken,

    All TIDL is either 8 bit or 16 bit int.  There is no float support on the device.  The flows are:

    1. Int input -> 8/16bit flow through the network -> 8/16 bit out

    2. Float input -> 8/16bit flow through the network -> 8/16 bit out

    Set tensor_bits to 8. If your data is coming in a float (not from a camera, as that is int), you may need an input conversion layer from float to int, depending on your TIDL version.  

    Regards,

    Chris