This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: [TIDL] The value field of the 8-bit fixed-point output differs greatly from the true value field.

Part Number: TDA4VM


Hi all.

I use the TIDL8bit fixed point model (QAT) to get a set of outputs, and since I set the writeTraceLevel to 3, I can directly get the corresponding floating point values for the outputs. Using the same input, a set of floating point outputs can be obtained on the pytorch floating point model.

Histograms are drawn using logarithmic and linear coordinates to compare the two sets of outputs, respectively.

The top is the pytorch floating point output and the bottom is the TIDL output.

From the histogram of the linear coordinates, it can be seen that the shape of the output distribution of TIDL and pytorch appear to be the same, but the range of values of the two fields is particularly different. This can lead to very large errors in subsequent post-processing.
I was wondering if it is possible to make the value field of the TIDL output match the true value by multiplying it by a scaling factor? Or is there another way to improve TIDL fixed-point output accuracy?

Best regards.

lance

  • Hi Lance,

        Few questions :

    Regards,

    Anshu

  • 1. Yes. I used QAT.

    2. The two graphs above depict the distribution of the output FEATURE in pytorch, with the difference that the y-axis of the left image uses a logarithmic coordinate system and the y-axis of the right image uses a linear coordinate system. The following two graphs depict the distribution of the same output feature in TIDL.

  • The x-axis of the histogram is the value of FEATURE and the vertical y-axis is the number of that value.

  • Hi Anshu,

    This issue is seriously blocking the progress of our algorithm deployment and I hope you can support it soon. Thank you.

    Regards,

    Lance

  • Hi Lance,

    Quantization Aware Training (QAT) is a proper training method and it can change the model parameters as well as the model output. Comparing the output of a QAT model to original float model may not be always appropriate because QAT will change the model.

    If no change was required, then QAT would not be required in the first place - but QAT adapts the model parameters and the feature map to overcome the loss due to quantization. 

    Now it all depends on the loss function that you are using. If you are doing semantic segmentation, which does an Softmax for training, ArgMax during inference, the scale of the output feature map doesn't have much significance. So the loss that you are using may not help QAT to preserve the scale of the output.

    But if you are using Object detection of Depth prediction etc and have used the correct loss function during QAT, it is likely that the output scale will be preserved because the output has a relation to the scale of the feature map.

    In summary, QAT changes the model. The loss function used during QAT determines whether scale of the output will be preserved or not.

  • We would also like to know a bit about the post processing that you mentioned. What kind of post procesing is it? Was that post processing not part of the loss function during QAT? 

  • 1. We use this output to do bbox regression. It is a part of loss function.

    2. We use muti-task trainng while QAT, does it matters?

  • >>We use muti-task trainng while QAT, does it matters?

    What does the above statement mean? Whether it is single task training or multi-task training, the whole model has to be wrapped in QuantTrainModule  while during QAT. You cannot do it part by part.

    Question: Was the whole model used for QAT? Was there any Convolution/BN/ReLU or such layers there was NOT wrapped inside the QuantTrainModule when you did QAT? If any such part was not included in QuantTrainModule , then obviously those layers are not finetuned for the QAT model.

  • The purpose of QAT is NOT to make the output of fixed point look similar to the output of floating point.The only purpose of QAT is to minimize your loss. It may do so by changing the model parameters as well as the scale or range of feature map. It all depends on your loss function. 

    But the loss at the end of QAT should be similar to the loss obtained during floating point training. Also the accuracy at the end of QAT should be similar to the floating point accuracy.

    But if some part of your model was not wrapped in QuantTrainModule, then we can't guarantee that the model has learned to overcome quantization errors even if your loss and accuracy are good. 

  • yeah, i understand your reply.

    all layers that we used have been trained by QAT and the loss function of ours.

    I want to know that if we use QAT(8 bit) and our loss function to train multi-task learning(like 2d+ depth predction or others), the errors maybe bigger than only one task? or one of all tasks must be failure(like producing a big error to float model)?

    another question: i want to know that datalayer -128/128 and -128/1, which is better for qat

    thanks

  • Regarding data normalization, we have tried the typical torchvision normalization:

    input_mean: [123.675, 116.28, 103.53],
    input_scale: [0.017125, 0.017507, 0.017429],

    We have also tried:

    input_mean: 128
    input_scale: 1/64

    Both of these work fine.

    We have done QAT for models similar to the ones that are described here: https://git.ti.com/cgit/jacinto-ai-devkit/pytorch-jacinto-ai-devkit/about/docs/Multi_Task_Learning.md

    Since you mentioned Depth prediction, let me add this: Depth prediction was especially tricky with quantization as it is a pure regression task. The range of the output (depth) can go really large. Remember that the output range can be much larger than the groundtruth (used for training) range. Imagine if the output range is 0 to +512. If we quantize it to 8bits a change of 1 in quantized domain will correspond to in change of 2 in the float - which doesn't have good precision. The solution was to restrict the range of output by adding an explicit clip layer at the end of the model to control the range.

    I think we have to look at your model in more detail if we have understand better. I'll forward this thread to Zhong Ming who arranged the last call, to get his view on how to proceed.

  • Hi Manu,

    We want to know how much consistency the QAT model can guarantee when do inference on PC and when do inference on TIDL(8bit)?

    With the same QAT model, our results when run on python differ greatly from the results when run on TIDL after importing. It seems that the TIDL QAT model runs with a large margin of error.

  • Before I answer your question, let me highlight something. I added some guidelines in the devkit documentation.

    https://git.ti.com/cgit/jacinto-ai-devkit/pytorch-jacinto-ai-devkit/about/docs/Quantization.md

    See the section titled:

    "Guidelines For Training To Get Best Accuracy With Quantization"

    Also see the section titled, to understand some of the limitations of QuantTrainModule:

    "Important Notes - read carefully"

    Please read and follow those guidelines.

  • Hi,

    You may need to share the onnx model to us to understand the situation better. Before that:

    Could you please confirm that the guidelines explained the previous post, and if there are any deviations, please describe them.

    Could you please elaborate on what you mean by this: "It seems that the TIDL QAT model runs with a large margin of error."

  • Thank you very much Munu. Today I will put together an email to ZhongMing to try to describe as clearly as possible the problems we are facing.

  • BTW, set quantizationStyle =3 in TIDL for QAT models