TDA4VM: [TIDL] The value field of the 8-bit fixed-point output differs greatly from the true value field.

lei fu1

Part Number: TDA4VM

Hi all.

I use the TIDL8bit fixed point model (QAT) to get a set of outputs, and since I set the writeTraceLevel to 3, I can directly get the corresponding floating point values for the outputs. Using the same input, a set of floating point outputs can be obtained on the pytorch floating point model.

Histograms are drawn using logarithmic and linear coordinates to compare the two sets of outputs, respectively.

The top is the pytorch floating point output and the bottom is the TIDL output.

From the histogram of the linear coordinates, it can be seen that the shape of the output distribution of TIDL and pytorch appear to be the same, but the range of values of the two fields is particularly different. This can lead to very large errors in subsequent post-processing.
I was wondering if it is possible to make the value field of the TIDL output match the true value by multiplying it by a scaling factor? Or is there another way to improve TIDL fixed-point output accuracy?

Best regards.

lance

over 5 years ago

0 Anshu Jain over 5 years ago

TI__Guru 56820 points

Hi Lance,

Few questions :

Are you using jacinto-ai-devkit (https://github.com/TexasInstruments/jacinto-ai-devkit) to do Quantization Aware Training (QAT)?
Can you tell us what is the left and right side images which you have shared?

Regards,

Anshu

0 lei fu1 over 5 years ago in reply to Anshu Jain

Expert 1640 points

1. Yes. I used QAT.

2. The two graphs above depict the distribution of the output FEATURE in pytorch, with the difference that the y-axis of the left image uses a logarithmic coordinate system and the y-axis of the right image uses a linear coordinate system. The following two graphs depict the distribution of the same output feature in TIDL.

0 lei fu1 over 5 years ago in reply to Anshu Jain

Expert 1640 points

The x-axis of the histogram is the value of FEATURE and the vertical y-axis is the number of that value.

0 lei fu1 over 5 years ago in reply to Anshu Jain

Expert 1640 points

Hi Anshu,

This issue is seriously blocking the progress of our algorithm deployment and I hope you can support it soon. Thank you.

Regards,

Lance

0 Manu Mathew over 5 years ago in reply to lei fu1

TI__Genius 11506 points

Hi Lance,

Quantization Aware Training (QAT) is a proper training method and it can change the model parameters as well as the model output. Comparing the output of a QAT model to original float model may not be always appropriate because QAT will change the model.

If no change was required, then QAT would not be required in the first place - but QAT adapts the model parameters and the feature map to overcome the loss due to quantization.

Now it all depends on the loss function that you are using. If you are doing semantic segmentation, which does an Softmax for training, ArgMax during inference, the scale of the output feature map doesn't have much significance. So the loss that you are using may not help QAT to preserve the scale of the output.

But if you are using Object detection of Depth prediction etc and have used the correct loss function during QAT, it is likely that the output scale will be preserved because the output has a relation to the scale of the feature map.

In summary, QAT changes the model. The loss function used during QAT determines whether scale of the output will be preserved or not.

0 Manu Mathew over 5 years ago in reply to Manu Mathew

TI__Genius 11506 points

We would also like to know a bit about the post processing that you mentioned. What kind of post procesing is it? Was that post processing not part of the loss function during QAT?

0 lei fu1 over 5 years ago in reply to Manu Mathew

Expert 1640 points

1. We use this output to do bbox regression. It is a part of loss function.

2. We use muti-task trainng while QAT, does it matters?

0 Manu Mathew over 5 years ago in reply to lei fu1

TI__Genius 11506 points

>>We use muti-task trainng while QAT, does it matters?

What does the above statement mean? Whether it is single task training or multi-task training, the whole model has to be wrapped in QuantTrainModule while during QAT. You cannot do it part by part.

Question: Was the whole model used for QAT? Was there any Convolution/BN/ReLU or such layers there was NOT wrapped inside the QuantTrainModule when you did QAT? If any such part was not included in QuantTrainModule , then obviously those layers are not finetuned for the QAT model.

0 Manu Mathew over 5 years ago in reply to Manu Mathew

TI__Genius 11506 points

The purpose of QAT is NOT to make the output of fixed point look similar to the output of floating point.The only purpose of QAT is to minimize your loss. It may do so by changing the model parameters as well as the scale or range of feature map. It all depends on your loss function.

But the loss at the end of QAT should be similar to the loss obtained during floating point training. Also the accuracy at the end of QAT should be similar to the floating point accuracy.

But if some part of your model was not wrapped in QuantTrainModule, then we can't guarantee that the model has learned to overcome quantization errors even if your loss and accuracy are good.

0 lei fu1 over 5 years ago in reply to Manu Mathew

Expert 1640 points

yeah, i understand your reply.

all layers that we used have been trained by QAT and the loss function of ours.

I want to know that if we use QAT(8 bit) and our loss function to train multi-task learning(like 2d+ depth predction or others), the errors maybe bigger than only one task? or one of all tasks must be failure(like producing a big error to float model)?

another question: i want to know that datalayer -128/128 and -128/1, which is better for qat

thanks

0 Manu Mathew over 5 years ago in reply to lei fu1

TI__Genius 11506 points

Regarding data normalization, we have tried the typical torchvision normalization:

input_mean: [123.675, 116.28, 103.53],
input_scale: [0.017125, 0.017507, 0.017429],

We have also tried:

input_mean: 128
input_scale: 1/64

Both of these work fine.

We have done QAT for models similar to the ones that are described here: https://git.ti.com/cgit/jacinto-ai-devkit/pytorch-jacinto-ai-devkit/about/docs/Multi_Task_Learning.md

Since you mentioned Depth prediction, let me add this: Depth prediction was especially tricky with quantization as it is a pure regression task. The range of the output (depth) can go really large. Remember that the output range can be much larger than the groundtruth (used for training) range. Imagine if the output range is 0 to +512. If we quantize it to 8bits a change of 1 in quantized domain will correspond to in change of 2 in the float - which doesn't have good precision. The solution was to restrict the range of output by adding an explicit clip layer at the end of the model to control the range.

I think we have to look at your model in more detail if we have understand better. I'll forward this thread to Zhong Ming who arranged the last call, to get his view on how to proceed.

0 lei fu1 over 5 years ago in reply to Manu Mathew

Expert 1640 points

Hi Manu,

We want to know how much consistency the QAT model can guarantee when do inference on PC and when do inference on TIDL(8bit)?

With the same QAT model, our results when run on python differ greatly from the results when run on TIDL after importing. It seems that the TIDL QAT model runs with a large margin of error.

0 Manu Mathew over 5 years ago in reply to lei fu1

TI__Genius 11506 points

Before I answer your question, let me highlight something. I added some guidelines in the devkit documentation.

https://git.ti.com/cgit/jacinto-ai-devkit/pytorch-jacinto-ai-devkit/about/docs/Quantization.md

See the section titled:

"Guidelines For Training To Get Best Accuracy With Quantization"

Also see the section titled, to understand some of the limitations of QuantTrainModule:

"Important Notes - read carefully"

Please read and follow those guidelines.

0 Manu Mathew over 5 years ago in reply to Manu Mathew

TI__Genius 11506 points

Hi,

You may need to share the onnx model to us to understand the situation better. Before that:

Could you please confirm that the guidelines explained the previous post, and if there are any deviations, please describe them.

Could you please elaborate on what you mean by this: "It seems that the TIDL QAT model runs with a large margin of error."

0 lei fu1 over 5 years ago in reply to Manu Mathew

Expert 1640 points

Thank you very much Munu. Today I will put together an email to ZhongMing to try to describe as clearly as possible the problems we are facing.

0 Manu Mathew over 5 years ago in reply to lei fu1

TI__Genius 11506 points

BTW, set quantizationStyle =3 in TIDL for QAT models

Processors

Processors forum

TDA4VM: [TIDL] The value field of the 8-bit fixed-point output differs greatly from the true value field.