TDA4VM: The output of Pytorch model of QAT is different from ONNX model

chen yizhao

Part Number: TDA4VM

I follow the QAT document to QAT on our pytorch model and the QAT will change all ReLU layer to PACT2 layer. Then I will convet pytorch model to onnx model in order to set the model on TDA4. But I find some problem when transfering pytorch model to onnx model.

As shown above, left is the output of QAT pytorch model, right is the output of onnx model, they are different. I wonder if onnx model can realize the function of PACT2, because the output of onnx is not power of 2. The onnx model can only do clip but cannot set the output to being power of 2. If the onnx model cannot set the output to being power of 2, why the QAT is useful for increasing accuracy for TDA4 INT8 model?

Pls answer my question asap, thx for your support.

over 4 years ago

+1 Manu Mathew over 4 years ago

TI__Genius 11436 points

The Pytorch QAT operations matches with that of TIDL. TIDL will quantize the onnx model and use it for inference. So the TIDL output will be similar to that of PyTorch (but note that this is not an exact bitmatch, but sufficient to achieve good accuracy).

So if you run that QAT onnx model in onnxruntime, it will not generate the expected output. It has to be run with quantization to get the correct output - and TIDL can run it with quantization.

Does this answer your question?

0 chen yizhao over 4 years ago in reply to Manu Mathew

Prodigy 60 points

Thx for your reply. I get your point and I want to double check it . The Pytorch QAT output will be similar to TIDL INT8 output and we do not care the output of onnx model.

And I have another question about QAT. During QAT, PACT2 layer will automatically set the feature map to be power of 2 and the parameters are also automatically generated, including clip range and scale for power of 2. And can I set the scale value to be fixed value for some layer? For example, for some layers output, PACT2 will set it to be 2^(-1) as 0, 0.5, 1, but I want it to be 2^(-3) as 0, 0.125, 0.250, and I have to change the scale value. So can I do this? Pls give me some advise. Thx a lot!

0 Manu Mathew over 4 years ago in reply to chen yizhao

TI__Genius 11436 points

To use the specified range during QAT, you can use torch.nn.Hardtanh (with min_val and max_val specified) for that layer. Another option is to use xnn.layers.PAct2 with clip_range specified.

See some examples here:

https://git.ti.com/cgit/jacinto-ai/pytorch-jacinto-ai-devkit/tree/modules/pytorch_jacinto_ai/xnn/layers/activation.py#n184

https://git.ti.com/cgit/jacinto-ai/pytorch-jacinto-ai-devkit/tree/modules/pytorch_jacinto_ai/xnn/layers/activation.py#n172

https://git.ti.com/cgit/jacinto-ai/pytorch-jacinto-ai-devkit/tree/modules/pytorch_jacinto_ai/xvision/models/pixel2pixel/pixel2pixelnet_utils.py#n44

0 chen yizhao over 4 years ago in reply to Manu Mathew

Prodigy 60 points

Thx, but I don't want to use fixed range, I want to use fixed scale value.

https://git.ti.com/cgit/jacinto-ai/pytorch-jacinto-ai-devkit/tree/modules/pytorch_jacinto_ai/xnn/quantize/quant_train_module.py#n374

As shown in the link above, the scale value is automatically calculated. Can I set it to fixed value for some layers? And how can I do that?

0 Manu Mathew over 4 years ago in reply to chen yizhao

TI__Genius 11436 points

According to the current implementation, scale value and range are tighly linked - they cannot be separated. For signed output, scale_value = 128/abs(max(max_value, min_value))

So, by specifying/fixing the range, we are specifying the scale_value indirectly. There is no other way in the code to specify the scale value directly.

0 chen yizhao over 4 years ago in reply to Manu Mathew

Prodigy 60 points

Okay Thx for your answer. Another question, any difference between this two function?

https://git.ti.com/cgit/jacinto-ai/pytorch-jacinto-ai-devkit/tree/modules/pytorch_jacinto_ai/xnn/layers/activation.py#n184

https://git.ti.com/cgit/jacinto-ai/pytorch-jacinto-ai-devkit/tree/modules/pytorch_jacinto_ai/xnn/layers/activation.py#n172

0 Manu Mathew over 4 years ago in reply to chen yizhao

TI__Genius 11436 points

No - there is no difference if you do QAT. Hardtanh will also be converted PAct2 when you start QAT.

If you put a Hardtanh, but don't do QAT, I am not sure what it will be exported as in the ONNX. If it is exported as clip layer, then there is no issue. But if it is exported as Hardtanh ONNX operator, then it may not be natively supported in TIDL.

0 chen yizhao over 4 years ago in reply to Manu Mathew

Prodigy 60 points

Hi another question about QAT

https://git.ti.com/cgit/jacinto-ai/pytorch-jacinto-ai-devkit/tree/modules/pytorch_jacinto_ai/xnn/layers/function.py#n178

https://git.ti.com/cgit/jacinto-ai/pytorch-jacinto-ai-devkit/tree/modules/pytorch_jacinto_ai/xnn/layers/function.py#n179

In the code above, why using rand_val = 0.5 and torch.floor(). Why not using torch.round()?

0 Manu Mathew over 4 years ago in reply to chen yizhao

TI__Genius 11436 points

torch.round() mode is also supported, but that seems to be doing round towards positive and negative infinities.

https://git.ti.com/cgit/jacinto-ai/pytorch-jacinto-ai-devkit/tree/modules/pytorch_jacinto_ai/xnn/layers/function.py#n184

What I wanted was round towards positive infinity - which can differ from the other case in the case of negative numbers, so it was implemented explicitly. See this comparison:

0 chen yizhao over 4 years ago in reply to Manu Mathew

Prodigy 60 points

Thx for your support, will contact again if having another question!

0 chen yizhao over 4 years ago in reply to Manu Mathew

Prodigy 60 points

Hi, another question about QAT.

I know that PACT2 layer will set the feature map value to be power of 2. Will weight and bias value be set to power of 2 during QAT?

0 chen yizhao over 4 years ago in reply to Manu Mathew

Prodigy 60 points

Hi, another question. Why use nn.model.module instead of nn.model when load QAT model? Any difference between nn.model.module and nn.model?

0 Manu Mathew over 4 years ago in reply to chen yizhao

TI__Genius 11436 points

Sure. I shall close this thread then.

0 Manu Mathew over 4 years ago in reply to chen yizhao

TI__Genius 11436 points

It's something about the internal implementation detail of the QAT module. It's not due to functionality.

0 Manu Mathew over 4 years ago in reply to chen yizhao

TI__Genius 11436 points

Yes - feature map, weights and biases will be quantized to power of 2 during QAT.

0 chen yizhao over 4 years ago in reply to Manu Mathew

Prodigy 60 points

Thx for your reply. Another question about QAT

In the QAT document, you mention that The same module should not be re-used multiple times within the module in order that the feature map range estimation is correct. In your example, you seperate ReLU modules. So my question is any differences between before seperating ReLU modules and after seperating? And do we need to do something for other modules, such as Convolutional and TransposeConv?

+1 Manu Mathew over 4 years ago in reply to chen yizhao

TI__Genius 11436 points

It's not wrong to use same module multiple times. Just that, if the same module is repeated multiple times, the feature map range collected will be same for all those cases - and that can hurt accuracy.

In fact, in some cases, it is necessary to use same module more than once - for example Siamese networks with shared weights.

So using same module more than once is not wrong - but we careful and know what you are doing - because in quantization, the ranges are important. If the ranges are incorrect, the accuracy will drop.

0 chen yizhao over 4 years ago in reply to Manu Mathew

Prodigy 60 points

Thx for your answer, I get your point.

https://git.ti.com/cgit/jacinto-ai/pytorch-jacinto-ai-devkit/tree/modules/pytorch_jacinto_ai/xnn/layers/function.py#n178

another question about QAT. As I have mentioned before, QAT use rand_val = 0.5 and torch.floor() to set the feature map to be power of two. Is the operations same in TIDL? Will TIDL quantize the feature map by adding 0.5 and doing floor() operations?

0 Manu Mathew over 4 years ago in reply to chen yizhao

TI__Genius 11436 points

Sorry - I missed to reply. Yes - this is what TIDL does.

0 Manu Mathew over 4 years ago in reply to Manu Mathew

TI__Genius 11436 points

Please feel free to open another thread if you have more questions.

Processors

Processors forum

TDA4VM: The output of Pytorch model of QAT is different from ONNX model