CCS/TDA4VM: The model get a pool result after QAT

user6333437

Part Number: TDA4VM

Tool/software: Code Composer Studio

Dear Sir,

I am using Quantization Aware Training (QAT) to incorporate into an existing PyTorch training code.I did it step by step according to the example provided on the official website:

#********************************program start*****************************

from pytorch_jacinto_ai import xnn

# create your model here:
model = build_detector()

# create a dummy input - this is required to analyze the model - fill in the input image size expected by your model.
dummy_input = torch.rand((1,3,384,768))

# wrap your model in xnn.quantize.QuantTrainModule. 
# once it is wrapped, the actual model is in model.module
model = xnn.quantize.QuantTrainModule(model, dummy_input=dummy_input)

# load your pretrained weights here into model.module
pretrained_data = torch.load(pretrained_path)
model.module.load_state_dict(pretrained_data)

# your training loop here with with loss, backward, optimizer and scheduler. 
# this is the usual training loop - but use a lower learning rate such as 5e-5
....
....

# save the model - the trained module is in model.module
torch.save(model.module.state_dict(), os.path.join(save_path,'model.pth'))
torch.onnx.export(model.module, dummy_input, os.path.join(save_path,'model.onnx'), export_params=True, verbose=False)

#********************************program end*****************************

The learning rate is 5e-5, epoch size is 25. During QAT, the loss value is very large and not converge. After the above progress, I got a new mode.

However, when I use the new model from QAT to do inference, the results are very poor.

Before QAT, I can get a detection result as follow:

After QAT, the detection result as follow::

On TDA4VM and PC, the conclusion is consistent.

No other code has been modified. Why the accuracy of the QAT-trained model has declined so much.

over 5 years ago

0 kumar.desappan over 5 years ago

TI__Mastermind 22145 points

Your attachments are not visible.

Did you achieve the expected results in PyTorch with QAT model? if yes, please follow the steps in the below for debugging this.

http://software-dl.ti.com/jacinto7/esd/processor-sdk-rtos-jacinto7/latest/exports/docs/tidl_j7_01_01_00_10/ti_dl/docs/user_guide_html/md_tidl_fsg_steps_to_debug_mismatch.html

BTW, set quantizationStyle =3 for QAT models

0 user6333437 over 5 years ago in reply to kumar.desappan

Prodigy 30 points

Thank you for your reply.

Before QAT, I can get a detection result as follow:

After QAT, the detection result as follow::

I can get a expected results in PyTorch with QAT model, but the results are not as good as before.

0 Manu Mathew over 5 years ago in reply to user6333437

TI__Genius 11466 points

We have updated the documentation for quantization. Please pull the latest code. Also make sure that the guidelines and instructions given here are strictly followed:

https://git.ti.com/cgit/jacinto-ai-devkit/pytorch-jacinto-ai-devkit/about/docs/Quantization.md

Best regards,

Manu

0 user6333437 over 5 years ago in reply to Manu Mathew

Prodigy 30 points

Dear,

Our detection model has three outputs in total. By comparing the output on pytorch and tidl, we find that the first output and the third output are the same, but the second output is completely different, and the second output leads to the vehicle missed detection in the figure above. Why is the second output so different, while the first and the third output are the same? Does QAT support three-outputs detection model by now?

Best regards,

0 Manu Mathew over 5 years ago in reply to user6333437

TI__Genius 11466 points

QAT does not have such a limitation on the number of outputs. This may be due the peculiarity of:

(1) layers that produce that output

(2) weights in those convolution layers

(3) the loss function that uses that branch doesn't get enough weightage in your total loss, so the weights and feature map doesn't get adjusted enough to avoid quantization error.

Can you try to:

(a) increase the weightage of loss function that uses that poorly performing output and re-do the QAT.

(b) Also inspect that layer if Batch Normalization have been properly inserted.

(d) Also check your weight decay and increase it if required. Note that if you are increasing weight decay, its better that the whole training be repeated (i.e. the floating point training as well).

Let us know if you get improvement.

Processors

Processors forum

CCS/TDA4VM: The model get a pool result after QAT