Hello,
I have a pre-trained model that I want to use on NPU.
The issue is that when quantizing it with TINPUTinyMLQATFxModule there is a big drop in performance (the loss goes from 0.25 to 0.34) wheras this issue doesn't occur when I quantize it using torchao (loss goes from 0.25 to 0.26).
You can find in the zip the quantization scripts as well as the logs of the quantization. run_qat_ti.zip
I am using a very simple model :
self.cnn_backbone = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=8, kernel_size=(3, 1), padding=(1, 0), stride=(2, 1)),
nn.BatchNorm2d(num_features=8),
nn.ReLU(),
nn.Conv2d(in_channels=8, out_channels=16, kernel_size=(3, 1), padding=(1, 0), stride=(2, 1)),
nn.BatchNorm2d(num_features=16),
nn.ReLU(),
nn.Conv2d(in_channels=16, out_channels=32, kernel_size=(3, 1), padding=(1, 0), stride=(1, 1)),
nn.BatchNorm2d(num_features=32),
nn.ReLU()
)
self.flatten = nn.Flatten()
dummy_input = torch.randn(1, 1, input_size, 1)
conv_output_size = self.cnn_backbone(dummy_input).view(1, -1).size(1)
# MLP Head
self.mlp_head = nn.Sequential(
nn.Linear(conv_output_size, 64),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(64, output_size)
)
Some models that had a lower loss, had an even bigger loss of performance when using TI's library for quantization. Do you have an idea why does it happen ?
Best,
Gal Pascual