TDA4VM: Edgeai-torchvision QAT Precision issue

Renf Shi

Part Number: TDA4VM

Hello TI guys:

We are now following the process provided in the official edgeai-torchvision document (github.com/.../Quantization.md), using the official QAT process for our binocular depth estimation model. But it is found that the quantized int8 model and the floating-point fp32 model have a large difference in EPE indicators after QAT.
The precision loss is around 20%. (fp32 EPE: 2.1733, int8 EPE: 2.6181) For specific values, please check the attached training log.

When performing QAT training on our binocular depth estimation model, since the official QAT tools does not support some operations in the network (the corresponding documents were not found), we used some workaround methods to bypass these problems. Now we are not sure is these operations caused the precision loss problem in the QAT process.

The problems we encountered are as follows:

1. Model.py is a binocular depth estimation model, which uses a twin network structure and requires weight sharing in the feature extraction part. Is the sharing method as follows correct?

```

The forward function in model.py starts at line 236, conv1, conv2, and conv3 are the layers that the model needs to share parameters

# weight share Is this correct?

self.conv1_2 = self.conv1

self.conv2_2 = self.conv2

self.conv3_2 = self.conv3

```

2. A cost volume operation is required in the binocular network, but the cost volume is not supported when converting to the XNN model. How to deal with this operation?

3. If the network needs to be truncated into two parts before and after the cost volume because the cost volume part cannot be converted, how can the network structure after the cost volume support multiple inputs and the number of input channels is greater than 4?

I hope that the above three questions can be answered by relevant experts from TI.

PS: The attachment is our related model code and training log of int8 model and fp32 model.

Thanks very much!3365.develop_simple.rar QAT_train_int8.log train_fp32.log

over 2 years ago

0 Manu Mathew over 2 years ago

TI__Genius 11486 points

1. There are lot of resources available online regarding weight sharing. Here are some examples.

https://discuss.pytorch.org/t/how-to-share-weights-of-dilated-convolutional-kernels/136390/3

https://discuss.pytorch.org/t/weight-sharing-between-custom-convs/105755/3

https://discuss.pytorch.org/t/appropriate-method-for-weight-sharing-in-convolutional-layers/145093

2.TIDL does not support cost volume currently. I suggest that you first try without a model involving that cost volume layer or any of the unsupported operators. Once you have working model with only supported models with a reasonable accuracy delta between float and TIDL, then it would be right time to talk about unsupported layers. (For example, replace that cost volume with a concatenation - just to be able to get started with TIDL)

Will depth represented in 8-bits be sufficient to good accuracy compared to float? You can check this my taking the ground truth depth, quantizing to 8bits and then computing the EPE. My guess is that 8-bits may not be sufficient for continuous data like depth.

I suggest that you experiment with 16-bits mode in TIDL PTQ and make sure that it gives reasonable accuracy. This will do a flow flush of your entire pipeline. TIDL PTQ also supports mixed precision - you can put specific layers only in 16-bits.

0 Renf Shi over 2 years ago in reply to Manu Mathew

Expert 1080 points

1. These examples are for Pytorch model, not for the XNN model. We need using the edgeai-torchvision tool for sharing weights. Could you give some suggestions about this?

2. We tried the PTQ process, but the PTQ process will cause severe precision loss. So we can just use the QAT process.

Processors

Processors forum

TDA4VM: Edgeai-torchvision QAT Precision issue