Hello TI guys:
We are now following the process provided in the official edgeai-torchvision document (github.com/.../Quantization.md), using the official QAT process for our binocular depth estimation model. But it is found that the quantized int8 model and the floating-point fp32 model have a large difference in EPE indicators after QAT.
The precision loss is around 20%. (fp32 EPE: 2.1733, int8 EPE: 2.6181) For specific values, please check the attached training log.
When performing QAT training on our binocular depth estimation model, since the official QAT tools does not support some operations in the network (the corresponding documents were not found), we used some workaround methods to bypass these problems. Now we are not sure is these operations caused the precision loss problem in the QAT process.
The problems we encountered are as follows:
1. Model.py is a binocular depth estimation model, which uses a twin network structure and requires weight sharing in the feature extraction part. Is the sharing method as follows correct?
```
The forward function in model.py starts at line 236, conv1, conv2, and conv3 are the layers that the model needs to share parameters
# weight share Is this correct?
self.conv1_2 = self.conv1
self.conv2_2 = self.conv2
self.conv3_2 = self.conv3
```
2. A cost volume operation is required in the binocular network, but the cost volume is not supported when converting to the XNN model. How to deal with this operation?
3. If the network needs to be truncated into two parts before and after the cost volume because the cost volume part cannot be converted, how can the network structure after the cost volume support multiple inputs and the number of input channels is greater than 4?
I hope that the above three questions can be answered by relevant experts from TI.
PS: The attachment is our related model code and training log of int8 model and fp32 model.
Thanks very much!3365.develop_simple.rarQAT_train_int8.logtrain_fp32.log
 
				 
		 
					 
                          