This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: Quantization Aware Training (QAT) using edgeai-torchvision repository

Part Number: TDA4VM
Hi,

The training flow I'm using is as-

  • Taking a trained model that I want to use, loading its weights
  • Replacing the relevant layers to TI implementation (Upsample -> xnn.layers.ResizeWith, Concat -> xnn.layers.CatBlock)
  • Wrapping the modified model with xnn.quantize.QuantTrainModule class
  • Training 50 epochs, with "small" learning rate values

These step were done with many hyper parameters changes- LR changes, adding/removing weights decay:

  • Tried with/without freezing BatchNorm and Quantization range as advised in the guide (using xnn.utils.freeze_bn(model) and xnn.layers.freeze_quant_range(model))
  • While training is being done - it can be seem that the training loss value if very much bigger than the one I get without the QAT flow, and seem that nothing of the described experiments improves it (about x5 bigger than without QAT flow)
The inference flow I'm using with the trained QAT model-
  • When testing the trained model, I use torch.load + load_state_dict (using the weights of each model after the replacement of the relevant layers (concat, upsample) as done before training
  • The tested model have severe localization problems, accuracy drops dramatically

Would appreciate your guidance if the described flow seems reasonable, and if there's something that might be recommended to try (maybe hyper parameters tuning or other suggested methods).
Thanks in advance!
  • Hi,

    Your explanations seem correct. But let us analyze further to see if you have missed any thing.

    1 Can you try the QAT flow on a classification model and make sure the flow works.

    2. Please share the onnx model with QAT (above)

  • Hi,

    I have met your issue when I try to QAT a yolox model, While I didn't take step 2 in your flow. My situation is, in the first few epochs, the loss descended, then become huge big and soon the train crashed because of NAN. This is nothing to do with the lr I think, but somehow I fixed it, there are my attempts:

    - Using weight_decay for all params in model optimizer (when do QAT, and I use sgd with weight_decay, momentum and nesterov)

    - Make the task easy (In my case, YoloX using multi-scale training, and I fixed the scale with (640,640), the loss and whole process of QAT become much stable)

    but, I just wondering if I should do step 2. Replacing layers with xnn.layers before I wrapping the model, because from the EdgeAI_MMDet code, You just not need do this manually

  • - Make the task easy (In my case, YoloX using multi-scale training, and I fixed the scale with (640,640), the loss and whole process of QAT become much stable)

    [Manu] This is a good observation. It makes sense - fixed image resize may make the qhole training easy, especially with QAT.

    - but, I just wondering if I should do step 2. Replacing layers with xnn.layers before I wrapping the model, because from the EdgeAI_MMDet code, You just not need do this manually

    [Manu] I did not understand. Can you please explain this further.

  • By should I do step2 I mean:

    My train flow should or should not include the step which is replacing the relevant layers to Ti implementation (Upsample -> xnn.layers.ResizeWith, Conca -> xnn.layers.CatBlock)

    I didn't do this before I wrap the model, but it still works, and the official code in Edgeai_MMdet seems not do this neither

  • >>>I didn't do this before I wrap the model, but it still works

    The changes are for the model to work efficiently in TIDL and to get good accuracy. By saying "it still works" do you mean it worked as expected in TIDL?

    Regarding Upsample/interpolation, if the code is using torch.nn.functional.interpolate, the following may be sufficient: https://github.com/TexasInstruments/edgeai-mmdetection/blob/master/tools/train.py#L193

    But if it is using torch.nn.Upsample module, then you may have to change it to use xnn.layers.ResizeWith

  • Thx, Yes it still works means it worked as expected in TIDL (from the results of few test imgs so far),  the onnx graph is clean, I think. I have checked the doc of QAT, and I'm confused, it mentioned one also needs to use xnn.layers.AddBlock for add and xnn.layers.CatBlock for concat, is all this necessary? Will this automatically being done by  wrapping with xnn.quantize.QuantTrainModule?

  •  Use of xnn.layers.AddBlock for add and xnn.layers.CatBlock for concat will make sure that there are Clip layers after those operators to inform correct quantization ranges to TIDL. This is for accuracy.

    In the absence of those CLIP layers in ONNX model, TIDL will have to compute the ranges during calibration - which may be suboptimal. 

  • Thx a lot, see if I get your point right:

    >>>| Use of xnn.layers.AddBlock for add and xnn.layers.CatBlock for concat will make sure that there are Clip layers after those operators to inform correct quantization ranges to TIDL. This is for accuracy.

    Means the output maybe out of range, leads to unknown result?

    >>>| In the absence of those CLIP layers in ONNX model, TIDL will have to compute the ranges during calibration - which may be suboptimal.

    Means TIDL will fix this bug with more time?

  • >>>Means the output maybe out of range, leads to unknown result?

    Depends on the calibration set.

    >>>Means TIDL will fix this bug with more time?

    This is not a bug, but rather a feature of TIDL. If Clip range is missing for a certain layer, TIDL will compute it during calibration. But if it is present, it will not compute it, but merely use what is available.

  • I got you, so the standard process including 

    1. Train a model regularly

    2. Replacing nn.functional.interpolate with xnn.resize, Replacing add operator with xnn.layers.Addblock, and torch.cat with xnn.layers.CatBlock, etc.

    3. Wrapper the model with xnn.quantize.QuantTrainMode

    4. Train the QAT model

  • Thx, another ambiguous problem I have met is that:

    I found my network class contains an loss computation function, during training it will compute loss internally in the forward function after I got output of net,

    eg:

    if self.training:

          out = self.head(x)

          loss = self.loss_fn(torch.cat(out,1))

          return loss

    else:

          out_reg = self.head_reg(x)

          out_obj = self.head_obj(x)

          out_cls = self.head_cls(x)

          return torch.cat((out_reg, out_obj.sigmoid(), out_cls.sigmoid()), 1)

    I have check that the score in my task is drop a lot when I test on TDA4 than pytorch.

    One probable issue is the torch.cat, because the out including reg and conf and obj while cls and obj are between(0~1) while reg's range is much larger, if them quantize together, there will be a problem,I'm trying to not use the xnn.layers.CatBlock but don't know whether it will fix it, or I'll try to not use torch.cat.

    Another tricky one I want to ask for help is that, Should I let the process of computing loss out of the network, because in loss_fn contains lots of tensor operators, I don't know if I put it inside the network, whether it would cause some problems during QAT

  • You can first try your float model in onnxruntime (in onnxruntime-tidl you can set with tensor_bits: 32 float simulation mode in PC). You can also use 16 bit mode (tensor_bits: 16). This will rule out basic issues including pre/post processing. Once all that is sorted, you can check accuracy with 8-bit quantization. 

  • is that onnxruntime-tidl a package like onnxruntime, which I can simply use in python or I have to compile and use it in C

  • It is TI's fork of onnxruntime with the ability to offload to TIDL in the backend. It is a package that is installed when you run the setup of edgeai-tidl-tools or edgeai-benchmark

    https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/setup.sh#L293

    https://github.com/TexasInstruments/edgeai-benchmark/blob/master/setup_pc.sh#L109

  • Could you plz also help me check this out

    In the repositoiry edgeai-yolox https://github.com/TexasInstruments/edgeai-yolox/blob/main/yolox/models/yolo_head.py#L165,

    you can see there are lots of post process in forward function, will this post process affects the quantization? Should I remove all these out of the forward function

  • Hi, Could you plz help me with my question? thx

  • edgeai-yolox is our repository and is supported in our modelzoo and edgeai-benchmark (https://github.com/TexasInstruments/edgeai-benchmark/tree/master) compilation tool. 

    After training in edgeai-yolox, you can export an onnx model and prototxt, using this export script. Then you can compile the model using this:

    Take a look at how a yolox config is to be define: https://github.com/TexasInstruments/edgeai-benchmark/blob/master/configs/detection_experimental.py#L59

    for compilation.

    These are the scripts to be used for custom model compilation:

    https://github.com/TexasInstruments/edgeai-benchmark/blob/master/run_custom_pc.sh

    https://github.com/TexasInstruments/edgeai-benchmark/blob/master/scripts/benchmark_custom.py

    In benchmark_custom.py, add a yolox config in the pipeline_configs (you can remove the existing ones) and then run run_custom_pc.sh

  • What I explained above the the Post Training Quantization (PTQ) work flow. We also have mixed precision support - it is possible to put certain layers alone to 16bits. For example see this: https://github.com/TexasInstruments/edgeai-benchmark/blob/master/configs/detection_experimental.py#L65

    We have seen that putting the last convolution layers that produce the prediction into 16bits improves the accuracy significantly. 

  • Yes, thanks for you detailed reply. But we mostly using QAT, and I'm just wondering the post-process in model forward function: like make_grid ( which is nothing to do with network but yet in the forward function) will affects the quantize training?