TDA4VM: TDA4VM

Alex Spivakovsky

Part Number: TDA4VM

Hello,

I have a question considering QAT.

I would like to perform 8 bit quantization for some model layers and inference some layers with 16 bits precision. I made some changes in the edgai-torchvision so that those layers (convolutional) will not be replaced

with the QuantTrainConv2d and also they will not receive the PACT2. Those layers will not "receive" the Clip layer in the ONNX graph.

Now I would like to perform compilation and create artifacts in the TIDL. I don't want to change the already computed clip ranges for the 8 bit layers. but I would like to compute those for the "untouched" ones for 16 bit range. How can this be done?

Thank you,

Alex.

over 2 years ago

0 Manu Mathew over 2 years ago

TI__Genius 11386 points

TIDL should insert Clip parameters during model import if the Clip layer is missing. But we have not verified the scenario that you are trying out. Please try out your model in TIDL and let us know how it goes.

0 Alex Spivakovsky over 2 years ago in reply to Manu Mathew

Prodigy 160 points

Thank you for the answer,

My first solution will not work as Clip still will be computed in the TIDL for the layers where this Clip is absent. I found another workaround for this, setting the quantize_enable = False in the QuantTrainConv2d class for the layers I don't want to quantize. In this case the Clip is computed for all layers. When exporting artifacts in TIDL, I set those non-quantized layers to run in 16 bits in the compile options.

However when I run the onnxruntime in the execution mode after exporting the artifacts, the output does not match the pytorch model.

For my understanding, when I set the quantize_enable = False, the QAT is not done on that particular layers.

Could you point out to a possible issue here?

Thank you,

Alex.

0 Manu Mathew over 2 years ago in reply to Alex Spivakovsky

TI__Genius 11386 points

I have not tested setting quantize_enable to False to select layers. If you can step through the code and make sure that it is functioning according to what you expected, that would be goo.d Noe that the quantization is performed in QuantTrainPAct2 module.

>>However when I run the onnxruntime in the execution mode after exporting the artifacts, the output does not match the pytorch model.

Can you describe more about this? Which pytorch model? The float model or QAT model? Which onnxruntime? The original onnxruntime or the onnxruntime with tidl offload?

0 Alex Spivakovsky over 2 years ago in reply to Manu Mathew

Prodigy 160 points

I'm talking about the model after the QAT. For my understanding, during the inference it still uses floating point weights, but clips the activations using the values set in the PACT2 blocks during the QAT. I'm using onnxruntime with the tidl offload on the onnx graph exported with the Clip layers added by the QAT. I set the accuracy_level to 0 in the compile_options to avoid tidl calibration, use a single onnx session run to create the artifacts.

After artifacts being created I use TIDLExecutionProvider to inference images through the model. Then I compare the onnx output to the output of the model in QuantTestModule from torchvision edgeailite/xnn. My understanding is that the outputs should exactly match.

0 Manu Mathew over 2 years ago in reply to Alex Spivakovsky

TI__Genius 11386 points

I understand now. The outputs may not exactly match - because there may be small differences in the way TIDL and PyTorch implements certain operators. We have been using accuracy measurements to see if QAT is improving the accuracy or not.

0 Alex Spivakovsky over 2 years ago in reply to Manu Mathew

Prodigy 160 points

The problem is that the outputs don't match by a pretty high margin. The model has only convolutional and resize layers. All convolutional layers have clips. How can I check were the differences come from? The _tidl_net.bin_paramDebug.csv file is pretty much unusable for understanding.

0 Manu Mathew over 2 years ago in reply to Alex Spivakovsky

TI__Genius 11386 points

can you try setting debug_level to 3 in the delegate_options given to execution provider.

https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/examples/osrt_python/common_utils.py#L63

https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/examples/osrt_python/ort/onnxrt_ep.py#L137

The layer trace files (float values) for each layer will be dumped in /tmp folder. The file names start with tidl_..

0 Alex Spivakovsky over 2 years ago in reply to Manu Mathew

Prodigy 160 points

Hi, I did what you suggested, for some reason there no files created, it does print per layer statistics, which I don't know how to understand like below:

Starting Layer # - 74
Processing Layer # - 74
73 512.00000 -64.00000 64.00000 3
End of Layer # - 74 with outPtrs[0] = 0x7f9815362000

0 Manu Mathew over 2 years ago in reply to Alex Spivakovsky

TI__Genius 11386 points

Hi, You can get the information here:

https://software-dl.ti.com/jacinto7/esd/processor-sdk-rtos-jacinto7/08_04_00_06/exports/docs/tidl_j721e_08_04_00_16/ti_dl/docs/user_guide_html/md_tidl_fsg_steps_to_debug_mismatch.html

(This is actually talking about directly using the underlying TIDL-RT directly - not using python interface - but still gives insights and answers the question that you asked above)

Also this may be useful: https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/docs/tidl_osr_debug.md

0 Alex Spivakovsky over 2 years ago in reply to Manu Mathew

Prodigy 160 points

Hi, I'm still working on solving my problem. There is one thing that caught my eye while setting debug_level=3 to the TIDLExecutionProvider.

It prints the scale, min and max clip values for each exactly as provided in the ONNX file after the QAT. However when I observe the first layer info, it gives me the following:

Starting Layer # - 1
0 128.00000 0.00000 255.00000 1
Processing Layer # - 1
1 256.00000 0.00000 1.00000 0
End of Layer # - 1 with outPtrs[0] = 0x7f72a003107e

There is a Clip layer right after the input and before the first Conv layer : 1 256.00000 0.00000 1.00000 0. Indeed the input image is divided by 255 so that the input values are between 0 and 1.0. But what about that one: 0 128.00000 0.00000 255.00000 1? From where this one comes from?

0 Manu Mathew over 2 years ago in reply to Alex Spivakovsky

TI__Genius 11386 points

I have looped in the TIDL expert to look into this. Is it possible for you to attach a sample onnx file?

0 Alex Spivakovsky over 2 years ago in reply to Manu Mathew

Prodigy 160 points

RCF360_for_conti_split_24_07_2022_ShortRange_TDA4_quantization_DepthNet.zip

Hi Manu, please find onnx file attached inside the ZIP archive per your request.

Thank you,

Alex.

0 kumar.desappan over 2 years ago in reply to Alex Spivakovsky

TI__Mastermind 22145 points

0 128.00000 0.00000 255.00000 1 - Is the information for the input tensor (data ID 0). The input is considered as Singed int8 (The last 1 specifies element type) , So the scale is selected as 128. Here Min and Max clip values are not right, it is not used for any computation os can be safely ignored.

I hope this particular question in this thread is just for information, Are you observing any issue specific to this input tensor Min and Max, if yes please let me know

0 Alex Spivakovsky over 2 years ago in reply to kumar.desappan

Prodigy 160 points

Hi Kumar,

I'm having accuracy issues between the pytorch model output after the QAT and the TIDL onnxruntime inference after the artifacts extraction. For my understanding, I should expect similar or very close results, as each convolutional layer already has its activation range precomputed during the QAT.

However this is not the case. I tried to understand where the issue comes from and even made a really simple sub-model which only has a single convolution. Again, I extracted the artifacts and tested it on inference. For my surprise, there were already very big differences.

Some details, here is my simple model:

The runtimes_visualization.svg looks OK:

However the out_tidl_net_bin.svg looks weird: Where from that BatchNorm layer?

The data into the model is a floating point, normalized image, so the data is in range [0,1]

The convolution has 32 channels, strangely the first channel gives zero error for each 96x896 entry, however all other channels show big mismatch already at this early stage.

I can provide you with the onnx file, input/output data and artifacts if you need.

Thank you,

Alex.

0 kumar.desappan over 2 years ago in reply to Alex Spivakovsky

TI__Mastermind 22145 points

The first Clip layer is converted as batchNorm with clip activation (with scale and Bias =0). The second clip is merged with Convolution with clip activation.

Not very sure why only one channel is producing the right results.

Please share the model, input/output data along with the model compilation code that you have used to re-produce this issue.

BTW, is the results are working fine int he PC emulation model for this model

0 Alex Spivakovsky over 2 years ago in reply to kumar.desappan

Prodigy 160 points

Hi Kumar,

Please find attached depthnet.zip0131.test.zip

Inside you'll find:

1. test.onnx - the onnx model

2. test.npy - img to inference as a numpy array

3. output.npy - output of the torch model

4. python_code.py - the code I use to create artifacts and inference.

To your question, all my experiments are done in PC emulation mode using the TIDLExecutionProvider for the onnxruntime.

0 kumar.desappan over 2 years ago in reply to Alex Spivakovsky

TI__Mastermind 22145 points

Hi Alex,

We will try and get back on this.

in the meantime, can you please try the 16-bit mode? and share your observation

'tensor_bits': 16,

0 Alex Spivakovsky over 2 years ago in reply to kumar.desappan

Prodigy 160 points

Hi Kumar,

I tried the 16 bits and the result was even worse. The maximum error was 0.2 comparing to 0.04 with 8 bits. Waiting to your investigation results.

Thank you,

Alex.

0 Manu Mathew over 2 years ago in reply to Alex Spivakovsky

TI__Genius 11386 points

8-bit QAT model would do very bad in 16-bit inference. I think Kumar was suggesting to try the float model in 16-bit.

0 Alex Spivakovsky over 2 years ago in reply to kumar.desappan

Prodigy 160 points

Hi Kumar, I made a really simple experiment justifying the problems I face. I exported a tiny onnx model with a single colvolution with 2 same kernels as in the image below:

As you can see the input to the model is a single channel 5x5 image. I used a single value for all 5x5 entries to test whether the output of the model follows the math and the torch model.

The 5x5 example of the input:

The output of the model:

The calculation should be:

Input Clip (Range 0-1, Scale 256) : floor(0.8 *256 +0.5) / 256 = 0.80078125

Convolution does not influence calculations

Output Clip (Range 0-4, Scale 64) : floor(0.80078125 *64 + 0.5)/64 = 0.796875

However as it can be seen the TIDL gives the 0.78125, difference of 0.015625.

The table below summarizes view cases I tested. The torch result is always in compliant with the math behind the quantization,

whether the TIDL not.

Thank you,

Alex.

0 Manu Mathew over 2 years ago in reply to Alex Spivakovsky

TI__Genius 11386 points

Hi Alex,

Kumar has been on travel for a few days. I expect that he will reply as soon as he is back.

Regards, Manu

0 Alex Spivakovsky over 2 years ago in reply to Manu Mathew

Prodigy 160 points

Hi Manu,

Thank you for your update. Meanwhile I opened another issue, could you please take a look?

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1175872/tda4vm-how-to-instruct-tidl-to-run-particular-layer-with-16-bits-accuracy

Thank you,

Alex.

0 Manu Mathew over 2 years ago in reply to Alex Spivakovsky

TI__Genius 11386 points

I have forwarded this new thread to the TIDL expert and he shall respond in that thread. Let me answer the original question.

I had a discussion with Kumar. In TIDL there are approximations in fixed point implementations to get the best performance with 8-bits. So the the initial Clip layer will not be exactly represented. This is probably the reason why there is a difference in output. So the output is not expected to bit-match between the QAT output from pytorch and TIDL. There are also approximations in TIDL implementation of Convolution bias and resize layers.

(Note: It may be possible to avoid that initial clip layer by setting quantize_in to False https://github.com/TexasInstruments/edgeai-torchvision/blob/master/torchvision/edgeailite/xnn/quantize/quant_graph_module.py#L57 during QAT) and then give input data in the range 0-255.

The point that I was trying to highlight earlier is: "We have been using accuracy measurements to see if QAT is improving the accuracy or not."

You can see the classification models that says qat-p2 in the following report; https://github.com/TexasInstruments/edgeai-modelzoo/blob/master/modelartifacts/report_tda4vm.csv

and also the results here: https://github.com/TexasInstruments/edgeai-torchvision/blob/master/docs/pixel2pixel/Quantization.md

Processors

Processors forum

TDA4VM: TDA4VM