TIDL quantization and the CVPR paper

safwan ismael

Hello TI team,

I read the TI's paper "Sparse, Quantized, Full Frame CNN for Low Power Embedded Devices" in order to understand more how the quantization is done in the TIDL. I would be thankful if you could answer my questions about the paper.

1) After quantization, operations (multiplication and accumaltions) are done in the fixed-point format (say 8 bits). The results of these operations, however, need much more bit width. For instance, multiplying two 8 bit values may need up to 16 bits. How does TIDL solve this issue? The paper does not explain this point. I.e. during inference, is there a switching between floating and fixed point formats? How is that done?

2) In the TIDL, there is no training done after sparcification and quantization of the network. Is that OK? In the relevant literature including your paper, re-training is needed.

Thanks a lot

Best,

Safwan

over 7 years ago

0 Praveen Eppa1 over 7 years ago

TI__Genius 17580 points

Hi Safwan,

1. For instance, multiplying two 8 bit values may need up to 16 bits. How does TIDL solve this issue?
Ans: We don't switch between floating and fixed point formats, we convert these 16-bit/32-bit accumulators to 8-bit fixed without using any floating point. One exception is that we use floating point in DetectionOutput layer by converting input of this layer to floating point from fixed point .

2. In the TIDL, there is no training done after sparcification and quantization of the network. Is that OK? In the relevant literature including your paper, re-training is needed.
Ans: This is required after sparcification and not required after quantization. For more details please refer to Caffe-jacinto models in github.
github.com/.../caffe-jacinto-models

Thanks,
Praveen

0 safwan ismael over 7 years ago in reply to Praveen Eppa1

Intellectual 300 points

Hi Praveen,

1) If two tensors (scalars for simplicity), each 8 bits, and each close to the maximum of the range of this tensor, were multiplied and the result was converted to 8-bits, there is clearly overflow. Are the details of the forward-pass of a quantized net clarified somewhere?
2) The link to Caffe Jacinto does not seem very relevant for me. I am using Tensorflow and I could not find relevant info there.

Thank you.
Best,
Safwan

0 kumar.desappan over 7 years ago in reply to safwan ismael

TI__Mastermind 22145 points

1. It is not direct 16bit to 8-bit conversion. 16/32 bit accumulators are quantized to 8-bit using scale factor. The logic is very similar to the one referred in the below blog. But fully implemented using fixed point operation and uses dynamic range computation
petewarden.com/.../
2.We don't support sparfication using tensorflow. So sparcification is not relevant to your case

Processors

Processors forum

TIDL quantization and the CVPR paper