This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TIDL quantization and the CVPR paper

Hello TI team,

I read the TI's paper "Sparse, Quantized, Full Frame CNN for Low Power Embedded Devices" in order to understand more how the quantization is done in the TIDL. I would be thankful if you could answer my questions about the paper.

1) After quantization, operations (multiplication and accumaltions) are done in the fixed-point format (say 8 bits). The results of these operations, however, need much more bit width. For instance, multiplying two 8 bit values may need up to 16 bits. How does TIDL solve this issue? The paper does not explain this point. I.e. during inference, is there a switching between floating and fixed point formats? How is that done?

2) In the TIDL, there is no training done after sparcification and quantization of the network. Is that OK? In the relevant literature including your paper, re-training is needed.

Thanks a lot

Best,

Safwan

 

  • Hi Safwan,

    1. For instance, multiplying two 8 bit values may need up to 16 bits. How does TIDL solve this issue?
    Ans: We don't switch between floating and fixed point formats, we convert these 16-bit/32-bit accumulators to 8-bit fixed without using any floating point. One exception is that we use floating point in DetectionOutput layer by converting input of this layer to floating point from fixed point .

    2. In the TIDL, there is no training done after sparcification and quantization of the network. Is that OK? In the relevant literature including your paper, re-training is needed.
    Ans: This is required after sparcification and not required after quantization. For more details please refer to Caffe-jacinto models in github.
    github.com/.../caffe-jacinto-models

    Thanks,
    Praveen
  • Hi Praveen,

    1) If two tensors (scalars for simplicity), each 8 bits, and each close to the maximum of the range of this tensor, were multiplied and the result was converted to 8-bits, there is clearly overflow. Are the details of the forward-pass of a quantized net clarified somewhere?
    2) The link to Caffe Jacinto does not seem very relevant for me. I am using Tensorflow and I could not find relevant info there.

    Thank you.
    Best,
    Safwan
  • 1. It is not direct 16bit to 8-bit conversion. 16/32 bit accumulators are quantized to 8-bit using scale factor. The logic is very similar to the one referred in the below blog. But fully implemented using fixed point operation and uses dynamic range computation
    petewarden.com/.../
    2.We don't support sparfication using tensorflow. So sparcification is not relevant to your case