This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM-Q1: An issue with the weight decay setting when using Pytorch to train a model

Part Number: TDA4VM-Q1

Dear Sir,

      We observed that the "">github.com/.../tidl_fsg_quantization.md" suggested:the weight decay factor should not be too small. We have used a weight decay factor of 1e-4 for training several networks and we highly recommend a similar value. Using small values such as 1e-5 is not recommended.

We want to know what scenario or conditions this setup came to the conclusion? Because we observed in our experiments that the quantification error was smaller when setting a larger weight decay, such as 1e-2.

  • Your observation is correct - quantization error will be smaller when weight decay is higher. That is why we suggested not to use too small weight decay. Too high weight decay can affect the accuracy as well. Basically you have to find optimal values for all these hyper parameters to get best results.