TDA4VM-Q1: An issue with the weight decay setting when using Pytorch to train a model

jirui gao

Part Number: TDA4VM-Q1

Dear Sir,

We observed that the "">github.com/.../tidl_fsg_quantization.md" suggested：the weight decay factor should not be too small. We have used a weight decay factor of 1e-4 for training several networks and we highly recommend a similar value. Using small values such as 1e-5 is not recommended.

We want to know what scenario or conditions this setup came to the conclusion? Because we observed in our experiments that the quantification error was smaller when setting a larger weight decay, such as 1e-2.

over 1 year ago

0 Manu Mathew over 1 year ago

TI__Genius 11466 points

Your observation is correct - quantization error will be smaller when weight decay is higher. That is why we suggested not to use too small weight decay. Too high weight decay can affect the accuracy as well. Basically you have to find optimal values for all these hyper parameters to get best results.

Processors

Processors forum

TDA4VM-Q1: An issue with the weight decay setting when using Pytorch to train a model