Dear Sir,
We observed that the "">github.com/.../tidl_fsg_quantization.md" suggested:the weight decay factor should not be too small. We have used a weight decay factor of 1e-4 for training several networks and we highly recommend a similar value. Using small values such as 1e-5 is not recommended.
We want to know what scenario or conditions this setup came to the conclusion? Because we observed in our experiments that the quantification error was smaller when setting a larger weight decay, such as 1e-2.