hello, I am very interested in your paper. thank you for the implementation. but I have some questions about your code。
in this line:
|
self.meta_weight = self.weight - \ |
self.meta_weight = self.weight - \ lr * (self.calibrated_grads \ + (self.weight.grad.data - self.calibrated_grads.data).detach())
why not using the
self.calibrated_grads directly? instead, you used the refine gradients:
self.weight.grad.
furthermore, the weights have been updated in the main function using the refine gradients.
so i am very confused why using the refine gradients again!
hello, I am very interested in your paper. thank you for the implementation. but I have some questions about your code。
in this line:
MetaQuant/meta_utils/meta_quantized_module.py
Line 86 in 3169e0b
self.meta_weight = self.weight - \ lr * (self.calibrated_grads \ + (self.weight.grad.data - self.calibrated_grads.data).detach())why not using the
self.calibrated_gradsdirectly? instead, you used the refine gradients:self.weight.grad.furthermore, the weights have been updated in the main function using the refine gradients.
so i am very confused why using the refine gradients again!