Hello,
I am trying to understand how MetaQuant works. Reading the paper, it seems that STE is usage is not required at all. However reading the code, it seems STE is used in order to derive a gradient:
https://github.com/csyhhu/MetaQuant/blob/master/utils/quantize.py#L25
https://github.com/csyhhu/MetaQuant/blob/master/utils/quantize.py#L41
It would be very helpful if you could help me understand this point.
Hello,
I am trying to understand how MetaQuant works. Reading the paper, it seems that STE is usage is not required at all. However reading the code, it seems STE is used in order to derive a gradient:
https://github.com/csyhhu/MetaQuant/blob/master/utils/quantize.py#L25
https://github.com/csyhhu/MetaQuant/blob/master/utils/quantize.py#L41
It would be very helpful if you could help me understand this point.