Hi!
thanks for this little piece of juicy code!
Just for curiosity, I've noticed that in your implementation you are using nn.LayerNorm with the standard denominator constant eps=1e-5, whereas in other implementations (DINO [here] and ViT in timm[here]) this parameter is explicitly set to eps=1e-6.
I know that it is a small detail, but details sometimes are super-important for having better models.
Do you think the model is sensitive to this kind of parameter change? Have you ever tried/noticed it?
Thanks!
Hi!
thanks for this little piece of juicy code!
Just for curiosity, I've noticed that in your implementation you are using
nn.LayerNormwith the standard denominator constanteps=1e-5, whereas in other implementations (DINO[here] andViTintimm[here]) this parameter is explicitly set toeps=1e-6.I know that it is a small detail, but details sometimes are super-important for having better models.
Do you think the model is sensitive to this kind of parameter change? Have you ever tried/noticed it?
Thanks!