Congratulations you guys make a big progress in deep learning nerual nets.
I found an interesting thing in the model initialization, the diff is so large when BN module initialized with normal distribution, as the weights is 1 and bias is zero.
There must be something I didn’t know.
Wish you guys can help me to figure this out.