I saw in the original model, they disable the bias in the conv layers and add a bias in the scale layers. Since in mxnet, the batch norm layers have both the scale and the bias, I am wondering if it would make a difference without disabling the bias term.
I saw in the original model, they disable the bias in the conv layers and add a bias in the scale layers. Since in mxnet, the batch norm layers have both the scale and the bias, I am wondering if it would make a difference without disabling the bias term.