In pretrain get_loss function, loss_lm is calculated by mean.
Because of this, all zero values in loss_lm handles as a correct answer.
So, I think we need to change mean to numerator / denominator like tensorflow.
loss_lm = (loss_lm * masked_weights.float()).mean()
to
loss_lm_numerator = (loss_lm*masked_weights.float()).sum()
loss_lm_denominator = masked_weights.sum() + 1e-5
loss_lm = loss_lm_numerator / loss_lm_denominator
Is it correct?
In pretrain get_loss function, loss_lm is calculated by mean.
Because of this, all zero values in loss_lm handles as a correct answer.
So, I think we need to change mean to numerator / denominator like tensorflow.
loss_lm = (loss_lm * masked_weights.float()).mean()
to
loss_lm_numerator = (loss_lm*masked_weights.float()).sum()
loss_lm_denominator = masked_weights.sum() + 1e-5
loss_lm = loss_lm_numerator / loss_lm_denominator
Is it correct?