We should probably add the gradnorm to the tensorboard plots to see what the convergence is like over time.
We should probably add the gradnorm to the tensorboard plots to see what the convergence is like over time.