Thank you for your great work; it has been incredibly helpful!
I have a question regarding training on your released dataset. During training, the gradient norm is mostly restricted to 0.5. However, at certain steps, the gradient spikes significantly, as shown in the attached image. This disrupts the training process and even triggers a PyTorch watchdog interrupt.
Could you please provide some guidance or advice on how to resolve this issue?

Thank you for your great work; it has been incredibly helpful!
I have a question regarding training on your released dataset. During training, the gradient norm is mostly restricted to 0.5. However, at certain steps, the gradient spikes significantly, as shown in the attached image. This disrupts the training process and even triggers a PyTorch watchdog interrupt.
Could you please provide some guidance or advice on how to resolve this issue?