Abnormal Too Large Grad Norm

Thank you for your great work; it has been incredibly helpful!

I have a question regarding training on your released dataset. During training, the gradient norm is mostly restricted to 0.5. However, at certain steps, the gradient spikes significantly, as shown in the attached image. This disrupts the training process and even triggers a PyTorch watchdog interrupt.

Could you please provide some guidance or advice on how to resolve this issue?

<img width="612" height="395" alt="Image" src="https://github.com/user-attachments/assets/bbce8657-bc0b-4a61-ab63-dee4d79eb161" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abnormal Too Large Grad Norm #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Abnormal Too Large Grad Norm #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions