Fix GitHub issue #792: Fast gradient clipping ignores ignore_index masking by HuanyuZhang · Pull Request #808 · meta-pytorch/opacus

HuanyuZhang · 2026-03-06T16:31:02Z

Summary:
Context/Motivation: Fixes #792

When using fast/ghost gradient clipping for NLP tasks, DPLossFastGradientClipping
computes per-sample mean loss via .mean(dim=1), which divides by the full sequence
length. This ignores the ignore_index parameter from the criterion (e.g.,
CrossEntropyLoss(ignore_index=-100)), causing masked/padded positions to dilute
the loss. For tasks like SQuAD where only a few tokens are real targets out of a
long sequence, the loss becomes orders of magnitude too small, preventing training.

This diff:

Modified DPLossFastGradientClipping.__call__() to check for ignore_index on the
criterion and compute mean only over non-ignored positions when present
Added regression test github_issue_test.py verifying ignore_index is respected
for both mean and sum reductions, plus a backwards-compatibility test for the
no-masking case

Differential Revision: D95489302

meta-codesync · 2026-03-06T16:31:09Z

@HuanyuZhang has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95489302.

…ore_index masking (meta-pytorch#808) Summary: Context/Motivation: Fixes meta-pytorch#792 When using fast/ghost gradient clipping for NLP tasks, `DPLossFastGradientClipping` computes per-sample mean loss via `.mean(dim=1)`, which divides by the full sequence length. This ignores the `ignore_index` parameter from the criterion (e.g., `CrossEntropyLoss(ignore_index=-100)`), causing masked/padded positions to dilute the loss. For tasks like SQuAD where only a few tokens are real targets out of a long sequence, the loss becomes orders of magnitude too small, preventing training. This diff: - Modified `DPLossFastGradientClipping.__call__()` to check for `ignore_index` on the criterion and compute mean only over non-ignored positions when present - Added regression test `github_issue_test.py` verifying ignore_index is respected for both mean and sum reductions, plus a backwards-compatibility test for the no-masking case Differential Revision: D95489302

…ore_index masking (meta-pytorch#808) Summary: Context/Motivation: Fixes meta-pytorch#792 When using fast/ghost gradient clipping for NLP tasks, `DPLossFastGradientClipping` computes per-sample mean loss via `.mean(dim=1)`, which divides by the full sequence length. This ignores the `ignore_index` parameter from the criterion (e.g., `CrossEntropyLoss(ignore_index=-100)`), causing masked/padded positions to dilute the loss. For tasks like SQuAD where only a few tokens are real targets out of a long sequence, the loss becomes orders of magnitude too small, preventing training. This diff: - Modified `DPLossFastGradientClipping.__call__()` to check for `ignore_index` on the criterion and compute mean only over non-ignored positions when present - Added regression test `github_issue_test.py` verifying ignore_index is respected for both mean and sum reductions, plus a backwards-compatibility test for the no-masking case Reviewed By: aparna-aketi Differential Revision: D95489302

meta-codesync · 2026-03-09T17:57:11Z

This pull request has been merged in 8493eeb.

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 6, 2026

meta-codesync Bot added fb-exported meta-exported labels Mar 6, 2026

HuanyuZhang force-pushed the export-D95489302 branch 2 times, most recently from c2efd81 to 1e0a727 Compare March 7, 2026 14:40

HuanyuZhang force-pushed the export-D95489302 branch from 1e0a727 to 9b82a22 Compare March 9, 2026 16:01

meta-codesync Bot closed this in 8493eeb Mar 9, 2026

facebook-github-tools Bot added the Merged label Mar 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GitHub issue #792: Fast gradient clipping ignores ignore_index masking#808

Fix GitHub issue #792: Fast gradient clipping ignores ignore_index masking#808
HuanyuZhang wants to merge 1 commit into
meta-pytorch:mainfrom
HuanyuZhang:export-D95489302

HuanyuZhang commented Mar 6, 2026

Uh oh!

meta-codesync Bot commented Mar 6, 2026

Uh oh!

meta-codesync Bot commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HuanyuZhang commented Mar 6, 2026

Uh oh!

meta-codesync Bot commented Mar 6, 2026

Uh oh!

meta-codesync Bot commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant