Skip to content

Fix adaptive noise accounting#807

Closed
david-stan wants to merge 6 commits into
meta-pytorch:mainfrom
JetBrains-Research:fix-adaptive-noise-accounting
Closed

Fix adaptive noise accounting#807
david-stan wants to merge 6 commits into
meta-pytorch:mainfrom
JetBrains-Research:fix-adaptive-noise-accounting

Conversation

@david-stan
Copy link
Copy Markdown
Contributor

@david-stan david-stan commented Feb 2, 2026

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Docs change / refactoring / dependency upgrade

Motivation and Context / Related issue

This PR fixes incorrect privacy accounting for AdaClipDPOptimizer.

The AdaClip optimizer internally adjusts the noise_multiplier based on Theorem 1 from the AdaClip paper:
σ_eff = (σ^-2 - (2σ_u)^-2)^(-1/2)

where σ is the user-provided noise multiplier and σ_u is the unclipped_num_std parameter.

The adjusted σ_eff is used internally for noise generation during training. However, privacy accountants (rdp, prv,...) were using this adjusted value instead of the original user-provided σ for privacy budget calculations, resulting in incorrect epsilon values.

Introduced an accounting_noise_multiplier property that:

  • Returns noise_multiplier for standard DPOptimizer (backward compatible)
  • Returns the original user-provided value for AdaClipDPOptimizer (before adjustment)
  • Is used by the accountant hook for correct privacy accounting

The internal noise generation continues to use the adjusted value as intended by the AdaClip algorithm, while privacy accounting now uses the correct original value.

Update!!

As of commit 1f9283a
Refactored the noise adjustment to be calculated locally within the add_noise() method where it's actually needed, rather than modifying self.noise_multiplier at initialization.

Now:

  • self.noise_multiplier always holds the original user-provided value (σ)
  • The adjusted value (σ_eff) is calculated on-the-fly only when adding noise to gradients
  • Privacy accountants automatically use the correct original value

This is simpler and more maintainable than introducing separate properties or modifying the accounting system.

How Has This Been Tested (if it applies)

Created comprehensive test suite in opacus/tests/accounting_noise_multiplier_test.py with 7 test cases:

  1. test_adaclip_preserves_noise_multiplier: Verifies AdaClip keeps original noise_multiplier unchanged
  2. test_adaclip_with_zero_noise: Edge case with zero noise_multiplier
  3. test_accountant_uses_original_noise_multiplier: Validates accountant.step() uses correct value
  4. test_privacy_accounting_with_adaclip_e2e: End-to-end integration test with PrivacyEngine and actual training loop
  5. test_comparison_dpoptimizer_vs_adaclip_accounting: Compares accounting behavior between standard and AdaClip optimizers
  6. test_adaclip_noise_adjustment_calculation: Validates the Theorem 1 formula produces correct results
  7. test_adaclip_constraint_validation: Ensures proper error when noise_multiplier ≥ 2 * unclipped_num_std

Checklist

  • The documentation is up-to-date with the changes I made.
  • I have read the CONTRIBUTING document and completed the CLA (see CONTRIBUTING).
  • All tests passed, and additional code has been covered with new tests.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 2, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Feb 2, 2026

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this in D92044480. (Because this pull request was imported automatically, there will not be any future comments.)

…nting_noise_multiplier` property and simplify privacy accounting logic
@iden-kalemaj
Copy link
Copy Markdown
Contributor

iden-kalemaj commented Feb 16, 2026

Thank you @david-stan for fixing this subtle bug. The fix looks good to me.

We also have an implementation of adaptive clipping + fast gradient clipping here, which incurs a similar bug. I'm curious if you've used adaptive clipping with fast gradient clipping? Let me know if you'd like to fix that one too. If not I can open an issue.

Please see the lint error.

@iden-kalemaj iden-kalemaj self-assigned this Feb 17, 2026
@coveralls
Copy link
Copy Markdown

Pull Request Test Coverage Report for Build 21669348710

Details

  • 96 of 98 (97.96%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.9%) to 79.078%

Changes Missing Coverage Covered Lines Changed/Added Lines %
opacus/optimizers/adaclipoptimizer.py 7 8 87.5%
opacus/tests/accounting_noise_multiplier_test.py 89 90 98.89%
Totals Coverage Status
Change from base Build 22050507086: 0.9%
Covered Lines: 5813
Relevant Lines: 7351

💛 - Coveralls

@david-stan
Copy link
Copy Markdown
Contributor Author

Thank you @david-stan for fixing this subtle bug. The fix looks good to me.

We also have an implementation of adaptive clipping + fast gradient clipping here, which incurs a similar bug. I'm curious if you've used adaptive clipping with fast gradient clipping? Let me know if you'd like to fix that one too. If not I can open an issue.

Please see the lint error.

I will fix those as well. We use fast gradient clipping as well, except for trl!

@david-stan
Copy link
Copy Markdown
Contributor Author

Also, kinda (un)related question regarding optimizer step..
On the line:

p.grad /= self.expected_batch_size * self.accumulated_iterations

why do we divide .grads by statistical average (expected_batch_size), if we at that moment already know the exact number of poisson samples?

@iden-kalemaj
Copy link
Copy Markdown
Contributor

iden-kalemaj commented Feb 20, 2026

Also, kinda (un)related question regarding optimizer step.. On the line:

p.grad /= self.expected_batch_size * self.accumulated_iterations

why do we divide .grads by statistical average (expected_batch_size), if we at that moment already know the exact number of poisson samples?

This comes from the DP-SGD algorithm and its analysis (https://arxiv.org/pdf/1607.00133). Using expected batch size ensures we have an unbiased estimator of the gradient. The epxected_batch_size is computed from the probability with which we sample datapoints.

@iden-kalemaj
Copy link
Copy Markdown
Contributor

@david-stan this looks good to me, can you please fix the failing lint error?

@iden-kalemaj iden-kalemaj self-requested a review February 20, 2026 21:42
@iden-kalemaj
Copy link
Copy Markdown
Contributor

Also, kinda (un)related question regarding optimizer step.. On the line:

p.grad /= self.expected_batch_size * self.accumulated_iterations

why do we divide .grads by statistical average (expected_batch_size), if we at that moment already know the exact number of poisson samples?

Actually, the more important reason we use expected_batch_size is that the actual batch size is nonprivate and we do not want to leak it.

@iden-kalemaj
Copy link
Copy Markdown
Contributor

@david-stan one last lint fix is needed to pass the isort

@meta-codesync meta-codesync Bot closed this in 9d3c2b0 Mar 6, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Mar 6, 2026

This pull request has been merged in 9d3c2b0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants