Fix incorrect sample_rate with WeightedRandomSampler (Issue #813) by intagliated · Pull Request #816 · meta-pytorch/opacus

intagliated · 2026-04-27T18:43:18Z

Problem Statement

When using WeightedRandomSampler with a DataLoader, Opacus's make_private() and make_private_with_epsilon() were computing an incorrect sample_rate.

The original logic derived the rate from 1 / len(data_loader). However, for weighted samplers, len(data_loader) returns the number of batches, not the dataset size. This caused the privacy budget to be consumed significantly faster than reported, effectively breaking the Differential Privacy (DP) guarantees.

Impact: In a dataset of 100k samples with a batch size of 16, the rate was computed as 0.0078 instead of the correct 0.00016—a ~781x discrepancy in epsilon consumption.

Solution

The calculation was refactored to be mathematically consistent by grounding the rate in the absolute dataset length and explicit batch size.

Key Changes:

Standardized Formula: Implemented $sample_rate = \frac{batch_size}{N}$ across all sampler types to ensure the accountant matches the physical sampling probability.
Metadata Privacy: Added support for metadata_epsilon, allowing Laplace noise injection into the dataset size $N$ used for accounting. This protects metadata privacy while maintaining a perfect 1.0x ratio between the accountant and the sampler.
Robust Detection: Added logic to extract batch_size from either the DataLoader or BatchSampler to handle NoneType edge cases in newer PyTorch versions.
User Safety: Added an explicit UserWarning when WeightedRandomSampler is detected to ensure transparency in how the privacy rate is derived.

Verification

Validated using a dedicated audit script (verify_randomness.py) comparing the expected privacy ratio against the actual consumption.

Sampler Type	Before Fix (Ratio)	After Fix (Ratio)	Status
Standard Shuffle	1.0x	1.0x	✅ PASS
WeightedRandomSampler	781.2x	1.0x	✅ PASS

…rch#813) WeightedRandomSampler caused sample_rate to be computed from num_samples instead of dataset size, burning epsilon 781x faster than expected silently. Fix: compute sample_rate as batch_size / len(dataset) which is correct for all sampler types. Also adds UserWarning when WeightedRandomSampler is detected. Same fix applied to DPDataLoader.from_data_loader().

meta-codesync · 2026-04-27T18:46:42Z

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this in D102646860. (Because this pull request was imported automatically, there will not be any future comments.)

intagliated · 2026-05-24T09:10:07Z

Hi @HuanyuZhang, I noticed this PR was imported into Meta's internal system (D98158224) and assigned a month ago.
#814 is a PR on the same issue and it was already merged. Just checking in to see if there will be any feedback from the internal review team or if there are any additional changes needed on my end to move this toward a merge.

Thanks for your time.

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 27, 2026

Merge branch 'meta-pytorch:main' into fix-weighted-sampler-final

2d7f6f9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix incorrect sample_rate with WeightedRandomSampler (Issue #813)#816

Fix incorrect sample_rate with WeightedRandomSampler (Issue #813)#816
intagliated wants to merge 2 commits into
meta-pytorch:mainfrom
intagliated:fix-weighted-sampler-final

intagliated commented Apr 27, 2026

Uh oh!

meta-codesync Bot commented Apr 27, 2026

Uh oh!

intagliated commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

intagliated commented Apr 27, 2026

Problem Statement

Solution

Verification

Uh oh!

meta-codesync Bot commented Apr 27, 2026

Uh oh!

intagliated commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant