π Bug
When DataLoader uses WeightedRandomSampler, make_private_with_epsilon
silently computes sample_rate from len(sampler) (e.g. 128 = num_samples)
instead of the full dataset size. This produces sample_rate β 0.31 instead
of β 0.000007 β a 45,000x difference β causing the entire privacy budget
to burn in a single epoch with no warning or error.
Related to #600 (which notes the sampler is replaced) but this issue
focuses on the privacy accounting consequence: epsilon tracking is
silently invalid.
Please reproduce using our template Colab
###To Reproduce
import torch
from torch.utils.data import TensorDataset, DataLoader, WeightedRandomSampler
from opacus import PrivacyEngine
# Simulate a dataset of 100,000 samples
X = torch.randn(100_000, 10)
y = torch.randint(0, 2, (100_000,))
dataset = TensorDataset(X, y)
# WeightedRandomSampler with 128 samples drawn per epoch
weights = torch.ones(100_000)
sampler = WeightedRandomSampler(weights, num_samples=128, replacement=True)
loader = DataLoader(dataset, batch_size=16, sampler=sampler)
model = torch.nn.Linear(10, 2)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
privacy_engine = PrivacyEngine()
model, optimizer, loader = privacy_engine.make_private_with_epsilon(
module=model,
optimizer=optimizer,
data_loader=loader,
epochs=1,
target_epsilon=8.0,
target_delta=1e-5,
max_grad_norm=1.0,
)
print("real sample_rate (correct): ", 16 / len(dataset))
print("sample_rate used by accountant: ",
optimizer.expected_batch_size / len(loader.dataset))
print("original batch_size:", 16)
print("expected_batch_size:", optimizer.expected_batch_size)
print("len(dataset): ", len(dataset))
print("len(loader.sampler):", len(loader.sampler))
# Expected: ~0.00016 (16 / 100_000)
# Actual: 0.125 (16 / 128)
Observed:
No warning or error is raised.
Confirmed output from running the reproduce script:
sample_rate used by accountant: 0.125
Correct sample_rate should be 16 / 100_000 = 0.00016.
Ratio: 0.125 / 0.00016 = 781x faster epsilon burn than expected.
(Earlier estimate of 45,000x assumed num_samples=1 edge case β
the real multiplier depends on num_samples in the sampler.
With num_samples=128 and batch_size=16, the ratio is 781x.)
Expected behavior
Either:
- Raise a warning when
WeightedRandomSampler is detected, explaining
that privacy accounting may be incorrect
Suggested warning (minimal fix):
if isinstance(data_loader.sampler, WeightedRandomSampler):
warnings.warn(
"WeightedRandomSampler detected. Opacus replaces it with "
"UniformWithReplacementSampler for Poisson sampling. "
"Privacy accounting uses batch_size/dataset_size as sample_rate. "
"Your epsilon tracking may be incorrect.",
UserWarning
)
or
```python
# After sampler replacement, recompute sample_rate:
sample_rate = batch_size / len(new_data_loader.dataset)
Workaround until fixed: replace `WeightedRandomSampler` with `shuffle=True`.
### Environment
- PyTorch Version: 2.11.0+cu128
- OS: Linux
- Python version: 3.10.20
- CUDA/cuDNN version: 12.8
- GPU: NVIDIA RTX 5090 (Blackwell architecture, 32GB VRAM)
- Opacus version: 1.5.4
## Additional context
The `len(loader.sampler) = 100000` after `make_private_with_epsilon`
proves Opacus successfully replaced the sampler. The `sample_rate = 0.125`
proves it was computed from the old sampler before replacement.
The inconsistency between these two values is the bug.
Happy to open a PR fixing the sample_rate recomputation in
`privacy_engine.py`.
π Bug
When
DataLoaderusesWeightedRandomSampler,make_private_with_epsilonsilently computes
sample_ratefromlen(sampler)(e.g. 128 = num_samples)instead of the full dataset size. This produces
sample_rate β 0.31insteadof
β 0.000007β a 45,000x difference β causing the entire privacy budgetto burn in a single epoch with no warning or error.
Related to #600 (which notes the sampler is replaced) but this issue
focuses on the privacy accounting consequence: epsilon tracking is
silently invalid.
Please reproduce using our template Colab
###To Reproduce
Observed:
No warning or error is raised.
Confirmed output from running the reproduce script:
sample_rate used by accountant: 0.125
Correct sample_rate should be 16 / 100_000 = 0.00016.
Ratio: 0.125 / 0.00016 = 781x faster epsilon burn than expected.
(Earlier estimate of 45,000x assumed num_samples=1 edge case β
the real multiplier depends on num_samples in the sampler.
With num_samples=128 and batch_size=16, the ratio is 781x.)
Expected behavior
Either:
WeightedRandomSampleris detected, explainingthat privacy accounting may be incorrect
Suggested warning (minimal fix):