WeightedRandomSampler causes silent epsilon miscalculation)

## 🐛 Bug
When `DataLoader` uses `WeightedRandomSampler`, `make_private_with_epsilon` 
silently computes `sample_rate` from `len(sampler)` (e.g. 128 = num_samples) 
instead of the full dataset size. This produces `sample_rate ≈ 0.31` instead 
of `≈ 0.000007` — a 45,000x difference — causing the entire privacy budget 
to burn in a single epoch with no warning or error.

**Related to #600** (which notes the sampler is replaced) but this issue 
focuses on the privacy accounting consequence: epsilon tracking is 
silently invalid.

### Please reproduce using our template Colab


###To Reproduce
```python
import torch
from torch.utils.data import TensorDataset, DataLoader, WeightedRandomSampler
from opacus import PrivacyEngine

# Simulate a dataset of 100,000 samples
X = torch.randn(100_000, 10)
y = torch.randint(0, 2, (100_000,))
dataset = TensorDataset(X, y)

# WeightedRandomSampler with 128 samples drawn per epoch
weights = torch.ones(100_000)
sampler = WeightedRandomSampler(weights, num_samples=128, replacement=True)
loader = DataLoader(dataset, batch_size=16, sampler=sampler)

model = torch.nn.Linear(10, 2)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

privacy_engine = PrivacyEngine()
model, optimizer, loader = privacy_engine.make_private_with_epsilon(
    module=model,
    optimizer=optimizer,
    data_loader=loader,
    epochs=1,
    target_epsilon=8.0,
    target_delta=1e-5,
    max_grad_norm=1.0,
)

print("real sample_rate (correct):      ", 16 / len(dataset))
print("sample_rate used by accountant:  ", 
      optimizer.expected_batch_size / len(loader.dataset))
print("original batch_size:", 16)
print("expected_batch_size:", optimizer.expected_batch_size)
print("len(dataset):       ", len(dataset))
print("len(loader.sampler):", len(loader.sampler))
# Expected: ~0.00016  (16 / 100_000)
# Actual:   0.125     (16 / 128)  
```
**Observed:** 
**No warning or error is raised.**
Confirmed output from running the reproduce script:
    sample_rate used by accountant: 0.125
Correct sample_rate should be 16 / 100_000 = 0.00016.
Ratio: 0.125 / 0.00016 = **781x faster epsilon burn** than expected.
(Earlier estimate of 45,000x assumed num_samples=1 edge case — 
the real multiplier depends on num_samples in the sampler. 
With num_samples=128 and batch_size=16, the ratio is 781x.)

## Expected behavior
Either:
1. Raise a warning when `WeightedRandomSampler` is detected, explaining 
   that privacy accounting may be incorrect

Suggested warning (minimal fix):
```python
if isinstance(data_loader.sampler, WeightedRandomSampler):
    warnings.warn(
        "WeightedRandomSampler detected. Opacus replaces it with "
        "UniformWithReplacementSampler for Poisson sampling. "
        "Privacy accounting uses batch_size/dataset_size as sample_rate. "
        "Your epsilon tracking may be incorrect.",
        UserWarning
    )
or
```python
# After sampler replacement, recompute sample_rate:
sample_rate = batch_size / len(new_data_loader.dataset)
```
```
Workaround until fixed: replace `WeightedRandomSampler` with `shuffle=True`.
### Environment

- PyTorch Version: 2.11.0+cu128
- OS: Linux
- Python version: 3.10.20 
- CUDA/cuDNN version: 12.8
- GPU: NVIDIA RTX 5090 (Blackwell architecture, 32GB VRAM)
- Opacus version: 1.5.4
## Additional context

The `len(loader.sampler) = 100000` after `make_private_with_epsilon` 
proves Opacus successfully replaced the sampler. The `sample_rate = 0.125` 
proves it was computed from the old sampler before replacement. 
The inconsistency between these two values is the bug.

Happy to open a PR fixing the sample_rate recomputation in 
`privacy_engine.py`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WeightedRandomSampler causes silent epsilon miscalculation) #813

🐛 Bug

Please reproduce using our template Colab

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

WeightedRandomSampler causes silent epsilon miscalculation) #813

Description

🐛 Bug

Please reproduce using our template Colab

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions