Potential mismatch between FEASGM-style normalization and SGM-based privacy accounting

## 🐛 Bug



I would like to report a potential privacy-accounting mismatch related to DP-SGD with Poisson sampling and gradient normalization.

I found that Opacus versions v1.0.0--v1.6.0 appear to implement a floor-based Expected-Averaged Subsampled Gaussian Mechanism (FEASGM), as indicated by the floor-based expected-batch-size normalization in the implementation:

https://github.com/meta-pytorch/opacus/blame/v1.6.0/opacus/privacy_engine.py#L441C10-L441C10

However, the privacy accounting appears to rely on the standard SGM-based analysis.

Our auditing shows that, in some small-dataset and high-dimensional-output settings, when the normalizing factor differs between two neighboring datasets, the privacy leakage can be very large: two neighboring datasets can become almost fully distinguishable. We further performed a privacy analysis under the f-DP framework:

https://academic.oup.com/jrsssb

Our analysis suggests that this occurs because the privacy guarantee deteriorates as the output dimension increases and can vanish as the output dimension approaches infinity.

This issue is related to the prior bug report:

https://github.com/meta-pytorch/opacus/issues/571

However, our findings differ from and extend that report.

As we understand it, issue #571 made the following observations:

1. In small-dataset settings, the empirical privacy leakage was estimated to be around 2.5.
2. Averaging by the realized batch size, which we denote as Averaged SGM (ASGM), may make the leakage bounded by the SGM guarantee https://github.com/meta-pytorch/opacus/issues/571#issuecomment-1458465792.
3. In large-dataset settings, the privacy leakage may be bounded by the SGM guarantee.

In contrast, our analysis identifies the following additional findings:

1. In some small-dataset and high-dimensional-output settings, the privacy leakage can be significantly larger, and two neighboring datasets can become almost fully distinguishable.
2. Averaging by the realized batch size does not necessarily make the privacy leakage bounded by the SGM guarantee; ASGM can still leak more privacy than what is captured by SGM-based accounting.
3. In some large-dataset regimes, our analysis suggests that, under practical parameter settings, the actual privacy guarantee can still be weaker than the guarantee reported by the SGM-based accountant.


## To Reproduce

This is not a runtime crash, so there is no traceback. The issue concerns the formal mechanism being accounted for.

The detailed auditing procedure, including the experimental setup, reproduction steps, empirical results, and theoretical analysis, is provided in our paper on arXiv:

https://arxiv.org/abs/2605.15648

## Expected behavior

I think it would be helpful if Opacus provided a warning or documentation explaining when the implemented mechanism may not exactly match the standard SGM assumption used by the privacy accountant. Clarifying this point would help users better understand the privacy guarantee provided by Opacus.

In particular, such a warning or documentation could ask users to check whether the floor-based expected-batch-size normalizer changes between neighboring dataset sizes, e.g., from $\lfloor Nq \rfloor$ to $\lfloor (N+1)q \rfloor$, such as for datasets with 190 and 191 records. If these two normalizers differ, a potential mismatch may arise between the FEASGM-style implementation and the SGM-based privacy accounting.

I look forward to discussing this issue further with the maintainers and hearing your thoughts on this potential mismatch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential mismatch between FEASGM-style normalization and SGM-based privacy accounting #819

🐛 Bug

To Reproduce

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Potential mismatch between FEASGM-style normalization and SGM-based privacy accounting #819

Description

🐛 Bug

To Reproduce

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions