You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to report a potential privacy-accounting mismatch related to DP-SGD with Poisson sampling and gradient normalization.
I found that Opacus versions v1.0.0--v1.6.0 appear to implement a floor-based Expected-Averaged Subsampled Gaussian Mechanism (FEASGM), as indicated by the floor-based expected-batch-size normalization in the implementation:
However, the privacy accounting appears to rely on the standard SGM-based analysis.
Our auditing shows that, in some small-dataset and high-dimensional-output settings, when the normalizing factor differs between two neighboring datasets, the privacy leakage can be very large: two neighboring datasets can become almost fully distinguishable. We further performed a privacy analysis under the f-DP framework:
Our analysis suggests that this occurs because the privacy guarantee deteriorates as the output dimension increases and can vanish as the output dimension approaches infinity.
In large-dataset settings, the privacy leakage may be bounded by the SGM guarantee.
In contrast, our analysis identifies the following additional findings:
In some small-dataset and high-dimensional-output settings, the privacy leakage can be significantly larger, and two neighboring datasets can become almost fully distinguishable.
Averaging by the realized batch size does not necessarily make the privacy leakage bounded by the SGM guarantee; ASGM can still leak more privacy than what is captured by SGM-based accounting.
In some large-dataset regimes, our analysis suggests that, under practical parameter settings, the actual privacy guarantee can still be weaker than the guarantee reported by the SGM-based accountant.
To Reproduce
This is not a runtime crash, so there is no traceback. The issue concerns the formal mechanism being accounted for.
The detailed auditing procedure, including the experimental setup, reproduction steps, empirical results, and theoretical analysis, is provided in our paper on arXiv:
I think it would be helpful if Opacus provided a warning or documentation explaining when the implemented mechanism may not exactly match the standard SGM assumption used by the privacy accountant. Clarifying this point would help users better understand the privacy guarantee provided by Opacus.
In particular, such a warning or documentation could ask users to check whether the floor-based expected-batch-size normalizer changes between neighboring dataset sizes, e.g., from $\lfloor Nq \rfloor$ to $\lfloor (N+1)q \rfloor$, such as for datasets with 190 and 191 records. If these two normalizers differ, a potential mismatch may arise between the FEASGM-style implementation and the SGM-based privacy accounting.
I look forward to discussing this issue further with the maintainers and hearing your thoughts on this potential mismatch.
🐛 Bug
I would like to report a potential privacy-accounting mismatch related to DP-SGD with Poisson sampling and gradient normalization.
I found that Opacus versions v1.0.0--v1.6.0 appear to implement a floor-based Expected-Averaged Subsampled Gaussian Mechanism (FEASGM), as indicated by the floor-based expected-batch-size normalization in the implementation:
https://github.com/meta-pytorch/opacus/blame/v1.6.0/opacus/privacy_engine.py#L441C10-L441C10
However, the privacy accounting appears to rely on the standard SGM-based analysis.
Our auditing shows that, in some small-dataset and high-dimensional-output settings, when the normalizing factor differs between two neighboring datasets, the privacy leakage can be very large: two neighboring datasets can become almost fully distinguishable. We further performed a privacy analysis under the f-DP framework:
https://academic.oup.com/jrsssb
Our analysis suggests that this occurs because the privacy guarantee deteriorates as the output dimension increases and can vanish as the output dimension approaches infinity.
This issue is related to the prior bug report:
#571
However, our findings differ from and extend that report.
As we understand it, issue #571 made the following observations:
In contrast, our analysis identifies the following additional findings:
To Reproduce
This is not a runtime crash, so there is no traceback. The issue concerns the formal mechanism being accounted for.
The detailed auditing procedure, including the experimental setup, reproduction steps, empirical results, and theoretical analysis, is provided in our paper on arXiv:
https://arxiv.org/abs/2605.15648
Expected behavior
I think it would be helpful if Opacus provided a warning or documentation explaining when the implemented mechanism may not exactly match the standard SGM assumption used by the privacy accountant. Clarifying this point would help users better understand the privacy guarantee provided by Opacus.
In particular, such a warning or documentation could ask users to check whether the floor-based expected-batch-size normalizer changes between neighboring dataset sizes, e.g., from$\lfloor Nq \rfloor$ to $\lfloor (N+1)q \rfloor$ , such as for datasets with 190 and 191 records. If these two normalizers differ, a potential mismatch may arise between the FEASGM-style implementation and the SGM-based privacy accounting.
I look forward to discussing this issue further with the maintainers and hearing your thoughts on this potential mismatch.