UPSTREAM PR #30864: Ml dsa speedup#670
Open
loci-dev wants to merge 6 commits into
Open
Conversation
added 2 commits
April 16, 2026 20:02
Also slightly refactor the ML-KEM version to share the necesasry defines, and add a daily CI run to check both (presently, for just some platforms with known working valgrind support).
Don't declassify rho_prime, that needs to stay protected.
Move constish_time_non_zero() to <internal/constant_time.h> as requested by reviewers, and rename it constish_time_true(), better reflecting the expected 0/1 boolean input.
added 3 commits
April 18, 2026 15:06
- New CONSTTIME_SECRET_VECTOR() and CONSTTIME_DECLASSIFY_VECTOR() macros simplify CT labeling of ML-DSA vectors and avoid incorrect sizing. - New constant_time_declassify_u32() inline function mirrors a similar function in BoringSSL, with this we declassify the output pass/fail of rejection tests, rather than its numeric inputs, matching similar code in BoringSSL.
Use rank not 2 in ML-KEM decap classify_bytes
This mirrors the corresponding code in ML-KEM and works under
the same conditions/assumptions. Also adjusted related
functions with unnecessary 2-layers of constant_time selects
where one suffices (now also matching BoringSSL).
Intentionally uses the constant time instrumentation PR as its
merge-base, so to be merged after than has baked in for a few
days and shows working CT tests in daily CI runs.
Sample before/after performance pairs and percent throughput
increases for one X86_64 CPU:
keygens/s sign/s verify/s
ML-DSA-44 18728.3 6061.2 23251.6
ML-DSA-44 21077.2 7392.4 27244.3
ML-DSA-44 12.5% 22.0% 17.2%
ML-DSA-65 10084.3 3603.0 13988.6
ML-DSA-65 11197.9 4549.7 16208.4
ML-DSA-65 11.0% 26.3% 15.9%
ML-DSA-87 7184.8 2917.3 8141.0
ML-DSA-87 8132.4 3693.7 9430.7
ML-DSA-87 13.2% 26.6% 15.8%
and here's the same for an Apple silicon M2:
keygens/s sign/s verify/s
ML-DSA-44 17235.7 3099.3 15744.5
ML-DSA-44 21855.2 4907.6 22849.0
ML-DSA-44 26.8% 58.3% 45.1%
ML-DSA-65 9165.8 1908.5 10058.3
ML-DSA-65 11262.7 3069.6 14348.1
ML-DSA-65 22.9% 60.8% 42.6%
ML-DSA-87 6596.1 1563.6 6330.8
ML-DSA-87 8404.9 2584.6 8767.6
ML-DSA-87 27.4% 65.3% 38.5%
7df28a7 to
231580f
Compare
421b135 to
770bf14
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
Source pull request: openssl/openssl#30864
Drop value barrier from ML-DSA reduce_once
[ The second commit is the substance of this PR, the first commit is just the CT tests from #30863 ]
This mirrors the corresponding code in ML-KEM and works under
the same conditions/assumptions.
Instentionally uses the constant time instrumentation PR as its
merge-base, so to be merged after than has baked in for a few
days and shows working CT tests in daily CI runs.
Sample before/after performance pairs for one X86_64 CPU:
Checklist