UPSTREAM PR #30864: Ml dsa speedup by loci-dev · Pull Request #670 · auroralabs-loci/openssl

loci-dev · 2026-04-17T05:01:03Z

Note

Source pull request: openssl/openssl#30864

Drop value barrier from ML-DSA reduce_once

[ The second commit is the substance of this PR, the first commit is just the CT tests from #30863 ]

This mirrors the corresponding code in ML-KEM and works under
the same conditions/assumptions.

Instentionally uses the constant time instrumentation PR as its
merge-base, so to be merged after than has baked in for a few
days and shows working CT tests in daily CI runs.

Sample before/after performance pairs for one X86_64 CPU:

                keygens/s    sign/s  verify/s
   -  ML-DSA-44   18066.4    6014.1   23375.7
   +  ML-DSA-44   20404.4    7105.4   26455.0
   -  ML-DSA-65   10131.3    3567.9   14172.5
   +  ML-DSA-65   11148.6    4358.6   15762.0
   -  ML-DSA-87    7239.2    2912.2    8214.2
   +  ML-DSA-87    8098.4    3518.5    9299.8

Checklist

documentation is added or updated
tests are added or updated

Also slightly refactor the ML-KEM version to share the necesasry defines, and add a daily CI run to check both (presently, for just some platforms with known working valgrind support).

Don't declassify rho_prime, that needs to stay protected.

Move constish_time_non_zero() to <internal/constant_time.h> as requested by reviewers, and rename it constish_time_true(), better reflecting the expected 0/1 boolean input.

- New CONSTTIME_SECRET_VECTOR() and CONSTTIME_DECLASSIFY_VECTOR() macros simplify CT labeling of ML-DSA vectors and avoid incorrect sizing. - New constant_time_declassify_u32() inline function mirrors a similar function in BoringSSL, with this we declassify the output pass/fail of rejection tests, rather than its numeric inputs, matching similar code in BoringSSL.

Use rank not 2 in ML-KEM decap classify_bytes

This mirrors the corresponding code in ML-KEM and works under the same conditions/assumptions. Also adjusted related functions with unnecessary 2-layers of constant_time selects where one suffices (now also matching BoringSSL). Intentionally uses the constant time instrumentation PR as its merge-base, so to be merged after than has baked in for a few days and shows working CT tests in daily CI runs. Sample before/after performance pairs and percent throughput increases for one X86_64 CPU: keygens/s sign/s verify/s ML-DSA-44 18728.3 6061.2 23251.6 ML-DSA-44 21077.2 7392.4 27244.3 ML-DSA-44 12.5% 22.0% 17.2% ML-DSA-65 10084.3 3603.0 13988.6 ML-DSA-65 11197.9 4549.7 16208.4 ML-DSA-65 11.0% 26.3% 15.9% ML-DSA-87 7184.8 2917.3 8141.0 ML-DSA-87 8132.4 3693.7 9430.7 ML-DSA-87 13.2% 26.6% 15.8% and here's the same for an Apple silicon M2: keygens/s sign/s verify/s ML-DSA-44 17235.7 3099.3 15744.5 ML-DSA-44 21855.2 4907.6 22849.0 ML-DSA-44 26.8% 58.3% 45.1% ML-DSA-65 9165.8 1908.5 10058.3 ML-DSA-65 11262.7 3069.6 14348.1 ML-DSA-65 22.9% 60.8% 42.6% ML-DSA-87 6596.1 1563.6 6330.8 ML-DSA-87 8404.9 2584.6 8767.6 ML-DSA-87 27.4% 65.3% 38.5%

Viktor Dukhovni added 2 commits April 16, 2026 20:02

Add valgrind CT support to ML-DSA

3bc9a41

Also slightly refactor the ML-KEM version to share the necesasry defines, and add a daily CI run to check both (presently, for just some platforms with known working valgrind support).

fixup! Add valgrind CT support to ML-DSA

bc8f0e4

Don't declassify rho_prime, that needs to stay protected.

loci-dev had a problem deploying to PROD__AL_DEMO April 17, 2026 05:01 — with GitHub Actions Failure

fixup! fixup! Add valgrind CT support to ML-DSA

65f3b5b

Move constish_time_non_zero() to <internal/constant_time.h> as requested by reviewers, and rename it constish_time_true(), better reflecting the expected 0/1 boolean input.

loci-dev force-pushed the main branch from be164c5 to b5b7577 Compare April 18, 2026 03:32

Viktor Dukhovni added 3 commits April 18, 2026 15:06

fixup! fixup! fixup! fixup! Add valgrind CT support to ML-DSA

352e7af

Use rank not 2 in ML-KEM decap classify_bytes

loci-dev force-pushed the loci/pr-30864-ml-dsa-speedup branch from 7df28a7 to 231580f Compare April 20, 2026 03:43

loci-dev had a problem deploying to PROD__AL_DEMO April 20, 2026 03:43 — with GitHub Actions Failure

loci-dev force-pushed the main branch 5 times, most recently from 421b135 to 770bf14 Compare April 28, 2026 03:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #30864: Ml dsa speedup#670

UPSTREAM PR #30864: Ml dsa speedup#670
loci-dev wants to merge 6 commits into
mainfrom
loci/pr-30864-ml-dsa-speedup

loci-dev commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

loci-dev commented Apr 17, 2026

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant