Skip to content

UPSTREAM PR #30926: SHA3: code refactor#680

Open
loci-dev wants to merge 2 commits into
mainfrom
loci/pr-30926-sha3_expand
Open

UPSTREAM PR #30926: SHA3: code refactor#680
loci-dev wants to merge 2 commits into
mainfrom
loci/pr-30926-sha3_expand

Conversation

@loci-dev
Copy link
Copy Markdown

Note

Source pull request: openssl/openssl#30926

Remove macro tables to make debugging easier.
The low level platform specific code has been moved into the crypto layer. sha3, shake, keccak and cshake-keccak now have seperate creator functions. (The intention is to remove higher level EVP calls from algorithms such as cshake, tuplehash and kmac).

Checklist
  • documentation is added or updated
  • tests are added or updated

slontis added 2 commits April 22, 2026 12:58
Remove macro tables to make debugging easier.
The low level platform specific code has been moved into the crypto layer.
sha3, shake, keccak and cshake-keccak now have seperate creator functions.
(The intention is to remove higher level EVP calls from algorithms such as
cshake, tuplehash and kmac).
@loci-review
Copy link
Copy Markdown

loci-review Bot commented Apr 23, 2026

Overview

This SHA3 provider refactoring improves performance across all 11 modified digest context initialization functions. Out of 20,086 total functions analyzed, 13 were modified, 4 added, and 6 removed. Response times improved 5-11% (19-46 ns), while throughput times improved 82-87% (81-113 ns) through code consolidation.

Binaries analyzed:

  • libcrypto.so: -0.072% power consumption (-182.62 nJ)
  • libssl.so: 0.0% change
  • openssl: 0.0% change

Function Analysis

High-impact functions:

  • cshake_keccak_256_newctx (libcrypto.so): Response time improved 11.15% (411.44 ns → 365.58 ns), throughput time improved 87.39% (129.31 ns → 16.30 ns). Refactored from 13 to 8 basic blocks, delegating initialization to ossl_cshake_keccak_new() helper.

  • cshake_keccak_128_newctx (libcrypto.so): Response time improved 11.14% (414.01 ns → 367.89 ns), throughput time improved 85.89% (131.87 ns → 18.61 ns). Similar consolidation pattern.

Moderate-impact functions:

  • keccak_256_newctx: -6.21% response time, -83.84% throughput time. Delegates to ossl_keccak_new().
  • sha3_256_newctx: -5.89% response time, -83.53% throughput time. Delegates to ossl_sha3_new().
  • shake_128_newctx: -5.40% response time, -83.74% throughput time. Delegates to ossl_shake_new().

Six additional SHA3/Keccak/SHAKE variants show similar 5-6% response time improvements and 82-83% throughput time reductions.

Code changes: Eliminated ~355 lines of macro-based code generation, replacing inline initialization (CRYPTO_zalloc + ossl_sha3_init) with delegation to four centralized helper functions. Control flow simplified from 11-13 basic blocks to 7-9 blocks, branching reduced from 3-4 to 1-2 branches, stack frames reduced from 32 to 16 bytes. Platform-specific hardware acceleration (S390X, ARMv8.2) moved from inline code to helpers.

Flame Graph Comparison

Base version:
Flame Graph: libcrypto.so::sha3_prov.c_cshake_keccak_256_newctx

Target version:
Flame Graph: libcrypto.so::sha3_prov.c_cshake_keccak_256_newctx

Base version shows flat call structure with direct CRYPTO_zalloc (168 ns) and ossl_keccak_init (111 ns) calls. Target introduces ossl_cshake_keccak_new (346 ns) wrapper consolidating initialization logic. Despite deeper call stack (4→5 levels), execution is 11.1% faster due to reduced branching overhead and better code organization.

Additional Findings

This refactoring is part of OpenSSL's provider architecture modernization, replacing macro-generated code with explicit, maintainable implementations. The changes affect SHA3 digest context creation—called once per digest operation in TLS handshakes, signature verification, and cryptographic hashing. While not the primary computational bottleneck (actual hashing dominates), the improvements contribute to overall efficiency in high-throughput scenarios. No GPU/ML operations affected; changes are CPU-only cryptographic provider code.

💬 Questions? Tag @loci-dev

@loci-dev loci-dev force-pushed the main branch 4 times, most recently from 421b135 to 770bf14 Compare April 28, 2026 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants