UPSTREAM PR #30967: Unicode codepoints fit in 32 bit values. #684
UPSTREAM PR #30967: Unicode codepoints fit in 32 bit values. #684loci-dev wants to merge 2 commits into
Conversation
instead of unsigned long
The symbol presence test fails for NO_DEPRECATED builds if you use modern CPP practices for definitions. This is the result of my accepting that doing so will be as PTSD inducing as walking into my parents bedroom at an inopportune time, and fixing it. Better me who has less time left to live with the mental trauma than a younger developer.
OverviewAnalysis of 20,084 functions across OpenSSL binaries reveals type safety refactoring (unsigned long → uint32_t for Unicode) with minimal performance impact. Modified: 14 functions (0.07%), New: 2, Removed: 0. Power Consumption:
Commits: e117289 "Unicode codepoints fit in 32 bits. Use uint32_t" and 6630c34 (Perl build adjustments) by Bob Beck. Function AnalysisSignificant Improvements:
Notable Regressions:
Source Code Correlation: Type migration to uint32_t enables 32-bit register optimizations and semantic correctness. API consolidation (UTF8_getc/putc → ossl_utf8_getc_internal/ossl_utf8_putc_internal) yields 11-18% per-call improvements. Changes confined to ASN.1 encoding module (certificate parsing, string conversion), not cryptographic hot paths. Other analyzed functions showed minor changes consistent with type safety improvements. Flame Graph ComparisonFunction: UTF8_getc@@OPENSSL_4.0.0 (libcrypto.so) - illustrates wrapper refactoring pattern Base version executes all UTF-8 decoding inline (88.6 ns self-time). Target delegates to Additional FindingsNo Critical Path Impact: All changes are in ASN.1 string processing (certificate parsing, PKCS#12 conversion), not in performance-critical cryptographic operations (AES, SHA, SSL record processing). Modified functions execute during certificate validation and import/export—infrequent operations where nanosecond changes are negligible compared to millisecond-scale I/O and validation costs. Code Quality Benefits: uint32_t provides correct Unicode semantics (max U+10FFFF fits in 21 bits), improves portability across 32/64-bit platforms, and enables compiler optimizations. Internal API consolidation supports OpenSSL 4.1 deprecation strategy while maintaining backward compatibility. 💬 Questions? Tag @loci-dev |
421b135 to
770bf14
Compare


Note
Source pull request: openssl/openssl#30967
Using "unsigned long" for these makes no sense, and has the result of using a value that is potentially
far too big on some platforms, and resulting in "always true" warnings on others if you check the range,
as noticed by @mbroz.
Instead of using the "whatever size the platform feels like" along with prayer and clean living, how about
we just use a uint32_t for everything.
The only vestiges of unsigned long then end up being the undocumented UTF8_[putc|getc] APIs,
which have been around since the days of yore when any function you wrote should be made public
because that was the way.. Therefore we deprecate these functions with no planned replacement,
if someone screams they must have them we can expose the internal uint32_t using interface in a
follow on, but let's see if this can be flensed.
Checklist