Allocation-free decode: 4.5x faster, 29x less memory on escape-heavy input#3
Open
usiegj00 wants to merge 1 commit into
Open
Allocation-free decode: 4.5x faster, 29x less memory on escape-heavy input#3usiegj00 wants to merge 1 commit into
usiegj00 wants to merge 1 commit into
Conversation
…lation and regex
Profiling a production SMTP server (Crystal 1.19, escape-heavy Japanese
newsletter traffic) attributed three top allocation sites to this decoder:
- from_quoted_printable built a String per =XX escape via
"#{chars.next_char}#{chars.next_char}" (String::Builder + String each),
then ran a Regex match per escape (bstr =~ /\A[0-9A-fa-f]{2}\z/)
- decode_size made two full-body String#scan regex passes only to size
the output buffer
Changes:
- decode escapes with Char#to_i?(16) arithmetic: zero allocations per
escape, same accepted grammar (upper/lowercase hex, =CRLF soft break)
- decode_size returns the encoded bytesize as an upper bound; QP decoding
never expands and decode() already trims via Slice.new(buf, appender.size)
- replace the scaffold placeholder spec with 16 examples covering decode,
encode, soft/hard breaks, multi-byte UTF-8, lowercase hex, error cases,
and a 76-column wrap round-trip
Benchmark (34 KB escape-heavy body, --release):
before: 1.52 ms, 3.27 MB allocated per decode
after: 339 us, 112 KB allocated per decode (4.5x faster, 29x less)
usiegj00
added a commit
to aluminumio/crystal-mime
that referenced
this pull request
Jun 10, 2026
Upstream arcage/crystal-quotedprintable is dormant since May 2021. Our fork carries the allocation-free decode fix (4.5x faster, 29x less allocation on escape-heavy bodies) plus a real spec suite. Upstream PR: arcage/crystal-quotedprintable#3. Bump to 0.1.18.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi! We use this shard in a production SMTP server (via crystal-mime) processing escape-heavy quoted-printable bodies (Japanese newsletter traffic). Heap profiling attributed several of our top allocation sites to the decoder, so we optimized it and would like to contribute the fix upstream.
What was allocating
from_quoted_printablebuilt a String per=XXescape via"#{chars.next_char}#{chars.next_char}"(one String::Builder + one String each) and then ran a Regex match per escapedecode_sizeran two full-bodyString#scanregex passes just to compute the output buffer sizeChanges
Char#to_i?(16)arithmetic — zero allocations per escape, identical accepted grammar (upper/lowercase hex,=\r\nsoft breaks, sameInvalidEncodedDataerrors)decode_sizereturns the encoded bytesize as an upper bound; QP decoding never expands its input, anddecodealready trims the result viaSlice.new(buf, appender.size)false.should eq(true)) is replaced with 16 examples covering decode/encode, soft and hard line breaks, multi-byte UTF-8, lowercase hex digits, the escaped equals sign, error cases, and a 76-column wrap round-tripBenchmark
34 KB escape-heavy body,
--release:4.5x faster, 29x less allocation. No public API changes.
Happy to adjust anything — and thanks for the shard!