Allocation-free decode: 4.5x faster, 29x less memory on escape-heavy input by usiegj00 · Pull Request #3 · arcage/crystal-quotedprintable

usiegj00 · 2026-06-10T07:29:36Z

Hi! We use this shard in a production SMTP server (via crystal-mime) processing escape-heavy quoted-printable bodies (Japanese newsletter traffic). Heap profiling attributed several of our top allocation sites to the decoder, so we optimized it and would like to contribute the fix upstream.

What was allocating

from_quoted_printable built a String per =XX escape via "#{chars.next_char}#{chars.next_char}" (one String::Builder + one String each) and then ran a Regex match per escape
decode_size ran two full-body String#scan regex passes just to compute the output buffer size

Changes

Escapes are decoded with Char#to_i?(16) arithmetic — zero allocations per escape, identical accepted grammar (upper/lowercase hex, =\r\n soft breaks, same InvalidEncodedData errors)
decode_size returns the encoded bytesize as an upper bound; QP decoding never expands its input, and decode already trims the result via Slice.new(buf, appender.size)
The scaffold placeholder spec (false.should eq(true)) is replaced with 16 examples covering decode/encode, soft and hard line breaks, multi-byte UTF-8, lowercase hex digits, the escaped equals sign, error cases, and a 76-column wrap round-trip

Benchmark

34 KB escape-heavy body, --release:

	before	after
per decode	1.52 ms / 3.27 MB allocated	339 µs / 112 KB allocated

4.5x faster, 29x less allocation. No public API changes.

Happy to adjust anything — and thanks for the shard!

…lation and regex Profiling a production SMTP server (Crystal 1.19, escape-heavy Japanese newsletter traffic) attributed three top allocation sites to this decoder: - from_quoted_printable built a String per =XX escape via "#{chars.next_char}#{chars.next_char}" (String::Builder + String each), then ran a Regex match per escape (bstr =~ /\A[0-9A-fa-f]{2}\z/) - decode_size made two full-body String#scan regex passes only to size the output buffer Changes: - decode escapes with Char#to_i?(16) arithmetic: zero allocations per escape, same accepted grammar (upper/lowercase hex, =CRLF soft break) - decode_size returns the encoded bytesize as an upper bound; QP decoding never expands and decode() already trims via Slice.new(buf, appender.size) - replace the scaffold placeholder spec with 16 examples covering decode, encode, soft/hard breaks, multi-byte UTF-8, lowercase hex, error cases, and a 76-column wrap round-trip Benchmark (34 KB escape-heavy body, --release): before: 1.52 ms, 3.27 MB allocated per decode after: 339 us, 112 KB allocated per decode (4.5x faster, 29x less)

Upstream arcage/crystal-quotedprintable is dormant since May 2021. Our fork carries the allocation-free decode fix (4.5x faster, 29x less allocation on escape-heavy bodies) plus a real spec suite. Upstream PR: arcage/crystal-quotedprintable#3. Bump to 0.1.18.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allocation-free decode: 4.5x faster, 29x less memory on escape-heavy input#3

Allocation-free decode: 4.5x faster, 29x less memory on escape-heavy input#3
usiegj00 wants to merge 1 commit into
arcage:masterfrom
aluminumio:perf/allocation-free-decode

usiegj00 commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

usiegj00 commented Jun 10, 2026

What was allocating

Changes

Benchmark

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant