Skip to content

Allocation-free decode: 4.5x faster, 29x less memory on escape-heavy input#3

Open
usiegj00 wants to merge 1 commit into
arcage:masterfrom
aluminumio:perf/allocation-free-decode
Open

Allocation-free decode: 4.5x faster, 29x less memory on escape-heavy input#3
usiegj00 wants to merge 1 commit into
arcage:masterfrom
aluminumio:perf/allocation-free-decode

Conversation

@usiegj00

Copy link
Copy Markdown

Hi! We use this shard in a production SMTP server (via crystal-mime) processing escape-heavy quoted-printable bodies (Japanese newsletter traffic). Heap profiling attributed several of our top allocation sites to the decoder, so we optimized it and would like to contribute the fix upstream.

What was allocating

  • from_quoted_printable built a String per =XX escape via "#{chars.next_char}#{chars.next_char}" (one String::Builder + one String each) and then ran a Regex match per escape
  • decode_size ran two full-body String#scan regex passes just to compute the output buffer size

Changes

  • Escapes are decoded with Char#to_i?(16) arithmetic — zero allocations per escape, identical accepted grammar (upper/lowercase hex, =\r\n soft breaks, same InvalidEncodedData errors)
  • decode_size returns the encoded bytesize as an upper bound; QP decoding never expands its input, and decode already trims the result via Slice.new(buf, appender.size)
  • The scaffold placeholder spec (false.should eq(true)) is replaced with 16 examples covering decode/encode, soft and hard line breaks, multi-byte UTF-8, lowercase hex digits, the escaped equals sign, error cases, and a 76-column wrap round-trip

Benchmark

34 KB escape-heavy body, --release:

before after
per decode 1.52 ms / 3.27 MB allocated 339 µs / 112 KB allocated

4.5x faster, 29x less allocation. No public API changes.

Happy to adjust anything — and thanks for the shard!

…lation and regex

Profiling a production SMTP server (Crystal 1.19, escape-heavy Japanese
newsletter traffic) attributed three top allocation sites to this decoder:

- from_quoted_printable built a String per =XX escape via
  "#{chars.next_char}#{chars.next_char}" (String::Builder + String each),
  then ran a Regex match per escape (bstr =~ /\A[0-9A-fa-f]{2}\z/)
- decode_size made two full-body String#scan regex passes only to size
  the output buffer

Changes:
- decode escapes with Char#to_i?(16) arithmetic: zero allocations per
  escape, same accepted grammar (upper/lowercase hex, =CRLF soft break)
- decode_size returns the encoded bytesize as an upper bound; QP decoding
  never expands and decode() already trims via Slice.new(buf, appender.size)
- replace the scaffold placeholder spec with 16 examples covering decode,
  encode, soft/hard breaks, multi-byte UTF-8, lowercase hex, error cases,
  and a 76-column wrap round-trip

Benchmark (34 KB escape-heavy body, --release):
  before: 1.52 ms, 3.27 MB allocated per decode
  after:  339 us,  112 KB allocated per decode  (4.5x faster, 29x less)
usiegj00 added a commit to aluminumio/crystal-mime that referenced this pull request Jun 10, 2026
Upstream arcage/crystal-quotedprintable is dormant since May 2021.
Our fork carries the allocation-free decode fix (4.5x faster, 29x
less allocation on escape-heavy bodies) plus a real spec suite.
Upstream PR: arcage/crystal-quotedprintable#3. Bump to 0.1.18.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant