feat(vanilla-epoll): add json-tls profile (HTTP/1.1 over TLS 1.3)#952
feat(vanilla-epoll): add json-tls profile (HTTP/1.1 over TLS 1.3)#952enghitalo wants to merge 5 commits into
Conversation
Wires the json-tls profile: the existing /json/{count}?m={m} endpoint
served over TLS on :8081, the only gap that kept vanilla off the json-tls
board. The JSON serializer is reused verbatim, so responses are byte-for-byte
identical to the plaintext /json (validated).
main.v
- Extract a transport-agnostic `write_json_into(ro, mut out, count, m)` from
the WorkerCtx serializer; the plaintext method now delegates (no behavior
change). Content-Length is precomputed from the same values the body emits,
so the framed length can never desync (no response-splitting surface).
- Add a stateless TLS request_handler that serves ONLY /json (404 elsewhere),
keeping the TLS port's attack surface minimal. It captures the read-only
`ro`; no make_state (avoids the TLS worker's stateful path).
- Start a second http_server on TLS_PORT (env, default 8081) with
tls_config from /certs/server.{crt,key} (TLS_CERT/TLS_KEY overridable);
fails loud if the cert is present but the key is missing, self-signs only
when no cert is mounted. Spawned before the blocking plaintext run().
Dockerfile
- Build Mbed TLS 4.1.0 from the pinned release tarball and compile with
`-d vanilla_tls`; runtime carries the shared libs (incl. libtfpsacrypto,
the 4.x TF-PSA-Crypto split) + ldconfig. EXPOSE 8081.
- THREAD-SAFETY (load-bearing): enable MBEDTLS_THREADING_C +
MBEDTLS_THREADING_PTHREAD. vanilla runs N TLS worker threads, all driving
TLS 1.3 handshakes through Mbed TLS's GLOBAL PSA key store; without the
threading mutex the concurrent handshakes race on key slots -> a
heap-use-after-free under load (ASan: psa_wipe_key_slot frees a slot
another thread is mid-memcpy in psa_hmac_setup; crashes ~c1024 with
"double free or corruption"). With it, the load that 100%-crashed now
survives clean (c1024 137k rps in-container, c1536 ok).
meta.json: add "json-tls" to tests.
Validated in-container with the real mounted cert: all 3 validate.sh
json-tls assertions pass (HTTP/1.1 + ALPN http/1.1, item schema +
total==price*quantity*m on 7:2/23:11/50:1, Content-Type application/json),
byte-for-byte parity with plaintext /json, keep-alive, and no heap
corruption under concurrent TLS load.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
/benchmark -f vanilla-epoll |
|
👋 |
Benchmark ResultsFramework:
Full log |
Bump the vanilla lib pin 15bd57e -> 7ca36f6 (perf/ktls-record-offload), which adds kTLS TX+RX after the Mbed TLS handshake: the kernel does TLS 1.3 record AES-128-GCM, so the epoll TLS worker's steady-state read/write become plain recv()/send(). This removes ALL per-record userspace crypto and the per-record PSA key-store mutex that capped the userspace path. The handshake still runs in Mbed TLS (amortized by keep-alive); the suite is pinned to TLS_AES_128_GCM_SHA256 and TLS 1.3 tickets are disabled so the kernel record sequence starts at 0. The 5 changed TLS files are identical at 15bd57e and lib HEAD, so this pin is exactly "15bd57e + kTLS" — a clean A/B vs the userspace json-tls already in this PR. Key derivation was cross-checked byte-identical against OpenSSL HKDF; the crypto_info layout/split/rec_seq verified against linux/tls.h + RFC 8446 §7. On a host without the `tls` kernel module it falls back cleanly to userspace TLS (the one-time stderr log prints `[ktls] engaged` or `[ktls] fallback`). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Update: json-tls now runs over kTLS (kernel record offload)Pushed a follow-up that bumps the vanilla lib pin Why: the prior userspace run landed at 512k rps (~76% TLS tax). Root cause: Mbed TLS 4's PSA crypto uses a process-global key store, and Clean A/B: the 5 changed TLS files are byte-identical at Local validation (this dev box has no
A one-time stderr line logs 🤖 Generated with Claude Code |
|
/benchmark -f vanilla-epoll |
|
👋 |
Benchmark ResultsFramework:
Full log |
kTLS result: json-tls 512k → 1,502,783 rps (2.93×)The kTLS pin engaged on the bench host — json-tls jumped from the userspace 512,243 to 1,502,783 rps, cutting the TLS tax from ~76% to ~30% (plain json is 2.15M) at slightly lower CPU (6163% vs 6473%). That lands vanilla-epoll in the json-tls top tier (just under swerver's 1.52M), up from mid-pack. Also note This PR is now a solid, mergeable result: correct json-tls (validate passes), fast via kTLS where the Still ~21% behind the io_uring leaders (zix 1.90M / sark 1.83M / ioxide 1.79M). The remaining gap is structural, not crypto: epoll's single-acceptor + per-edge syscalls vs io_uring multishot/batched, plus the 4096-handshake startup storm still going through Mbed TLS's global PSA mutex. Closing it (SO_REUSEPORT per-worker accept to parallelize the handshake storm + accept) is a separate follow-up. 🤖 Generated with Claude Code |
Behavior-neutral cleanup of the kTLS lib commit: extract a ktls_recv helper (symmetry with ktls_send) and use a single goto-cleanup in vtls_enable_ktls that scrubs all userspace key material on every path. Same kTLS behavior that benchmarked at 1.50M rps json-tls; re-pinned so the benchmarked code == the final reviewed code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…2AV#79) kTLS record offload landed on the vanilla lib main via enghitalo/vanilla#79, so re-point the pin from the (now-merged) feature commit to main @5137a9a. Same code that benchmarked json-tls at 1.50M rps (2.93x over userspace). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
/benchmark -f vanilla-epoll --save |
|
👋 |
Benchmark ResultsFramework:
Full log |
What
Wires the json-tls profile for
vanilla-epoll: the existing/json/{count}?m={m}endpoint served over HTTP/1.1 + TLS 1.3 on:8081. This was the only gap keeping vanilla off the json-tls board — the JSON path already existed and is correct; this PR adds the TLS transport and declares the test.The JSON serializer is reused verbatim, so json-tls responses are byte-for-byte identical to the plaintext
/json(verified).How
main.vwrite_json_into(ro, mut out, count, m); the plaintextWorkerCtxmethod now delegates (no behavior change). Content-Length is precomputed from the same values the body emits → the framed length can never desync (no response-splitting surface).request_handlerthat serves only/json(404 elsewhere), keeping the TLS port's surface minimal. It captures the read-onlyro; nomake_state(sidesteps the TLS worker's stateful path).http_serveronTLS_PORT(env, default8081) withtls_configfrom/certs/server.{crt,key}(TLS_CERT/TLS_KEYoverridable). Fails loud if the cert is present but the key is missing; self-signs only when no cert is mounted.Dockerfile-d vanilla_tls. Runtime carries the shared libs (incl.libtfpsacrypto, the 4.x TF-PSA-Crypto split) +ldconfig.EXPOSE 8081.MBEDTLS_THREADING_C+MBEDTLS_THREADING_PTHREAD. vanilla runs N TLS worker threads, all driving TLS 1.3 handshakes through Mbed TLS's global PSA key store; without the threading mutex the concurrent handshakes race on key slots → a heap-use-after-free under load (ASan:psa_wipe_key_slotfrees a slot another thread is mid-memcpyinpsa_hmac_setup; crashes ~c1024 withdouble free or corruption). With it enabled, the load that 100%-crashed now survives clean.meta.json: add"json-tls"totests.Validation
Built the image and ran it with the repo cert bind-mounted at
/certs:validate.shjson-tls assertions: protocol negotiation (HTTP/1.1, ALPNhttp/1.1over TLS 1.3), response schema +total == price*quantity*mon7:2 / 23:11 / 50:1,Content-Type: application/json./jsonoutput identical to plaintext/json.num_connects=1, connection reused) — the profile's defining property.🤖 Generated with Claude Code