Skip to content

perf(vanilla-io_uring): serve /static via static_assets (precompression + borrowed send)#951

Open
enghitalo wants to merge 2 commits into
MDA2AV:mainfrom
enghitalo:perf/vanilla-io_uring-static-borrowed-send
Open

perf(vanilla-io_uring): serve /static via static_assets (precompression + borrowed send)#951
enghitalo wants to merge 2 commits into
MDA2AV:mainfrom
enghitalo:perf/vanilla-io_uring-static-borrowed-send

Conversation

@enghitalo

@enghitalo enghitalo commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

What changed since the original

The original PR did a per-entry core.queue_buf borrowed send for the (identity-only) preloaded asset. This supersedes it: /static now goes through the lib's audited static_assets module, mounted at /static/ (enghitalo/vanilla#80 url_prefix), which both negotiates the precompressed .br/.gz sibling AND emits small assets via core.queue_buf borrowed send — the #75 mechanism, folded into the module (enghitalo/vanilla#81). One audited static path, shared with vanilla-epoll (#954).

Why

The /static handler preloaded identity only (skipping .br/.gz) and ignored Accept-Encoding, so it shipped the uncompressed body while the profile sends Accept-Encoding: br;q=1, gzip;q=0.8. Two wins now instead of one:

  1. precompression — ships the ~4× smaller .br body (the dominant fix; vanilla-io_uring currently collapses on static: ~3.5K rps / 213 MB at static-6800);
  2. borrowed direct send — the preloaded, immutable bytes are sent by the worker directly (core.queue_buf), never copied through the per-connection write buffer (the original PR's win, kept).

Drops the hand-rolled StaticFile / static_response / content_type.

Verification

Built clean (v -prod). The io_uring + static_assets + queue_buf path verified DB-free on the io_uring backend (the entry itself needs Postgres at boot, so this used a minimal harness on the same lib): /static/app.js + Accept-Encoding: brContent-Encoding: br, 47,275 B (vs 204,800 identity); components.css.br delivered byte-exact (33,268 B). For the epoll sibling the same module gives +83% rps / −52% bandwidth in a 2-core A/B (#954) — io_uring should see a larger jump since it starts from the broken-static baseline.

Pin

Bumps the vanilla pin to main + #80 (url_prefix) + #81 (queue_buf emit). Depends on those two (both open against enghitalo/vanilla; #80 already merged). The Dockerfile pins the #81 branch commit so CI builds today; it lands on vanilla main once #81 merges.

🤖 Generated with Claude Code

@enghitalo enghitalo marked this pull request as ready for review June 29, 2026 02:24
@enghitalo enghitalo force-pushed the perf/vanilla-io_uring-static-borrowed-send branch from 293a86a to c060506 Compare June 29, 2026 02:24
@enghitalo enghitalo changed the title [DRAFT/blocked] perf(vanilla-io_uring): serve static via queue_buf borrowed send perf(vanilla-io_uring): serve static via queue_buf borrowed send Jun 29, 2026
@enghitalo

Copy link
Copy Markdown
Contributor Author

/benchmark -f vanilla-io_uring

@github-actions

Copy link
Copy Markdown
Contributor

👋 /benchmark request received. A collaborator will review and approve the run.

@github-actions

Copy link
Copy Markdown
Contributor

Benchmark Results

Framework: vanilla-io_uring | Test: all tests

Test Conn RPS CPU Mem Δ RPS Δ Mem
limited-conn 4096 2,239,353 5030.2% 1.5GiB -0.5% +7.1%
json 4096 2,341,764 6550.1% 1.3GiB +3.1% +8.3%
json-comp 512 2,154,861 6054.4% 1.0GiB +0.9% +8.7%
json-comp 4096 2,881,726 6527.7% 1.3GiB +5.2% ~0%
json-comp 16384 2,759,968 6186.5% 1.8GiB +37.1% ~0%
upload 32 2,609 1823.8% 1.4GiB -1.0% +27.3%
upload 256 3,068 3618.3% 1.2GiB +2.5% +9.1%
api-4 256 28,703 355.0% 2.1GiB -0.9% +10.5%
api-16 1024 28,341 1489.6% 2.3GiB ~0% +9.5%
static 1024 456,287 4091.6% 1.0GiB +80.9% -28.6%
static 4096 432,376 6296.6% 1.2GiB +7032.6% -29.4%
static 6800 418,648 6384.2% 1.3GiB +11749.6% -27.8%
async-db 1024 10,562 5042.7% 1.7GiB -1.1% +13.3%
crud 4096 222,266 1443.6% 1.8GiB -1.1% ~0%
fortunes 1024 17 1439.7% 1.7GiB -58.5% +41.7%
Full log
[info] waiting for server...
[info] server ready

[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   48.53ms   17.20ms   39.10ms   171.70ms    5.00s

  1261432 requests in 15.00s, 1259960 responses
  Throughput: 83.98K req/s
  Bandwidth:  26.18MB/s
  Status codes: 2xx=1259960, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 1259960 / 1259960 responses (100.0%)
  Latency overflow (>5s): 4096
  Reconnects: 4317
  Per-template: 59744,62391,63489,66116,65385,65822,67612,67830,67357,66465,66263,65427,64008,64273,65591,65080,57220,50694,53889,55304
  Per-template-ok: 59744,62391,63489,66116,65385,65822,67612,67830,67357,66465,66263,65427,64008,64273,65591,65080,57220,50694,53889,55304
[info] CPU 602.9% | Mem 1.5GiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   18.89ms   14.90ms   32.70ms   171.60ms   204.90ms

  3232140 requests in 15.00s, 3232140 responses
  Throughput: 215.44K req/s
  Bandwidth:  68.17MB/s
  Status codes: 2xx=3232140, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 3232138 / 3232140 responses (100.0%)
  Reconnects: 14168
  Per-template: 147026,151021,158411,163600,164874,165470,167613,165014,164977,168927,167840,166164,165470,166739,168714,168889,163135,154821,148670,144763
  Per-template-ok: 147026,151021,158411,163600,164874,165470,167613,165014,164977,168927,167840,166164,165470,166739,168714,168889,163135,154821,148670,144763
[info] CPU 1352.9% | Mem 1.7GiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  200
  Templates: 20
  Expected:  200
  Duration:  15s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   18.37ms   14.60ms   32.30ms   179.20ms   215.50ms

  3333995 requests in 15.00s, 3333995 responses
  Throughput: 222.23K req/s
  Bandwidth:  70.30MB/s
  Status codes: 2xx=3333995, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 3333985 / 3333995 responses (100.0%)
  Reconnects: 14710
  Per-template: 152257,158457,162624,169377,170837,169675,168932,167936,170297,173311,172600,170331,170299,172247,169604,169729,169740,164888,156972,153872
  Per-template-ok: 152257,158457,162624,169377,170837,169675,168932,167936,170297,173311,172600,170331,170299,172247,169604,169729,169740,164888,156972,153872
[info] CPU 1443.6% | Mem 1.8GiB

=== Best: 222266 req/s (CPU: 1443.6%, Mem: 1.8GiB) ===
[info] input BW: 19.08MB/s (avg template: 90 bytes)
[info] saved results/crud/4096/vanilla-io_uring.json
httparena-bench-vanilla-io_uring
httparena-bench-vanilla-io_uring

==============================================
=== vanilla-io_uring / fortunes / 1024c (tool=gcannon) ===
==============================================
[info] resetting postgres for a clean per-profile baseline
[info] starting postgres sidecar
httparena-postgres
[info] postgres ready (seeded)
[info] waiting for server...
[info] server ready

[run 1/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency      0us      0us      0us      0us      0us

  0 requests in 5.00s, 0 responses
  Throughput: 0 req/s
  Bandwidth:  0B/s
  Status codes: 2xx=0, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 0 / 0 responses (0.0%)
[info] CPU 112.2% | Mem 817MiB

[run 2/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    3.04s    3.02s    4.70s    4.76s    4.76s

  38 requests in 5.00s, 38 responses
  Throughput: 7 req/s
  Bandwidth:  184.46KB/s
  Status codes: 2xx=38, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 38 / 38 responses (100.0%)
[info] CPU 1458.3% | Mem 1.3GiB

[run 3/3]
gcannon v0.5.3
  Target:    localhost:8080/fortunes
  Threads:   64
  Conns:     1024 (16/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    3.41s    4.11s    5.00s    5.00s    5.00s

  88 requests in 5.00s, 88 responses
  Throughput: 17 req/s
  Bandwidth:  427.16KB/s
  Status codes: 2xx=88, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 88 / 88 responses (100.0%)
  Latency overflow (>5s): 11
[info] CPU 1439.7% | Mem 1.7GiB

=== Best: 17 req/s (CPU: 1439.7%, Mem: 1.7GiB) ===
[info] saved results/fortunes/1024/vanilla-io_uring.json
httparena-bench-vanilla-io_uring
httparena-bench-vanilla-io_uring
[info] skip: vanilla-io_uring does not subscribe to baseline-h2
[info] skip: vanilla-io_uring does not subscribe to static-h2
[info] skip: vanilla-io_uring does not subscribe to baseline-h2c
[info] skip: vanilla-io_uring does not subscribe to json-h2c
[info] skip: vanilla-io_uring does not subscribe to baseline-h3
[info] skip: vanilla-io_uring does not subscribe to static-h3
[info] skip: vanilla-io_uring does not subscribe to gateway-64
[info] skip: vanilla-io_uring does not subscribe to gateway-h3
[info] skip: vanilla-io_uring does not subscribe to production-stack
[info] skip: vanilla-io_uring does not subscribe to unary-grpc
[info] skip: vanilla-io_uring does not subscribe to unary-grpc-tls
[info] skip: vanilla-io_uring does not subscribe to stream-grpc
[info] skip: vanilla-io_uring does not subscribe to stream-grpc-tls
[info] skip: vanilla-io_uring does not subscribe to echo-ws
[info] skip: vanilla-io_uring does not subscribe to echo-ws-pipeline
[info] rebuilding site/data/*.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/frameworks.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-16-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/api-4-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/async-db-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/crud-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/fortunes-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-16384.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/json-comp-512.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/limited-conn-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-1024.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-4096.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/static-6800.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-256.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/upload-32.json
[updated] /home/diogo/actions-runner/_work/HttpArena/HttpArena/site/data/current.json
[info] done
httparena-postgres
httparena-redis
[info] restoring loopback MTU to 65536

@enghitalo enghitalo marked this pull request as draft June 29, 2026 13:16
enghitalo added a commit to enghitalo/vanilla that referenced this pull request Jun 29, 2026
emit_into now hands a preloaded (small) asset's precomputed response to the worker
to send DIRECTLY (borrowed) when the backend supports it (io_uring core.queue_buf),
instead of always copying it through the per-connection write buffer. The bytes are
immutable for the server's lifetime, so borrowing is safe; queue_buf returns false
on any backend that can't borrow-send (epoll, TLS, non-Linux), where `out << response`
stays the path — so behavior is unchanged there.

This folds the io_uring static borrowed-send (the per-entry queue_buf in
MDA2AV/HttpArena#951) into the one audited static path, so both backends can serve
/static through static_assets: precompressed negotiation + ETag/Vary + sendfile(2)
for large bodies (epoll) + queue_buf borrowed send for small bodies (io_uring).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…on + borrowed send)

Supersedes the per-entry queue_buf change: /static now goes through the lib's
audited static_assets module (mounted at /static/ via url_prefix), which both
negotiates the precompressed .br/.gz sibling per Accept-Encoding AND emits small
assets via core.queue_buf borrowed send — the MDA2AV#75 mechanism, folded into the
module (enghitalo/vanilla#81) so it is the ONE static path shared with vanilla-epoll.

Two wins instead of one:
- precompression: the static profile sends `Accept-Encoding: br;q=1`, so this ships
  the ~4x smaller .br body instead of the raw file (the former identity-only map
  ignored Accept-Encoding) — the dominant fix for this profile;
- borrowed direct send: the preloaded, immutable bytes are sent by the worker
  directly (core.queue_buf), never copied through the per-connection write buffer.

Verified DB-free on the io_uring backend: /static/app.js + Accept-Encoding: br ->
Content-Encoding: br, 47,275 B; components.css.br delivered byte-exact (33,268 B).
Drops the hand-rolled StaticFile / static_response / content_type.

Bumps the vanilla pin to main + MDA2AV#80 (url_prefix) + MDA2AV#81 (queue_buf emit).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@enghitalo enghitalo force-pushed the perf/vanilla-io_uring-static-borrowed-send branch from c060506 to 2ff607a Compare June 29, 2026 14:36
@enghitalo enghitalo changed the title perf(vanilla-io_uring): serve static via queue_buf borrowed send perf(vanilla-io_uring): serve /static via static_assets (precompression + borrowed send) Jun 29, 2026
@enghitalo enghitalo marked this pull request as ready for review June 29, 2026 14:39
…V#81 merged)

Both lib deps are now on vanilla main: enghitalo/vanilla#80 (static_assets
url_prefix) and MDA2AV#81 (core.queue_buf borrowed send). Repin from the MDA2AV#81 branch
commit to the merged main commit. No entry change; rebuilds clean (v -prod).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@enghitalo

Copy link
Copy Markdown
Contributor Author

/benchmark -f vanilla-io_uring

@github-actions

Copy link
Copy Markdown
Contributor

👋 /benchmark request received. A collaborator will review and approve the run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant