Skip to content

docs(benchmarks): reproducible gateway, PII, and evidence benchmarks (#119)#164

Merged
sergeyenin merged 1 commit into
mainfrom
docs/benchmarks-119
Jun 3, 2026
Merged

docs(benchmarks): reproducible gateway, PII, and evidence benchmarks (#119)#164
sergeyenin merged 1 commit into
mainfrom
docs/benchmarks-119

Conversation

@sergeyenin
Copy link
Copy Markdown
Contributor

@sergeyenin sergeyenin commented Jun 3, 2026

Summary

Closes the last P1 in credibility epic #108 — reproducible benchmarks for the README "< 15 ms excluding upstream" claim.

  • BenchmarkGatewayPipelineOverhead — full ServeHTTP path with OPA policy, PII scan, local mock upstream, response scan, signed evidence write (rate limits raised for bench stability).
  • make benchmarks / scripts/run-benchmarks.sh — runs gateway + existing BenchmarkPIIScan + BenchmarkEvidenceStore, prints a markdown table with go version, OS, CPU, commit, and raw go test lines.
  • docs/reference/benchmarks.md — methodology, scope, exclusions (WAN RTT, retry/fallback until Provider fallback chains (error-driven, sovereignty-respecting) #138/Retries with backoff (recorded as evidence fact) #139), interpretation guide.
  • Links from README, LIMITATIONS, docs index, and request-lifecycle throughput section.

Local sample (Apple M1 Max, make benchmarks): gateway ~5.5 ms/req, PII ~0.08 ms/scan, evidence ~1360 writes/s — under the 15 ms budget with mock upstream.

Test plan

  • make benchmarks succeeds
  • scripts/check-claim-discipline.sh passes
  • Gateway bench no longer hits 429 rate limit

Closes #119


Note

Low Risk
Documentation and benchmark harness only; no production gateway behavior changes beyond a new test with elevated rate limits for stability.

Overview
Adds a reproducible proof-bar for the README “under 15 ms excluding upstream” pipeline claim: operators run make benchmarks (or scripts/run-benchmarks.sh) to get a markdown table of gateway overhead, PII scan latency, and evidence write throughput on their machine.

New gateway benchmark BenchmarkGatewayPipelineOverhead exercises a full non-streaming ServeHTTP path against a local httptest upstream (OPA, PII, response scan, signed evidence), with rate limits raised so the bench does not 429. The runner aggregates that benchmark with existing BenchmarkPIIScan and BenchmarkEvidenceStore, records Go/OS/CPU/commit, and dumps raw go test lines.

docs/reference/benchmarks.md documents methodology, what is in/out of scope (no WAN RTT, no retry/fallback until Epic #113), and how to interpret results. LIMITATIONS and doc indexes now point at reproducible benchmarks instead of a vague “forthcoming” note; README and the request-lifecycle doc link make benchmarks alongside the optional docker/hey load harness.

Reviewed by Cursor Bugbot for commit fa7270f. Configure here.

…rks (#119)

Add BenchmarkGatewayPipelineOverhead (ServeHTTP with local mock upstream),
scripts/run-benchmarks.sh, and make benchmarks to emit a markdown table with
hardware metadata. Document methodology in docs/reference/benchmarks.md and
link from README, LIMITATIONS, and the request-lifecycle doc.

Closes #119
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.

Reviewed by Cursor Bugbot for commit fa7270f. Configure here.

if w.Code != http.StatusOK {
b.Fatalf("status %d: %s", w.Code, w.Body.String())
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gateway bench cost query drift

Medium Severity

BenchmarkGatewayPipelineOverhead times repeated ServeHTTP calls against one SQLite evidence store that grows every iteration. Each request runs callerCostTotals, which scans accumulating rows via CostByAgent, so measured ns/op rises during the run and overstates steady per-request overhead versus a fixed-size store.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit fa7270f. Configure here.

@sergeyenin sergeyenin merged commit 93c9d53 into main Jun 3, 2026
9 checks passed
@sergeyenin sergeyenin deleted the docs/benchmarks-119 branch June 3, 2026 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Publish reproducible benchmarks (overhead, evidence throughput, PII latency)

1 participant