Bulletproof pre-publish polish: SDK fixes, server hardening, /metrics, integration tests by anfocic · Pull Request #1 · anfocic/pagetally

anfocic · 2026-05-08T12:12:17Z

Summary

Final pass before publishing the pagetally npm package. SDK gets correctness fixes; server gets the middleware stack a public-internet OSS deploy actually needs; integration tests run against a real Postgres in CI.

Client SDK (`client/`)

stripQueryParams now strips #hash in addition to ?query
auto-track listens for hashchange so hash-routed SPAs emit pageviews
TTFB is now also captured via a navigation PerformanceObserver (the synchronous read at init returned 0 if the entry wasn't ready yet)
bfcache restore (pageshow.persisted) now fires a fresh pageview, so we no longer emit a pageleave with no matching pageview
npm hygiene: sideEffects: false, engines.node >=18, publishConfig.access: public, README/LICENSE moved into prepack so npm pack reproduces the published tarball

Server hardening (`server/`)

Per-IP rate limiting via tower_governor + SmartIpKeyExtractor — /collect 60 burst / ~120 rpm, /contact 3 burst / ~5 rpm
STATS_ORIGINS env scopes CORS for /stats/*; /collect keeps Any
Security headers always on: CSP default-src 'none'; frame-ancestors 'none', X-Content-Type-Options, Referrer-Policy, X-Frame-Options. HSTS gated on BEHIND_TLS=1
15s TimeoutLayer; graceful shutdown on SIGTERM/SIGINT drains in-flight inserts
Per-field input validation on /collect (site_id ≤64, path ≤2048, referrer ≤253, event name ≤64), NaN/Inf scrubbed from performance metrics, x-country constrained to ASCII alpha
x-request-id injected if absent, propagated if the caller sends one
LOG_FORMAT=json flips tracing-subscriber to one-line JSON for log shippers
GET /metrics (axum-prometheus) for Prometheus scraping; unauthenticated by convention, README spells out the firewall expectation

Tests

12 unit tests (validation, email)
13 integration tests via #[sqlx::test] against a fresh Postgres per test — covers /health, /collect happy path + rejection paths + body-limit, /stats/* with and without admin token, response-header surface, request-id propagation, /metrics text format, pageleave clamping
46 client vitest tests pass

Observability

Structured JSON logs (LOG_FORMAT=json)
/metrics Prometheus endpoint
scripts/loadtest.sh wraps oha for quick collect-burst, collect-spread, stats-read smoke tests

Reference numbers (release build, local Postgres)

Scenario	Throughput	p99
`/collect` single-IP burst	~71k rps	<3 ms
`/stats/summary` reads	~20k rps	~5 ms

Treat as floors, not guarantees.

Test plan

cargo test — 12 unit + 13 integration green
cargo clippy --all-targets -- -D warnings clean
cargo audit clean (RUSTSEC-2023-0071 documented inert via server/.cargo/audit.toml)
npm test — 46/46
npm pack --dry-run — 13.8 KB, 9 files
Local end-to-end: server boots, /health, /metrics, /collect from SDK all verified
CI green on this branch

Client SDK fixes: - stripQueryParams now strips URL fragment (#hash) in addition to ?query, so hash-route state can't leak into stored paths - auto-track listens for hashchange so hash-routed SPAs emit pageviews - TTFB now also observed via PerformanceObserver navigation entries; the synchronous read at init returned 0 if the nav entry wasn't ready yet - Analytics fires a fresh pageview on bfcache restore (pageshow.persisted), so we no longer emit a pageleave with no matching pageview - performance.ts: hoist cleanup ahead of flush to avoid TDZ-fragile capture Package hygiene for npm: - sideEffects: false for tree-shaking - engines.node >=18 - publishConfig.access: public - move README/LICENSE copy from prepublishOnly to prepack so npm pack also reproduces a complete tarball - include README.md and LICENSE in the files allowlist alongside dist/ Docs: - README: operator hardening checklist (ADMIN_TOKEN, ALLOWED_SITES, rate limit at proxy, x-country trust, access log retention) - README: privacy notes warning consumers about path-segment PII and arbitrary track() props - env examples: revert ALLOWED_SITES to commented-out default, matching the public-OSS plug-and-play story

…isory Add per-field validation at the ingest boundary so the 16KB body limit can't be spent stuffing one giant value into an indexed column, and so malformed metric floats can't poison stats queries: - site_id: 1..=64 chars - path: 1..=2048 chars - referrer: <=253 chars - event name: 1..=64 chars - performance metrics: NaN / +-Infinity dropped (Postgres percentile_cont panics on NaN, and the values are useless either way) - x-country: require ASCII alpha, length 2 (previously any 2-byte unicode would slip through and be uppercased) Cover the new logic with unit tests in `types::tests`. Also add `server/.cargo/audit.toml` ignoring RUSTSEC-2023-0071. The `rsa` Marvin-attack advisory reaches us only via `sqlx-mysql`, which we disable at compile time (`sqlx` is built with `default-features = false` and only the `postgres` driver). `cargo audit` reads `Cargo.lock` and flags the unused crate anyway. The note documents the rationale and flags revisiting once `rsa >= 0.10` ships.

…shutdown Defense-in-depth pass. Everything ships on by default with sensible limits; operators can dial it down via env. Per-IP rate limiting (tower_governor with SmartIpKeyExtractor): - /collect: 60 burst, ~120 req/min steady — generous because SPAs fire pageleave + pageview close together on every route change - /contact: 3 burst, ~5 req/min — tight, this endpoint sends real email through Resend and is the obvious abuse target - Reads X-Forwarded-For / X-Real-IP first, falls back to TCP peer via ConnectInfo<SocketAddr>. Make sure a reverse proxy sets one of those. CORS: - /collect keeps Any origin (browsers POST from every embedding site) - /stats/* honors STATS_ORIGINS env (comma-separated list). Defaults to Any to avoid breaking existing setups, but the README now nudges operators to lock it down to their dashboard origin. Security response headers (always on): - X-Content-Type-Options: nosniff - Referrer-Policy: no-referrer - X-Frame-Options: DENY HSTS gated on BEHIND_TLS=1 so it doesn't ship on plain-HTTP dev hosts. Reliability: - 15s request timeout via TimeoutLayer — bounds tail latency on a slow Postgres so the pool can't get pinned forever. - Graceful shutdown on SIGTERM/SIGINT via with_graceful_shutdown so in-flight inserts get a chance to drain instead of being killed mid- query during a deploy. Config: - New env vars STATS_ORIGINS, BEHIND_TLS documented in README and deploy/pagetally.env.example. Tests: cargo test 12 pass, clippy --all-targets -D warnings clean, cargo audit clean (modulo the previously-documented inert RUSTSEC-2023-0071).

Integration tests (server/tests/api.rs, gated on DATABASE_URL via the sqlx::test macro that builds a fresh DB per test from migrations): - /health responds 200 ok - /collect inserts the expected row for a pageview - /collect rejects an unknown site_id when ALLOWED_SITES is set (403) - /collect rejects an oversize path (400) - /collect rejects a request that blows the 16KB body cap - /stats/* gives 401 without and with the wrong bearer token - /stats/summary returns rows once events land for the requested site - pageleave dur is server-clamped to the 30-minute ceiling - security headers (CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy) and x-request-id are emitted on every response - when the caller already sets x-request-id, we propagate it instead of rolling a new uuid (request-correlation across reverse proxies) Middleware additions in lib.rs: - Content-Security-Policy: default-src 'none'; frame-ancestors 'none'. Every response is JSON or plain text — nothing we serve should ever load subresources or execute script, and 'frame-ancestors' restates the X-Frame-Options posture for browsers that drop the legacy header. - SetRequestIdLayer + PropagateRequestIdLayer using MakeRequestUuid so every request has a stable id in logs and the same id is echoed back to the caller. - TraceLayer customized with explicit make_span / on_response at INFO so structured access logs are predictable instead of relying on the defaults silently changing across tower-http versions. Cargo: tower-http gains the request-id and util features. tower gets its util feature for ServiceExt::oneshot in tests. Adds http-body-util as a dev-dep for body-collecting in tests. Verification: cargo test now runs 12 unit + 12 integration tests (24 total) green against a real Postgres. clippy --all-targets -D warnings clean. CI already provisions Postgres so this lights up without further config.

Logging: - LOG_FORMAT=json switches the tracing subscriber from pretty text to one-line JSON records, the form most log shippers (Loki, Vector, Datadog, CloudWatch) ingest directly. Each line carries a timestamp, level, target, message, and any structured fields (including the per-request span id). Default stays text so a `cargo run` is still human-readable. Load testing (scripts/loadtest.sh, requires oha): - collect-burst sustained POST /collect from one IP — confirms the rate-limit kicks in (lots of 429s) without buckling - collect-spread parallel runs from distinct X-Forwarded-For values to exercise the real ingest path - stats-read hammer /stats/summary, optionally with ADMIN_TOKEN Reference numbers from a release build against local Postgres are captured in the README so we can tell when a change regresses throughput by an order of magnitude. They are smoke-test floors, not throughput guarantees. README: new env vars (LOG_FORMAT, RUST_LOG) added to the configuration table and a Load testing section pointing at the script.

axum-prometheus drops a tower layer onto every route, recording the standard HTTP histogram (request count, latency, status, route label) into a process-global recorder, and serves the textfile-format snapshot on GET /metrics. Operators get latency / error rate / per-route traffic without bolting on anything else. Two router constructors so tests stay parallel-safe: - router(state) — no metrics, used by all integration tests - router_with_metrics(state) — installs the Prometheus recorder and the /metrics route. Only main.rs calls it, and exactly once per process. The recorder is global (metrics-rs design); calling pair() twice would panic. The split keeps test isolation clean without resorting to a OnceLock dance. /metrics is intentionally unauthenticated — Prometheus convention is to expose it on an internal interface and rely on the network boundary. The README spells this out so an operator who forgets to firewall it isn't surprised. Tests: 13 integration tests now (added one that asserts /metrics renders the expected Prometheus textfile).

anfocic added 6 commits May 8, 2026 12:37

anfocic merged commit 18b4e8d into master May 9, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulletproof pre-publish polish: SDK fixes, server hardening, /metrics, integration tests#1

Bulletproof pre-publish polish: SDK fixes, server hardening, /metrics, integration tests#1
anfocic merged 6 commits into
masterfrom
chore/oss-publish-polish

anfocic commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anfocic commented May 8, 2026

Summary

Client SDK (client/)

Server hardening (server/)

Tests

Observability

Reference numbers (release build, local Postgres)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Client SDK (`client/`)

Server hardening (`server/`)