Skip to content

feat: per-request HTTP middleware via WebSocket#37

Merged
tito merged 32 commits intomainfrom
feat/middleware-websocket
Apr 23, 2026
Merged

feat: per-request HTTP middleware via WebSocket#37
tito merged 32 commits intomainfrom
feat/middleware-websocket

Conversation

@tito
Copy link
Copy Markdown
Contributor

@tito tito commented Apr 6, 2026

Closes #36

Summary

External middleware services that can inspect, block, or rewrite HTTP requests and responses in real time, over a JSON protocol. Greyproxy handles all networking, TLS termination, and MITM cert generation; the middleware just sees structured JSON and returns decisions.

Two transports, same wire protocol:

  • Stdio — greyproxy spawns the middleware as a child process and talks over stdin/stdout (NDJSON). No port, no second terminal, greyproxy owns the lifecycle. Recommended for local deployments.
  • WebSocket — greyproxy dials the middleware over a persistent WS connection. For shared services, remote middleware, or languages where stdio framing is awkward.

Both are configured per-middleware via --middleware-cmd '…' (stdio) and --middleware ws://… (WS), repeatable and freely mixed; YAML equivalents under greyproxy.middlewares. Multiple middlewares cascade in declaration order; each sees the previous one's (possibly rewritten) output; deny/block short-circuits the chain.

What's in the protocol

  • hello exchange with version negotiation (min_version/max_version overlap), declared hooks, declared filters, optional friendly name, and a max_body_bytes opt-out for large bodies.
  • Hook types: http-request (pre-upstream) and http-response (post-upstream, with the original request inlined for context).
  • Filters evaluated inside greyproxy before serialization, so non-matching traffic has zero overhead. Supported: host (glob), path (regex), method, content_type (glob), container (glob), tls, and llm — the last one piggybacks on greyproxy's built-in LLM dissector mapping (Anthropic/OpenAI/Google/OpenRouter + user-defined providers), so adding a provider in the UI takes effect on the next request with no middleware restart.
  • Decisions: allow, deny, passthrough, block, rewrite. Rewrite headers go through a hop-by-hop + credential denylist (Authorization, Cookie, Set-Cookie, Host, Connection, Transfer-Encoding, …) so a buggy or compromised middleware cannot silently escalate auth or reroute requests.
  • Fail-closed by default: a missing/timed-out/crashed middleware causes the request to be denied (403) or the response blocked (502). Operators opt in to passthrough by setting on_disconnect: allow per middleware (recommended for observation-only middlewares).

Hook points

Four call sites wired through the proxy pipeline:

  • Plain HTTP request/response in the HTTP/1.1 handler
  • MITM request (Step 1.5) and response (Step 4a) in the sniffer
  • HTTP/2 path on the MITM pipeline

Response bodies are decompressed before being sent to the middleware, and Content-Encoding is stripped on rewrite so the client doesn't get a re-compressed-but-actually-plain body.

UI

  • New Settings → Middlewares tab listing every configured middleware (kind badge: ws / stdio, name, hooks, filters, last connect status).
  • New /api/middlewares endpoint backing it.
  • Activity rows now show a per-middleware badge (friendly name from hello, glyph encodes the action) for any request/response touched by a middleware. Plain HTTP transactions are recorded in Activity the same way MITM ones already were, so middleware effects are visible regardless of transport.

Reliability

  • Reconnect with exponential backoff (100 ms → 2 s, ±20 % jitter) on both transports.
  • A connection that stayed up ≥ 5 s before dropping resets the backoff, so a flapping child doesn't inherit the tail of the previous attempt.
  • Stdio child runs in its own process group; on respawn or proxy exit the whole subtree is killed, including grandchildren spawned by wrappers like uv run.
  • Default per-message timeout 10 s (was 2 s) — generous enough for middlewares that call out to an LLM/scanner; operators can lower it per middleware in YAML.

Examples

Seven Python examples under examples/, each a single file runnable with uv run middleware.py. All use a small shared helper (examples/_lib/greyproxy_middleware.py) that auto-detects the transport from GREYPROXY_TRANSPORT, so the same source runs unchanged under either stdio or WS.

Example What it does
middleware-passthrough-py Logs and allows everything. Copy as a template.
middleware-command-stripper-py Strips dangerous shell commands from LLM responses.
middleware-pii-redactor-py Bidirectional PII redaction; upstream LLM never sees real PII.
middleware-secret-scanner-py Blocks outbound requests containing leaked secrets.
middleware-cost-tracker-py Parses OpenAI/Anthropic token usage, logs cumulative spend.
middleware-audit-log-py Writes structured JSONL audit trail of every request/response.
middleware-rtk-compress-py Compresses noisy tool_result output via rtk to save context-window tokens.

Documentation

  • New docs/middleware.md covering both transports, the full wire protocol, version negotiation, filters, decision shapes, body handling, the three timeouts, the rewrite header denylist, and how to write a middleware in any language.
  • Settings → Middlewares tab includes an inline infobox pointing at the docs.

Test plan

  • go build ./... passes
  • go test ./... passes (existing tests unaffected)
  • Manual (stdio): `greyproxy serve --middleware-cmd 'uv run examples/middleware-passthrough-py/middleware.py'` — requests flow through and are logged with a name badge in Activity
  • Manual (ws): start passthrough in one terminal, `greyproxy serve --middleware ws://localhost:9000/middleware` in another — same behaviour
  • Manual (cascade): mix `--middleware-cmd` and `--middleware` — declaration order respected, deny short-circuits
  • Manual (fail-closed): kill the middleware mid-flight with default `on_disconnect` — request rejected with 403; flip to `on_disconnect: allow` — request flows through
  • Manual (rewrite denylist): middleware tries to set `Authorization` on rewrite — header dropped, warning logged
  • Manual (llm filter): middleware declares `filters: { llm: true }` — only LLM traffic dispatched; add a user-defined provider in the UI — next request matches without middleware restart
  • Manual: `--middleware http://…` normalised to `ws://…`

@tito tito closed this Apr 8, 2026
@tito tito reopened this Apr 9, 2026
tito added 28 commits April 14, 2026 11:44
Add support for an external middleware service that can inspect, block,
or rewrite HTTP requests and responses in real time over a persistent
WebSocket connection with JSON protocol.

New middleware package (internal/greyproxy/middleware/):
- types.go: wire message types (Hello, HookSpec, Decision, etc.)
- client.go: WebSocket client with multiplexing, reconnect, hello exchange
- filter.go: filter evaluation (glob, regex, exact match) with compiled cache

New hook points (internal/gostx/):
- proxy_hook.go: GlobalProxyRequestHook, GlobalProxyResponseHook (plain HTTP)
- mitm_hook.go: GlobalMitmRequestMiddlewareHook (Step 1.5),
  GlobalMitmResponseHook (Step 4a) for the MITM pipeline

Call sites:
- handler.go: request hook before RoundTrip, response hook after
- sniffer.go: request hook after hold hook / before credential substituter,
  response hook before writing to client / before round-trip hook

Wire-up:
- main.go: --middleware CLI flag with http->ws URL normalization
- program.go: config loading, client startup, all 4 hook registrations
- config.go: MiddlewareConfig struct

Includes docs/middleware.md and 6 Python example middleware (uv run):
passthrough, command stripper, PII redactor, secret scanner,
cost tracker, audit log.
The h2Handler.ServeHTTP (HTTP/2 code path) was missing middleware
request and response hooks. When clients connect via SOCKS5 to HTTPS
endpoints, the decrypted stream uses HTTP/2 and routes through
h2Handler, which only had the observability hook but not the middleware
hooks (GlobalMitmRequestMiddlewareHook, GlobalMitmResponseHook).
This meant WebSocket middleware never received http-response messages
for HTTP/2 traffic.
When the middleware rewrites a response body (e.g., stripping dangerous
commands), the new body is uncompressed plaintext. If the original
response had Content-Encoding (zstd, gzip, etc.), the header was
preserved, causing the client to fail decompression on the uncompressed
body. Now Content-Encoding and Transfer-Encoding are removed on rewrite.
The proxy now decompresses gzip/deflate/zstd response bodies before
sending them to the middleware WebSocket. Without this, compressed
responses (common with HTTP/2 clients like Node.js) were sent as raw
bytes that the middleware couldn't inspect, so pattern matching (e.g.,
dangerous command stripping) silently failed on compressed content.
Lets a middleware subscribe to LLM traffic with {"llm": true} in its
hello filter instead of duplicating greyproxy's host/path→decoder map.
The proxy evaluates EndpointRegistry.Match() on every hook invocation
(no caching, so a rule toggled in the UI takes effect on the next
request) and passes isLLM into MatchesFilter. Disabled registry rules
naturally do not match because Match() returns "" for them.

  nil   = no LLM gating
  true  = only requests the endpoint registry resolves to a decoder
  false = only non-LLM traffic
--middleware is now repeatable and the YAML key becomes `middlewares:`
(list). Multiple middlewares run sequentially: each sees the previous
one's (possibly rewritten) output as its input, and deny/block
short-circuits the chain.

Per hook type the wiring builds an ordered list of (client, filters)
pairs, and each global hook iterates it:
- allow/passthrough continues the cascade
- rewrite mutates working state so the next step sees the new version
- deny/block stops the chain and returns immediately

Request-side cascades mutate req in place (plain HTTP and MITM).
Response-side cascades track a working (status, headers, body) tuple
and flush it back via the returned decision, since the MITM response
hook receives its info struct by value and the plain HTTP response
path applies rewrites through the decision struct.

Decision gains a Tags map[string]any field (structlog-style) so a
middleware can emit per-request metadata on any action. Tags are
preserved per middleware with no cross-middleware merging, which
matters for the upcoming Activity integration.
Non-trivial middleware decisions (deny, block, rewrite, or silent
allow/passthrough carrying structured tags) from the MITM response
cascade now surface in the Activity view. Scope is intentionally
limited to MITM response for this first pass; request-side denies
and the plain-HTTP path can follow.

Correlation is the key challenge: the middleware response hook and
the round-trip persistence hook both fire for the same request but
are separate callbacks with no shared key. A short request id is now
generated once at the top of the sniffer's httpRoundTrip() and
threaded through HTTPRoundTripInfo → MitmRoundTripInfo so both hooks
see the same id. The middleware cascade stashes row-worthy events in
a per-process map keyed by request id; the persistence hook drains
and writes them to middleware_events with the freshly-created
transaction id. A TTL sweeper reaps orphan buckets if a request never
reaches the persistence hook.

Schema (migration 13): middleware_events with composite
(transaction_kind, transaction_id) index for the cheap join. Only
mutating actions or tag-emitting decisions produce a row, so silent
middlewares stay invisible and don't bloat the table.

Activity rendering: QueryActivity does one extra query per page to
load events for the fetched rows and attach them to ActivityItem;
the activity table shows per-event badges next to the URL and lists
them grouped by middleware in the detail panel with diff summaries,
durations, and raw tags.
Plain-HTTP traffic (non-TLS upstreams, local dev servers, HTTP-only LLM
endpoints) was invisible in Activity: only the TCP connection rows were
logged and nothing ever populated http_transactions for this path. The
middleware cascade already ran on these requests and could deny or
rewrite them, but since no transaction row existed, the new Phase C
Activity integration had nothing to attach middleware events to.

This adds the symmetric piece:

- New ProxyRoundTripInfo + GlobalProxyRoundTripHook in gostx, fired
  after the response has been written back to the client. Body capture
  is gated on the hook being set, so the default path has no new
  overhead when nobody consumes it.

- The plain-HTTP handler generates a RequestID once at the top of
  proxyRoundTrip() and stores it in ctx via gostx.WithRequestID. Both
  the middleware cascade hooks and the round-trip hook read it from
  ctx to correlate decisions with the transaction row.

- The plain-HTTP request and response middleware cascades in program.go
  now stash middleware_events (deny/block/rewrite and tagged
  allow/passthrough) under the RequestID, matching the MITM response
  cascade behavior.

- program.go installs GlobalProxyRoundTripHook unconditionally. It
  calls CreateHttpTransaction with the captured request/response data,
  drains any stashed middleware events and writes them, then publishes
  EventTransactionNew so the Activity live feed updates.

Net effect: a plain HTTP request through greyproxy now produces an
http_transactions row, shows up in Activity with method/url/status/
duration, and surfaces middleware event badges identically to a
MITM-intercepted HTTPS request.
A middleware can now return an optional "name" field in its hello
response. The proxy stores it alongside the URL and displays it in
the Activity view instead of the raw ws:// URL, which was unreadable
once more than one middleware was in the cascade.

- middleware.HelloMsg gains Name; the Client stores it and exposes
  Name() alongside HookSpecs() / MaxBodyBytes().
- program.go captures it into clientHook and threads it through every
  stash site (plain HTTP request/response + MITM response cascades).
- Migration 14 adds middleware_events.middleware_name as a nullable
  column. WriteMiddlewareEvent writes it; LoadMiddlewareEventsForActivity
  reads it.
- MiddlewareEventSummary.DisplayLabel() prefers the name and falls back
  to the URL, so middlewares that didn't upgrade still render sensibly.
- Activity UI shows the friendly label in the row badge ("rtk-compress:
  rewrite") and in the detail panel, keeping the URL as a tooltip and
  as parenthetical text next to the name.
- rtk-compress example declares name: "rtk-compress".
- docs/middleware.md describes the field as optional-but-recommended.
Follows up the middleware-name work. The activity row badge now shows
the friendly name declared in the middleware's hello (or the URL as
fallback), so a cascade with multiple middlewares reads as
"✎ rtk-compress" rather than "rewrite". The action type is encoded
redundantly via both the background color and a small unicode glyph:

  ✗ red   -- deny / block
  ✎ amber -- rewrite
  ♯ blue  -- tagged-allow / tagged-passthrough

The action text moves into the tooltip ("rewrite by rtk-compress
(ws://localhost:9000/middleware)") where it is still discoverable but
does not crowd the row.
User feedback: the ws:// URL is noise in the UI. The badge tooltip no
longer includes it, and the detail panel no longer appends it in
parentheses after the name. The middleware URL is still stored in
middleware_events.middleware_url for provenance (API / debugging),
but the Activity view shows only the friendly name.

Events whose middleware did not declare a name still fall back to the
URL as the display label (via MiddlewareEventSummary.DisplayLabel),
so anonymous middlewares remain identifiable.
The destination td had a blanket `truncate` class: overflow hidden,
single line, ellipsis. Fine when the td contained only the URL, but
the middleware event badges appended after the URL were getting
clipped by overflow:hidden whenever the URL pushed the row width to
the column edge.

Restructure the HTTP branch as a flex container: credentials icon
(shrink-0) + URL (truncate, min-w-0) + middleware badges (shrink-0).
The URL shrinks with ellipsis, the badges stay visible on the right.
Connection branch gets its own inner truncate div so the existing
host text keeps its truncation behavior unchanged.
Two related gaps surfaced once the rtk-compress middleware was run
against real Claude Code HTTPS traffic to api.anthropic.com:

1. The MITM request middleware hook rewrote bodies successfully but
   never stashed middleware_events rows, so no badge ever showed in
   Activity for HTTPS traffic. Plain HTTP (request + response) and
   MITM response cascades stashed; only MITM request didn't.

2. The MITM path only carried the RequestID inside HTTPRoundTripInfo,
   not via ctx. The MITM request hook receives `(ctx, req, container)`
   with no info struct, so it had no way to read the id that Phase C
   assumes is the correlation key.

Both are fixed by:

- Moving NewRequestID / WithRequestID / RequestIDFromContext down into
  the sniffing package so both the sniffer (generator) and gostx
  (consumer) share the same unexported ctx key. gostx/proxy_hook.go
  becomes a thin wrapper that delegates to sniffing. The handler/http
  and cmd/greyproxy call sites keep using gostx.* unchanged.

- Sniffer's httpRoundTrip() now also writes the id to ctx immediately
  after generating it, so every downstream hook (hold, middleware
  request, middleware response, round-trip persist) sees the same id
  via RequestIDFromContext.

- MITM request cascade in program.go now stashes events for deny /
  rewrite / tagged-allow / tagged-passthrough under the RequestID,
  matching the three other cascades. The MITM response persist hook
  already drains pending events after CreateHttpTransaction, so these
  new events land on the same transaction row.

- The rtk-compress example middleware was also updated as part of the
  same investigation: LOG_CMD narrowed back to commands that produce
  severity-tagged output (tail/journalctl/dmesg/less/more/*.log),
  pick_mode returns None again for unknown shapes, and rtk_compress
  special-cases `rtk log` to omit the `-` arg (which rtk treats as a
  literal filename for the log subcommand, unlike json/diff).
Documents the rtk tool-output compressor example and ships a reproducible
test setup (fake Anthropic server + client) that measures the before/after
byte delta to prove the rewrite path is wired up end-to-end.
main.go had a struct-field misalignment that failed gofmt; the feedback
file was committed to this branch by mistake (it concerns an unrelated
Greywall project).
…aders

Addresses a cluster of correctness, security, and simplicity issues found
during PR review. Each one individually was small; together they change the
semantics operators should rely on, so the doc updates are part of the same
commit.

client.go

- Hello type validation returned a nil err on mismatch (`return err` after
  a successful ReadJSON), so a server replying with a wrong type silently
  succeeded. Now returns a real error and the connection is dropped.
- The read loop previously killed the entire connection on any JSON
  unmarshal error, which drained every in-flight request to a default
  decision. Now malformed frames are logged and skipped; only transport
  errors trigger reconnect.
- Send() held the client-wide mutex across the WebSocket write, so a slow
  peer stalled reads of pending/hooks. Writes now use a dedicated writeMu.
- Pending entries track whether they were a request or response, so
  drainPending() returns the correct default action (block/passthrough
  for response, deny/allow for request) instead of always emitting a
  request-shaped deny.
- Decision gains a Fallback field (json:"-") carrying the reason when the
  Decision was synthesised locally. Cascades log this at warn so operators
  can distinguish "middleware allowed" from "middleware was down".

Fail-closed default

- OnDisconnect now defaults to "deny" rather than "allow". A policy
  middleware (secret scanner, PII redactor) that crashes or is unreachable
  should not let traffic flow through silently. Advisory-only middleware
  (audit, cost tracker) must set on_disconnect: allow explicitly, which
  the docs now frame as a deliberate opt-in.

Rewrite header denylist

- A middleware's `rewrite` decision previously merged into req.Header and
  resp.Header with no filter, so a compromised middleware could overwrite
  Authorization, Cookie, or Host. MergeRewriteHeaders strips hop-by-hop
  headers (RFC 7230 §6.1) and credential/identity headers before applying.
  Rejected keys are logged.
- Response rewrites also drop Content-Encoding when a fresh body is
  supplied, so the next cascade step doesn't try to gunzip plaintext.

Unknown actions

- IsKnownAction is checked on every decision; unknown actions still fall
  through to allow/passthrough (safest default for forward compatibility)
  but now emit a warn log naming the middleware and action. One typo
  shouldn't silently bypass policy without a trace.

Response hook had request_body always empty

- The plain-HTTP response cascade read RequestBodyFromContext(ctx), but
  no code ever called WithRequestBody, so ResponseMsg.RequestBody was
  always nil. ProxyRequestHook now returns (ctx, decision) and the
  request cascade stashes the captured body on ctx so the response
  cascade can include it.

Filter cache leak

- filterCache was a global `map[*HookFilter]*compiledFilter` keyed by
  pointer identity. On reconnect the hello response produced a fresh
  HookFilter pointer, so the cache grew indefinitely. Compiled regexes
  now live on the HookFilter itself behind a sync.Once and are GC'd with
  the filter.

Refactor

- The four near-duplicate cascades in program.go (~400 lines) are now one
  runRequestCascade + one runResponseCascade. The transport-specific hook
  entry points (plain HTTP / MITM, request / response) translate to the
  neutral cascade result type, so a fix in the iteration logic applies to
  all four paths at once.
All six non-rtk examples now send a "name" field in their hello response.
Greyproxy uses this for activity badges; without it, the rows show the
full ws:// URL instead, which is noisy. rtk already had this.
New tests lock in the guarantees the PR review surfaced:

- Hello type validation: a server replying with the wrong type must not
  mark the client ready.
- Fallback actions: request-hook timeout returns deny (default) or allow
  (opt-in); response-hook timeout returns block (default) or passthrough
  (opt-in); Fallback reason is set so callers can log it.
- Drain on disconnect: an in-flight ResponseMsg Send gets a
  response-shaped default (block) not a request-shaped one (deny).
- Malformed frame: a garbage WebSocket frame is skipped without dropping
  the connection; a later valid decision still reaches the waiting Send.
- Header denylist: MergeRewriteHeaders refuses Authorization, Cookie,
  Host, Set-Cookie, and hop-by-hop headers (case-insensitively) while
  applying safe headers. This is the security-critical regression guard
  against a compromised middleware escalating credentials.
- Filter match semantics: host glob with leading *. wildcard, path regex
  compiled-once caching, LLM gate, and content-type parameter stripping.
- NewID uniqueness and hex shape.
- ActionForTimeoutKind / IsKnownAction / BodyChanged helpers.

main_test.go installs a no-op logger so the cascade fallback paths don't
nil-panic under `go test` (the binary installs a real logger via
logger.SetDefault; tests have to do it themselves).
The old backoff went 100ms → 10s doubling and never reset across the outer
for loop. Once the cap was reached (after ~7 disconnects), every subsequent
restart-reconnect-restart cycle sat at 10s. Middleware development flows
(auto-reload, container restart) suffered the most.

Three changes:

- Cap lowered from 10s to 2s. An LLM request at default timeout_ms=2s
  doesn't benefit from a longer reconnect window; the request has already
  fallen back by then.
- Backoff resets to the initial 100ms when the previous connection was
  up for at least 5 seconds ("healthy"). A working middleware that
  restarts now reconnects within a few hundred ms, not seconds.
- Added ±20% jitter so multiple greyproxy instances (or multiple
  middlewares behind the same outage) don't reconnect in lockstep.

Docs: clarify the three distinct timeouts (hello 5s, per-message timeout_ms,
reconnect backoff) — the old doc only mentioned the hello one in passing.

Test added: asserts jitter stays inside the ±20% envelope.
2 seconds is too tight once the middleware is non-trivial. Real policy
middlewares regularly offload their decision to another LLM (PII
classification, prompt-injection detection, policy evaluation on model
output), or to a slow local scanner. 2s was an artefact of treating the
middleware as a pure-regex predicate; the protocol supports more than that.

10s gives LLM-offloaded middlewares a realistic budget without feeling
unbounded. Operators whose middleware is purely local can drop `timeout_ms`
as low as they like in YAML — the docs now flag this as the shape of good
config. CLI-only middlewares take the default.

The test TestClient_DefaultTimeoutGenerous pins the default so a future
revert has to touch the test too.

Config.TimeoutMs default is now resolved inside middleware.New rather than
duplicated in buildMiddlewareConfigs. Single source of truth; YAML values
still override when present.
Option B from the review: middlewares declare a [min_version, max_version]
range in their hello response, the proxy picks the highest integer in the
overlap of that range and [1, ProtocolVersion]. No overlap refuses the
connection with a readable error naming both ranges. Omitting both bounds
is equivalent to declaring [1,1], so every example middleware already in
the repo keeps working without any wire change.

Why bother now while ProtocolVersion is still 1:

- The mechanism has to exist *before* we bump. If we ship v2 without
  negotiation, every middleware in the wild silently sees the wrong shape
  until its author reads a changelog. With negotiation, a v2 proxy
  connecting to a v1-only middleware picks v1 (if v1 is still supported)
  or refuses the connection with a clear message (if v1 has been retired)
  rather than hanging on a field the middleware never filled in.
- Agreed version is logged at connect, so operators can see which version
  each middleware negotiated without guessing from code.

Examples updated to declare min/max explicitly — acts as documentation
and as a pin against a future proxy that retires v1.

Tests cover the full matrix plus the backwards-compat path (omitted
bounds) and the refused-connection path (middleware requires v>proxy).
Settings page now has a "Middlewares" tab listing every configured
middleware with a live connection state badge, the URL, the negotiated
protocol version, its declared hooks, and the effective timeout_ms /
on_disconnect policy. Read-only: middleware configuration is owned by
CLI flags and greyproxy.yml, not the runtime store, so this page does
not offer mutation.

The UI hits GET /api/middlewares which returns the current snapshot.
Plumbing:

- middleware.Client gets URL() / TimeoutMs() / OnDisconnect() /
  IsConnected() getters. IsConnected reads c.conn!=nil under the mu
  lock so the flag tracks reconnects without an event bus.
- greyproxy.MiddlewareStatus is the wire/struct shape for the API.
- api.Shared gains a MiddlewareStatusesFn closure field; the api
  package stays free of any middleware-package import (no cycle).
- cmd/greyproxy sets the closure after creating the clients. Each
  call to the handler runs the closure fresh, so a middleware that
  goes down surfaces immediately without UI state drifting.
- A "Refresh" button reloads on demand; switching to the tab also
  triggers a load.

Smoke-tested end to end: connected state flips from true to false when
the upstream middleware is killed, without restarting greyproxy.
Operators can now point greyproxy at a command instead of a URL:

    greyproxy serve --middleware-cmd 'uv run ./mw.py'

Greyproxy spawns the child, owns its lifecycle, and talks NDJSON on
stdin/stdout. Reconnection and fallback decisions are identical to the
WebSocket path — the child crashing triggers the same exp-backoff
respawn that a WS disconnect does. Same wire protocol, same hello
exchange, same version negotiation, same header denylist, same
per-message timeout. The only difference is framing.

Rationale: every existing example middleware ships its own WebSocket
server boilerplate (~30 lines), makes the operator manage a port, and
requires two terminals. For local single-host deployments that's
friction with no upside. The stdio path matches how MCP servers are
typically launched and reduces "start my middleware" to one flag.

Transport layout:

- internal/greyproxy/middleware/transport.go introduces a Transport
  interface (WriteMessage / ReadMessage / Close) and two
  implementations: wsTransport (gorilla, extracted from the previous
  inline code, no logic change) and stdioTransport (exec.CommandContext,
  bufio.Scanner on stdout, bounded stderr forwarder into the logger).
- Client is now transport-agnostic: New() picks the dialer based on
  whether Config.URL or Config.Command is set, and the rest of the
  client (hello, pending map, Send, drain, fallback) doesn't care.
- stdioTransport.Close closes stdin first (so a well-behaved child
  exits on EOF), waits stdioCloseGrace (2s), then SIGKILLs. Prevents
  zombies when the proxy exits.
- Hello timeout now works on any transport: readMessageWithTimeout
  runs ReadMessage in a goroutine and closes the transport to unblock
  it on timeout. Previously we relied on WS-specific SetReadDeadline.

Config + CLI:

- Config gains Command []string and Name string. Exactly one of URL or
  Command must be set; YAML entries with both are skipped with a
  warning.
- splitCommand parses --middleware-cmd with shell-like rules (quotes,
  backslash escapes) but never invokes a shell. Operators who need
  shell features pass "sh -c '...'" explicitly.
- MiddlewareStatus gains Kind ("ws" | "stdio") so the UI can
  distinguish the two in the Middlewares tab.

Tests:

- TestSplitCommand covers quoting, escapes, leading/trailing
  whitespace, unterminated quotes, empty input.
- TestStdioTransport_HelloAndDecision re-execs the test binary as a
  fake middleware (main_test.go reads GREYPROXY_FAKE_MW and acts
  accordingly) so the full spawn→hello→request→decision→close cycle
  runs without depending on Python or a separate fixture binary.
- TestStdioTransport_ChildExit_TriggersReadError pins the "middleware
  died mid-conversation → ReadMessage returns error" behaviour that
  triggers the client's reconnect loop.
- TestStdioTransport_CloseKillsChild asserts the SIGKILL path fires
  within the grace window.

Smoke-tested end to end: same Python passthrough middleware used under
both --middleware ws://... and --middleware-cmd 'uv run mw.py', plus
a blocking scenario through the secret-scanner.
examples/_lib/greyproxy_middleware.py is a small shared library (no
external deps beyond the existing websockets for ws mode) that hides
the transport from middleware authors. The author writes two functions:

    def handle_request(msg): ...
    def handle_response(msg): ...
    run(name="my-mw", handle_request=handle_request, handle_response=handle_response)

run() picks the transport at launch time. If GREYPROXY_TRANSPORT=stdio
is in the env (set by greyproxy when it spawns the child), the helper
speaks NDJSON on stdin/stdout. Otherwise it starts a WebSocket server
on $GREYPROXY_WS_PORT (default 9000). The handler code is identical.

Important stdio property: stdout is the protocol. The helper redirects
all logging to stderr on startup and replaces any preinstalled handlers
so a middleware that configured its own logger can't accidentally
corrupt the wire by writing to stdout.

Helper also ships the decision builders (allow, deny, rewrite_request,
passthrough, block, rewrite_response, decode_body) that were
copy-pasted into every previous example. Handlers that raise are
caught and fall back to allow/passthrough so the stream survives one
bad request.

Rewrote two examples to use the helper:

- middleware-passthrough-py: the canonical template. Drops from ~150
  lines (inline WS server + duplicated decision helpers) to ~45 lines
  of actual logic.
- middleware-secret-scanner-py: demonstrates a policy middleware. Also
  smaller, and now emits tags on block so the operator sees which
  pattern matched in the Activity view.

Other examples (pii-redactor, command-stripper, cost-tracker, audit-log,
rtk-compress) are intentionally left in their inline form for now —
they stand as proof that the older pattern keeps working. They can
migrate later; no urgency because the wire protocol is identical.
The Middlewares tab in Settings now shows a purple "stdio" or blue "ws"
badge next to each middleware's name, so an operator can tell at a
glance whether a given entry is a child process owned by greyproxy or
an external WebSocket service. The URL column already showed
"stdio:<cmd>" for spawned children; the badge is the quick visual cue.

Small layout tweak alongside: name + kind pill are now in a flex group
on the left so they stay together when the name wraps.
…tdio

Three issues surfaced when trying `--middleware-cmd 'uv run examples/
middleware-rtk-compress-py/middleware.py'`:

1. Zombie children across respawns.

   A command like `uv run mw.py` is a wrapper — uv forks the real
   Python interpreter as a grandchild. stdioTransport.Close was only
   signalling t.cmd.Process (uv), so when uv died, Python got
   reparented to init and kept holding whatever ports it had bound.
   Next respawn failed with "address already in use" and the cycle
   repeated forever.

   Fix: set Setpgid=true on the exec.Cmd so the child starts its own
   process group; kill -pgid on SIGKILL so the whole subtree dies
   together. Split into transport_unix.go / transport_windows.go
   because SysProcAttr.Setpgid is unix-only. Verified end-to-end:
   greyproxy→uv→python tree vanishes on SIGTERM to greyproxy, port
   that was held by the grandchild is released immediately.

2. rtk example still embedded its own WebSocket server.

   Only passthrough + secret-scanner were ported to the helper
   library in the previous commit; rtk still had inline
   asyncio+websockets code. When spawned via --middleware-cmd, it
   opened port 9000 instead of speaking NDJSON on stdout, greyproxy's
   hello-read timed out, and the respawn loop hit issue (1).

   Fix: rewrite rtk to import `run`/`allow`/`rewrite_request`/
   `decode_body` from examples/_lib and call run() at module scope.
   Core logic (pick_mode, rtk_compress, Anthropic/OpenAI walkers) is
   unchanged. Also removed three leftover `print()` debug statements
   that would corrupt the stdout frame stream in stdio mode.

3. Cosmetic: "middleware connected" log was showing `url=` empty for
   stdio entries because it read clientURLs[i] (the config URL, which
   is empty for command: entries) instead of c.URL() (the endpoint
   string the client actually uses). And the stderr prefix showed
   `mw[?]` because the CLI --middleware-cmd flag doesn't supply a
   Name. The dialer now falls back to filepath.Base(command[0])
   (`mw[uv]`) until the real name arrives in the hello, and the
   connected-log uses c.URL() / c.Kind().
audit-log, command-stripper, cost-tracker, pii-redactor now use
examples/_lib/greyproxy_middleware.py the same way passthrough,
secret-scanner, and rtk-compress already do. The author writes
handle_request / handle_response and calls run(); the library picks
stdio or WebSocket transport based on how greyproxy launched it.

Every example in the repo now supports both transports without code
changes. Three additional wins from the conversion:

- Inline asyncio + websockets boilerplate (roughly 40 lines per file)
  is gone. Each example is now the decision logic and almost nothing
  else.
- cost-tracker, command-stripper, and pii-redactor emit tags on every
  mutating decision (cost.model / cost.usd / command-stripper.flags /
  pii.redacted / pii.restored). These show up in the Activity UI
  badges and in stashed event metadata, so operators can see per-
  request what a middleware did without reading the middleware's own
  logs.
- All stray print() calls are gone. In stdio mode stdout is the
  protocol, so any print() corrupts frames; the helper forces logging
  to stderr but handler code still has to avoid print().

Smoke-tested all four under --middleware-cmd: each negotiates v1,
declares the expected hooks, and surfaces in /api/middlewares with
kind=stdio, connected=true.
…dleware

The page still read as if WebSocket was the only option: the opening
paragraph said "connects over a persistent WebSocket", the Overview
diagram labelled the edge "JSON/WS", Quick Start started with
`uv run middleware.py` in one terminal and `--middleware ws://...` in
another, and the "Writing a middleware" section still described the
middleware as a WebSocket server. Since stdio is now the preferred
launch path for local middlewares, the whole framing needs to land
the transport choice early and present stdio first in every example.

Changes:

- Rewrote the opening summary around "two transports, same wire
  protocol, pick one per middleware". Explicit guidance: stdio for
  local, WS for shared/remote.
- Neutralised the Overview diagram: message arrows are JSON, the
  transport annotation sits alongside.
- Quick Start leads with --middleware-cmd one-liner, then WS.
- Examples table updated to seven entries (rtk-compress was missing)
  and the preamble now explains that the shared helper makes every
  example dual-transport.
- Configuration section: --middleware-cmd introduced alongside
  --middleware, with guidance on shell semantics (no sh -c, argv
  split with shell-like quoting only).
- YAML sample shows a command: entry first, then a url: entry, each
  with the config fields a real operator actually cares about
  (name, on_disconnect, auth_header).
- Connection lifecycle: mentions both transports; adds the stdio-
  specific note about process-group ownership so operators
  understand why the child tree exits cleanly when greyproxy does.
- Writing a middleware: removed the "WebSocket server" framing;
  now leads with the Python helper (run(handle_request=...)) and
  has a separate "Other languages" subsection listing the wire
  requirements for stdio and WS.
- Multiple s/WebSocket/transport/ in places where the text said
  "WebSocket" but meant either framing.
- Fixed the "6 examples" count → 7.

No protocol change; this is a docs-only reframing.
tito added 2 commits April 22, 2026 17:42
The info block at the top of the Middlewares tab said middlewares
"connect over a persistent WebSocket" and only pointed operators at
--middleware — now wrong, since stdio is the preferred launch path for
local deployments. Rewritten to:

- lead with "two transports, same JSON protocol"
- describe each transport with its CLI flag and YAML shape in a short
  bulleted list
- link out to docs/middleware.md for the full protocol

Also updated the "no middlewares configured" empty-state hint so a
first-time operator sees both options.

Rendered HTML verified: stdio mentioned, --middleware-cmd mentioned,
stale "over a persistent WebSocket" string gone.
Previous version had a bulleted list explaining each transport and a
link to docs/middleware.md. That's mini-documentation, and the docs
aren't hosted online yet anyway. Reduced to a single sentence naming
the three ways to configure (--middleware-cmd, --middleware, yaml) and
a reminder that the list is read-only at runtime. Detailed guidance
belongs in docs/middleware.md once it has a URL.
@tito tito force-pushed the feat/middleware-websocket branch from 0bb8fc2 to 505606e Compare April 22, 2026 23:57
@tito tito marked this pull request as ready for review April 22, 2026 23:58
tito added 2 commits April 22, 2026 18:07
…cket

# Conflicts:
#	internal/gostx/internal/util/sniffing/sniffer.go
Discard errors via `_ =` on defer Close() and `go func() { _ = c.Start() }()`
so golangci-lint passes; switch one os.Setenv in a *testing.T helper to
t.Setenv. No behaviour change.
@tito tito merged commit 8bbf32f into main Apr 23, 2026
3 checks passed
@tito tito deleted the feat/middleware-websocket branch April 23, 2026 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: per-request HTTP middleware via WebSocket

1 participant