Web UI WS state machine: outbox + seq + ack + keepalive#172
Merged
seamus-brady merged 1 commit intomainfrom Apr 26, 2026
Merged
Web UI WS state machine: outbox + seq + ack + keepalive#172seamus-brady merged 1 commit intomainfrom
seamus-brady merged 1 commit intomainfrom
Conversation
Replaces the per-node monotonic seq (diagnostic-only) with a per-
client outbox keyed on stable client_id. Every server-pushed frame
is appended to that client's outbox before being shipped, the
client acks by seq, the server prunes acked frames. On reconnect
the client opens with `?since=N` and the server replays the gap
from the outbox; when the gap is too wide (frames pruned out) it
falls back to a full session_history rebuild.
Adds WS keepalive — server emits Ping every 25s; client replies
Pong. Idle proxies and OS-level NAT can no longer silently drop
the connection during long agent cycles.
Three operator-reported symptoms this fixes:
1. Long-running cycle, no live notification, refresh shows the
reply. Caused by silent WS dropouts (no keepalive); reply
delivered into a dead notify_subject. Now: keepalive prevents
the dropout, OR if it happens anyway, reconnect-with-since
replays the missed frames.
2. User-typed message disappears. Caused by user_message_ack
being dropped from inFlightUserMessages BEFORE cog had
persisted the message — race with reconnect-mid-cycle wiping
the DOM. Now: ack is purely a UI hint; in-flight tracking
stays until renderSessionHistory observes the message in the
server's authoritative view.
3. No live agent_progress / thinking / tool updates during
work. Caused by the same dropout as #1 — notifications fire
into a dead WS. Same fix: keepalive + outbox replay.
Implementation:
* `src/web/outbox.gleam` — pure ring buffer with seq, ack,
replay_since (UpToDate / Replay / TooOld), age-prune, size cap
* `src/web/client_registry.gleam` — OTP actor wrapping per-
client_id Outboxes; survives WS process death, hourly janitor
drops idle clients
* `src/web/gui.gleam` — `WsState` gains `client_id`, `registry`,
`keepalive_subject`. New `ws_send` helper routes every message
through `client_registry.append`; ~25 direct
`mist.send_text_frame` callsites converted. `?since=N` parsed
from URL; ws_on_init either spawns history query (fresh
connect or TooOld) or fires ReplayReady (gap fits in outbox).
New ClientMessage handlers for `Ack(seq)` and `Pong`. New
KeepaliveTick selector arm sends `Ping` and re-arms.
* `src/web/protocol.gleam` — `encode_server_message_with_seq`
takes explicit seq; `encode_server_message_body` returns
seq-less body for the outbox; `splice_seq` re-emits a stored
body with its original seq during replay. Old global
`monotonic_seq` FFI call removed.
* `src/web/html.gleam` — both ws_connect_js (chat + admin
pages) and the mobile page's connect() track lastSeenSeq in
sessionStorage, build wsUrl with `&since=N`, send periodic
`{type:"ack", seq:N}` every 5s, reply pong on `{type:"ping"}`.
user_message_ack handler no longer drops in-flight entries.
Tests: 2167 passing. New: 12 outbox tests covering append, ack,
replay (UpToDate / Replay / TooOld with boundary at oldest_kept-1),
size cap, age prune, end-to-end reconnect flow. Updated
seq_increments test to confirm the new "explicit seq pass-through"
contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the per-node monotonic
seq(diagnostic-only) with a per-client outbox keyed on stableclient_id. Every server-pushed frame is appended to that client's outbox before being shipped, the client acks byseq, the server prunes acked frames. On reconnect the client opens with?since=N; the server replays the gap from the outbox or falls back to a fullsession_historyrebuild when the gap exceeds the buffer.Adds WS keepalive — server emits
Pingevery 25s, client repliesPong. Idle proxies and OS-level NAT can no longer silently drop the connection during long agent cycles.Three reported symptoms this fixes
user_message_ackdropped frominFlightUserMessagesBEFORE cog persisted; race with reconnect-mid-cycle wiping the DOMrenderSessionHistoryobserves the message in the server's authoritative viewagent_progress/thinking/ tool updates during workWhat's in the box
src/web/outbox.gleam— pure ring buffer with seq, ack,replay_since(UpToDate / Replay / TooOld with boundary handling), age-prune, size cap. 12 unit tests.src/web/client_registry.gleam— OTP actor wrapping per-client_idOutboxes. Survives WS process death so reconnects under the same id replay correctly. Hourly janitor drops idle clients.src/web/gui.gleam—WsStategainsclient_id,registry,keepalive_subject. Newws_sendhelper routes every message throughclient_registry.append. ~25 directmist.send_text_framecall sites converted.?since=Nparsed from URL;ws_on_initeither spawns the history query (fresh / TooOld) or firesReplayReady(gap fits). NewAck(seq)andPongclient-message handlers.KeepaliveTickselector arm emitsPingand re-arms.src/web/protocol.gleam—encode_server_message_with_seqtakes explicit seq;encode_server_message_bodyreturns the seq-less body for the outbox;splice_seqre-emits a stored body with its original seq during replay. Old globalmonotonic_seqFFI removed;encode_server_messagekept as a back-compat alias that emits seq=0.src/web/html.gleam— bothws_connect_js(chat + admin) and the mobile page'sconnect()tracklastSeenSeqinsessionStorage, buildwsUrlwith&since=N, send periodic{type:"ack", seq:N}every 5s, reply pong on{type:"ping"}.user_message_ackhandler no longer drops in-flight entries.What is preserved
The full UI surface — uploads (POST /upload + attachment refs in user_message), all admin tabs (Narrative, Log, Scheduler, Cycles, Planner, D' Safety, D' Config, Comms, Affect, Skills, Memory, Documents), question/answer flow, history browsing, search, approve/reject — all unchanged. Every server message type retains its existing JSON shape; only the seq field's semantics changed (per-node → per-client).
Test plan
gleam buildcleangleam formatcleangleam test— 2190 passing (gained 12 outbox tests + 11 from concurrent main merges)gleam run— confirm boot, observe[outbox]debug entries on first WS connect🤖 Generated with Claude Code