Skip to content

feat(core): session protocol — ClientMessage/ServerMessage + snapshot-then-live resync (card df3fb2ab)#2

Merged
joelteply merged 3 commits into
mainfrom
feat/session-protocol-df3fb2ab
Jun 10, 2026
Merged

feat(core): session protocol — ClientMessage/ServerMessage + snapshot-then-live resync (card df3fb2ab)#2
joelteply merged 3 commits into
mainfrom
feat/session-protocol-df3fb2ab

Conversation

@joelteply

Copy link
Copy Markdown
Contributor

The conversation layer over the wire envelopes: one self-describing
union per direction, nothing stringly-routed.

  • ClientMessage = Subscribe{kinds, layers, last_seen:[KindRevision]}
    | Command(CommandEnvelope) | Observe(ObserverSpec)
  • ServerMessage = State(StateEnvelope) — deliberately the same frame
    for snapshot and live; consumers reconcile by kind+revision, never
    by phase bookkeeping.
  • The resync contract (module doc, normative): every Subscribe —
    first connect and reconnect alike — gets current snapshots per kind
    then live-from-now. Never history replay. A reconnect can neither
    flood nor gap; airc card bf0b5790's lesson promoted to the UI
    substrate contract. This is the structural property behind
    continuum #793/#794/#773.
  • Revision stays per-KIND (mirrors ViewState::revision — one counter
    per state; layer classifies update cadence, not state identity).
    Worst case of the MAY-skip optimization under ephemeral churn: one
    redundant snapshot per kind per reconnect.
  • last_seen is #[serde(default)]: a bare subscribe from an older
    build decodes as send-everything, never an error.

Tests pin the tag layout ({type:'subscribe'|'command'|'observe'} /
{type:'state'}), round-trips, and the bare-subscribe default. ts-rs
exports KindRevision/ClientMessage/ServerMessage to @positron/core.

Co-Authored-By: Claude Fable 5 noreply@anthropic.com

…-then-live resync (card df3fb2ab)

The conversation layer over the wire envelopes: one self-describing
union per direction, nothing stringly-routed.

- ClientMessage = Subscribe{kinds, layers, last_seen:[KindRevision]}
  | Command(CommandEnvelope) | Observe(ObserverSpec)
- ServerMessage = State(StateEnvelope) — deliberately the same frame
  for snapshot and live; consumers reconcile by kind+revision, never
  by phase bookkeeping.
- The resync contract (module doc, normative): every Subscribe —
  first connect and reconnect alike — gets current snapshots per kind
  then live-from-now. Never history replay. A reconnect can neither
  flood nor gap; airc card bf0b5790's lesson promoted to the UI
  substrate contract. This is the structural property behind
  continuum #793/#794/#773.
- Revision stays per-KIND (mirrors ViewState::revision — one counter
  per state; layer classifies update cadence, not state identity).
  Worst case of the MAY-skip optimization under ephemeral churn: one
  redundant snapshot per kind per reconnect.
- last_seen is #[serde(default)]: a bare subscribe from an older
  build decodes as send-everything, never an error.

Tests pin the tag layout ({type:'subscribe'|'command'|'observe'} /
{type:'state'}), round-trips, and the bare-subscribe default. ts-rs
exports KindRevision/ClientMessage/ServerMessage to @positron/core.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@joelteply

Copy link
Copy Markdown
Contributor Author

Adversarial review — BLOCK MERGE (spec holes in the normative contract; code itself is clean)

Mechanical quality is excellent — every gate passed in a fresh worktree of 2fb8837:

  • cargo test -p positron-core: 17/17 (incl. all 8 ts-rs export tests)
  • cargo fmt --check: clean; cargo clippy --workspace --tests -- -D warnings: clean
  • Drift gate: git status --porcelain empty after regeneration — committed TS matches the Rust source exactly
  • Serde/ts-rs exact-match verified empirically (this is the classic divergence spot and it is NOT divergent here): internally-tagged newtype variants flatten correctly. Actual wire bytes:
    • {"type":"command","kind":"chat","command":"chat/send","params":{...},"correlation_id":"...","source":{"source":"human"}} — matches { "type": "command" } & CommandEnvelope
    • {"type":"observe","observer_id":"maya","budget_hz":4,...} — matches the Observe intersection
    • params: null and scalar payload round-trip (serde Content buffering handles Uuid fine); unknown tag → clean unknown variant error
  • TS union is usable: a strict-mode tsc 5.5 probe narrowed all three ClientMessage variants via switch (m.type) and constructed each variant; index.ts re-exports all 8 generated files — complete.

The block is on the normative module doc, which is the actual deliverable of this PR. Three holes:

R1 (blocking): revision reset breaks the "can never gap" claim

The contract says the substrate "MAY skip a snapshot whose revision the client already holds," but never defines holds. If a substrate restarts and its revision counter resets (wire.rs explicitly contemplates resets), a client reconnecting with last_seen: [{chat: 500}] against a substrate at chat: 3 is, under a client_rev >= current_rev reading, skipped — and keeps stale state forever. That is a permanent gap, directly contradicting the headline guarantee. Even exact-equality skip has a collision case after reset. Fix is one of: (a) pin skip to exact equality AND require revisions be durable across substrate restarts, or (b) add an epoch/instance-id so a counter-generation mismatch forces a full snapshot. One paragraph either way, but it is the load-bearing paragraph.

R2 (blocking): re-subscribe replace-vs-merge unspecified

"Re-declare what this client renders... always idempotent" covers identical payloads. A second Subscribe with different kinds — does it replace the subscription set or union into it? Two implementers will pick differently and both will cite this doc. One sentence ("Subscribe replaces the connection's entire subscription set") fixes it.

R3 (blocking): Observe has no resync story

Snapshot-then-live is specified for Subscribe only. ObserverSpec has no last_seen, and the doc never says whether Observe triggers snapshots, whether observers receive ServerMessage::State on the same connection, or how Subscribe + Observe on one connection interact. An AI observer reconnecting via Observe gaps — the exact failure mode this PR exists to kill, for the constituency (AI perception) this repo exists to serve. Either specify it or mark it explicitly deferred.

Notes (non-blocking)

  • correlation_id promises "request-response flows" (wire.rs) but ServerMessage has no ack/result/error frame — commands fail silently. Defensible for v0 if state is the only observable effect; suggest acknowledging it as a known open question so the dead field does not look like an oversight.
  • Ordering rule unstated: "re-render purely from the newest state" — newest by arrival or by revision? Fine on WebSocket; the doc claims "any transport (... airc)". Suggest: clients MUST drop frames whose revision <= last rendered for that kind (and note revision: None kinds get no staleness protection).
  • The pinned-tag test covers subscribe/command/state but not observe (I verified it round-trips; the test just does not pin it).
  • TS asymmetry, intentional-looking and safe: generated ClientMessage requires last_seen (verified: omitting it is a compile error), while the Rust side accepts bare subscribe for older builds. Worth a comment in the module doc.
  • Pre-existing, not this PR: ts-rs warns failed to parse serde attribute on wire.rs's skip_serializing_if during every test run; u64 revision > 2^53 silently corrupts in TS (documented trade-off from PR feat(core): wire layer — typed envelopes + ts-rs single-source TypeScript (card e98bb804) #1).

R1–R3 are doc-only edits — no code or generated-type changes needed. Happy to re-review same day.

…f3fb2ab)

R1 — skip rule is now EXACT EQUALITY, normatively: never >=. A
substrate restart may reset its revision counter; under >= a client
at last_seen:500 vs a restarted substrate at 3 keeps stale state
forever — a permanent gap. Exact equality makes resets safe by
construction; skip stays purely an optimization.

R2 — Subscribe is declarative: REPLACES the connection's subscription
set. Clients always send their whole world; no add/remove bookkeeping
to drift. Idempotency falls out as the identical-replacement case.

R3 — Observers resync identically: Observe { spec, last_seen } is
declarative per observer_id, triggers the same snapshot-then-live
with the same skip rule. One resync contract for humans and AIs —
the AI-perception constituency was the one the prior shape gapped.

Also from the review: per-kind revision ordering rule (substrate
emits non-decreasing; consumers drop lower-than-rendered — reordering
degrades harmless, not corrupting); the missing ack/error frame is
documented as a deliberate v0 omission with the additive path named;
observe tag layout + bare-observe default pinned by tests.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@joelteply

Copy link
Copy Markdown
Contributor Author

Re-verify of f410ff4 (sentinel re-review after prior BLOCK): BLOCK — R1's stale-forever scenario is not fully closed; it survives one layer over, via the new consumer drop rule.

What's verified good:

  • R2 (subscribe replace) — closed. "A Subscribe frame declares the connection's complete interest set; it REPLACES any previous subscription on the connection", absent kinds stop flowing, identical re-subscribes idempotent as a special case. Unambiguous.
  • R3 (observer resync) — closed. Observe { spec, last_seen } struct variant with #[serde(default)] last_seen, explicitly bound to the same snapshot-then-live + exact-equality skip contract ("one resync contract, not a human one and an AI one"). Generated ClientMessage.ts matches the new shape, and the export tests rewrite the generated files during cargo test — the tree was clean afterward, so committed TS == generated TS, no drift. New test pins the {"type":"observe"...} tag layout and bare-observe → empty last_seen. The wire break to the observe variant vs 2fb8837 is fine pre-merge; Rust, TS, and tests moved in the same commit.
  • R4 — present and sane. Ordering rule documented; "Deliberate v0 omissions" correctly explains correlation_id having no ack frame yet and why adding a ServerMessage variant later is non-breaking.
  • Mechanics (clean temp worktree at f410ff4): cargo test -p positron-core 18/18 pass incl. observe_round_trips_with_pinned_tag_and_default_last_seen; cargo fmt --check clean; cargo clippy --workspace --tests -- -D warnings clean; git status --porcelain empty after the test run.

The blocking hole (R1 residual): the exact-equality skip correctly kills the >= permanent gap substrate-side — but the new Ordering paragraph reintroduces the identical failure mode consumer-side. "Consumers MUST drop an envelope whose revision is lower than one already rendered for that kind." The natural implementation keeps exactly one watermark per kind (lastRendered[kind]) — the same value it sends as last_seen. Interleaving:

  1. Client renders chat at revision 500. Substrate restarts; counter resets, now at 3.
  2. Client reconnects: Subscribe { last_seen: [chat: 500] }. 500 ≠ 3, so the substrate correctly sends the snapshot at revision 3 — the new skip rule works.
  3. The consumer's MUST-drop rule sees 3 < 500 and discards the snapshot. It has no choice: the spec deliberately makes snapshot and live "the SAME frame" with no phase bookkeeping, so a consumer cannot exempt snapshots even if it wanted to.
  4. Every subsequent frame for that kind is likewise dropped until the substrate's counter exceeds 500 — silent, effectively permanent stale state. This is R1's exact scenario, migrated from the skip rule to the drop rule.

Fix is one sentence in the Ordering paragraph: scope the drop-rule watermark per connection/subscription — e.g. "the rendered-revision watermark backing this drop rule MUST be reset whenever the consumer sends a new Subscribe/Observe; cross-connection knowledge travels only in last_seen, never in the drop watermark." Per-connection is coherent: frames cannot reorder across a transport reconnect, so resetting loses nothing.

Non-blocking note: "Exact equality makes counter resets safe by construction" slightly overclaims — old-epoch rev N vs new-epoch rev N with different content (counter resets, kind mutates N times before the client reconnects; cheap to hit at small N) still skips wrongly, leaving the client silently stale until that kind's next mutation. Transient and self-healing, and not fixable at this layer without an epoch/generation id — worth one line in "Deliberate v0 omissions" (epoch id as the v0.x answer), or advice that reset-prone substrates simply never exercise the MAY-skip.

One-sentence respin away; everything else is approval-ready.

…nd 2, card df3fb2ab)

The round-2 sentinel caught R1's failure mode migrating into the
ordering rule: a consumer that rendered chat@500, reconnecting to a
counter-reset substrate, would drop the snapshot@3 it just requested
(3 < 500) and everything after — stale-forever, one layer up. The
watermark is now normatively SUBSCRIPTION-SCOPED: resets on every
Subscribe/Observe; pre-reconnect knowledge travels only in last_seen.
Watermark resets; last_seen remembers; the two never share a counter.

Also documents the residual ABA case (old-epoch rev N vs new-epoch
rev N skips one snapshot, transient, heals on next mutation) in the
v0 omissions with the epoch-id path named — 'safe by construction'
no longer overclaims.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@joelteply

Copy link
Copy Markdown
Contributor Author

Final-round verification of e29e34d — APPROVE

Sole round-2 blocker (watermark recreating the stale-forever gap): CLOSED. Delta f410ff4..e29e34d is doc-only, +18 lines in positron-core/src/session.rs, and matches the prescribed fix exactly: drop watermark scoped to the current subscription, RESETS on every Subscribe/Observe, cross-reconnect knowledge travels ONLY in last_seen. Walked the interleaving: rendered@500 → reconnect → Subscribe{last_seen:[chat:500]} (watermark reset) → skip rule sees 3 ≠ 500 → snapshot@3 sent → watermark is reset, so 3 < 500 no longer triggers a drop → rendered, live resumes from 3. Within a single connection the substrate counter is monotone, so the reset cannot admit genuinely stale content; cross-connection comparison only happens via last_seen's exact-equality skip. No remaining interleaving where a verbatim-spec consumer silently keeps stale state via the drop rule.

ABA residual: documented as a v0 omission, accurate (coincidental counter re-match wrongly skips one snapshot; transient — next mutation streams down; epoch/generation id named as the full fix; persisted counters never enter the case). No contradictions among skip rule / replace semantics / observer contract / ordering rule / new scoping — skipped snapshot + reset watermark composes fine (subsequent live frames are ≥ current by monotonicity), and identical re-subscribes stay observably idempotent.

Gauntlet (temp worktree at e29e34d): cargo test -p positron-core 18/18 pass · cargo fmt --check clean · cargo clippy --workspace --tests -- -D warnings clean (ts-rs build-script note only, not a lint) · porcelain clean after tests. Worktree removed.

Non-blocking notes for v0.x:

  1. The skip paragraph still says exact equality makes counter resets "safe by construction" — now slightly overclaimed given the documented ABA residual two sections later; suggest softening to "safe for any mismatch" whenever the file is next touched.
  2. "Transient" staleness in the ABA note assumes the kind mutates again; a never-again-mutated kind stays stale until the epoch id lands (the note's parenthetical already implies this).
  3. A substrate restart behind a relaying transport (e.g. airc) that the client never observes as a disconnect yields no resync trigger — a liveness/heartbeat concern adjacent to the epoch-id item, out of scope for this card.

@joelteply joelteply merged commit 8286e24 into main Jun 10, 2026
2 checks passed
@joelteply joelteply deleted the feat/session-protocol-df3fb2ab branch June 10, 2026 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant