Skip to content

feat(control-plane): runtime-node attach + heartbeat + version-floor#145

Draft
Sawmonabo wants to merge 7 commits into
developfrom
feat/plan-003-phase-3-control-plane-attach
Draft

feat(control-plane): runtime-node attach + heartbeat + version-floor#145
Sawmonabo wants to merge 7 commits into
developfrom
feat/plan-003-phase-3-control-plane-attach

Conversation

@Sawmonabo
Copy link
Copy Markdown
Owner

@Sawmonabo Sawmonabo commented Jun 6, 2026

Summary

Plan-003 Phase 3 — Control-Plane Attach + Heartbeat Services + Version-Floor Enforcement. Brings tests P1–P10 green:

  • Contract narrowing (T3.0)capabilityupdate.healthChanges.state narrowed to the 2-value RuntimeNodeHealthState (online | degraded) daemon self-report enum; offline/revoked become unconstructable (least-privilege, I-003-2).
  • Migration (T3.1) — Plan-003-owned control-plane Postgres v3: runtime_node_attachments + runtime_node_presence.
  • Attach service (T3.2–T3.5) — NULL-floor unconditional admission; floor comparison with admit-in-read-only (I-003-1); typed VERSION_FLOOR_EXCEEDED on below-floor writes (never ejected); single-active-session + reconnect reactivation; revocation terminal; multi-node coexistence without changing session identity.
  • Heartbeat service (T3.6) — presence ingestion + server-derived degraded(30s)/offline(60s) staleness sweep recorded as coordination-record updates, not durable events (ADR-017 §Server-Derived Runtime-Node Lifecycle Events).
  • Capability-update (T3.9) + router (T3.8) — discovery-snapshot refresh; runtimeNodeRouter (attach/heartbeat/capabilityupdate/detach) composed onto the Plan-008-bootstrap tRPC host.
  • I-003-3 — attach/detach never mutate session_memberships (T3.7).

Task DAG

plan: 3
phase: 3
pr: 145
tasks:
  - id: T3.0
    title: "Contract: narrow capabilityupdate.healthChanges.state to the 2-value RuntimeNodeHealthState daemon self-report enum"
    target_paths:
      - packages/contracts/src/runtime-node.ts
      - packages/contracts/src/__tests__/runtime-node.test.ts
      - docs/architecture/contracts/api-payload-contracts.md
    depends_on: []
    dispatch_mode: sequential
    role: implementer
    spec_coverage:
      - "Spec-003 line 57"
      - "Spec-003 line 65"
    verifies_invariant: [I-003-2]
    blocked_on: []
    acceptance_criteria:
      - "Unit (contract): no Phase-3 P-AC; inverts the shipped 5-value-accept conformance test (runtime-node.test.ts:306-319) so offline/revoked/registering reject and online/degraded accept"
    contract_provides:
      - "RuntimeNodeCapabilityUpdateRequest.healthChanges.state (narrowed to RuntimeNodeHealthState: online | degraded)"
    contract_consumes:
      - RuntimeNodeHealthStateSchema
    consumes_resolution:
      RuntimeNodeHealthStateSchema: "shipped Phase-1 export packages/contracts/src/runtime-node.ts:109 (RuntimeNodeHealthStateSchema = z.enum(['online','degraded'])); swap the healthChanges.state member from NodeStateSchema (runtime-node.ts:236) to this hoisted 2-value enum"
    notes: "Role is implementer, not contract-author: narrows an already-shipped contract, inverts its shipped conformance test, and flips the synchronized api-payload-contracts.md wire-doc block — all co-committed (the narrowed schema fails the old test, so the inversion cannot lag the type). No same-level task consumes its output (T3.9 is downstream at L5), so contract-author run-first-in-level semantics buy nothing. Precedes all control-plane service tasks; specifically precedes T3.9."
  - id: T3.1
    title: "Control-plane migration v3: CREATE runtime_node_attachments + runtime_node_presence (Postgres, Plan-003-owned)"
    target_paths:
      - packages/control-plane/src/migrations/0003-runtime-nodes.ts
      - packages/control-plane/src/migrations/__tests__/0003-runtime-nodes.test.ts
      - packages/control-plane/src/sessions/migration-runner.ts
      - packages/control-plane/src/sessions/__tests__/migration-runner.test.ts
      - packages/control-plane/src/migrations/__tests__/runtime-node-upstream-anchors.test.ts
      - packages/control-plane/src/migrations/__tests__/migration-shape.test.ts
      - packages/control-plane/src/migrations/__tests__/0002-session-invites.test.ts
      - packages/control-plane/src/presence/__tests__/presence-register-service.test.ts
      - packages/control-plane/src/sessions/__tests__/session-directory-service.test.ts
    depends_on: []
    dispatch_mode: sequential
    role: implementer
    spec_coverage:
      - "Spec-003 line 91"
    verifies_invariant: []
    blocked_on: []
    acceptance_criteria:
      - "Migration-shape test: applying 0003 against a DB migrated through 0002 creates both tables with the exact column set, the state CHECK enum, the composite (node_id, session_id) uniqueness, and the presence PK; idempotent under the runner"
      - "Cross-plan suite stays green: registering v3 in the shared runner reconciles all forced-amendment consumers (no red test outside target_paths)"
    contract_provides: []
    contract_consumes: []
    consumes_resolution: {}
    notes: "Tables are Plan-003-owned (cross-plan-dependencies.md line 41; -- Owner: Plan-003 stamps in shared-postgres-schema.md), CREATEd here, not by Plan-001. Append { version: 3, sql } to MIGRATIONS after Plan-002's v2 0002-session-invites (0002 is taken; reusing it breaks the runner's monotonic sequence). Precedes T3.2 (attachments) and T3.6 (presence). Verifies no invariant directly — substrate for I-003-1/I-003-3/I-003-5 persistence. ORCHESTRATOR target_paths reconciliation (advisor-validated, 4->9 files): registering v3 in the shared applyMigrations is one atomic change whose blast radius is empirically 5 test files / 7 assertions (corrected validate cmd `pnpm --filter @ai-sidekicks/control-plane test` — the buggy trailing `run` positional had hidden 3 of 5). Blast radius confirmed CONTAINED to control-plane (no external applyMigrations/schema_migrations importer; runtime-daemon's same-named fn is a separate SQLite runner). Folded all 5 into T3.1 (atomic: one commit = v3 exists + runner applies it + every consumer reconciled), matching the Plan-002 Amendment 2 fix-in-place precedent. (1) migrations/__tests__/0003-runtime-nodes.test.ts — restored co-located shape test the Phase-A transcription dropped (plan line 414), mirrors 0002 T1-T8. (2) sessions/__tests__/migration-runner.test.ts — R1 len 2->3 + v3 anchor/table probes; R2 [v1,v2]->[v1,v2,v3]. (3) migrations/__tests__/runtime-node-upstream-anchors.test.ts (Plan-003-owned) — its header note (d) is a self-documenting tripwire: flip assertion (3) ABSENT->PRESENT (`.toBe(true)`) + retitle; AND home the full-schema I-002-3 carve-out here (new assertion (4): the only durable presence-NAMED table is runtime_node_presence — runtime-node liveness, NOT collaborative Yjs-Awareness presence; cite Spec-003 §Default-Behavior + ADR-017 §Server-Derived + shared-postgres-schema.md). (4) presence/__tests__/presence-register-service.test.ts (Plan-002 I-002-3 re-verify) + (5) migrations/__tests__/migration-shape.test.ts third test — these broke ONLY because they used applyMigrations as a v2-shortcut that now also applies v3; restore their DOCUMENTED v1->v2 scoping (migration-shape header lines 27-29; presence test-2 comment) via direct-exec, NOT a Plan-003 carve-out mutating a Plan-002 assertion. migration-shape test 3 keeps the runner (the runner IS its subject): assert the runner's v1->fullset delta is the 3 tables + retitle. (6) migrations/__tests__/0002-session-invites.test.ts T7 + (7) sessions/__tests__/session-directory-service.test.ts x2 — idempotency tests: a bare 2->3 bump would make T7's 'no-op' title lie, so bring the DB fully to v1+v2+v3 before the no-op assertion (T7: applyMigrations catch-up then re-call asserting no-op); session-directory x2 already fully-migrate in beforeEach so they are clean [v1,v2]->[v1,v2,v3] array bumps. Every edited Plan-001/002-owned file carries an in-file amendment note naming Plan-003 PR #145 (mirrors the Cross-Plan Amendment 2 reference pattern)."
  - id: T3.2
    title: "Attach service: NULL-floor unconditional admission + upsert/reconnect + typed CONFLICT refusals"
    target_paths:
      - packages/contracts/src/error.ts
      - packages/contracts/src/__tests__/error.test.ts
      - docs/architecture/contracts/error-contracts.md
      - packages/control-plane/src/runtime-nodes/attach-service.ts
      - packages/control-plane/src/runtime-nodes/errors.ts
      - packages/control-plane/src/runtime-nodes/__tests__/attach-service.test.ts
    depends_on: [T3.1]
    dispatch_mode: sequential
    role: implementer
    spec_coverage:
      - "Spec-003 line 53"
    verifies_invariant: [I-003-3, I-003-5]
    blocked_on: []
    acceptance_criteria:
      - "P1: RuntimeNodeAttach with NULL min_client_version floor admits all daemon versions"
      - "P9: node actively attached elsewhere is refused a second active attach with typed CONFLICT (409); reconnect after detach reactivates the offline row (I-003-5)"
      - "P10: re-attach of a revoked row is refused with typed CONFLICT (409) — revocation is terminal"
    contract_provides:
      - RUNTIME_NODE_ATTACH_CONFLICT_CODE
      - RUNTIME_NODE_ATTACH_REVOKED_CODE
    contract_consumes: []
    consumes_resolution: {}
    notes: "Two-part task — (A) author the attach-conflict WIRE CONTRACT, (B) build the service that throws it. Advisor-validated scope expansion: the original DAG's empty contract fields missed this gap; the plan T3.2 step says the exception carries 'a dotted-lowercase code registered in error-contracts.md per the existing convention' but names no string, so it is fresh design authored here (NOT inlined as a bare string — the exact contract-deferral anti-pattern). (A) CONTRACT — add TWO code constants to packages/contracts/src/error.ts mirroring the RESOURCE_LIMIT_EXCEEDED_CODE two-liner (type alias + const ONLY; NO Details/Error/Schema block — code+message-only per the registry-only convention; no AC needs structured details and a conflicting-session-id detail would risk cross-session info-leak): RUNTIME_NODE_ATTACH_CONFLICT_CODE = 'runtimenode.attach_conflict' (P9, TRANSIENT — node already actively attached on another session; resolvable after detach) and RUNTIME_NODE_ATTACH_REVOKED_CODE = 'runtimenode.attach_revoked' (P10, TERMINAL — this session's attachment row is revoked; never retry). Domain token 'runtimenode' matches the method namespace (runtimenode.attach) and avoids the runtime_node.* event-name collision (separator differs). Add 2 conformance assertions to contracts/src/__tests__/error.test.ts (mirror the error.test.ts:64 .toBe pattern). Add a Runtime Node table to error-contracts.md Error Codes section with both codes at HTTP 409 (mirror the Invite/Membership tables). (B) SERVICE — create runtime-nodes/errors.ts with TWO throwables mirroring sessions/errors.ts ResourceLimitExceededException but WITHOUT the details field (plain extends Error: readonly code = the CONST, constructor(message: string)): RuntimeNodeAttachConflictException + RuntimeNodeAttachRevokedException (T3.4 extends this file with the version-floor throwable). Create attach-service.ts (Querier-injected AttachService, mirrors MembershipService constructor-injection). Upsert is INSERT ... ON CONFLICT (node_id, session_id) DO UPDATE on idx_node_attachments_node; cross-session second-active refusal catches Postgres 23505 on idx_node_attachments_active and throws RuntimeNodeAttachConflictException; revoked re-attach throws RuntimeNodeAttachRevokedException. SERVICE-BOUNDARY verification only (Review Note [T3.2 service-boundary verification]): tests assert AttachService.attach THROWS the typed exception; the to-409 envelope projection is deferred to T3.4. depends_on [T3.1] (tables must exist). HEAD of the attach-service.ts writer chain; precedes T3.3, T3.5, T3.7, T3.9 and provides the AttachService class T3.8 wires."
  - id: T3.3
    title: "Attach service: version-floor comparison — at/above floor read/write, below floor admit read-only"
    target_paths:
      - packages/contracts/src/event.ts
      - packages/contracts/src/__tests__/session-event.test.ts
      - packages/control-plane/src/runtime-nodes/attach-service.ts
      - packages/control-plane/src/runtime-nodes/__tests__/attach-service.test.ts
    depends_on: [T3.2]
    dispatch_mode: sequential
    role: implementer
    spec_coverage:
      - "Spec-003 line 53"
    verifies_invariant: [I-003-1]
    blocked_on: []
    acceptance_criteria:
      - "P2: client_version >= floor admits with full read/write"
      - "P3: client_version < floor admits in read-only state — node remains joined and reads succeed (I-003-1)"
    contract_provides:
      - compareEventEnvelopeVersion
    contract_consumes:
      - "RuntimeNodeAttachRequest.clientVersion"
      - "RuntimeNodeAttachResponse.readOnly"
    consumes_resolution:
      "RuntimeNodeAttachRequest.clientVersion": "shipped Phase-1 contract: RuntimeNodeAttachRequest.clientVersion: EventEnvelopeVersion (packages/contracts/src/runtime-node.ts:128; api-payload-contracts.md:490 — branded MAJOR.MINOR semver, semver-aware comparison per ADR-018 Decision #1). Ratified by Tier-3 audit."
      "RuntimeNodeAttachResponse.readOnly": "shipped Phase-1 contract: RuntimeNodeAttachResponse.readOnly: boolean (packages/contracts/src/runtime-node.ts:172,191; api-payload-contracts.md:497 — derived below-floor permission flag, orthogonal to NodeState; comment notes it is populated by the Phase-3 attach service). T3.3 fills the field; it does NOT extend the schema."
    notes: "depends_on [T3.2] (semantic — extends AttachService). Verified readOnly is in the SHIPPED Zod RuntimeNodeAttachResponseSchema (runtime-node.ts:191), not doc-only, so this is not a schema-extension gap. File-serialization edge on attach-service.ts: placed at a level strictly after T3.2 and before T3.5/T3.7/T3.9/T3.4 to avoid same-file same-level collision. Precedes T3.4 (which needs the read-only state). ORCHESTRATOR scope expansion (advisor-validated): authors compareEventEnvelopeVersion in contracts/src/event.ts — the canonical total ordering of the EventEnvelopeVersion value type — consumed in-PR by T3.3 (below-floor read-only verdict) and re-derived by T3.4 (below-floor write-refusal); the daemon's envelope version negotiation (ADR-018 sections 6/7/10) is the third consumer. Contracts is the only shared ancestor; a control-plane-local helper would force the daemon to depend upward or re-implement the compare (lexical 10<9 bug, twice). Hand-rolled numeric MAJOR.MINOR tuple compare, NOT the semver lib (strict 2-segment type needs no coercion/ranges/prerelease and no new dep). #deriveReadOnly parses the DB floor via EventEnvelopeVersionSchema.parse (NEVER an as-cast) so a malformed floor throws loud rather than NaN-bypassing to read-write. Comparator unit tests co-locate in session-event.test.ts (where EventEnvelopeVersion is already tested), not a new event.test.ts. See Review Note [T3.3 version comparator -> contracts]."
  - id: T3.4
    title: "Write-after-read-only-attach returns typed VERSION_FLOOR_EXCEEDED; node not detached"
    target_paths:
      - packages/control-plane/src/runtime-nodes/errors.ts
      - packages/control-plane/src/runtime-nodes/attach-service.ts
      - packages/control-plane/src/runtime-nodes/runtime-node-router.factory.ts
      - packages/control-plane/src/sessions/trpc.ts
    depends_on: [T3.3, T3.8]
    dispatch_mode: sequential
    role: implementer
    spec_coverage:
      - "Spec-003 line 123"
    verifies_invariant: [I-003-1]
    blocked_on: []
    acceptance_criteria:
      - "P4: read-only-attached daemon's write returns typed VERSION_FLOOR_EXCEEDED; node remains joined (no detach)"
    contract_provides: []
    contract_consumes:
      - VERSION_FLOOR_EXCEEDED_CODE
      - compareEventEnvelopeVersion
    consumes_resolution:
      VERSION_FLOOR_EXCEEDED_CODE: "shipped Phase-2 contract constant: VERSION_FLOOR_EXCEEDED_CODE + the VersionFloorExceededError wire shape/schema in packages/contracts/src/error.ts (Plan-001 T1.5/T2.3); wire code version.floor_exceeded -> HTTP 409 (error-contracts.md:266). Import the constant, do NOT re-spell the string literal."
    notes: "Extends errors.ts (adds VersionFloorExceededException — class extends Error with readonly code = VERSION_FLOOR_EXCEEDED_CODE, mirroring sessions/errors.ts ResourceLimitExceededException). depends_on [T3.3] (read-only state) AND [T3.8]: T3.4 adds a catch-arm to runtime-node-router.factory.ts, which T3.8 CREATEs — adding a catch-arm to a not-yet-existing file is an unsatisfiable forward reference (mirror of the T3.7->T3.8 hazard). The row text confirms the errorFormatter is one 'the T3.8 sibling router reuses' — same physical router file, no second router named. ANALYST-RESOLVED in-DAG (the rows describe the surfaces but do not state this edge explicitly). T3.4 is terminal; T3.8's deps exclude it, so no cycle. Two-part wiring: router catch-arm rethrows as TRPCError CONFLICT (409); errorFormatter on shared t builder (sessions/trpc.ts) projects onto shape.data.aisError. Attachment row left intact (no revoked/offline, no session_memberships change). ALSO wires catch-arms for T3.2's RuntimeNodeAttachConflictException + RuntimeNodeAttachRevokedException (the deferred P9/P10 to-409 envelope), evolves SessionRouterAisError to optional-details, and does the AisWireException base-class refactor (the shared t formatter reaches 4 typed branches here) — see Review Note [T3.2 attach-conflict wire codes + T3.4/T3.8 envelope evolution]. Re-derives the read-only verdict at write time via compareEventEnvelopeVersion(clientVersion, floor) (no persisted read_only column exists — recompute from the attachment client_version + the session floor); consumes the T3.3 contracts comparator in-DAG."
  - id: T3.5
    title: "Multiple nodes attach to one session without changing session identity"
    target_paths:
      - packages/control-plane/src/runtime-nodes/__tests__/attach-service.test.ts
    depends_on: [T3.2, T3.3]
    dispatch_mode: sequential
    role: implementer
    spec_coverage:
      - "Spec-003 line 49"
      - "Spec-003 line 122"
    verifies_invariant: [I-003-3]
    blocked_on: []
    acceptance_criteria:
      - "P5: multiple runtime nodes attach to the same session without changing session identity"
    contract_provides: []
    contract_consumes: []
    consumes_resolution: {}
    notes: "Audit semantic dep is [T3.2]. Added operational edge [T3.3] for attach-service.ts file-serialization (six same-file writers must occupy distinct levels — same-level overlap is rejected even in sequential mode). Inserts under composite (node_id, session_id) uniqueness so two distinct nodes share one session_id without re-creating the session or mutating sessions. ORCHESTRATOR (T3.3 precedent): test-only — P5 is ALREADY satisfied by T3.2's upsert (ON CONFLICT (node_id, session_id)) + the per-NODE (not per-session) idx_node_attachments_active partial-unique (0003-runtime-nodes.ts:122-123), so no attach-service.ts production change is needed; T3.5 is the P5 characterization test (two distinct nodes both admitted active to one session; the sessions row is byte-unchanged and no new session row is created — session identity preserved). target_paths corrected to the actual edit surface (attach-service.test.ts); the [T3.3] operational edge now serializes the shared test file."
  - id: T3.6
    title: "Heartbeat service: presence ingestion + sweep-driven degraded/offline transitions (coordination record, no durable event)"
    target_paths:
      - packages/control-plane/src/runtime-nodes/heartbeat-service.ts
      - packages/control-plane/src/runtime-nodes/__tests__/heartbeat-service.test.ts
    depends_on: [T3.1]
    dispatch_mode: sequential
    role: implementer
    spec_coverage:
      - "Spec-003 line 59"
      - "Spec-003 line 60"
      - "Spec-003 line 61"
    verifies_invariant: []
    blocked_on: []
    acceptance_criteria:
      - "P6: heartbeat ingestion updates runtime_node_presence; a node aging past 30s then 60s is demoted by the staleness sweep to degraded then offline (coordination-record transition; no durable event in V1)"
    contract_provides: []
    contract_consumes: []
    consumes_resolution: {}
    notes: "New file, disjoint from the attach chain; depends_on [T3.1] only (presence table). Periodic staleness sweep at 5s (finer than the 15s cadence, bounding detection lag to <=5s). Server-derived degraded (>30s) / offline (>60s) are coordination-record writes to runtime_node_presence.health_state, NOT durable runtime_node.* events (ADR-017 §Server-Derived — V1.1-gated). Demotion is sweep-driven not ingest-driven (a dead node sends nothing). Backs T3.8's runtimenode.heartbeat procedure. Verifies no invariant directly (health-state lifecycle)."
  - id: T3.7
    title: "I-003-3 enforcement: attach/detach never mutate session_memberships; detach path"
    target_paths:
      - packages/control-plane/src/runtime-nodes/attach-service.ts
    depends_on: [T3.2, T3.5]
    dispatch_mode: sequential
    role: implementer
    spec_coverage:
      - "Spec-003 line 47"
      - "Spec-003 line 51"
    verifies_invariant: [I-003-3]
    blocked_on: []
    acceptance_criteria:
      - "P7: RuntimeNodeAttach does not mutate session_memberships"
      - "P8: RuntimeNodeDetach leaves session_memberships unchanged"
    contract_provides: []
    contract_consumes: []
    consumes_resolution: {}
    notes: "Audit semantic dep is [T3.2]. Added operational edge [T3.5] for attach-service.ts file-serialization. Detach resolves the node's single active attachment by nodeId (unambiguous per I-003-5) and updates runtime_node_attachments.state (offline for clean disconnect, revoked for trust revocation) + runtime_node_presence.health_state, leaving membership rows untouched. The detach method authored here backs T3.8's runtimenode.detach procedure, so T3.7 PRECEDES T3.8 (the original DAG mis-sequenced T3.7 after T3.8 — an unsatisfiable forward reference; corrected here)."
  - id: T3.9
    title: "Capability-update service: control-plane discovery-snapshot refresh (AttachService.updateCapabilities)"
    target_paths:
      - packages/control-plane/src/runtime-nodes/attach-service.ts
    depends_on: [T3.0, T3.2, T3.7]
    dispatch_mode: sequential
    role: implementer
    spec_coverage:
      - "Spec-003 line 52"
      - "Spec-003 line 57"
      - "Spec-003 line 76"
      - "Spec-003 line 65"
    verifies_invariant: [I-003-2, I-003-3, I-003-5]
    blocked_on: []
    acceptance_criteria:
      - "Unit (no P-AC): capabilities snapshot refreshed on the active row; healthChanges.state online against a registering attachment is rejected (I-003-2 registering->online guard, typed CONFLICT 409); updatedAt is server now() while attached_at is unchanged; no session_memberships write; control-plane path emits no durable runtime_node.* event (ADR-017)"
    contract_provides: []
    contract_consumes:
      - "RuntimeNodeCapabilityUpdateRequest.healthChanges.state (narrowed: online | degraded)"
      - RuntimeNodeCapabilityUpdateResponseSchema
    consumes_resolution:
      "RuntimeNodeCapabilityUpdateRequest.healthChanges.state (narrowed: online | degraded)": "in-DAG: provided by T3.0 contract_provides (healthChanges.state narrowed from 5-value NodeState to 2-value RuntimeNodeHealthState). offline/revoked are rejected at the schema boundary, so the I-003-2 guard here is only the residual registering->online state-context refusal."
      RuntimeNodeCapabilityUpdateResponseSchema: "shipped Phase-1 contract: RuntimeNodeCapabilityUpdateResponseSchema (packages/contracts/src/runtime-node.ts:314) — response left unchanged by T3.0 (state stays full NodeState, the server-derived liveness projection). Map RETURNING node_id/state + now() AS updated_at -> nodeId/state/updatedAt and parse through it."
    notes: "depends_on [T3.0] (narrowed request contract) AND [T3.2] (AttachService class). Added operational edge [T3.7] for attach-service.ts file-serialization. updateCapabilities is a method ON AttachService (cohesive with attach/detach — all own the runtime_node_attachments row lifecycle; NOT a fragmented second class). In a transaction(): resolve single active attachment by nodeId FOR UPDATE, refresh capabilities JSONB, apply healthChanges.state under the I-003-2 registering->online guard. updatedAt is transaction-time server clock (RETURNING now()), NOT a stored column (schema has only attached_at, must not overwrite). PRECEDES T3.8 (backs runtimenode.capabilityupdate); its after-T3.8 file numbering is append-only, not execution order."
  - id: T3.8
    title: "Route registration: createRuntimeNodeRouter factory + sibling composition into the Plan-008 tRPC host"
    target_paths:
      - packages/control-plane/src/runtime-nodes/runtime-node-router.factory.ts
    depends_on: [T3.2, T3.6, T3.7, T3.9]
    dispatch_mode: sequential
    role: implementer
    spec_coverage:
      - "Spec-003 line 52"
    verifies_invariant: []
    blocked_on: []
    acceptance_criteria:
      - "No standalone assertion — enables P1-P10 transport (mounts the four runtimenode.* procedures)"
    contract_provides: []
    contract_consumes:
      - "runtimenode.attach / runtimenode.heartbeat / runtimenode.capabilityupdate / runtimenode.detach (procedure names + request/response schemas)"
    consumes_resolution:
      "runtimenode.attach / runtimenode.heartbeat / runtimenode.capabilityupdate / runtimenode.detach (procedure names + request/response schemas)": "ratified by Tier-3 audit — Runtime-Node Method-Name Registry, api-payload-contracts.md:528-539 (all four procedures: mutation; request/response schemas named). dotted-camelCase method names are METHOD_NAME_FORMAT-valid (Plan-007 ipc/registry.ts rejects the underscore runtime_node.* style). Backing service methods are in-DAG: attach->AttachService.attach (T3.2), heartbeat->HeartbeatService ingest (T3.6), detach->AttachService detach path (T3.7), capabilityupdate->AttachService.updateCapabilities (T3.9)."
    notes: "depends_on [T3.2, T3.6, T3.7, T3.9] — the four backing service methods; none may be a forward reference. CREATEs runtime-node-router.factory.ts and composes it as a sibling runtimeNodeRouter merged into the root router in server/host.ts alongside createSessionRouter (CP-003-2). T3.4 later adds a catch-arm to this file, hence T3.4 depends_on T3.8. Verifies no invariant (transport wiring)."
levels:
  - [T3.0, T3.1]
  - [T3.2, T3.6]
  - [T3.3]
  - [T3.5]
  - [T3.7]
  - [T3.9]
  - [T3.8]
  - [T3.4]
status: ready

Test plan

  • P1: NULL floor admits all daemon versions
  • P2: client_version ≥ floor → read/write attachment
  • P3: client_version < floor → read-only attachment (joined, reads succeed)
  • P4: read-only-attached write returns typed VERSION_FLOOR_EXCEEDED; node not detached
  • P5: multiple runtime nodes attach to one session without changing session identity
  • P6: heartbeat ingestion updates presence; staleness sweep demotes degraded(30s)→offline(60s) as coordination-record transitions (no durable event in V1)
  • P7: attach does not mutate session_memberships
  • P8: detach leaves session_memberships unchanged
  • P9: single-active-session enforced; reconnect reactivates the offline row
  • P10: re-attach of a revoked row is refused (revocation terminal)

Review Notes

  • [T3.2 service-boundary verification] The wire layer lands last (router T3.8 @ L6; catch-arm + errorFormatter T3.4 @ L7), but T3.2's P9/P10 are written as wire behavior (typed CONFLICT 409) at L1. This is structurally forced — T3.2 cannot depend on the catch-arm without cycling (T3.4 → T3.3 → T3.2). T3.2 therefore verifies at the service boundary: AttachService.attach throws the typed exception on cross-session-active (Postgres 23505) / revoked re-attach, and reactivates the offline row on reconnect. The "→ 409 envelope" assertion is deferred to T3.4 (where the catch-arm lands). T3.2 MUST NOT create runtime-node-router.factory.ts (T3.8 owns it). T3.3 P2/P3 and T3.4 P4 are already service-level / end-to-end respectively, so this only applies to T3.2.
  • [Phase E Refs] Add ADR-018 §Decision docs(repo): amend branch model from trunk-based to GitFlow-lite #4 to the Refs trailer + squash message — it is the version-floor-enforcement authority this phase implements (ADR-017 is the reconciliation backdrop). Final Refs: ADR-017, ADR-018, Plan-003, Spec-003.
  • [T3.1 cross-plan test reconciliation — expected, not scope-creep] Registering migration v3 in the shared applyMigrations forces amendments to 5 test files beyond the new migration (blast radius empirically confirmed CONTAINED to control-plane). All 5 are folded into T3.1 as one atomic commit (Plan-002 Amendment 2 fix-in-place precedent). Why a Plan-003 task touches Plan-001/002 files: (a) migration-runner.test.ts / session-directory-service.test.ts — version-count + anchor-array bumps (new registered version). (b) 0002-session-invites.test.ts T7 — idempotency test brought fully to v1+v2+v3 so the "re-call is a no-op" assertion stays true (not a hollow 2→3 bump). (c) presence-register-service.test.ts (tests 1&2) + migration-shape.test.ts (test 3 only) — these I-002-3 guards broke only because they used applyMigrations as a v2-shortcut that now also applies v3; the fix restores their documented v1→v2 scoping (migration-shape.test.ts:27-29; presence test-2 comment) rather than mutating a Plan-002 invariant assertion from a Plan-003 task. The full-schema I-002-3 carve-out is homed in the Plan-003-owned runtime-node-upstream-anchors.test.ts (its header note (d) prescribes exactly this), where new assertion (4) pins that the only durable presence-named table is runtime_node_presence — runtime-node liveness, a distinct domain from the collaborative Yjs-Awareness presence I-002-3 governs (cites Spec-003 + ADR-017 + shared-postgres-schema.md). I-002-3's teeth are preserved at full-schema scope, in-lane. Each edited Plan-001/002 file carries an in-file amendment note naming this PR.
  • [T3.2 attach-conflict wire codes + T3.4/T3.8 envelope evolution] T3.2 authors two code+message-only wire error codes — runtimenode.attach_conflict (P9, transient) + runtimenode.attach_revoked (P10, terminal) — in contracts/src/error.ts + error-contracts.md (new §Runtime Node table, HTTP 409). No details shape by design: the registry-only convention (8 of the existing 409 codes are code+message), no AC needs structured details, and a conflicting-session-id detail would leak a session the caller may not access. The current SessionRouterAisError envelope (trpc.ts:49-53) types details as required (ResourceLimitExceededDetails) only because resource-limit is the sole projected error today. T3.4/T3.8 obligation (carry forward — do not lose): when wiring the runtime-node router catch-arm + reusing the shared t errorFormatter, (a) evolve the envelope to accept code+message-only errors (make details optional, or give the runtime-node router its own envelope), and (b) the formatter then matches 4 typed exceptions (resource-limit + version-floor + the 2 attach-conflict) — at/above the trpc.ts:27-29 + sessions/errors.ts:14-16 documented "3+ branches → AisWireException base class" refactor trigger, so do the base-class refactor at that point. T3.2 itself stays at the service boundary (throwables only; no formatter wiring).
  • [T3.3 version comparator → contracts] T3.3 authors compareEventEnvelopeVersion(a, b): -1 | 0 | 1 in contracts/src/event.ts, co-located with the EventEnvelopeVersion value type it orders — completing the type's API, not speculative generalization. Placement = dependency direction: the comparator is consumed in-PR by T3.3 (below-floor read-only verdict) and re-derived by T3.4 (below-floor write-refusal — there is no persisted read_only column, so the verdict is recomputed at write time from the attachment client_version + the session floor), and per ADR-018 §6/§7/§10 the daemon also orders EventEnvelopeVersion values for envelope negotiation + upcaster keying. Contracts is the only shared ancestor of both packages; a control-plane-local helper would force the daemon to depend upward into control-plane or re-implement the compare (the lexical "10" < "9" bug, twice). Hand-rolled, not the semver lib: the type is strictly 2-segment MAJOR.MINOR (EVENT_ENVELOPE_VERSION_PATTERN), so a numeric tuple compare is trivially correct while semver needs .coerce() padding, a new control-plane dependency, and irrelevant patch/prerelease/range semantics. Floor-bypass guard: the comparator takes brand-validated inputs (the brand is the proof of well-formedness); the only unbranded input is the DB floor (min_client_version), read through EventEnvelopeVersionSchema.parse(floor)never an as cast — so a malformed floor throws a loud data-integrity error instead of split(".").map(Number) yielding NaN (every NaN comparison is false → silent admit to read-write). A malformed-floor test pins that .parse throws. Tests: the comparator's unit tests co-locate in session-event.test.ts (where EventEnvelopeVersion is already tested), and attach-service.test.ts line 524's T3.2 placeholder (below-floor → readOnly=false) is deliberately flipped to readOnly=true (the T3.2 author staged it as a visible behavior change, not a silent regression).

Refs: ADR-017, Plan-003, Spec-003
Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com

Sawmonabo and others added 7 commits June 6, 2026 02:22
Narrow the capabilityupdate request `healthChanges.state` field from the 5-value
`NodeState` to the 2-value `RuntimeNodeHealthState` (online|degraded) — the same
self-reported-health enum `attach`/`heartbeat` already carry — so all three
daemon-self-report surfaces are consistent. The illegal `offline`/`revoked`/
`registering` self-report is now unconstructable at the schema boundary rather
than runtime-rejected: `offline` is server-derived liveness-death (the staleness
sweep, T3.6), `revoked` is an authority-issued trust decision (detach/admin,
T3.7), and `registering -> online` is daemon-declaration-driven (T3.9) — none is
daemon-self-reportable (I-003-2 least-privilege).

Swap the schema member NodeStateSchema -> RuntimeNodeHealthStateSchema; the
single-T input-inference cast stays (mechanism now identical to the heartbeat
request schema). Invert the shipped 5-value conformance test to reject, reconcile
every comment that called healthChanges.state the 5-value NodeState, and flip the
api-payload-contracts.md wire-shape mirror in lockstep so the doc never leads the
code. The response `state: NodeState` and the response schema are unchanged — the
request-narrow/response-broad asymmetry is intentional (daemon asserts narrow,
server reports broad). attach/heartbeat untouched.

Refs: Spec-003, Plan-003, ADR-014, ADR-018

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CREATE runtime_node_attachments + runtime_node_presence (Plan-003-owned)
as control-plane Postgres migration version 3, reproduced verbatim from
shared-postgres-schema.md (state CHECK, composite (node_id, session_id)
UNIQUE, partial-active UNIQUE for I-003-5 single-active-session, presence
PK). Register v3 in the migration runner's MIGRATIONS array after v2.

Co-located 0003-runtime-nodes.test.ts pins column set, both CHECK enums,
both unique indexes, presence PK, FK enforcement, and runner idempotency.
Forced cross-plan amendment: reconcile 5 applyMigrations-dependent tests
(runtime-node-upstream-anchors, migration-shape, 0002-session-invites,
presence-register-service, session-directory-service) for migration v3.

Refs: Plan-003 Phase 3 T3.1, Spec-003 line 91
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
AttachService (Querier-injected) admits a runtime node into a session via an
atomic upsert. P1: a NULL min_client_version floor admits every daemon version
with readOnly=false, landing the row at registering. P9: a cross-session
second-active attach trips the idx_node_attachments_active partial-unique
(23505, from either the INSERT or the offline-reactivating DO UPDATE) and is
refused with the typed RuntimeNodeAttachConflictException (I-003-5). P10: a
re-attach against a terminal revoked row updates zero rows and is refused with
RuntimeNodeAttachRevokedException; an offline row is reactivated instead.

Adds two registry-only wire codes (runtimenode.attach_conflict,
runtimenode.attach_revoked) to @ai-sidekicks/contracts and error-contracts.md
(HTTP 409, code+message only, no details, avoiding cross-session info-leak).
The tRPC catch-arm + errorFormatter that project these to the 409 envelope are
deferred to T3.4/T3.8 (the service throws at the service boundary).

Preserves I-003-3 (attach never mutates session_memberships, asserted by both
byte-identity and row-count) and I-003-5 (single active attachment, enforced by
the partial-unique index, not application TOCTOU).

Refs: Plan-003, Spec-003
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
HeartbeatService (Querier-injected, mirroring AttachService) owns the
runtime-node liveness axis on runtime_node_presence, distinct from the
attachment-slot axis that attach/detach own.

ingest upserts the presence row on the server clock with the daemon's 2-value
self-report (online|degraded). A sweep-demoted node that resumes heartbeating is
restored to online without passing through offline (P6 hysteresis recovery).

sweepStaleness is the server-derived demotion: one idempotent, transition-only
UPDATE...RETURNING that drives rows stale past 30s to degraded and past 60s to
offline (Spec-003 lines 59-61). It writes ONLY the coordination record and emits
no durable runtime_node.* event; those are V1.1-gated (ADR-017).
STALENESS_SWEEP_INTERVAL_MS is 5s, finer than the 15s cadence, bounding
detection lag to one sweep interval. The periodic scheduler and tRPC router are
deferred to T3.8.

14 tests cover ingest create/update, offline self-report rejection, the
degraded/offline demotions including the degraded->offline progression,
hysteresis and offline->online recovery, multi-node multiplicity, sweep
idempotency, and the presence-only write boundary.

Refs: ADR-017, Plan-003, Spec-003
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wire AttachService.#deriveReadOnly to compare the daemon's clientVersion
against the session's min_client_version floor. A daemon at or above the
floor is admitted read-write (P2); a daemon below the floor is admitted
read-only (P3) — it remains joined and reads succeed, never ejected
(I-003-1 / ADR-018 Decision #4). The VERSION_FLOOR_EXCEEDED write refusal
on a read-only daemon's subsequent write is T3.4's.

Author compareEventEnvelopeVersion in packages/contracts/src/event.ts: a
hand-rolled numeric MAJOR.MINOR tuple compare (not semver), the canonical
total ordering of the EventEnvelopeVersion value type. It lives in
contracts because both the control-plane floor gate and the daemon's
envelope negotiation (ADR-018 Decision #1) compare these values, and
contracts is their only shared ancestor — a consumer-local helper would
re-introduce the lexical "10" < "9" ordering bug. The raw DB floor is
parsed+branded at the service boundary (never an as-cast), so a malformed
floor throws at parse time instead of reaching the comparator as NaN.

Refs: ADR-018, Plan-003, Spec-003
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a P5 characterization block to the AttachService suite: two distinct
runtime nodes attach to one session as co-active rows, the sessions row
stays byte-for-byte unchanged, and no new session is created — multi-node
coexistence without changing session identity. The shipped T3.2 path
already satisfies P5 (the (node_id, session_id) conflict arbiter + the
per-node active index admit multi-node-per-session, and attach never
writes sessions), so this is test-only; no production change. Includes
the multi-node I-003-3 complement (session_memberships untouched).

Refs: Plan-003, Spec-003
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant