feat(runtime): AcpxRuntime supervisor registry with respawn + auto-resume (#273)#465
Merged
Conversation
… contract (#273) Delete the session map entry on close() instead of setting status="stopped", so listSessions() no longer returns closed sessions — matching the real AcpRuntime behavior and the AgentRuntime interface doc ("List all active sessions"). Updated the one collateral test in agent-runtime.test.ts that was asserting the old non-contract behavior; matrix now 12/0.
Add `agentTaskId?: string` to McpDeps and thread GROVE_AGENT_TASK_ID from the stdio MCP server env into every grove_claim call so the supervisor can release the agent's leases when its AgentTask permanently dies.
Replace the onDead TODO no-op in createTaskControllerWiring with a real implementation: when a slot permanently dies, list all active claims and release those whose context.agentTaskId matches the dead slotId.
…ad lease release (#273)
…ors blocking pre-push typecheck - AgentDisconnectedError: split readonly parameter property into separate field decl + constructor assignment (erasableSyntaxOnly compat) - task-controller-wiring.test.ts: add missing putAgentTaskSpec/listAgentTasks to fakeTaskStore; fix onRespawnCalls type to AcpxRespawnEvent (not Parameters<> nesting); import AcpxRespawnEvent; fix biome import order + empty block warnings
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #273.
Builds
AcpxSupervisor— a grove-owned registry of acpx runtime handles with death-detection, respawn, best-effort auto-resume, andSessionLostsurfacing onAgentTask, adopted behind a flag. Also delivers the #210 runtime-adapter conformance matrix.Phases (each TDD'd, two-stage reviewed)
AgentDisconnectedError+ optionalonDisconnectonAgentRuntime;AcpRuntimedetects an unexpected child exit (newonExitonLaunchResult), marks the sessioncrashed, and rejects the in-flightsend()withstopReason: "error". Intentionalclose()never fires it.AcpxSupervisor(ensure/get/stop/list, idempotent + single-flight,AgentRuntimefaçade). New sharedrunRuntimeAdapterMatrix(label, factory)runs Mock + Acp + Supervisor through one conformance suite. (Caught + fixed a realMockRuntime.close()contract bug along the way.)running → resuming → running, fresh session within the shared runtime (newwireSessionId, never the dead one — bug(acp): parser rejects all acpx session/update frames as _sessionMismatch #319), exponential backoff +maxRespawns → dead. Monotonic per-slotseqsurvives the respawn boundary (no reset).SessionLostonAgentTask. NewResuming/SessionLostcondition types; a thin wiring layer (src/server/acpx-supervisor-wiring.ts) translates respawn events into task conditions — a transient blip staysRunning, a permanent death goesFailed.selectRuntimewraps in the supervisor behindGROVE_SUPERVISOR=1; the server activates the respawn→task wiring. Claims carrycontext.agentTaskId(stamped via MCP fromGROVE_AGENT_TASK_ID) soonDeadreleases exactly that task's leases instead of stranding them.Key design decisions (the issue is ahead of current code in several ways)
AcpRuntime, notAcpxRuntime; the registry holds it.AcpRuntimefor all slots, not one per slot. grove'sAcpRuntimealready spawns one adapter subprocess perspawn(), so process-per-slot isolation (the issue's intent) is preserved while a shared client gives a single monotonic id counter and one event sink to demux. Recorded in the design doc.session/load(upstream-unsupported; consistent with grove-direct-acp).SessionLostis always surfaced;seq/acpxRecordIdare forward-compatible ifsession/loadever lands.seqlives at the eventSink, notpublishTurnToNexus(which has no production caller today).Design + plan:
docs/superpowers/specs/2026-05-29-acpx-supervisor-design.md,docs/superpowers/plans/2026-05-29-acpx-supervisor.md.Verification
bun testexit is a pre-existing coverage threshold onuse-text-input.ts(exists onmain), unrelated to this branch.bun run buildgreen (afterbun installfor the ask-user SDK); typecheck shows only the pre-existingpackages/ask-user@anthropic-ai/sdkerrors.Deferred / called out honestly
tests/e2e/acpx-supervisor-respawn-tmux.ts+ runbook). It needs a live grove+Nexus stack; two spots are markedTODO(verify-on-stack)(acpx child-PID discovery, AgentTask PUT schema/readiness). Do not treat the respawn path as E2E-validated until it runs green on a stack.seqis not wire-observable today — it lives on the in-process eventSink (→AcpSessionStore, TUI-local), so the E2E asserts AgentTask phase +SessionLostcondition + sessionId-change instead. Seq continuity is covered by the unit testacpx-supervisor.respawn.test.ts. Exposing seq over HTTP/SSE would be a separate follow-up.🤖 Generated with Claude Code