Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
976b649
docs(runtime): AcpxSupervisor registry design (#273)
windoliver May 29, 2026
4cebea1
docs(runtime): AcpxSupervisor implementation plan + OQ2 resolution (#…
windoliver May 29, 2026
0e4b04b
feat(runtime): AgentDisconnectedError + onDisconnect hook (#273)
windoliver May 29, 2026
8f44bf4
feat(runtime): detect unexpected acpx exit, reject in-flight send (#273)
windoliver May 29, 2026
6ce35d0
test(runtime): cover in-flight send() rejection on mid-turn disconnec…
windoliver May 29, 2026
ea060a5
test(runtime): extract in-process ACP launch helper (#273)
windoliver May 29, 2026
9c0e504
test(runtime): shared runtime adapter conformance matrix (#210, #273)
windoliver May 29, 2026
17fcf80
fix(runtime): MockRuntime.close removes session to match AgentRuntime…
windoliver May 29, 2026
0f30e63
feat(runtime): AcpxSupervisor registry core + matrix row (#273)
windoliver May 29, 2026
4c01efc
fix(runtime): supervisor listSessionEntities dedupe + split record co…
windoliver May 29, 2026
76a5dc7
docs(runtime): record shared-AcpRuntime decision + correct Phase 3 re…
windoliver May 29, 2026
46ba33b
feat(runtime): supervisor respawn + auto-resume + backoff (#273)
windoliver May 29, 2026
4280e42
test(runtime): seq continuity across turns + respawn (#273)
windoliver May 29, 2026
96ac065
feat(core): Resuming + SessionLost AgentTask conditions (#273)
windoliver May 29, 2026
577ae16
refactor(core): export upsertCondition for reuse (#273)
windoliver May 29, 2026
4ac2753
feat(server): wire supervisor respawn events to AgentTask conditions …
windoliver May 29, 2026
05116cb
feat(runtime): adopt AcpxSupervisor behind GROVE_SUPERVISOR flag (#273)
windoliver May 29, 2026
000a0d6
feat(server): activate supervisor respawn->AgentTask wiring + lease r…
windoliver May 29, 2026
4f4654e
feat(mcp): stamp agentTaskId into claim context for task linkage (#273)
windoliver May 30, 2026
ee4de41
feat(server): release task's leases on permanent agent death (#273)
windoliver May 30, 2026
e2e94e8
test(e2e): supervisor kill-PID respawn harness + runbook; harden onDe…
windoliver May 30, 2026
e3816b7
fix(types): resolve erasableSyntaxOnly + AgentTaskStore interface err…
windoliver May 30, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions docs/superpowers/plans/2026-05-29-acpx-supervisor-e2e-runbook.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# AcpxSupervisor respawn E2E — Runbook (#273, Phase 5.3)

> **Status: the E2E script `tests/e2e/acpx-supervisor-respawn-tmux.ts` is authored but NOT-YET-RUN.**
> It was written without executing against a live grove+Nexus stack. Treat its
> assertions as *intended*, not *verified*, until a real run goes green. Do not
> claim the respawn path is end-to-end validated on the strength of the script alone.

## What it proves (when run green)

A real acpx agent, spawned through `AcpxSupervisor` (`GROVE_SUPERVISOR=1`), is
`kill -9`'d mid-flight and the supervisor:

1. detects the death (Phase 1 `onDisconnect`),
2. respawns a fresh session within the shared runtime (Phase 3),
3. surfaces it on the bound `AgentTask` as a `SessionLost=True` condition while
keeping the task `Running` — not `Failed` (Phase 4 wiring),
4. binds a new `status.sessionId` (respawn, not the dead session).

All four are observable over `GET /api/agent-tasks/:id`.

## What it deliberately does NOT assert — and why

**Monotonic `seq` across the kill boundary is not wire-observable.** The
supervisor stamps `seq` on the in-process `AcpRuntimeEvent` sink
(`src/core/acpx-supervisor.ts` → `routeEvent`), which feeds `AcpSessionStore`
(TUI-local) only. It is **not** exposed on any HTTP/SSE endpoint today, so a
black-box E2E cannot see it. Seq continuity is instead covered by the unit test:

```
src/core/acpx-supervisor.respawn.test.ts
→ "seq is strictly increasing across turns and respawn (no reset)"
```

If wire-level seq observability is ever wanted, that's a separate follow-up
(expose `seq` on the agent-event/watch stream), out of scope for #273.

## Prerequisites

- Docker up with a healthy Nexus stack (this environment had several
`nexus-*-nexus-1 (healthy)` containers — `docker ps`), OR a local grove server
the script can reach on `SERVER_PORT`.
- A real ACP adapter installed for the chosen `runtime` (default `codex` →
`@zed-industries/codex-acp`). Without it, spawn fails before respawn is testable.
- `tmux` available.
- Valid agent auth so the spawned agent can reach Nexus (per project memory on
per-worktree key isolation — run from a fresh dir; the script does `git init`
+ a fresh temp workdir per run).

## Run

```bash
GROVE_SUPERVISOR=1 bun run tests/e2e/acpx-supervisor-respawn-tmux.ts
# flags: --keep (leave tmux+workdir), --attach (print attach cmd & wait), --timeout <ms>
```

## Two spots that will likely need a tweak on first real run

Both are marked `TODO(verify-on-stack)` in the script:

1. **acpx child-PID discovery** (`findAcpxChildPid`). The script `pgrep`s for
`codex-acp|claude-agent-acp|gemini .*--acp|acp` and kills the highest PID.
On a real box, run `pgrep -fl` once after the agent binds, confirm the exact
adapter argv, and pin the pattern (and ideally scope to descendants of the
grove server PID so a sibling test's agent isn't killed).

2. **AgentTask PUT schema + readiness**. The script PUTs
`{ id, worktree, runtime, role, prompt, dependsOn, generation, createdAt }`.
Confirm against `src/server/routes/agent-tasks.ts` (`putAgentTaskSpec`) and
the `AgentTaskSpecRecord` shape; adjust fields if the server rejects the body.
Also confirm the bind actually produces `phase: "Running"` + a `sessionId`
(the controller must be enabled — the script sets `GROVE_TASK_CONTROLLER=1`).

## Validate against Nexus, not local SQLite

Per project memory (`feedback_e2e_use_nexus`): production reads/writes go to
Nexus. Point the server at the running Nexus stack (env/keys) rather than the
SQLite fallback, and confirm the `AgentTask` you poll is the Nexus-backed one.

## Done = green + observed

Per `feedback_no_workarounds` / `feedback_no_e2e_shortcuts`: the task is only
"E2E validated" once this script runs and prints its `PASS:` line on a real
stack, with the `SessionLost` condition + sessionId change actually observed.
Until then this is a prepared harness, not a passing test.
Loading
Loading