From e874ce1ebc4f35825714f7a39c07717e1e92dfc2 Mon Sep 17 00:00:00 2001
From: Kostadis <kostadis@gmail.com>
Date: Sat, 27 Jun 2026 15:20:24 -0700
Subject: [PATCH 1/3] Add CampaignGenerator constitution v1.1.0

Sister doctrine to the mneme constitution, governing CG's LLM rendering
pipeline. Nine principles, each naming the anti-pattern it kills:

I.   Disk is Truth, the Model is a Draft   (Optimistic Lies)
II.  The Human Checkpoint is Non-Negotiable (Error Compounding)
III. Retrieval and Render are Separated      (Renderer Scope Decisions)
IV.  Verbatim is Sacred                       (Hallucinated Dialogue)
V.   One Seam per Boundary                    (Fragmented Integration)
VI.  CLI is the Engine, UI is a Face          (Split-Brain)
VII. Extract Once, Synthesize Deliberately    (Depth Regression)
VIII. State is Discoverable                    (Opacity / Tribal State)
IX.  The UI Mechanizes; Claude Converses      (The Walled Garden)

Plus Architecture is Destiny (token/precision economics), Authority &
the Human Checkpoint (Spec Kit plans are drafts), and Governance
(I & II outrank all; semver amendments).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .specify/memory/constitution.md | 122 ++++++++++++++++++++++++++++++++
 1 file changed, 122 insertions(+)
 create mode 100644 .specify/memory/constitution.md

diff --git a/.specify/memory/constitution.md b/.specify/memory/constitution.md
new file mode 100644
index 0000000..fd92d99
--- /dev/null
+++ b/.specify/memory/constitution.md
@@ -0,0 +1,122 @@
+# CampaignGenerator Constitution
+
+This is a **sister doctrine** to the [mneme constitution](https://github.com/kostadis/mneme/blob/main/.specify/memory/constitution.md). Both descend from the Kostadis architectural doctrine; both name the anti-pattern each principle kills. The division of labor between them:
+
+- **mneme** governs the *platform's state* — identity, databases, reconciliation, the DGX integration plane. Its enemy is corrupted or fragmented infrastructure state.
+- **This constitution** governs *CampaignGenerator's pipeline* — how an LLM is used to render trustworthy campaign artifacts. Its enemy is the precision failure that breaks the fourth wall at the table.
+
+Where the two overlap (Optimistic Lies, Split-Brain, Fragmented State), the anti-pattern names are shared deliberately. CampaignGenerator is one actor in a larger flow that also includes Zoom, gm-assist, MemPalace, the Anthropic API, and a set of Claude skills that live outside this repo. This constitution binds general principles to CG via concrete clauses; a principle without a clause that names a file, a test, or a workspace path is aspiration, not law.
+
+## Core Principles
+
+### I. Disk is Truth, the Model is a Draft
+
+Markdown and YAML files on disk are the single source of truth. Every database in the system — the MemPalace palace, the vector DB behind it, `quote_ledger.db`, the rpg-library index — is an *index over* or *cache of* that truth, never the truth itself. A database may be deleted and rebuilt from disk; disk may never be rebuilt from a database.
+
+LLM output is a **draft** until a human has reviewed it. Generated text is not fact, not canon, and not input to the next step until a human has read it and let it through. The rough extraction pass is the ceiling of what the model can do unaided, not the floor.
+
+*Kills: Optimistic Lies* — treating a confident-looking generated artifact as established fact.
+
+### II. The Human Checkpoint is Non-Negotiable
+
+LLMs render; humans decide. Scope (what belongs where), ordering (what came before what), and attribution (who said or did what) are **precision decisions** and they require a human checkpoint. No LLM output may feed another LLM call across a precision boundary without a human gate in between.
+
+Before any LLM call is added, state what decision it removes from the human. If the answer is "a precision decision, fed automatically downstream," a human checkpoint is mandatory before the next call. If the answer is "none — the human reviews and corrects before it feeds anything," the call is safe.
+
+*Kills: Error Compounding* — one call's silent 10% error inherited and amplified by the next.
+
+### III. Retrieval and Render are Separated
+
+A function retrieves or it renders — never both. This is enforced by `tests/test_retrieve_render_isolation.py`, which fails the build if any function body mixes a retrieval call (`retrieve`, `search_hierarchical`, `rpg_search`) with a render call (`stream_api`, `call_api`). Do not bypass the test; fix the structure.
+
+Render pipelines (`prep.py`, `sd_narrate.py`, `planning.py`) refuse to run unless a human has approved `docs/dossier_proposal.md`. The choke point is `proposal_loader.py:require_approved_proposal`. Deciding *what content is in scope* is the human's; turning approved scope into prose is the model's.
+
+*Kills: the Renderer Making Scope Decisions* — letting the prose pass also decide what's in the world.
+
+### IV. Verbatim is Sacred
+
+Quotes and transcript records are reproduced exactly, never paraphrased and never invented. The Zoom VTT is the only record of "what was said" at the table; gm-assist is the authoritative record of "what happened in what order." Neither may be embellished by a model that can see past its boundary.
+
+The cost of violating this is not a bad diff — it is a player at the table asking why an NPC said something it never said, or why an action that should have rippled through the world quietly disappeared. A precision failure here breaks the fourth wall. That is the most expensive failure the system can produce.
+
+*Kills: Hallucinated Dialogue* — fabricated or "improved" verbatim content.
+
+### V. One Seam per Boundary
+
+Every external dependency is reached through exactly one file, and that file is reached one direction:
+
+- Anthropic API → `campaignlib.py` (the only module that imports `anthropic`; `make_client` / `stream_api` / `call_api` are the surface, and they already retry)
+- MemPalace → `mempalace_client.py`
+- DGX / local LLM per-model behavior → `dgxlib`
+- CampaignGenerator capability exposed *outward* to other Claude sessions → `mcp_server.py`
+
+When you need to change how CG talks to X, there must be exactly one file to open. New integration code that scatters `import anthropic` or talks to MemPalace outside its client is a constitutional violation, not a style nit.
+
+*Kills: Fragmented Integration* — the same boundary crossed from five places that drift apart.
+
+### VI. CLI is the Engine, UI is a Face
+
+Every capability is a CLI tool first. The FastAPI server never reimplements pipeline logic — it shells out to CLI scripts via `server/subprocess_runner.py` and streams their output as Server-Sent Events. Fixing a bug in a script fixes it in the UI; exposing a flag means adding it to the corresponding `_build_*_cmd()` in the router, never reimplementing the behavior in the router.
+
+*Kills: Split-Brain* — CLI and UI growing two divergent implementations of the same operation.
+
+### VII. Extract Once, Synthesize Deliberately
+
+The grounding-doc generators follow one shape: chunk the input, extract per chunk, cache the extractions on disk, then synthesize one document from the pile (`run_extract_pipeline` + `run_synthesize_pipeline` in `campaignlib.py`). Re-runs reuse cached extractions.
+
+Do not collapse passes that each need depth. The killed chapter-extract consolidation is the cautionary tale: merging three extract passes into one per-chapter pass regressed all three grounding docs, because breadth in one pass came at the cost of depth in each. Prefer more, narrower passes over one wide pass that does each job worse.
+
+*Kills: Depth Regression* — premature consolidation that trades per-job depth for fewer calls.
+
+### VIII. State is Discoverable
+
+The campaign workspace is self-describing. Which pipeline stage a session is in, what artifacts exist, what is still pending — all of it is discoverable from disk (the `summaries/{session}/` layout, the presence or absence of each stage's output file), not held in the operator's memory or in a skill's head. A question the system surfaces ("this scene has no approved quotes yet") matters as much as an answer it gives.
+
+When the flow falls back to a skill or a manual step, that seam should be *visible* — an artifact on disk or a state the UI can represent — not tribal knowledge about which command to run next.
+
+*Kills: Opacity / Tribal State* — the system's real status living only in the operator's head.
+
+### IX. The UI Mechanizes; Claude Converses
+
+UI workflows exist to make the *mechanical* parts of a pipeline easier — to walk a multi-step process one step at a time, run each step, and show what came out. They do **not** replace the Claude chat interface, and they are not the place where the thinking happens. The judgment between steps — reviewing a draft, deciding scope, correcting an attribution, choosing what to promote — happens in a Claude conversation or at the CLI. The UI's job is to remove the friction of *remembering and invoking* the steps in order, never to absorb the work that happens between them.
+
+The expectation is explicit: between any two UI steps, the operator may drop to the CLI or to a Claude chat to do the real work, and lose nothing by doing so. A UI step that cannot be performed equivalently at the CLI is a step that has stolen judgment from the human.
+
+Files are the interchange. Every step reads files and writes files; the file on disk is how information passes between the UI, the CLI, and the chat, and how all three stay consistent. The UI must never hold pipeline state that exists only in the browser — if a step produced something, it produced a file, and that file is equally visible to the CLI and to Claude. (This is Principles I, VI, and VIII applied to the UI surface: the file is the truth, the CLI is the engine, and the state is discoverable — so the human is never trapped inside the UI.)
+
+The ensemble grounding-doc workflow is the canonical shape: the UI may step you Stage 1 → 2 → 3, but the `--list` scope review, the `aliases.json` edit, and the `diff`-before-promote happen at the CLI or in chat, and every stage hands off through a file (`merged.json`, `state_dossiers/*.md`, `*_draft.md`). The UI mechanizes the sequence; it does not synthesize the campaign.
+
+*Kills: The Walled Garden* — a UI that swallows the whole workflow, hides the files, and locks the human out of the conversation and the CLI.
+
+## Architecture is Destiny
+
+Bad architectural choices are liabilities, and in this system the currency is twofold: **token spend** and **precision failures at the table**.
+
+- **Token spend** is standing cost. Every LLM call must justify itself; the ensemble/Spark path exists precisely so that *extraction* can be made ~free locally and the API is spent only on *synthesis*. Caching (the scene-extract system-prefix cache, the enhance-summary cached prefix, the Batch API at 50% off) is not an optimization to add later — it is how the architecture stays affordable.
+- **Precision failures** are the catastrophic cost. A token wasted is recoverable; a fabricated quote that reaches the table is not. This is why Principles I–IV exist and why they outrank convenience. The human checkpoint is not friction the architecture should engineer away — it is the load-bearing wall.
+
+Every new database, daemon, cache, or LLM call is a recurring tax. Justify the tax against the truth on disk and the human gate, or do not add it.
+
+## Authority & the Human Checkpoint
+
+Humans author structure, identity, and schema. The LLM — including Spec Kit itself — renders within that boundary; it never decides it.
+
+- Spec Kit `/speckit-*` plans, specs, and tasks are **drafts**. They are reviewed against this constitution before they feed implementation.
+- A generated spec that decides scope, ordering, or attribution autonomously is exactly the precision-decision-without-a-checkpoint that Principle II forbids — catch it at review.
+- Good pattern: LLM extracts → human reviews and imposes structure → LLM renders inside that structure. Bad pattern: LLM extracts → LLM structures → LLM renders. The second compounds errors silently and is prohibited here.
+
+## Governance
+
+This constitution supersedes conflicting specs, plans, and tasks. A conflict requires written justification or an amendment — not a silent override.
+
+- **Principle precedence:** I (Disk is Truth) and II (The Human Checkpoint) outrank all other principles. When a convenience, a performance gain, or a cleaner abstraction collides with truth-on-disk or the human gate, truth and the gate win.
+- Every spec and plan is tested, by name, against all nine principles before implementation begins.
+- Amendments require a stated rationale, a version bump, and a check that dependent templates and docs stay in sync.
+- Semantic versioning of this document:
+  - **MAJOR** — a principle removed or redefined in a backward-incompatible way.
+  - **MINOR** — a new principle or materially expanded section.
+  - **PATCH** — clarifications, wording, non-semantic refinements.
+
+Runtime development guidance lives in `CLAUDE.md` (this repo) and `~/.claude/CLAUDE.md` (global). Where those and this constitution agree, this is the canonical statement; where they drift, amend one to match the other.
+
+**Version**: 1.1.0 | **Ratified**: 2026-06-27 | **Last Amended**: 2026-06-27

From 460ebfe8c4fefb20c8e5d9eec9ce12b583da99e2 Mon Sep 17 00:00:00 2001
From: Kostadis <kostadis@gmail.com>
Date: Sat, 27 Jun 2026 23:24:03 -0700
Subject: [PATCH 2/3] feat(ensemble): UI workflow + OpenRouter backend +
 explicit chapter selection
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Turn the ensemble grounding-doc CLI workflow (docs/cli/ensemble_workflow.md)
into a stepped UI page (Setup → Extract → Bundle → Synthesize), add OpenRouter
as a per-stage LLM backend through the single campaignlib seam, and make chapter
selection explicit. The existing Anthropic /grounding path is untouched.

Built via the Spec Kit flow (specs/001-ensemble-workflow-ui: spec/plan/research/
tasks/contracts/quickstart).

Backend / seam (Constitution V):
- campaignlib/api: _OpenRouterClient branch in make_client(backend="openrouter");
  reasoning-off mapping; _require_nonempty guards stream_api/call_api against
  empty model output. OpenRouter is constructed only inside campaignlib/api.
- Uniform add_backend_args / client_from_args; --backend/--endpoint added to the
  four synthesis scripts. Default backend=anthropic is byte-identical to before.

UI (Constitution VI/IX):
- New /ensemble route + nav entry; EnsembleWorkflow shell with disk-derived
  status; Setup, Extract, Bundle (scope + alias gates), Synthesize (diff-before-
  promote). server/routers/ensemble.py shells out to the CLI and exposes
  disk-derived status; it issues no retrieval/render calls.

Chapter picker + Constitution Principle X (operator-elevated):
- "Selection is Explicit; There is No Silent 'All'" added to the constitution
  (v1.1.0 -> 1.2.0, MINOR). New ChapterPicker.vue (Resolve glob, Select all /
  none / only, natural sort, extracted/pending badges).
- ensemble_batch.py --chapters is now nargs="+" (unions globs/paths); the engine
  gains the capability, the UI mechanizes it.
- chapters_selected stores the literal chosen set; an empty selection is refused
  (no glob fallback) by GET /run/extract and the disabled Run button.

Tests: +new suites (openrouter seam, ensemble status/gates/chapters, batch nargs).
Full suite 940 passed; isolation guard green; frontend builds clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .specify/extensions.yml                       |  23 +
 .specify/extensions/.registry                 |  19 +
 .specify/extensions/agent-context/README.md   |  66 ++
 .../agent-context/agent-context-config.yml    |   5 +
 .../commands/speckit.agent-context.update.md  |  27 +
 .../extensions/agent-context/extension.yml    |  34 +
 .../scripts/bash/update-agent-context.sh      | 337 ++++++++++
 .../powershell/update-agent-context.ps1       | 417 ++++++++++++
 .specify/feature.json                         |   3 +
 .specify/init-options.json                    |   9 +
 .specify/integration.json                     |  15 +
 .specify/integrations/claude.manifest.json    |  17 +
 .specify/integrations/speckit.manifest.json   |  17 +
 .specify/memory/constitution.md               |  16 +-
 .specify/scripts/bash/check-prerequisites.sh  | 189 ++++++
 .specify/scripts/bash/common.sh               | 619 ++++++++++++++++++
 .specify/scripts/bash/create-new-feature.sh   | 299 +++++++++
 .specify/scripts/bash/setup-plan.sh           |  84 +++
 .specify/scripts/bash/setup-tasks.sh          |  91 +++
 .specify/templates/checklist-template.md      |  40 ++
 .specify/templates/constitution-template.md   |  50 ++
 .specify/templates/plan-template.md           | 113 ++++
 .specify/templates/spec-template.md           | 131 ++++
 .specify/templates/tasks-template.md          | 252 +++++++
 .specify/workflows/speckit/workflow.yml       |  77 +++
 .specify/workflows/workflow-registry.json     |  13 +
 CLAUDE.md                                     |  15 +
 campaign_state.py                             |   7 +-
 campaignlib/__init__.py                       |   7 +-
 campaignlib/api/backends.py                   |  72 ++
 campaignlib/api/client.py                     |  69 +-
 docs/cli/ensemble_workflow.md                 |   8 +
 ensemble_batch.py                             |  15 +-
 frontend/src/components/layout/AppSidebar.vue |   6 +
 frontend/src/router.ts                        |  27 +
 frontend/src/views/EnsembleWorkflow.vue       |  75 +++
 frontend/src/views/ensemble/ChapterPicker.vue | 154 +++++
 .../src/views/ensemble/EnsembleBundle.vue     | 156 +++++
 .../src/views/ensemble/EnsembleExtract.vue    |  82 +++
 frontend/src/views/ensemble/EnsembleSetup.vue | 112 ++++
 .../src/views/ensemble/EnsembleSynthesize.vue |  99 +++
 frontend/src/views/ensemble/useEnsembleRun.ts |  85 +++
 party.py                                      |   7 +-
 planning.py                                   |   7 +-
 server/config_models.py                       |  39 ++
 server/main.py                                |   3 +-
 server/routers/ensemble.py                    | 432 ++++++++++++
 .../checklists/requirements.md                |  44 ++
 .../001-ensemble-workflow-ui/contracts/api.md |  83 +++
 .../001-ensemble-workflow-ui/contracts/cli.md |  69 ++
 specs/001-ensemble-workflow-ui/data-model.md  | 125 ++++
 specs/001-ensemble-workflow-ui/plan.md        | 126 ++++
 specs/001-ensemble-workflow-ui/quickstart.md  | 104 +++
 specs/001-ensemble-workflow-ui/research.md    | 105 +++
 specs/001-ensemble-workflow-ui/spec.md        | 161 +++++
 specs/001-ensemble-workflow-ui/tasks.md       | 286 ++++++++
 synthesise_world_state.py                     |  10 +-
 tests/test_ensemble_batch_chapters.py         |  16 +
 tests/test_ensemble_chapters.py               |  81 +++
 tests/test_ensemble_gates.py                  |  62 ++
 tests/test_ensemble_status.py                 |  34 +
 tests/test_openrouter_seam.py                 | 129 ++++
 62 files changed, 5851 insertions(+), 24 deletions(-)
 create mode 100644 .specify/extensions.yml
 create mode 100644 .specify/extensions/.registry
 create mode 100644 .specify/extensions/agent-context/README.md
 create mode 100644 .specify/extensions/agent-context/agent-context-config.yml
 create mode 100644 .specify/extensions/agent-context/commands/speckit.agent-context.update.md
 create mode 100644 .specify/extensions/agent-context/extension.yml
 create mode 100755 .specify/extensions/agent-context/scripts/bash/update-agent-context.sh
 create mode 100644 .specify/extensions/agent-context/scripts/powershell/update-agent-context.ps1
 create mode 100644 .specify/feature.json
 create mode 100644 .specify/init-options.json
 create mode 100644 .specify/integration.json
 create mode 100644 .specify/integrations/claude.manifest.json
 create mode 100644 .specify/integrations/speckit.manifest.json
 create mode 100755 .specify/scripts/bash/check-prerequisites.sh
 create mode 100755 .specify/scripts/bash/common.sh
 create mode 100755 .specify/scripts/bash/create-new-feature.sh
 create mode 100755 .specify/scripts/bash/setup-plan.sh
 create mode 100755 .specify/scripts/bash/setup-tasks.sh
 create mode 100644 .specify/templates/checklist-template.md
 create mode 100644 .specify/templates/constitution-template.md
 create mode 100644 .specify/templates/plan-template.md
 create mode 100644 .specify/templates/spec-template.md
 create mode 100644 .specify/templates/tasks-template.md
 create mode 100644 .specify/workflows/speckit/workflow.yml
 create mode 100644 .specify/workflows/workflow-registry.json
 create mode 100644 frontend/src/views/EnsembleWorkflow.vue
 create mode 100644 frontend/src/views/ensemble/ChapterPicker.vue
 create mode 100644 frontend/src/views/ensemble/EnsembleBundle.vue
 create mode 100644 frontend/src/views/ensemble/EnsembleExtract.vue
 create mode 100644 frontend/src/views/ensemble/EnsembleSetup.vue
 create mode 100644 frontend/src/views/ensemble/EnsembleSynthesize.vue
 create mode 100644 frontend/src/views/ensemble/useEnsembleRun.ts
 create mode 100644 server/routers/ensemble.py
 create mode 100644 specs/001-ensemble-workflow-ui/checklists/requirements.md
 create mode 100644 specs/001-ensemble-workflow-ui/contracts/api.md
 create mode 100644 specs/001-ensemble-workflow-ui/contracts/cli.md
 create mode 100644 specs/001-ensemble-workflow-ui/data-model.md
 create mode 100644 specs/001-ensemble-workflow-ui/plan.md
 create mode 100644 specs/001-ensemble-workflow-ui/quickstart.md
 create mode 100644 specs/001-ensemble-workflow-ui/research.md
 create mode 100644 specs/001-ensemble-workflow-ui/spec.md
 create mode 100644 specs/001-ensemble-workflow-ui/tasks.md
 create mode 100644 tests/test_ensemble_batch_chapters.py
 create mode 100644 tests/test_ensemble_chapters.py
 create mode 100644 tests/test_ensemble_gates.py
 create mode 100644 tests/test_ensemble_status.py
 create mode 100644 tests/test_openrouter_seam.py

diff --git a/.specify/extensions.yml b/.specify/extensions.yml
new file mode 100644
index 0000000..5415714
--- /dev/null
+++ b/.specify/extensions.yml
@@ -0,0 +1,23 @@
+installed:
+- agent-context
+settings:
+  auto_execute_hooks: true
+hooks:
+  after_specify:
+  - extension: agent-context
+    command: speckit.agent-context.update
+    enabled: true
+    optional: true
+    priority: 10
+    prompt: Execute speckit.agent-context.update?
+    description: Refresh agent context after specification
+    condition: null
+  after_plan:
+  - extension: agent-context
+    command: speckit.agent-context.update
+    enabled: true
+    optional: true
+    priority: 10
+    prompt: Execute speckit.agent-context.update?
+    description: Refresh agent context after planning
+    condition: null
diff --git a/.specify/extensions/.registry b/.specify/extensions/.registry
new file mode 100644
index 0000000..db05440
--- /dev/null
+++ b/.specify/extensions/.registry
@@ -0,0 +1,19 @@
+{
+  "schema_version": "1.0",
+  "extensions": {
+    "agent-context": {
+      "version": "1.0.0",
+      "source": "local",
+      "manifest_hash": "sha256:9a1dc02d2d0139bb03860392ecacef79183be2c442feda2f9ccaa4e5907b1e47",
+      "enabled": true,
+      "priority": 10,
+      "registered_commands": {
+        "claude": [
+          "speckit.agent-context.update"
+        ]
+      },
+      "registered_skills": [],
+      "installed_at": "2026-06-27T21:48:08.109321+00:00"
+    }
+  }
+}
\ No newline at end of file
diff --git a/.specify/extensions/agent-context/README.md b/.specify/extensions/agent-context/README.md
new file mode 100644
index 0000000..091e2b4
--- /dev/null
+++ b/.specify/extensions/agent-context/README.md
@@ -0,0 +1,66 @@
+# Coding Agent Context Extension
+
+This bundled extension manages the **coding agent context/instruction file** (e.g. `CLAUDE.md`, `.github/copilot-instructions.md`, `AGENTS.md`, `GEMINI.md`, …) for the active integration.
+
+It owns the lifecycle of the managed section delimited by the configurable start/end markers (defaults: `<!-- SPECKIT START -->` / `<!-- SPECKIT END -->`).
+
+## Why an extension?
+
+Not every Spec Kit user wants Spec Kit to write into the coding agent's context file. Extracting this behavior into a dedicated extension lets users:
+
+- **Opt out** entirely with `specify extension disable agent-context` — Spec Kit will then never create or modify the agent context file.
+- **Customize the markers** by editing `.specify/extensions/agent-context/agent-context-config.yml` — both the Python layer and the bundled scripts honor the same `context_markers` value.
+- **Synchronize multiple agent anchors** by setting `context_files` when a project intentionally uses more than one coding agent context file, such as `AGENTS.md` and `CLAUDE.md`.
+- **Refresh on demand** with `/speckit.agent-context.update`, or automatically through the hooks declared in `extension.yml` (`after_specify`, `after_plan`).
+
+## Commands
+
+| Command | Description |
+|---------|-------------|
+| `speckit.agent-context.update` | Refresh the managed section in the agent context file with the current plan path. |
+
+## Configuration
+
+All configuration flows through the extension's own config file at
+`.specify/extensions/agent-context/agent-context-config.yml`:
+
+```yaml
+# Path to the coding agent context file managed by this extension
+context_file: CLAUDE.md
+
+# Optional list of coding agent context files to manage together.
+# When non-empty, this takes precedence over context_file.
+context_files:
+  - AGENTS.md
+  - CLAUDE.md
+
+# Delimiters for the managed Spec Kit section
+context_markers:
+  start: "<!-- SPECKIT START -->"
+  end: "<!-- SPECKIT END -->"
+```
+
+- `context_file` — the project-relative path to the coding agent context file, written by `specify init` and `specify integration install`.
+- `context_files` — optional project-relative paths to multiple coding agent context files. When non-empty, the list takes precedence over `context_file`. Absolute paths, backslash separators, and `..` path segments are rejected.
+- `context_markers.start` / `.end` — the delimiters around the managed section. Edit these to use custom markers.
+
+## Requirements
+
+The bundled update scripts require **Python 3** with **PyYAML** for YAML/upsert processing (PowerShell can also use `ConvertFrom-Yaml` when available).
+
+PyYAML ships with the `specify` CLI and is normally available via the same `python3` interpreter. If a hook reports *"PyYAML is required … not available in the current Python environment"*, it means the system `python3` differs from the one used to install Spec Kit. To resolve, run:
+
+```bash
+pip install pyyaml
+# or target the specific interpreter Spec Kit uses:
+/path/to/speckit-python -m pip install pyyaml
+```
+
+## Disable
+
+```bash
+specify extension disable agent-context
+```
+
+When disabled, Spec Kit skips context file creation, updates, and removal (the gates are inside `upsert_context_section()` and `remove_context_section()`).
+Disabled projects also ignore stale `context_files` values during command rendering so disabling the extension remains a complete opt-out.
diff --git a/.specify/extensions/agent-context/agent-context-config.yml b/.specify/extensions/agent-context/agent-context-config.yml
new file mode 100644
index 0000000..d55ff7c
--- /dev/null
+++ b/.specify/extensions/agent-context/agent-context-config.yml
@@ -0,0 +1,5 @@
+context_file: CLAUDE.md
+context_files: []
+context_markers:
+  start: <!-- SPECKIT START -->
+  end: <!-- SPECKIT END -->
diff --git a/.specify/extensions/agent-context/commands/speckit.agent-context.update.md b/.specify/extensions/agent-context/commands/speckit.agent-context.update.md
new file mode 100644
index 0000000..a654eb5
--- /dev/null
+++ b/.specify/extensions/agent-context/commands/speckit.agent-context.update.md
@@ -0,0 +1,27 @@
+---
+description: "Refresh the managed Spec Kit section in coding agent context file(s)"
+---
+
+# Update Coding Agent Context
+
+Refresh the managed Spec Kit section inside the active coding agent's context/instruction file (e.g. `CLAUDE.md`, `.github/copilot-instructions.md`, `AGENTS.md`).
+
+## Behavior
+
+The script reads the agent-context extension config at
+`.specify/extensions/agent-context/agent-context-config.yml` to discover:
+
+- `context_file` — the path of the coding agent context file to manage.
+- `context_files` — optional project-relative paths for multiple coding agent context files. When non-empty, the script updates each listed file and the list takes precedence over `context_file`.
+- `context_markers.start` / `.end` — the delimiters surrounding the managed section. Defaults to `<!-- SPECKIT START -->` and `<!-- SPECKIT END -->` when the field is missing.
+
+It then creates, replaces, or appends the managed block so that the section points at the most recent plan path when one can be discovered (`specs/<feature>/plan.md`).
+
+If `context_files` and `context_file` are empty, the command reports nothing to do and exits successfully. Context file paths must stay project-relative; absolute paths, Windows drive paths, backslash separators, and `..` path segments are rejected.
+
+## Execution
+
+- **Bash**: `.specify/extensions/agent-context/scripts/bash/update-agent-context.sh [plan_path]`
+- **PowerShell**: `.specify/extensions/agent-context/scripts/powershell/update-agent-context.ps1 [plan_path]`
+
+When `plan_path` is omitted, the script auto-detects the most recently modified `specs/*/plan.md`.
diff --git a/.specify/extensions/agent-context/extension.yml b/.specify/extensions/agent-context/extension.yml
new file mode 100644
index 0000000..191069e
--- /dev/null
+++ b/.specify/extensions/agent-context/extension.yml
@@ -0,0 +1,34 @@
+schema_version: "1.0"
+
+extension:
+  id: agent-context
+  name: "Coding Agent Context"
+  version: "1.0.0"
+  description: "Manages coding agent context/instruction files (e.g., CLAUDE.md, copilot-instructions.md) with project-specific plan references and configurable markers"
+  author: spec-kit-core
+  repository: https://github.com/github/spec-kit
+  license: MIT
+
+requires:
+  speckit_version: ">=0.2.0"
+
+provides:
+  commands:
+    - name: speckit.agent-context.update
+      file: commands/speckit.agent-context.update.md
+      description: "Refresh the managed Spec Kit section in the coding agent context file"
+
+hooks:
+  after_specify:
+    command: speckit.agent-context.update
+    optional: true
+    description: "Refresh agent context after specification"
+  after_plan:
+    command: speckit.agent-context.update
+    optional: true
+    description: "Refresh agent context after planning"
+
+tags:
+  - "agent"
+  - "context"
+  - "core"
diff --git a/.specify/extensions/agent-context/scripts/bash/update-agent-context.sh b/.specify/extensions/agent-context/scripts/bash/update-agent-context.sh
new file mode 100755
index 0000000..64e1bae
--- /dev/null
+++ b/.specify/extensions/agent-context/scripts/bash/update-agent-context.sh
@@ -0,0 +1,337 @@
+#!/usr/bin/env bash
+# update-agent-context.sh
+#
+# Refresh the managed Spec Kit section in the coding agent's context file(s)
+# (e.g. CLAUDE.md, .github/copilot-instructions.md, AGENTS.md).
+#
+# Reads `context_files` or `context_file`, plus `context_markers.{start,end}`, from the
+# agent-context extension config:
+#   .specify/extensions/agent-context/agent-context-config.yml
+#
+# Usage: update-agent-context.sh [plan_path]
+#
+# When `plan_path` is omitted, the script derives it from `.specify/feature.json`
+# (written by /speckit-specify). Falls back to the most recently modified
+# `specs/*/plan.md` only when feature.json is absent or its plan does not exist yet.
+
+set -euo pipefail
+
+PROJECT_ROOT="$(pwd)"
+EXT_CONFIG="$PROJECT_ROOT/.specify/extensions/agent-context/agent-context-config.yml"
+DEFAULT_START="<!-- SPECKIT START -->"
+DEFAULT_END="<!-- SPECKIT END -->"
+
+if [[ ! -f "$EXT_CONFIG" ]]; then
+  echo "agent-context: $EXT_CONFIG not found; nothing to do." >&2
+  exit 0
+fi
+
+# Locate a Python 3 interpreter with PyYAML available.
+_python=""
+_python_candidates=()
+[[ -n "${SPECKIT_PYTHON:-}" ]] && _python_candidates+=("$SPECKIT_PYTHON")
+_python_candidates+=("python3" "python")
+for _candidate in "${_python_candidates[@]}"; do
+  if command -v "$_candidate" >/dev/null 2>&1 \
+    && "$_candidate" - <<'PY' >/dev/null 2>&1
+import sys
+try:
+    import yaml  # noqa: F401
+except ImportError:
+    sys.exit(1)
+sys.exit(0 if sys.version_info[0] == 3 else 1)
+PY
+  then
+    _python="$_candidate"
+    break
+  fi
+done
+unset _candidate _python_candidates
+
+if [[ -z "$_python" ]]; then
+  echo "agent-context: Python 3 with PyYAML not found on PATH; skipping update." >&2
+  echo "  To resolve: pip install pyyaml (or install it into the environment used by python3)." >&2
+  exit 0
+fi
+_case_insensitive_context_files=0
+case "$(uname -s 2>/dev/null || true)" in
+  MINGW*|MSYS*|CYGWIN*) _case_insensitive_context_files=1 ;;
+esac
+
+# Parse extension config once; emit context files as JSON, followed by marker strings.
+if ! _raw_opts="$("$_python" - "$EXT_CONFIG" "$_case_insensitive_context_files" <<'PY'
+import json
+import sys
+try:
+    import yaml
+except ImportError:
+    print(
+        "agent-context: PyYAML is required to parse extension config but is not available "
+        "in the current Python environment.\n"
+        "  To resolve: pip install pyyaml (or install it into the environment used by python3).\n"
+        "  Context file will not be updated until PyYAML is importable.",
+        file=sys.stderr,
+    )
+    sys.exit(2)
+try:
+    with open(sys.argv[1], "r", encoding="utf-8") as fh:
+        data = yaml.safe_load(fh)
+except Exception as exc:
+    print(
+        f"agent-context: unable to parse {sys.argv[1]} ({exc}); cannot update context.",
+        file=sys.stderr,
+    )
+    sys.exit(2)
+if not isinstance(data, dict):
+    data = {}
+def get_str(obj, *keys):
+    node = obj
+    for k in keys:
+        if isinstance(node, dict) and k in node:
+            node = node[k]
+        else:
+            return ""
+    return node if isinstance(node, str) else ""
+context_files = []
+seen_context_files = set()
+case_insensitive = sys.argv[2] == "1" or sys.platform.startswith(("win32", "cygwin"))
+raw_files = data.get("context_files")
+if isinstance(raw_files, list):
+    for value in raw_files:
+        if not isinstance(value, str):
+            continue
+        candidate = value.strip()
+        if not candidate:
+            continue
+        key = candidate.casefold() if case_insensitive else candidate
+        if key in seen_context_files:
+            continue
+        context_files.append(candidate)
+        seen_context_files.add(key)
+if not context_files:
+    raw_file = get_str(data, "context_file")
+    candidate = raw_file.strip()
+    if candidate:
+        context_files.append(candidate)
+print(json.dumps(context_files))
+print(get_str(data, "context_markers", "start"))
+print(get_str(data, "context_markers", "end"))
+PY
+)"; then
+  echo "agent-context: skipping update (see above for details)." >&2
+  exit 0
+fi
+
+_opts_lines=()
+while IFS= read -r _line || [[ -n "$_line" ]]; do
+  _opts_lines+=("$_line")
+done < <(printf '%s\n' "$_raw_opts")
+if (( ${#_opts_lines[@]} < 3 )); then
+  echo "agent-context: malformed config parser output; expected 3 lines (context_files, marker_start, marker_end), got ${#_opts_lines[@]}; skipping update." >&2
+  exit 0
+fi
+CONTEXT_FILES_JSON="${_opts_lines[0]}"
+MARKER_START="${_opts_lines[1]}"
+MARKER_END="${_opts_lines[2]}"
+
+if ! _context_files_raw="$("$_python" - "$CONTEXT_FILES_JSON" <<'PY'
+import json
+import sys
+try:
+    data = json.loads(sys.argv[1])
+except Exception:
+    data = []
+if not isinstance(data, list):
+    data = []
+for value in data:
+    if isinstance(value, str) and value:
+        print(value)
+PY
+)"; then
+  echo "agent-context: malformed context_files parser output; skipping update." >&2
+  exit 0
+fi
+
+CONTEXT_FILES=()
+while IFS= read -r _line || [[ -n "$_line" ]]; do
+  [[ -n "$_line" ]] && CONTEXT_FILES+=("$_line")
+done < <(printf '%s\n' "$_context_files_raw")
+
+if (( ${#CONTEXT_FILES[@]} == 0 )); then
+  echo "agent-context: context_files/context_file not set in extension config; nothing to do." >&2
+  exit 0
+fi
+
+for CONTEXT_FILE in "${CONTEXT_FILES[@]}"; do
+  # Reject absolute paths, backslash separators, and '..' path segments in context files
+  if [[ "$CONTEXT_FILE" == /* ]] || [[ "$CONTEXT_FILE" =~ ^[A-Za-z]: ]]; then
+    echo "agent-context: context files must be project-relative paths; got '$CONTEXT_FILE'." >&2
+    exit 1
+  fi
+  if [[ "$CONTEXT_FILE" == *\\* ]]; then
+    echo "agent-context: context files must not contain backslash separators; got '$CONTEXT_FILE'." >&2
+    exit 1
+  fi
+  IFS='/' read -ra _cf_parts <<< "$CONTEXT_FILE"
+  for _seg in "${_cf_parts[@]}"; do
+    if [[ "$_seg" == ".." ]]; then
+      echo "agent-context: context files must not contain '..' path segments; got '$CONTEXT_FILE'." >&2
+      exit 1
+    fi
+  done
+  if ! "$_python" - "$PROJECT_ROOT" "$CONTEXT_FILE" <<'PY'
+import sys
+from pathlib import Path
+
+root = Path(sys.argv[1]).resolve()
+target = (root / sys.argv[2]).resolve(strict=False)
+try:
+    target.relative_to(root)
+except ValueError:
+    sys.exit(1)
+PY
+  then
+    echo "agent-context: context file path resolves outside the project root; got '$CONTEXT_FILE'." >&2
+    exit 1
+  fi
+done
+unset _cf_parts _seg
+
+[[ -z "$MARKER_START" ]] && MARKER_START="$DEFAULT_START"
+[[ -z "$MARKER_END"   ]] && MARKER_END="$DEFAULT_END"
+
+PLAN_PATH="${1:-}"
+if [[ -z "$PLAN_PATH" ]]; then
+  # Prefer .specify/feature.json (written by /speckit-specify) over mtime heuristic.
+  _feature_json="$PROJECT_ROOT/.specify/feature.json"
+  if [[ -f "$_feature_json" ]]; then
+    _feature_dir="$("$_python" - "$_feature_json" <<'PY'
+import sys, json
+try:
+    with open(sys.argv[1], encoding="utf-8") as fh:
+        d = json.load(fh)
+    val = d.get("feature_directory", "")
+    print(val if isinstance(val, str) else "")
+except Exception:
+    print("")
+PY
+)"
+    # Normalize backslashes (written by PS on Windows) to forward slashes before path ops.
+    _feature_dir="$(printf '%s' "$_feature_dir" | tr '\\' '/')"
+    _feature_dir="${_feature_dir%/}"
+    if [[ -n "$_feature_dir" ]]; then
+      # feature_directory may be relative or absolute (absolute paths outside PROJECT_ROOT
+      # are preserved as-is by _persist_feature_json in common.sh).
+      # Also match drive-qualified paths (C:/...) written by PowerShell on Windows.
+      if [[ "$_feature_dir" == /* ]] || [[ "$_feature_dir" =~ ^[A-Za-z]:/ ]]; then
+        _candidate="$_feature_dir/plan.md"
+      else
+        _candidate="$PROJECT_ROOT/$_feature_dir/plan.md"
+      fi
+      if [[ -f "$_candidate" ]]; then
+        # Resolve symlinks before comparing so paths like /var/… vs /private/var/…
+        # (macOS) are treated as equivalent. Mirrors the mtime-fallback approach.
+        PLAN_PATH="$("$_python" - "$PROJECT_ROOT" "$_candidate" <<'PY'
+import sys
+from pathlib import Path
+root = Path(sys.argv[1]).resolve()
+cand = Path(sys.argv[2]).resolve()
+try:
+    print(cand.relative_to(root).as_posix())
+except ValueError:
+    # Outside project root: emit the resolved path in POSIX form.
+    # as_posix() converts backslashes correctly on native Windows Python.
+    print(cand.as_posix())
+PY
+)"
+      fi
+    fi
+  fi
+
+  # Fall back to mtime only when feature.json is absent or its plan does not exist yet.
+  # Python emits a project-relative POSIX path directly to avoid bash prefix-strip
+  # issues with backslash paths on Windows (Git bash / MSYS2).
+  if [[ -z "$PLAN_PATH" ]]; then
+    _plan_rel="$("$_python" - "$PROJECT_ROOT" <<'PY'
+import sys
+from pathlib import Path
+root = Path(sys.argv[1]).resolve()
+specs = root / "specs"
+plans = sorted(
+    specs.glob("*/plan.md"),
+    key=lambda p: p.stat().st_mtime,
+    reverse=True,
+)
+if plans:
+    try:
+        print(plans[0].relative_to(root).as_posix())
+    except ValueError:
+        print("")
+else:
+    print("")
+PY
+)"
+    if [[ -n "$_plan_rel" ]]; then
+      PLAN_PATH="$_plan_rel"
+    fi
+  fi
+fi
+
+# Build the managed section
+TMP_SECTION="$(mktemp)"
+trap 'rm -f "$TMP_SECTION"' EXIT
+{
+  echo "$MARKER_START"
+  echo "For additional context about technologies to be used, project structure,"
+  echo "shell commands, and other important information, read the current plan"
+  if [[ -n "$PLAN_PATH" ]]; then
+    echo "at $PLAN_PATH"
+  fi
+  echo "$MARKER_END"
+} > "$TMP_SECTION"
+
+for CONTEXT_FILE in "${CONTEXT_FILES[@]}"; do
+  CTX_PATH="$PROJECT_ROOT/$CONTEXT_FILE"
+  mkdir -p "$(dirname "$CTX_PATH")"
+
+  "$_python" - "$CTX_PATH" "$MARKER_START" "$MARKER_END" "$TMP_SECTION" <<'PY'
+import sys, os
+ctx_path, start, end, section_path = sys.argv[1:5]
+with open(section_path, "r", encoding="utf-8") as fh:
+    section = fh.read().rstrip("\n") + "\n"
+
+if os.path.exists(ctx_path):
+    with open(ctx_path, "r", encoding="utf-8-sig") as fh:
+        content = fh.read()
+    s = content.find(start)
+    e = content.find(end, s if s != -1 else 0)
+    if s != -1 and e != -1 and e > s:
+        end_of_marker = e + len(end)
+        if end_of_marker < len(content) and content[end_of_marker] == "\r":
+            end_of_marker += 1
+        if end_of_marker < len(content) and content[end_of_marker] == "\n":
+            end_of_marker += 1
+        new_content = content[:s] + section + content[end_of_marker:]
+    elif s != -1:
+        new_content = content[:s] + section
+    elif e != -1:
+        end_of_marker = e + len(end)
+        if end_of_marker < len(content) and content[end_of_marker] == "\r":
+            end_of_marker += 1
+        if end_of_marker < len(content) and content[end_of_marker] == "\n":
+            end_of_marker += 1
+        new_content = section + content[end_of_marker:]
+    else:
+        if content and not content.endswith("\n"):
+            content += "\n"
+        new_content = (content + "\n" + section) if content else section
+else:
+    new_content = section
+
+new_content = new_content.replace("\r\n", "\n").replace("\r", "\n")
+with open(ctx_path, "wb") as fh:
+    fh.write(new_content.encode("utf-8"))
+PY
+
+  echo "agent-context: updated $CONTEXT_FILE"
+done
diff --git a/.specify/extensions/agent-context/scripts/powershell/update-agent-context.ps1 b/.specify/extensions/agent-context/scripts/powershell/update-agent-context.ps1
new file mode 100644
index 0000000..da9ff44
--- /dev/null
+++ b/.specify/extensions/agent-context/scripts/powershell/update-agent-context.ps1
@@ -0,0 +1,417 @@
+#!/usr/bin/env pwsh
+# update-agent-context.ps1
+#
+# Refresh the managed Spec Kit section in the coding agent's context file(s)
+# (e.g. CLAUDE.md, .github/copilot-instructions.md, AGENTS.md).
+#
+# Reads `context_files` or `context_file`, plus `context_markers.{start,end}`, from the
+# agent-context extension config:
+#   .specify/extensions/agent-context/agent-context-config.yml
+#
+# Usage: update-agent-context.ps1 [plan_path]
+#
+# When `plan_path` is omitted, the script derives it from `.specify/feature.json`
+# (written by /speckit-specify). Falls back to the most recently modified
+# `specs/*/plan.md` only when feature.json is absent or its plan does not exist yet.
+
+[CmdletBinding()]
+param(
+    [Parameter(Position = 0)]
+    [string]$PlanPath
+)
+
+function Get-ConfigValue {
+    param(
+        [AllowNull()][object]$Object,
+        [Parameter(Mandatory = $true)][string]$Key
+    )
+
+    if ($null -eq $Object) {
+        return $null
+    }
+    if ($Object -is [System.Collections.IDictionary]) {
+        return $Object[$Key]
+    }
+    $prop = $Object.PSObject.Properties[$Key]
+    if ($prop) {
+        return $prop.Value
+    }
+    return $null
+}
+
+function Test-ConfigObject {
+    param(
+        [AllowNull()][object]$Object
+    )
+
+    if ($null -eq $Object) {
+        return $false
+    }
+    if ($Object -is [System.Collections.IDictionary]) {
+        return $true
+    }
+    if ($Object -is [System.Management.Automation.PSCustomObject]) {
+        return $true
+    }
+    return $false
+}
+
+function Resolve-ContextPath {
+    param(
+        [Parameter(Mandatory = $true)][string]$Root,
+        [Parameter(Mandatory = $true)][string]$RelativePath
+    )
+
+    $rootFull = [System.IO.Path]::GetFullPath($Root)
+    $segments = $RelativePath -split '/'
+    $resolved = $rootFull
+
+    foreach ($segment in $segments) {
+        if ([string]::IsNullOrWhiteSpace($segment) -or $segment -eq '.') {
+            continue
+        }
+
+        $candidate = [System.IO.Path]::GetFullPath((Join-Path $resolved $segment))
+        if (Test-Path -LiteralPath $candidate) {
+            $item = Get-Item -LiteralPath $candidate -Force
+            if ($item.Attributes -band [System.IO.FileAttributes]::ReparsePoint) {
+                $target = $item.Target
+                if ($target -is [System.Array]) {
+                    $target = $target[0]
+                }
+                if ($target) {
+                    if ([System.IO.Path]::IsPathRooted($target)) {
+                        $candidate = [System.IO.Path]::GetFullPath($target)
+                    } else {
+                        $candidate = [System.IO.Path]::GetFullPath(
+                            (Join-Path (Split-Path -Parent $candidate) $target)
+                        )
+                    }
+                }
+            }
+        }
+        $resolved = $candidate
+    }
+
+    return $resolved
+}
+
+function Test-IsSubPath {
+    param(
+        [Parameter(Mandatory = $true)][string]$Root,
+        [Parameter(Mandatory = $true)][string]$Path
+    )
+
+    $comparison = if ([System.Environment]::OSVersion.Platform -eq [System.PlatformID]::Win32NT) {
+        [System.StringComparison]::OrdinalIgnoreCase
+    } else {
+        [System.StringComparison]::Ordinal
+    }
+    $rootFull = [System.IO.Path]::GetFullPath($Root).TrimEnd(
+        [System.IO.Path]::DirectorySeparatorChar,
+        [System.IO.Path]::AltDirectorySeparatorChar
+    )
+    $pathFull = [System.IO.Path]::GetFullPath($Path)
+    return $pathFull.Equals($rootFull, $comparison) -or
+        $pathFull.StartsWith($rootFull + [System.IO.Path]::DirectorySeparatorChar, $comparison)
+}
+
+$ErrorActionPreference = 'Stop'
+$DefaultStart = '<!-- SPECKIT START -->'
+$DefaultEnd   = '<!-- SPECKIT END -->'
+$ProjectRoot  = (Get-Location).Path
+$ExtConfig    = Join-Path $ProjectRoot '.specify/extensions/agent-context/agent-context-config.yml'
+
+if (-not (Test-Path -LiteralPath $ExtConfig)) {
+    Write-Warning "agent-context: $ExtConfig not found; nothing to do."
+    exit 0
+}
+
+$Options = $null
+if (Get-Command ConvertFrom-Yaml -ErrorAction SilentlyContinue) {
+    try {
+        $Options = Get-Content -LiteralPath $ExtConfig -Raw -Encoding UTF8 | ConvertFrom-Yaml -ErrorAction Stop
+    } catch {
+        # fall through to ConvertFrom-Json fallback
+    }
+}
+
+if ($null -eq $Options) {
+    # ConvertFrom-Yaml unavailable or failed; try ConvertFrom-Json (no external deps,
+    # works when the config file is valid JSON, which is a subset of YAML).
+    try {
+        $raw = Get-Content -LiteralPath $ExtConfig -Raw -Encoding UTF8
+        $Options = $raw | ConvertFrom-Json -ErrorAction Stop
+        if (-not (Test-ConfigObject -Object $Options)) { $Options = $null }
+    } catch {
+        $Options = $null
+    }
+}
+
+if ($null -eq $Options) {
+    # ConvertFrom-Yaml/Json unavailable or failed; fall back to Python+PyYAML.
+    $pythonCmd = $null
+    $pythonCandidates = @()
+    if ($env:SPECKIT_PYTHON) {
+        $pythonCandidates += $env:SPECKIT_PYTHON
+    }
+    $pythonCandidates += @('python3', 'python')
+    foreach ($candidate in $pythonCandidates) {
+        if (Get-Command $candidate -ErrorAction SilentlyContinue) {
+            # Verify it is Python 3 with PyYAML available.
+            $null = & $candidate -c "import sys; import yaml; sys.exit(0 if sys.version_info[0] == 3 else 1)" 2>$null
+            if ($LASTEXITCODE -eq 0) {
+                $pythonCmd = $candidate
+                break
+            }
+        }
+    }
+
+    if ($pythonCmd) {
+        $pyScript = $null
+        try {
+            $pyScript = [System.IO.Path]::GetTempFileName()
+            Set-Content -LiteralPath $pyScript -Encoding UTF8 -Value @'
+import json
+import sys
+try:
+    import yaml
+except ImportError:
+    print(
+        "agent-context: PyYAML is required to parse extension config; cannot update context.",
+        file=sys.stderr,
+    )
+    sys.exit(2)
+
+try:
+    with open(sys.argv[1], "r", encoding="utf-8") as fh:
+        data = yaml.safe_load(fh)
+except Exception as exc:
+    print(
+        f"agent-context: unable to parse {sys.argv[1]} ({exc}); cannot update context.",
+        file=sys.stderr,
+    )
+    sys.exit(2)
+
+if not isinstance(data, dict):
+    data = {}
+
+print(json.dumps(data))
+'@
+            $jsonOut = & $pythonCmd $pyScript $ExtConfig
+            if ($LASTEXITCODE -eq 0 -and $jsonOut) {
+                $Options = $jsonOut | ConvertFrom-Json -ErrorAction Stop
+            }
+        } catch {
+            $Options = $null
+        } finally {
+            if ($pyScript -and (Test-Path -LiteralPath $pyScript)) {
+                Remove-Item -LiteralPath $pyScript -Force -ErrorAction SilentlyContinue
+            }
+        }
+    }
+
+    if (-not $Options) {
+        Write-Warning "agent-context: unable to parse $ExtConfig; skipping update."
+        exit 0
+    }
+}
+
+if (-not (Test-ConfigObject -Object $Options)) {
+    Write-Warning "agent-context: $ExtConfig must contain a YAML mapping; skipping update."
+    exit 0
+}
+
+$ConfiguredContextFiles = Get-ConfigValue -Object $Options -Key 'context_files'
+$ContextFiles = @()
+if ($null -ne $ConfiguredContextFiles) {
+    foreach ($item in @($ConfiguredContextFiles)) {
+        if ($item -is [string] -and -not [string]::IsNullOrWhiteSpace($item)) {
+            $ContextFiles += $item.Trim()
+        }
+    }
+}
+if ($ContextFiles.Count -eq 0) {
+    $ContextFile = Get-ConfigValue -Object $Options -Key 'context_file'
+    if ($ContextFile -is [string] -and -not [string]::IsNullOrWhiteSpace($ContextFile)) {
+        $ContextFiles += $ContextFile.Trim()
+    }
+}
+$pathComparison = if ([System.Environment]::OSVersion.Platform -eq [System.PlatformID]::Win32NT) {
+    [System.StringComparer]::OrdinalIgnoreCase
+} else {
+    [System.StringComparer]::Ordinal
+}
+$seenContextFiles = [System.Collections.Generic.HashSet[string]]::new($pathComparison)
+$dedupedContextFiles = @()
+foreach ($ContextFile in $ContextFiles) {
+    if ($seenContextFiles.Add($ContextFile)) {
+        $dedupedContextFiles += $ContextFile
+    }
+}
+$ContextFiles = $dedupedContextFiles
+if ($ContextFiles.Count -eq 0) {
+    Write-Warning 'agent-context: context_files/context_file not set in extension config; nothing to do.'
+    exit 0
+}
+
+foreach ($ContextFile in $ContextFiles) {
+    # Reject absolute paths, drive-qualified paths, backslash separators, and '..' path segments in context files
+    if ($ContextFile -match '^[A-Za-z]:') {
+        Write-Warning "agent-context: context files must be project-relative paths; got '$ContextFile'."
+        exit 1
+    }
+    if ([System.IO.Path]::IsPathRooted($ContextFile)) {
+        Write-Warning "agent-context: context files must be project-relative paths; got '$ContextFile'."
+        exit 1
+    }
+    if ($ContextFile.Contains('\')) {
+        Write-Warning "agent-context: context files must not contain backslash separators; got '$ContextFile'."
+        exit 1
+    }
+    $cfSegments = $ContextFile -split '[/\\]'
+    if ($cfSegments -contains '..') {
+        Write-Warning "agent-context: context files must not contain '..' path segments; got '$ContextFile'."
+        exit 1
+    }
+    $resolvedTarget = Resolve-ContextPath -Root $ProjectRoot -RelativePath $ContextFile
+    if (-not (Test-IsSubPath -Root $ProjectRoot -Path $resolvedTarget)) {
+        Write-Warning "agent-context: context file path resolves outside the project root; got '$ContextFile'."
+        exit 1
+    }
+}
+
+$MarkerStart = $DefaultStart
+$MarkerEnd   = $DefaultEnd
+$cm = Get-ConfigValue -Object $Options -Key 'context_markers'
+if ($cm) {
+    $cmStart = Get-ConfigValue -Object $cm -Key 'start'
+    if ($cmStart -is [string] -and $cmStart) {
+        $MarkerStart = $cmStart
+    }
+    $cmEnd = Get-ConfigValue -Object $cm -Key 'end'
+    if ($cmEnd -is [string] -and $cmEnd) {
+        $MarkerEnd = $cmEnd
+    }
+}
+
+if (-not $PlanPath) {
+    # Prefer .specify/feature.json (written by /speckit-specify) over mtime heuristic.
+    $FeatureJson = Join-Path $ProjectRoot '.specify/feature.json'
+    if (Test-Path -LiteralPath $FeatureJson) {
+        try {
+            $fj = Get-Content -LiteralPath $FeatureJson -Raw -Encoding UTF8 | ConvertFrom-Json
+            $featureDir = $fj.feature_directory
+            if ($featureDir -isnot [string] -or -not $featureDir) {
+                $featureDir = $null
+            } else {
+                $featureDir = $featureDir.TrimEnd('\', '/')
+            }
+            if ($featureDir) {
+                # Join-Path on Unix does not treat absolute ChildPath as "wins"; check explicitly.
+                if ([System.IO.Path]::IsPathRooted($featureDir)) {
+                    $candidatePlan = Join-Path $featureDir 'plan.md'
+                } else {
+                    $candidatePlan = Join-Path (Join-Path $ProjectRoot $featureDir) 'plan.md'
+                }
+                if (Test-Path -LiteralPath $candidatePlan) {
+                    # Resolve ./ .. segments before relativizing (mirrors bash Path.resolve()).
+                    # GetFullPath is available in .NET Framework 4.x (PS 5.1 compatible).
+                    $resolvedPlan = [System.IO.Path]::GetFullPath($candidatePlan)
+                    $resolvedDir  = [System.IO.Path]::GetDirectoryName($resolvedPlan)
+                    $normRoot = $ProjectRoot.TrimEnd('\', '/') + [System.IO.Path]::DirectorySeparatorChar
+                    $normDir  = $resolvedDir.TrimEnd('\', '/') + [System.IO.Path]::DirectorySeparatorChar
+                    $cmp = if ([System.Environment]::OSVersion.Platform -eq [System.PlatformID]::Win32NT) { [System.StringComparison]::OrdinalIgnoreCase } else { [System.StringComparison]::Ordinal }
+                    if ($normDir.StartsWith($normRoot, $cmp)) {
+                        $relDir = $normDir.Substring($normRoot.Length).TrimEnd('\', '/')
+                        $PlanPath = if ($relDir) { $relDir.Replace('\', '/') + '/plan.md' } else { 'plan.md' }
+                    } else {
+                        $PlanPath = $resolvedPlan.Replace('\', '/')
+                    }
+                }
+            }
+        } catch {
+            # Non-fatal: fall through to mtime heuristic.
+        }
+    }
+
+    # Fall back to mtime only when feature.json is absent or its plan does not exist yet.
+    if (-not $PlanPath) {
+        try {
+            $specsDir = Join-Path $ProjectRoot 'specs'
+            $candidate = Get-ChildItem -Path $specsDir -Directory -ErrorAction SilentlyContinue |
+                ForEach-Object { Get-Item -LiteralPath (Join-Path $_.FullName 'plan.md') -ErrorAction SilentlyContinue } |
+                Where-Object { $_ } |
+                Sort-Object LastWriteTime -Descending |
+                Select-Object -First 1
+            if ($candidate) {
+                # GetRelativePath is .NET 5+ only; strip prefix manually for PS 5.1 compat.
+                # Use case-insensitive comparison on Windows only (matches common.ps1 pattern).
+                $fullPath = $candidate.FullName.Replace('\', '/')
+                $normRoot = $ProjectRoot.Replace('\', '/').TrimEnd('/') + '/'
+                $cmp = if ([System.Environment]::OSVersion.Platform -eq [System.PlatformID]::Win32NT) { [System.StringComparison]::OrdinalIgnoreCase } else { [System.StringComparison]::Ordinal }
+                if ($fullPath.StartsWith($normRoot, $cmp)) {
+                    $PlanPath = $fullPath.Substring($normRoot.Length)
+                } else {
+                    $PlanPath = $fullPath
+                }
+            }
+        } catch {
+            # Non-fatal: continue without a plan path.
+        }
+    }
+}
+
+$lines = @($MarkerStart,
+           'For additional context about technologies to be used, project structure,',
+           'shell commands, and other important information, read the current plan')
+if ($PlanPath) {
+    $lines += "at $PlanPath"
+}
+$lines += $MarkerEnd
+$Section = ($lines -join "`n") + "`n"
+
+foreach ($ContextFile in $ContextFiles) {
+    $CtxPath = Join-Path $ProjectRoot $ContextFile
+    $CtxDir  = Split-Path -Parent $CtxPath
+    if ($CtxDir -and -not (Test-Path -LiteralPath $CtxDir)) {
+        New-Item -ItemType Directory -Path $CtxDir -Force | Out-Null
+    }
+
+    if (Test-Path -LiteralPath $CtxPath) {
+        $rawBytes = [System.IO.File]::ReadAllBytes($CtxPath)
+        # Strip UTF-8 BOM if present
+        if ($rawBytes.Length -ge 3 -and $rawBytes[0] -eq 0xEF -and $rawBytes[1] -eq 0xBB -and $rawBytes[2] -eq 0xBF) {
+            $content = [System.Text.Encoding]::UTF8.GetString($rawBytes, 3, $rawBytes.Length - 3)
+        } else {
+            $content = [System.Text.Encoding]::UTF8.GetString($rawBytes)
+        }
+
+        $s = $content.IndexOf($MarkerStart)
+        $e = if ($s -ge 0) { $content.IndexOf($MarkerEnd, $s) } else { $content.IndexOf($MarkerEnd) }
+
+        if ($s -ge 0 -and $e -ge 0 -and $e -gt $s) {
+            $endOfMarker = $e + $MarkerEnd.Length
+            if ($endOfMarker -lt $content.Length -and $content[$endOfMarker] -eq "`r") { $endOfMarker++ }
+            if ($endOfMarker -lt $content.Length -and $content[$endOfMarker] -eq "`n") { $endOfMarker++ }
+            $newContent = $content.Substring(0, $s) + $Section + $content.Substring($endOfMarker)
+        } elseif ($s -ge 0) {
+            $newContent = $content.Substring(0, $s) + $Section
+        } elseif ($e -ge 0) {
+            $endOfMarker = $e + $MarkerEnd.Length
+            if ($endOfMarker -lt $content.Length -and $content[$endOfMarker] -eq "`r") { $endOfMarker++ }
+            if ($endOfMarker -lt $content.Length -and $content[$endOfMarker] -eq "`n") { $endOfMarker++ }
+            $newContent = $Section + $content.Substring($endOfMarker)
+        } else {
+            if ($content -and -not $content.EndsWith("`n")) { $content += "`n" }
+            if ($content) { $newContent = $content + "`n" + $Section } else { $newContent = $Section }
+        }
+    } else {
+        $newContent = $Section
+    }
+
+    $newContent = $newContent.Replace("`r`n", "`n").Replace("`r", "`n")
+    [System.IO.File]::WriteAllText($CtxPath, $newContent, (New-Object System.Text.UTF8Encoding($false)))
+
+    Write-Host "agent-context: updated $ContextFile"
+}
diff --git a/.specify/feature.json b/.specify/feature.json
new file mode 100644
index 0000000..69a4651
--- /dev/null
+++ b/.specify/feature.json
@@ -0,0 +1,3 @@
+{
+  "feature_directory": "specs/001-ensemble-workflow-ui"
+}
diff --git a/.specify/init-options.json b/.specify/init-options.json
new file mode 100644
index 0000000..6b2408d
--- /dev/null
+++ b/.specify/init-options.json
@@ -0,0 +1,9 @@
+{
+  "ai": "claude",
+  "ai_skills": true,
+  "feature_numbering": "sequential",
+  "here": true,
+  "integration": "claude",
+  "script": "sh",
+  "speckit_version": "0.11.10.dev0"
+}
\ No newline at end of file
diff --git a/.specify/integration.json b/.specify/integration.json
new file mode 100644
index 0000000..5e4bc53
--- /dev/null
+++ b/.specify/integration.json
@@ -0,0 +1,15 @@
+{
+  "version": "0.11.10.dev0",
+  "integration_state_schema": 1,
+  "installed_integrations": [
+    "claude"
+  ],
+  "integration_settings": {
+    "claude": {
+      "script": "sh",
+      "invoke_separator": "-"
+    }
+  },
+  "integration": "claude",
+  "default_integration": "claude"
+}
diff --git a/.specify/integrations/claude.manifest.json b/.specify/integrations/claude.manifest.json
new file mode 100644
index 0000000..b8decd1
--- /dev/null
+++ b/.specify/integrations/claude.manifest.json
@@ -0,0 +1,17 @@
+{
+  "integration": "claude",
+  "version": "0.11.10.dev0",
+  "installed_at": "2026-06-27T21:48:08.043755+00:00",
+  "files": {
+    ".claude/skills/speckit-analyze/SKILL.md": "fecd4bf113c3dda58c75d387473c0106fc2dfea97a27bb7c65af94f3f916c188",
+    ".claude/skills/speckit-clarify/SKILL.md": "c1c2098756ca407530cca11c5b608f517d769962215ddafa013951b81e3e19c5",
+    ".claude/skills/speckit-constitution/SKILL.md": "ee3972318415a05559c6bf281dcbd2e8deda944e595d64ab5474abeacf558697",
+    ".claude/skills/speckit-implement/SKILL.md": "823049e49aa983fe398d4bccf6c686ab6afe8f2cd3856e0380c3ef797d78d56d",
+    ".claude/skills/speckit-converge/SKILL.md": "04226b8443797337624983111546d5e5a48d9993a176c4e6d72a4099a0af50d4",
+    ".claude/skills/speckit-plan/SKILL.md": "53733c8a4f4fd01685759bb1c68e94c73da4ce90d549139e79e419dec6471510",
+    ".claude/skills/speckit-checklist/SKILL.md": "946c6bc808891436972a11a423f89f0fbd272a79809bb8fd1d29f481ebe02613",
+    ".claude/skills/speckit-specify/SKILL.md": "9324dd55d12d420cd581031419fa37eb94ef75ae0bdd53391dd4414bd9d45e02",
+    ".claude/skills/speckit-tasks/SKILL.md": "cb29fb8247a30aac751be83de88d0399221692589dd26327552ae6f193816fda",
+    ".claude/skills/speckit-taskstoissues/SKILL.md": "dfe23aaca349cd76e98505dafa9aae1ef4616a0c35a5c79122b9bd881e16b62f"
+  }
+}
diff --git a/.specify/integrations/speckit.manifest.json b/.specify/integrations/speckit.manifest.json
new file mode 100644
index 0000000..ab72d99
--- /dev/null
+++ b/.specify/integrations/speckit.manifest.json
@@ -0,0 +1,17 @@
+{
+  "integration": "speckit",
+  "version": "0.11.10.dev0",
+  "installed_at": "2026-06-27T21:48:08.066355+00:00",
+  "files": {
+    ".specify/scripts/bash/setup-plan.sh": "4eb12c5b00f5c66a7d01b56c90898d320dcef4425d9b96652d57156c84948eda",
+    ".specify/scripts/bash/check-prerequisites.sh": "afce0aa8db177320d83aa0b8e3619c06b865fd810781894e4a7a3f81664941ce",
+    ".specify/scripts/bash/common.sh": "af8a16f87b4f9084759c42ff9abf35c0b2a2025dffe58c298758ff86de2923b2",
+    ".specify/scripts/bash/setup-tasks.sh": "cf21ba2212b4dd5b435c5ea8527500cfd27768b86c0bbc7ebc3207759f118d27",
+    ".specify/scripts/bash/create-new-feature.sh": "9ba116b64f0328eb69bc1a195d209074ea38823a73a554160d69df34a74daa65",
+    ".specify/templates/constitution-template.md": "ce7549540fa45543cca797a150201d868e64495fdff39dc38246fb17bd4024b3",
+    ".specify/templates/tasks-template.md": "fc29a233f6f5a27ca31f1aa46b596af6500c627441c6e62b2bc4a1d721525842",
+    ".specify/templates/checklist-template.md": "c37695297e5d3153d64f82c21223509940b13932046c7961c42d1d669516130c",
+    ".specify/templates/plan-template.md": "cc7f7979cf8d8836ec26492785affd80791d3422a2b745062ec695be8c985ef7",
+    ".specify/templates/spec-template.md": "3945437fc35cd30a5b2bf7beea680337c3516826d3efa5a6b92c4a7eca1ba28e"
+  }
+}
diff --git a/.specify/memory/constitution.md b/.specify/memory/constitution.md
index fd92d99..02dd813 100644
--- a/.specify/memory/constitution.md
+++ b/.specify/memory/constitution.md
@@ -88,6 +88,16 @@ The ensemble grounding-doc workflow is the canonical shape: the UI may step you
 
 *Kills: The Walled Garden* — a UI that swallows the whole workflow, hides the files, and locks the human out of the conversation and the CLI.
 
+### X. Selection is Explicit; There is No Silent "All"
+
+A batch operation acts on the set the human explicitly chose — never on an implicit "everything" inferred from an empty or absent selection. **"Select all" is a deliberate act that materializes the full set as the chosen set; it is not the state the system falls into when the human chose nothing.** An empty selection means *nothing is selected*: the operation refuses to run and says so, rather than guessing that the human meant the whole corpus.
+
+Which inputs a token-spending pass touches is a **scope decision** (Principle II), and it is the human's — made explicitly, every time. A default that quietly expands to "all" removes that decision from the human exactly when it is most expensive to get wrong.
+
+Concrete clause: the ensemble chapter picker stores `ui.ensemble.chapters_selected` as the literal set of chosen chapters; "Select all" writes every resolved path; `GET /api/ensemble/run/extract` refuses an empty `chapters` list instead of falling back to the glob (`tests/test_ensemble_chapters.py`). The CLI engine is exempt only because a glob *typed at the CLI* is itself an explicit act; the UI must never manufacture that act on the human's behalf.
+
+*Kills: the Implicit Blast Radius* — a batch action that silently expands to "everything" because the set was never explicitly chosen.
+
 ## Architecture is Destiny
 
 Bad architectural choices are liabilities, and in this system the currency is twofold: **token spend** and **precision failures at the table**.
@@ -110,7 +120,7 @@ Humans author structure, identity, and schema. The LLM — including Spec Kit it
 This constitution supersedes conflicting specs, plans, and tasks. A conflict requires written justification or an amendment — not a silent override.
 
 - **Principle precedence:** I (Disk is Truth) and II (The Human Checkpoint) outrank all other principles. When a convenience, a performance gain, or a cleaner abstraction collides with truth-on-disk or the human gate, truth and the gate win.
-- Every spec and plan is tested, by name, against all nine principles before implementation begins.
+- Every spec and plan is tested, by name, against all ten principles before implementation begins.
 - Amendments require a stated rationale, a version bump, and a check that dependent templates and docs stay in sync.
 - Semantic versioning of this document:
   - **MAJOR** — a principle removed or redefined in a backward-incompatible way.
@@ -119,4 +129,6 @@ This constitution supersedes conflicting specs, plans, and tasks. A conflict req
 
 Runtime development guidance lives in `CLAUDE.md` (this repo) and `~/.claude/CLAUDE.md` (global). Where those and this constitution agree, this is the canonical statement; where they drift, amend one to match the other.
 
-**Version**: 1.1.0 | **Ratified**: 2026-06-27 | **Last Amended**: 2026-06-27
+**Version**: 1.2.0 | **Ratified**: 2026-06-27 | **Last Amended**: 2026-06-27
+
+> **1.2.0** (MINOR) — Added Principle X (*Selection is Explicit; There is No Silent "All"*), arising from the ensemble chapter picker: a batch pass acts only on an explicitly chosen set, and "Select all" must materialize that set rather than be an empty-means-everything default.
diff --git a/.specify/scripts/bash/check-prerequisites.sh b/.specify/scripts/bash/check-prerequisites.sh
new file mode 100755
index 0000000..8377d8e
--- /dev/null
+++ b/.specify/scripts/bash/check-prerequisites.sh
@@ -0,0 +1,189 @@
+#!/usr/bin/env bash
+
+# Consolidated prerequisite checking script
+#
+# This script provides unified prerequisite checking for Spec-Driven Development workflow.
+# It replaces the functionality previously spread across multiple scripts.
+#
+# Usage: ./check-prerequisites.sh [OPTIONS]
+#
+# OPTIONS:
+#   --json              Output in JSON format
+#   --require-tasks     Require tasks.md to exist (for implementation phase)
+#   --include-tasks     Include tasks.md in AVAILABLE_DOCS list
+#   --paths-only        Only output path variables (no validation)
+#   --help, -h          Show help message
+#
+# OUTPUTS:
+#   JSON mode: {"FEATURE_DIR":"...", "AVAILABLE_DOCS":["..."]}
+#   Text mode: FEATURE_DIR:... \n AVAILABLE_DOCS: \n ✓/✗ file.md
+#   Paths only: REPO_ROOT: ... \n BRANCH: ... \n FEATURE_DIR: ... etc.
+
+set -e
+
+# Parse command line arguments
+JSON_MODE=false
+REQUIRE_TASKS=false
+INCLUDE_TASKS=false
+PATHS_ONLY=false
+
+for arg in "$@"; do
+    case "$arg" in
+        --json)
+            JSON_MODE=true
+            ;;
+        --require-tasks)
+            REQUIRE_TASKS=true
+            ;;
+        --include-tasks)
+            INCLUDE_TASKS=true
+            ;;
+        --paths-only)
+            PATHS_ONLY=true
+            ;;
+        --help|-h)
+            cat << 'EOF'
+Usage: check-prerequisites.sh [OPTIONS]
+
+Consolidated prerequisite checking for Spec-Driven Development workflow.
+
+OPTIONS:
+  --json              Output in JSON format
+  --require-tasks     Require tasks.md to exist (for implementation phase)
+  --include-tasks     Include tasks.md in AVAILABLE_DOCS list
+  --paths-only        Only output path variables (no prerequisite validation)
+  --help, -h          Show this help message
+
+EXAMPLES:
+  # Check task prerequisites (plan.md required)
+  ./check-prerequisites.sh --json
+  
+  # Check implementation prerequisites (plan.md + tasks.md required)
+  ./check-prerequisites.sh --json --require-tasks --include-tasks
+  
+  # Get feature paths only (no validation)
+  ./check-prerequisites.sh --paths-only
+  
+EOF
+            exit 0
+            ;;
+        *)
+            echo "ERROR: Unknown option '$arg'. Use --help for usage information." >&2
+            exit 1
+            ;;
+    esac
+done
+
+# Source common functions
+SCRIPT_DIR="$(CDPATH="" cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+source "$SCRIPT_DIR/common.sh"
+
+# Get feature paths
+_paths_output=$(get_feature_paths) || { echo "ERROR: Failed to resolve feature paths" >&2; exit 1; }
+eval "$_paths_output"
+unset _paths_output
+
+# If paths-only mode, output paths and exit (no validation)
+if $PATHS_ONLY; then
+    if $JSON_MODE; then
+        # Minimal JSON paths payload (no validation performed)
+        if has_jq; then
+            jq -cn \
+                --arg repo_root "$REPO_ROOT" \
+                --arg branch "$CURRENT_BRANCH" \
+                --arg feature_dir "$FEATURE_DIR" \
+                --arg feature_spec "$FEATURE_SPEC" \
+                --arg impl_plan "$IMPL_PLAN" \
+                --arg tasks "$TASKS" \
+                '{REPO_ROOT:$repo_root,BRANCH:$branch,FEATURE_DIR:$feature_dir,FEATURE_SPEC:$feature_spec,IMPL_PLAN:$impl_plan,TASKS:$tasks}'
+        else
+            printf '{"REPO_ROOT":"%s","BRANCH":"%s","FEATURE_DIR":"%s","FEATURE_SPEC":"%s","IMPL_PLAN":"%s","TASKS":"%s"}\n' \
+                "$(json_escape "$REPO_ROOT")" "$(json_escape "$CURRENT_BRANCH")" "$(json_escape "$FEATURE_DIR")" "$(json_escape "$FEATURE_SPEC")" "$(json_escape "$IMPL_PLAN")" "$(json_escape "$TASKS")"
+        fi
+    else
+        echo "REPO_ROOT: $REPO_ROOT"
+        echo "BRANCH: $CURRENT_BRANCH"
+        echo "FEATURE_DIR: $FEATURE_DIR"
+        echo "FEATURE_SPEC: $FEATURE_SPEC"
+        echo "IMPL_PLAN: $IMPL_PLAN"
+        echo "TASKS: $TASKS"
+    fi
+    exit 0
+fi
+
+# Validate required directories and files
+if [[ ! -d "$FEATURE_DIR" ]]; then
+    echo "ERROR: Feature directory not found: $FEATURE_DIR" >&2
+    echo "Run /speckit-specify first to create the feature structure." >&2
+    exit 1
+fi
+
+if [[ ! -f "$IMPL_PLAN" ]]; then
+    echo "ERROR: plan.md not found in $FEATURE_DIR" >&2
+    echo "Run /speckit-plan first to create the implementation plan." >&2
+    exit 1
+fi
+
+# Check for tasks.md if required
+if $REQUIRE_TASKS && [[ ! -f "$TASKS" ]]; then
+    echo "ERROR: tasks.md not found in $FEATURE_DIR" >&2
+    echo "Run /speckit-tasks first to create the task list." >&2
+    exit 1
+fi
+
+# Build list of available documents
+docs=()
+
+# Always check these optional docs
+[[ -f "$RESEARCH" ]] && docs+=("research.md")
+[[ -f "$DATA_MODEL" ]] && docs+=("data-model.md")
+
+# Check contracts directory (only if it exists and has files)
+if [[ -d "$CONTRACTS_DIR" ]] && [[ -n "$(ls -A "$CONTRACTS_DIR" 2>/dev/null)" ]]; then
+    docs+=("contracts/")
+fi
+
+[[ -f "$QUICKSTART" ]] && docs+=("quickstart.md")
+
+# Include tasks.md if requested and it exists
+if $INCLUDE_TASKS && [[ -f "$TASKS" ]]; then
+    docs+=("tasks.md")
+fi
+
+# Output results
+if $JSON_MODE; then
+    # Build JSON array of documents
+    if has_jq; then
+        if [[ ${#docs[@]} -eq 0 ]]; then
+            json_docs="[]"
+        else
+            json_docs=$(printf '%s\n' "${docs[@]}" | jq -R . | jq -s .)
+        fi
+        jq -cn \
+            --arg feature_dir "$FEATURE_DIR" \
+            --argjson docs "$json_docs" \
+            '{FEATURE_DIR:$feature_dir,AVAILABLE_DOCS:$docs}'
+    else
+        if [[ ${#docs[@]} -eq 0 ]]; then
+            json_docs="[]"
+        else
+            json_docs=$(for d in "${docs[@]}"; do printf '"%s",' "$(json_escape "$d")"; done)
+            json_docs="[${json_docs%,}]"
+        fi
+        printf '{"FEATURE_DIR":"%s","AVAILABLE_DOCS":%s}\n' "$(json_escape "$FEATURE_DIR")" "$json_docs"
+    fi
+else
+    # Text output
+    echo "FEATURE_DIR:$FEATURE_DIR"
+    echo "AVAILABLE_DOCS:"
+    
+    # Show status of each potential document
+    check_file "$RESEARCH" "research.md"
+    check_file "$DATA_MODEL" "data-model.md"
+    check_dir "$CONTRACTS_DIR" "contracts/"
+    check_file "$QUICKSTART" "quickstart.md"
+    
+    if $INCLUDE_TASKS; then
+        check_file "$TASKS" "tasks.md"
+    fi
+fi
diff --git a/.specify/scripts/bash/common.sh b/.specify/scripts/bash/common.sh
new file mode 100755
index 0000000..70ab89b
--- /dev/null
+++ b/.specify/scripts/bash/common.sh
@@ -0,0 +1,619 @@
+#!/usr/bin/env bash
+# Common functions and variables for all scripts
+
+# Find repository root by searching upward for .specify directory
+# This is the primary marker for spec-kit projects
+find_specify_root() {
+    local dir="${1:-$(pwd)}"
+    # Normalize to absolute path to prevent infinite loop with relative paths
+    # Use -- to handle paths starting with - (e.g., -P, -L)
+    dir="$(cd -- "$dir" 2>/dev/null && pwd)" || return 1
+    local prev_dir=""
+    while true; do
+        if [ -d "$dir/.specify" ]; then
+            echo "$dir"
+            return 0
+        fi
+        # Stop if we've reached filesystem root or dirname stops changing
+        if [ "$dir" = "/" ] || [ "$dir" = "$prev_dir" ]; then
+            break
+        fi
+        prev_dir="$dir"
+        dir="$(dirname "$dir")"
+    done
+    return 1
+}
+
+# Resolve an explicit SPECIFY_INIT_DIR project override (the directory that
+# *contains* .specify/), for non-interactive / CI use — e.g. running a Spec Kit
+# command against a member project from a monorepo root without cd.
+#
+# Precondition: SPECIFY_INIT_DIR is non-empty. Echoes the validated absolute
+# project root, or prints an error and returns 1. Strict by design: the path
+# must exist and contain .specify/, with no silent fallback to cwd or the
+# script-location default (which would silently write to the wrong project).
+#
+# This is the single resolver: bundled extensions inherit it by sourcing core
+# (e.g. the git extension's create-new-feature-branch) rather than duplicating it.
+resolve_specify_init_dir() {
+    local init_root
+    # Normalize: relative paths resolve against $(pwd); a trailing slash collapses.
+    # CDPATH="" so a relative value cannot be resolved against the caller's CDPATH
+    # (which would also echo to stdout and corrupt the captured path).
+    if ! init_root="$(CDPATH="" cd -- "$SPECIFY_INIT_DIR" 2>/dev/null && pwd)"; then
+        echo "ERROR: SPECIFY_INIT_DIR does not point to an existing directory: $SPECIFY_INIT_DIR" >&2
+        return 1
+    fi
+    if [[ ! -d "$init_root/.specify" ]]; then
+        echo "ERROR: SPECIFY_INIT_DIR is not a Spec Kit project (no .specify/ directory): $init_root" >&2
+        return 1
+    fi
+    printf '%s\n' "$init_root"
+}
+
+# Get repository root, prioritizing .specify directory
+# This prevents using a parent repository when spec-kit is initialized in a subdirectory
+get_repo_root() {
+    # Explicit project override wins (see resolve_specify_init_dir).
+    if [[ -n "${SPECIFY_INIT_DIR:-}" ]]; then
+        resolve_specify_init_dir
+        return
+    fi
+
+    # First, look for .specify directory (spec-kit's own marker)
+    local specify_root
+    if specify_root=$(find_specify_root); then
+        echo "$specify_root"
+        return
+    fi
+
+    # Final fallback to script location
+    local script_dir="$(CDPATH="" cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+    (cd "$script_dir/../../.." && pwd)
+}
+
+# Get current feature name from explicit state only.
+# Returns the feature identifier or empty string if none is set.
+# Feature state is set by SPECIFY_FEATURE (from create-new-feature or
+# the git extension) or implicitly via .specify/feature.json.
+get_current_branch() {
+    if [[ -n "${SPECIFY_FEATURE:-}" ]]; then
+        echo "$SPECIFY_FEATURE"
+        return
+    fi
+
+    # No explicit feature set — caller must handle this via feature.json
+    # in get_feature_paths(). Return empty to signal "unknown".
+    echo ""
+}
+
+# Safely read .specify/feature.json's "feature_directory" value.
+# Prints the raw value (possibly relative) to stdout, or empty string if the file
+# is missing, unparseable, or does not contain the key. Always returns 0 so callers
+# under `set -e` cannot be aborted by parser failure.
+# Parser order mirrors the historical get_feature_paths behavior: jq -> python3 -> grep/sed.
+read_feature_json_feature_directory() {
+    local repo_root="$1"
+    local fj="$repo_root/.specify/feature.json"
+    [[ -f "$fj" ]] || { printf '%s' ''; return 0; }
+
+    local _fd=''
+    if command -v jq >/dev/null 2>&1; then
+        if ! _fd=$(jq -r '.feature_directory // empty' "$fj" 2>/dev/null); then
+            _fd=''
+        fi
+    elif command -v python3 >/dev/null 2>&1; then
+        # Use Python so pretty-printed/multi-line JSON still parses correctly.
+        if ! _fd=$(python3 -c "import json,sys; d=json.load(open(sys.argv[1])); v=d.get('feature_directory'); print(v if v else '')" "$fj" 2>/dev/null); then
+            _fd=''
+        fi
+    else
+        # Last-resort single-line grep/sed fallback. The `|| true` guards against
+        # grep returning 1 (no match) aborting under `set -e` / `pipefail`.
+        _fd=$( { grep -E '"feature_directory"[[:space:]]*:' "$fj" 2>/dev/null || true; } \
+            | head -n 1 \
+            | sed -E 's/^[^:]*:[[:space:]]*"([^"]*)".*$/\1/' )
+    fi
+
+    printf '%s' "$_fd"
+    return 0
+}
+
+# Persist a feature_directory value to .specify/feature.json.
+# Writes only when the file is missing or the value differs from what's stored.
+# Accepts the raw (possibly relative) path — callers should pass the original
+# user-supplied value, not the normalized absolute path.
+_persist_feature_json() {
+    local repo_root="$1"
+    local feature_dir_value="$2"
+    local fj="$repo_root/.specify/feature.json"
+
+    # Strip repo_root prefix if the value is absolute and under repo_root
+    if [[ "$feature_dir_value" == "$repo_root/"* ]]; then
+        feature_dir_value="${feature_dir_value#"$repo_root/"}"
+    fi
+
+    # Read current value (if any) and skip write when unchanged
+    local current_val
+    current_val=$(read_feature_json_feature_directory "$repo_root")
+    if [[ "$current_val" == "$feature_dir_value" ]]; then
+        return 0
+    fi
+
+    # Ensure .specify/ directory exists
+    mkdir -p "$repo_root/.specify"
+
+    # Write feature.json — prefer jq for safe JSON, fall back to printf
+    if command -v jq >/dev/null 2>&1; then
+        jq -cn --arg fd "$feature_dir_value" '{feature_directory:$fd}' > "$fj"
+    else
+        printf '{"feature_directory":"%s"}\n' "$(json_escape "$feature_dir_value")" > "$fj"
+    fi
+}
+
+get_feature_paths() {
+    # Split decl/assignment so a SPECIFY_INIT_DIR validation failure in
+    # get_repo_root propagates as a hard error instead of being masked by `local`.
+    local repo_root
+    repo_root=$(get_repo_root) || return 1
+    local current_branch
+    current_branch=$(get_current_branch)
+
+    # Resolve feature directory.  Priority:
+    #   1. SPECIFY_FEATURE_DIRECTORY env var (explicit override)
+    #   2. .specify/feature.json "feature_directory" key (persisted by specify command)
+    #   3. Error — no feature context available
+    local feature_dir
+    if [[ -n "${SPECIFY_FEATURE_DIRECTORY:-}" ]]; then
+        feature_dir="$SPECIFY_FEATURE_DIRECTORY"
+        # Normalize relative paths to absolute under repo root
+        [[ "$feature_dir" != /* ]] && feature_dir="$repo_root/$feature_dir"
+        # Persist to feature.json so future sessions without the env var still work
+        _persist_feature_json "$repo_root" "$SPECIFY_FEATURE_DIRECTORY"
+    elif [[ -f "$repo_root/.specify/feature.json" ]]; then
+        local _fd
+        _fd=$(read_feature_json_feature_directory "$repo_root")
+        if [[ -n "$_fd" ]]; then
+            feature_dir="$_fd"
+            # Normalize relative paths to absolute under repo root
+            [[ "$feature_dir" != /* ]] && feature_dir="$repo_root/$feature_dir"
+        else
+            echo "ERROR: Feature directory not found. Set SPECIFY_FEATURE_DIRECTORY or ensure .specify/feature.json contains feature_directory." >&2
+            return 1
+        fi
+    else
+        echo "ERROR: Feature directory not found. Set SPECIFY_FEATURE_DIRECTORY or run the specify command to create .specify/feature.json." >&2
+        return 1
+    fi
+
+    # Use printf '%q' to safely quote values, preventing shell injection
+    # via crafted branch names or paths containing special characters
+    printf 'REPO_ROOT=%q\n' "$repo_root"
+    printf 'CURRENT_BRANCH=%q\n' "$current_branch"
+    printf 'FEATURE_DIR=%q\n' "$feature_dir"
+    printf 'FEATURE_SPEC=%q\n' "$feature_dir/spec.md"
+    printf 'IMPL_PLAN=%q\n' "$feature_dir/plan.md"
+    printf 'TASKS=%q\n' "$feature_dir/tasks.md"
+    printf 'RESEARCH=%q\n' "$feature_dir/research.md"
+    printf 'DATA_MODEL=%q\n' "$feature_dir/data-model.md"
+    printf 'QUICKSTART=%q\n' "$feature_dir/quickstart.md"
+    printf 'CONTRACTS_DIR=%q\n' "$feature_dir/contracts"
+}
+
+# Check if jq is available for safe JSON construction
+has_jq() {
+    command -v jq >/dev/null 2>&1
+}
+
+get_invoke_separator() {
+    local repo_root="${1:-$(get_repo_root)}"
+    if [[ "${_SPECIFY_INVOKE_SEPARATOR_CACHE_REPO_ROOT:-}" == "$repo_root" && -n "${_SPECIFY_INVOKE_SEPARATOR_CACHE_VALUE:-}" ]]; then
+        printf '%s\n' "$_SPECIFY_INVOKE_SEPARATOR_CACHE_VALUE"
+        return 0
+    fi
+
+    local integration_json="$repo_root/.specify/integration.json"
+    local separator="."
+    local parsed_with_jq=0
+
+    if [[ -f "$integration_json" ]]; then
+        if command -v jq >/dev/null 2>&1; then
+            local jq_separator
+            if jq_separator=$(jq -r '(.default_integration // .integration // "") as $k | if $k == "" then "." else (.integration_settings[$k].invoke_separator // ".") end' "$integration_json" 2>/dev/null); then
+                parsed_with_jq=1
+                case "$jq_separator" in
+                    "."|"-") separator="$jq_separator" ;;
+                esac
+            fi
+        fi
+
+        if [[ "$parsed_with_jq" -eq 0 ]] && command -v python3 >/dev/null 2>&1; then
+            if separator=$(python3 - "$integration_json" <<'PY' 2>/dev/null
+import json
+import sys
+
+try:
+    with open(sys.argv[1], encoding="utf-8") as fh:
+        state = json.load(fh)
+    key = state.get("default_integration") or state.get("integration") or ""
+    settings = state.get("integration_settings")
+    separator = "."
+    if isinstance(key, str) and isinstance(settings, dict):
+        entry = settings.get(key)
+        if isinstance(entry, dict) and entry.get("invoke_separator") in {".", "-"}:
+            separator = entry["invoke_separator"]
+    print(separator)
+except Exception:
+    print(".")
+PY
+); then
+                case "$separator" in
+                    "."|"-") ;;
+                    *) separator="." ;;
+                esac
+            else
+                separator="."
+            fi
+        fi
+    fi
+
+    _SPECIFY_INVOKE_SEPARATOR_CACHE_REPO_ROOT="$repo_root"
+    _SPECIFY_INVOKE_SEPARATOR_CACHE_VALUE="$separator"
+    printf '%s\n' "$separator"
+}
+
+format_speckit_command() {
+    local command_name="$1"
+    local repo_root="${2:-$(get_repo_root)}"
+    local separator
+    if [[ "${_SPECIFY_INVOKE_SEPARATOR_CACHE_REPO_ROOT:-}" == "$repo_root" && -n "${_SPECIFY_INVOKE_SEPARATOR_CACHE_VALUE:-}" ]]; then
+        separator="$_SPECIFY_INVOKE_SEPARATOR_CACHE_VALUE"
+    else
+        separator=$(get_invoke_separator "$repo_root")
+        _SPECIFY_INVOKE_SEPARATOR_CACHE_REPO_ROOT="$repo_root"
+        _SPECIFY_INVOKE_SEPARATOR_CACHE_VALUE="$separator"
+    fi
+
+    command_name="${command_name#/}"
+    command_name="${command_name#speckit.}"
+    command_name="${command_name#speckit-}"
+    command_name="${command_name//./$separator}"
+
+    printf '/speckit%s%s\n' "$separator" "$command_name"
+}
+
+# Escape a string for safe embedding in a JSON value (fallback when jq is unavailable).
+# Handles backslash, double-quote, and JSON-required control character escapes (RFC 8259).
+json_escape() {
+    local s="$1"
+    s="${s//\\/\\\\}"
+    s="${s//\"/\\\"}"
+    s="${s//$'\n'/\\n}"
+    s="${s//$'\t'/\\t}"
+    s="${s//$'\r'/\\r}"
+    s="${s//$'\b'/\\b}"
+    s="${s//$'\f'/\\f}"
+    # Escape any remaining U+0001-U+001F control characters as \uXXXX.
+    # (U+0000/NUL cannot appear in bash strings and is excluded.)
+    # LC_ALL=C ensures ${#s} counts bytes and ${s:$i:1} yields single bytes,
+    # so multi-byte UTF-8 sequences (first byte >= 0xC0) pass through intact.
+    local LC_ALL=C
+    local i char code
+    for (( i=0; i<${#s}; i++ )); do
+        char="${s:$i:1}"
+        printf -v code '%d' "'$char" 2>/dev/null || code=256
+        if (( code >= 1 && code <= 31 )); then
+            printf '\\u%04x' "$code"
+        else
+            printf '%s' "$char"
+        fi
+    done
+}
+
+check_file() { [[ -f "$1" ]] && echo "  ✓ $2" || echo "  ✗ $2"; }
+check_dir() { [[ -d "$1" && -n $(ls -A "$1" 2>/dev/null) ]] && echo "  ✓ $2" || echo "  ✗ $2"; }
+
+# Resolve a template name to a file path using the priority stack:
+#   1. .specify/templates/overrides/
+#   2. .specify/presets/<preset-id>/templates/ (sorted by priority from .registry)
+#   3. .specify/extensions/<ext-id>/templates/
+#   4. .specify/templates/ (core)
+resolve_template() {
+    local template_name="$1"
+    local repo_root="$2"
+    local base="$repo_root/.specify/templates"
+
+    # Priority 1: Project overrides
+    local override="$base/overrides/${template_name}.md"
+    [ -f "$override" ] && echo "$override" && return 0
+
+    # Priority 2: Installed presets (sorted by priority from .registry)
+    local presets_dir="$repo_root/.specify/presets"
+    if [ -d "$presets_dir" ]; then
+        local registry_file="$presets_dir/.registry"
+        if [ -f "$registry_file" ] && command -v python3 >/dev/null 2>&1; then
+            # Read preset IDs sorted by priority (lower number = higher precedence).
+            # The python3 call is wrapped in an if-condition so that set -e does not
+            # abort the function when python3 exits non-zero (e.g. invalid JSON).
+            local sorted_presets=""
+            if sorted_presets=$(SPECKIT_REGISTRY="$registry_file" python3 -c "
+import json, sys, os
+try:
+    with open(os.environ['SPECKIT_REGISTRY']) as f:
+        data = json.load(f)
+    presets = data.get('presets', {})
+    for pid, meta in sorted(presets.items(), key=lambda x: x[1].get('priority', 10) if isinstance(x[1], dict) else 10):
+        if isinstance(meta, dict) and meta.get('enabled', True) is not False:
+            print(pid)
+except Exception:
+    sys.exit(1)
+" 2>/dev/null); then
+                if [ -n "$sorted_presets" ]; then
+                    # python3 succeeded and returned preset IDs — search in priority order
+                    while IFS= read -r preset_id; do
+                        local candidate="$presets_dir/$preset_id/templates/${template_name}.md"
+                        [ -f "$candidate" ] && echo "$candidate" && return 0
+                    done <<< "$sorted_presets"
+                fi
+                # python3 succeeded but registry has no presets — nothing to search
+            else
+                # python3 failed (missing, or registry parse error) — fall back to unordered directory scan
+                for preset in "$presets_dir"/*/; do
+                    [ -d "$preset" ] || continue
+                    local candidate="$preset/templates/${template_name}.md"
+                    [ -f "$candidate" ] && echo "$candidate" && return 0
+                done
+            fi
+        else
+            # Fallback: alphabetical directory order (no python3 available)
+            for preset in "$presets_dir"/*/; do
+                [ -d "$preset" ] || continue
+                local candidate="$preset/templates/${template_name}.md"
+                [ -f "$candidate" ] && echo "$candidate" && return 0
+            done
+        fi
+    fi
+
+    # Priority 3: Extension-provided templates
+    local ext_dir="$repo_root/.specify/extensions"
+    if [ -d "$ext_dir" ]; then
+        for ext in "$ext_dir"/*/; do
+            [ -d "$ext" ] || continue
+            # Skip hidden directories (e.g. .backup, .cache)
+            case "$(basename "$ext")" in .*) continue;; esac
+            local candidate="$ext/templates/${template_name}.md"
+            [ -f "$candidate" ] && echo "$candidate" && return 0
+        done
+    fi
+
+    # Priority 4: Core templates
+    local core="$base/${template_name}.md"
+    [ -f "$core" ] && echo "$core" && return 0
+
+    # Template not found in any location.
+    # Return 1 so callers can distinguish "not found" from "found".
+    # Callers running under set -e should use: TEMPLATE=$(resolve_template ...) || true
+    return 1
+}
+
+# Resolve a template name to composed content using composition strategies.
+# Reads strategy metadata from preset manifests and composes content
+# from multiple layers using prepend, append, or wrap strategies.
+#
+# Usage: CONTENT=$(resolve_template_content "template-name" "$REPO_ROOT")
+# Returns composed content string on stdout; exit code 1 if not found.
+resolve_template_content() {
+    local template_name="$1"
+    local repo_root="$2"
+    local base="$repo_root/.specify/templates"
+
+    # Collect all layers (highest priority first)
+    local -a layer_paths=()
+    local -a layer_strategies=()
+
+    # Priority 1: Project overrides (always "replace")
+    local override="$base/overrides/${template_name}.md"
+    if [ -f "$override" ]; then
+        layer_paths+=("$override")
+        layer_strategies+=("replace")
+    fi
+
+    # Priority 2: Installed presets (sorted by priority from .registry)
+    local presets_dir="$repo_root/.specify/presets"
+    if [ -d "$presets_dir" ]; then
+        local registry_file="$presets_dir/.registry"
+        local sorted_presets=""
+        if [ -f "$registry_file" ] && command -v python3 >/dev/null 2>&1; then
+            if sorted_presets=$(SPECKIT_REGISTRY="$registry_file" python3 -c "
+import json, sys, os
+try:
+    with open(os.environ['SPECKIT_REGISTRY']) as f:
+        data = json.load(f)
+    presets = data.get('presets', {})
+    for pid, meta in sorted(presets.items(), key=lambda x: x[1].get('priority', 10) if isinstance(x[1], dict) else 10):
+        if isinstance(meta, dict) and meta.get('enabled', True) is not False:
+            print(pid)
+except Exception:
+    sys.exit(1)
+" 2>/dev/null); then
+                if [ -n "$sorted_presets" ]; then
+                    local yaml_warned=false
+                    while IFS= read -r preset_id; do
+                        # Read strategy and file path from preset manifest
+                        local strategy="replace"
+                        local manifest_file=""
+                        local manifest="$presets_dir/$preset_id/preset.yml"
+                        if [ -f "$manifest" ] && command -v python3 >/dev/null 2>&1; then
+                            # Requires PyYAML; falls back to replace/convention if unavailable
+                            local result
+                            local py_stderr
+                            py_stderr=$(mktemp)
+                            result=$(SPECKIT_MANIFEST="$manifest" SPECKIT_TMPL="$template_name" python3 -c "
+import sys, os
+try:
+    import yaml
+except ImportError:
+    print('yaml_missing', file=sys.stderr)
+    print('replace\t')
+    sys.exit(0)
+try:
+    with open(os.environ['SPECKIT_MANIFEST']) as f:
+        data = yaml.safe_load(f)
+    for t in data.get('provides', {}).get('templates', []):
+        if t.get('name') == os.environ['SPECKIT_TMPL'] and t.get('type', 'template') == 'template':
+            print(t.get('strategy', 'replace') + '\t' + t.get('file', ''))
+            sys.exit(0)
+    print('replace\t')
+except Exception:
+    print('replace\t')
+" 2>"$py_stderr")
+                            local parse_status=$?
+                            if [ $parse_status -eq 0 ] && [ -n "$result" ]; then
+                                IFS=$'\t' read -r strategy manifest_file <<< "$result"
+                                strategy=$(printf '%s' "$strategy" | tr '[:upper:]' '[:lower:]')
+                            fi
+                            if [ "$yaml_warned" = false ] && grep -q 'yaml_missing' "$py_stderr" 2>/dev/null; then
+                                echo "Warning: PyYAML not available; composition strategies may be ignored" >&2
+                                yaml_warned=true
+                            fi
+                            rm -f "$py_stderr"
+                        fi
+                        # Try manifest file path first, then convention path
+                        local candidate=""
+                        if [ -n "$manifest_file" ]; then
+                            # Reject absolute paths and parent traversal
+                            case "$manifest_file" in
+                                /*|*../*|../*) manifest_file="" ;;
+                            esac
+                        fi
+                        if [ -n "$manifest_file" ]; then
+                            local mf="$presets_dir/$preset_id/$manifest_file"
+                            [ -f "$mf" ] && candidate="$mf"
+                        fi
+                        if [ -z "$candidate" ]; then
+                            local cf="$presets_dir/$preset_id/templates/${template_name}.md"
+                            [ -f "$cf" ] && candidate="$cf"
+                        fi
+                        if [ -n "$candidate" ]; then
+                            layer_paths+=("$candidate")
+                            layer_strategies+=("$strategy")
+                        fi
+                    done <<< "$sorted_presets"
+                fi
+            else
+                # python3 failed — fall back to unordered directory scan (replace only)
+                for preset in "$presets_dir"/*/; do
+                    [ -d "$preset" ] || continue
+                    local candidate="$preset/templates/${template_name}.md"
+                    if [ -f "$candidate" ]; then
+                        layer_paths+=("$candidate")
+                        layer_strategies+=("replace")
+                    fi
+                done
+            fi
+        else
+            # No python3 or registry — fall back to unordered directory scan (replace only)
+            for preset in "$presets_dir"/*/; do
+                [ -d "$preset" ] || continue
+                local candidate="$preset/templates/${template_name}.md"
+                if [ -f "$candidate" ]; then
+                    layer_paths+=("$candidate")
+                    layer_strategies+=("replace")
+                fi
+            done
+        fi
+    fi
+
+    # Priority 3: Extension-provided templates (always "replace")
+    local ext_dir="$repo_root/.specify/extensions"
+    if [ -d "$ext_dir" ]; then
+        for ext in "$ext_dir"/*/; do
+            [ -d "$ext" ] || continue
+            case "$(basename "$ext")" in .*) continue;; esac
+            local candidate="$ext/templates/${template_name}.md"
+            if [ -f "$candidate" ]; then
+                layer_paths+=("$candidate")
+                layer_strategies+=("replace")
+            fi
+        done
+    fi
+
+    # Priority 4: Core templates (always "replace")
+    local core="$base/${template_name}.md"
+    if [ -f "$core" ]; then
+        layer_paths+=("$core")
+        layer_strategies+=("replace")
+    fi
+
+    local count=${#layer_paths[@]}
+    [ "$count" -eq 0 ] && return 1
+
+    # Check if any layer uses a non-replace strategy
+    local has_composition=false
+    for s in "${layer_strategies[@]}"; do
+        [ "$s" != "replace" ] && has_composition=true && break
+    done
+
+    # If the top (highest-priority) layer is replace, it wins entirely —
+    # lower layers are irrelevant regardless of their strategies.
+    if [ "${layer_strategies[0]}" = "replace" ]; then
+        cat "${layer_paths[0]}"
+        return 0
+    fi
+
+    if [ "$has_composition" = false ]; then
+        cat "${layer_paths[0]}"
+        return 0
+    fi
+
+    # Find the effective base: scan from highest priority (index 0) downward
+    # to find the nearest replace layer. Only compose layers above that base.
+    local base_idx=-1
+    local i
+    for (( i=0; i<count; i++ )); do
+        if [ "${layer_strategies[$i]}" = "replace" ]; then
+            base_idx=$i
+            break
+        fi
+    done
+
+    if [ $base_idx -lt 0 ]; then
+        return 1  # no base layer found
+    fi
+
+    # Read the base content; compose layers above the base (higher priority)
+    local content
+    content=$(cat "${layer_paths[$base_idx]}"; printf x)
+    content="${content%x}"
+
+    for (( i=base_idx-1; i>=0; i-- )); do
+        local path="${layer_paths[$i]}"
+        local strat="${layer_strategies[$i]}"
+        local layer_content
+        # Preserve trailing newlines
+        layer_content=$(cat "$path"; printf x)
+        layer_content="${layer_content%x}"
+
+        case "$strat" in
+            replace) content="$layer_content" ;;
+            prepend) content="$(printf '%s\n\n%s' "$layer_content" "$content")" ;;
+            append)  content="$(printf '%s\n\n%s' "$content" "$layer_content")" ;;
+            wrap)
+                case "$layer_content" in
+                    *'{CORE_TEMPLATE}'*) ;;
+                    *) echo "Error: wrap strategy missing {CORE_TEMPLATE} placeholder" >&2; return 1 ;;
+                esac
+                while [[ "$layer_content" == *'{CORE_TEMPLATE}'* ]]; do
+                    local before="${layer_content%%\{CORE_TEMPLATE\}*}"
+                    local after="${layer_content#*\{CORE_TEMPLATE\}}"
+                    layer_content="${before}${content}${after}"
+                done
+                content="$layer_content"
+                ;;
+            *) echo "Error: unknown strategy '$strat'" >&2; return 1 ;;
+        esac
+    done
+
+    printf '%s' "$content"
+    return 0
+}
diff --git a/.specify/scripts/bash/create-new-feature.sh b/.specify/scripts/bash/create-new-feature.sh
new file mode 100755
index 0000000..c960976
--- /dev/null
+++ b/.specify/scripts/bash/create-new-feature.sh
@@ -0,0 +1,299 @@
+#!/usr/bin/env bash
+
+set -e
+
+JSON_MODE=false
+DRY_RUN=false
+ALLOW_EXISTING=false
+SHORT_NAME=""
+BRANCH_NUMBER=""
+USE_TIMESTAMP=false
+ARGS=()
+i=1
+while [ $i -le $# ]; do
+    arg="${!i}"
+    case "$arg" in
+        --json)
+            JSON_MODE=true
+            ;;
+        --dry-run)
+            DRY_RUN=true
+            ;;
+        --allow-existing-branch)
+            ALLOW_EXISTING=true
+            ;;
+        --short-name)
+            if [ $((i + 1)) -gt $# ]; then
+                echo 'Error: --short-name requires a value' >&2
+                exit 1
+            fi
+            i=$((i + 1))
+            next_arg="${!i}"
+            # Check if the next argument is another option (starts with --)
+            if [[ "$next_arg" == --* ]]; then
+                echo 'Error: --short-name requires a value' >&2
+                exit 1
+            fi
+            SHORT_NAME="$next_arg"
+            ;;
+        --number)
+            if [ $((i + 1)) -gt $# ]; then
+                echo 'Error: --number requires a value' >&2
+                exit 1
+            fi
+            i=$((i + 1))
+            next_arg="${!i}"
+            if [[ "$next_arg" == --* ]]; then
+                echo 'Error: --number requires a value' >&2
+                exit 1
+            fi
+            BRANCH_NUMBER="$next_arg"
+            ;;
+        --timestamp)
+            USE_TIMESTAMP=true
+            ;;
+        --help|-h)
+            echo "Usage: $0 [--json] [--dry-run] [--allow-existing-branch] [--short-name <name>] [--number N] [--timestamp] <feature_description>"
+            echo ""
+            echo "Options:"
+            echo "  --json              Output in JSON format"
+            echo "  --dry-run           Compute feature name and paths without creating directories or files"
+            echo "  --allow-existing-branch  Reuse an existing feature directory if it already exists"
+            echo "  --short-name <name> Provide a custom short name (2-4 words) for the feature"
+            echo "  --number N          Specify branch number manually (overrides auto-detection)"
+            echo "  --timestamp         Use timestamp prefix (YYYYMMDD-HHMMSS) instead of sequential numbering"
+            echo "  --help, -h          Show this help message"
+            echo ""
+            echo "Examples:"
+            echo "  $0 'Add user authentication system' --short-name 'user-auth'"
+            echo "  $0 'Implement OAuth2 integration for API' --number 5"
+            echo "  $0 --timestamp --short-name 'user-auth' 'Add user authentication'"
+            exit 0
+            ;;
+        *)
+            ARGS+=("$arg")
+            ;;
+    esac
+    i=$((i + 1))
+done
+
+FEATURE_DESCRIPTION="${ARGS[*]}"
+if [ -z "$FEATURE_DESCRIPTION" ]; then
+    echo "Usage: $0 [--json] [--dry-run] [--allow-existing-branch] [--short-name <name>] [--number N] [--timestamp] <feature_description>" >&2
+    exit 1
+fi
+
+# Trim whitespace and validate description is not empty (e.g., user passed only whitespace)
+FEATURE_DESCRIPTION=$(echo "$FEATURE_DESCRIPTION" | sed -E 's/^[[:space:]]+|[[:space:]]+$//g')
+if [ -z "$FEATURE_DESCRIPTION" ]; then
+    echo "Error: Feature description cannot be empty or contain only whitespace" >&2
+    exit 1
+fi
+
+# Function to get highest number from specs directory
+get_highest_from_specs() {
+    local specs_dir="$1"
+    local highest=0
+    
+    if [ -d "$specs_dir" ]; then
+        for dir in "$specs_dir"/*; do
+            [ -d "$dir" ] || continue
+            dirname=$(basename "$dir")
+            # Match sequential prefixes (>=3 digits), but skip timestamp dirs.
+            if echo "$dirname" | grep -Eq '^[0-9]{3,}-' && ! echo "$dirname" | grep -Eq '^[0-9]{8}-[0-9]{6}-'; then
+                number=$(echo "$dirname" | grep -Eo '^[0-9]+')
+                number=$((10#$number))
+                if [ "$number" -gt "$highest" ]; then
+                    highest=$number
+                fi
+            fi
+        done
+    fi
+    
+    echo "$highest"
+}
+
+# Function to clean and format a branch name
+clean_branch_name() {
+    local name="$1"
+    echo "$name" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/-\+/-/g' | sed 's/^-//' | sed 's/-$//'
+}
+
+# Resolve repository root using common.sh functions which prioritize .specify
+SCRIPT_DIR="$(CDPATH="" cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+source "$SCRIPT_DIR/common.sh"
+
+REPO_ROOT=$(get_repo_root) || exit 1
+
+cd "$REPO_ROOT"
+
+SPECS_DIR="$REPO_ROOT/specs"
+if [ "$DRY_RUN" != true ]; then
+    mkdir -p "$SPECS_DIR"
+fi
+
+# Function to generate branch name with stop word filtering and length filtering
+generate_branch_name() {
+    local description="$1"
+    
+    # Common stop words to filter out
+    local stop_words="^(i|a|an|the|to|for|of|in|on|at|by|with|from|is|are|was|were|be|been|being|have|has|had|do|does|did|will|would|should|could|can|may|might|must|shall|this|that|these|those|my|your|our|their|want|need|add|get|set)$"
+    
+    # Convert to lowercase and split into words
+    local clean_name=$(echo "$description" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/ /g')
+    
+    # Filter words: remove stop words and words shorter than 3 chars (unless they're uppercase acronyms in original)
+    local meaningful_words=()
+    for word in $clean_name; do
+        # Skip empty words
+        [ -z "$word" ] && continue
+        
+        # Keep words that are NOT stop words AND (length >= 3 OR are potential acronyms)
+        if ! echo "$word" | grep -qiE "$stop_words"; then
+            if [ ${#word} -ge 3 ]; then
+                meaningful_words+=("$word")
+            elif echo "$description" | grep -q "\b${word^^}\b"; then
+                # Keep short words if they appear as uppercase in original (likely acronyms)
+                meaningful_words+=("$word")
+            fi
+        fi
+    done
+    
+    # If we have meaningful words, use first 3-4 of them
+    if [ ${#meaningful_words[@]} -gt 0 ]; then
+        local max_words=3
+        if [ ${#meaningful_words[@]} -eq 4 ]; then max_words=4; fi
+        
+        local result=""
+        local count=0
+        for word in "${meaningful_words[@]}"; do
+            if [ $count -ge $max_words ]; then break; fi
+            if [ -n "$result" ]; then result="$result-"; fi
+            result="$result$word"
+            count=$((count + 1))
+        done
+        echo "$result"
+    else
+        # Fallback to original logic if no meaningful words found
+        local cleaned=$(clean_branch_name "$description")
+        echo "$cleaned" | tr '-' '\n' | grep -v '^$' | head -3 | tr '\n' '-' | sed 's/-$//'
+    fi
+}
+
+# Generate branch name
+if [ -n "$SHORT_NAME" ]; then
+    # Use provided short name, just clean it up
+    BRANCH_SUFFIX=$(clean_branch_name "$SHORT_NAME")
+else
+    # Generate from description with smart filtering
+    BRANCH_SUFFIX=$(generate_branch_name "$FEATURE_DESCRIPTION")
+fi
+
+# Warn if --number and --timestamp are both specified
+if [ "$USE_TIMESTAMP" = true ] && [ -n "$BRANCH_NUMBER" ]; then
+    >&2 echo "[specify] Warning: --number is ignored when --timestamp is used"
+    BRANCH_NUMBER=""
+fi
+
+# Determine branch prefix
+if [ "$USE_TIMESTAMP" = true ]; then
+    FEATURE_NUM=$(date +%Y%m%d-%H%M%S)
+    BRANCH_NAME="${FEATURE_NUM}-${BRANCH_SUFFIX}"
+else
+    # Determine branch number from existing feature directories
+    if [ -z "$BRANCH_NUMBER" ]; then
+        HIGHEST=$(get_highest_from_specs "$SPECS_DIR")
+        BRANCH_NUMBER=$((HIGHEST + 1))
+    fi
+
+    # Force base-10 interpretation to prevent octal conversion (e.g., 010 → 8 in octal, but should be 10 in decimal)
+    FEATURE_NUM=$(printf "%03d" "$((10#$BRANCH_NUMBER))")
+    BRANCH_NAME="${FEATURE_NUM}-${BRANCH_SUFFIX}"
+fi
+
+# GitHub enforces a 244-byte limit on branch names
+# Validate and truncate if necessary
+MAX_BRANCH_LENGTH=244
+if [ ${#BRANCH_NAME} -gt $MAX_BRANCH_LENGTH ]; then
+    # Calculate how much we need to trim from suffix
+    # Account for prefix length: timestamp (15) + hyphen (1) = 16, or sequential (3) + hyphen (1) = 4
+    PREFIX_LENGTH=$(( ${#FEATURE_NUM} + 1 ))
+    MAX_SUFFIX_LENGTH=$((MAX_BRANCH_LENGTH - PREFIX_LENGTH))
+    
+    # Truncate suffix at word boundary if possible
+    TRUNCATED_SUFFIX=$(echo "$BRANCH_SUFFIX" | cut -c1-$MAX_SUFFIX_LENGTH)
+    # Remove trailing hyphen if truncation created one
+    TRUNCATED_SUFFIX=$(echo "$TRUNCATED_SUFFIX" | sed 's/-$//')
+    
+    ORIGINAL_BRANCH_NAME="$BRANCH_NAME"
+    BRANCH_NAME="${FEATURE_NUM}-${TRUNCATED_SUFFIX}"
+    
+    >&2 echo "[specify] Warning: Branch name exceeded GitHub's 244-byte limit"
+    >&2 echo "[specify] Original: $ORIGINAL_BRANCH_NAME (${#ORIGINAL_BRANCH_NAME} bytes)"
+    >&2 echo "[specify] Truncated to: $BRANCH_NAME (${#BRANCH_NAME} bytes)"
+fi
+
+FEATURE_DIR="$SPECS_DIR/$BRANCH_NAME"
+SPEC_FILE="$FEATURE_DIR/spec.md"
+
+if [ "$DRY_RUN" != true ]; then
+    if [ -d "$FEATURE_DIR" ] && [ "$ALLOW_EXISTING" != true ]; then
+        if [ "$USE_TIMESTAMP" = true ]; then
+            >&2 echo "Error: Feature directory '$FEATURE_DIR' already exists. Rerun to get a new timestamp or use a different --short-name."
+        else
+            >&2 echo "Error: Feature directory '$FEATURE_DIR' already exists. Please use a different feature name or specify a different number with --number."
+        fi
+        exit 1
+    fi
+
+    mkdir -p "$FEATURE_DIR"
+
+    if [ ! -f "$SPEC_FILE" ]; then
+        TEMPLATE=$(resolve_template "spec-template" "$REPO_ROOT") || true
+        if [ -n "$TEMPLATE" ] && [ -f "$TEMPLATE" ]; then
+            cp "$TEMPLATE" "$SPEC_FILE"
+        else
+            echo "Warning: Spec template not found; created empty spec file" >&2
+            touch "$SPEC_FILE"
+        fi
+    fi
+
+    # Persist to .specify/feature.json so downstream commands can find the feature
+    _persist_feature_json "$REPO_ROOT" "$FEATURE_DIR"
+
+    # Inform the user how to set feature state in their own shell
+    printf '# To persist: export SPECIFY_FEATURE=%q\n' "$BRANCH_NAME" >&2
+    printf '#              export SPECIFY_FEATURE_DIRECTORY=%q\n' "$FEATURE_DIR" >&2
+fi
+
+if $JSON_MODE; then
+    if command -v jq >/dev/null 2>&1; then
+        if [ "$DRY_RUN" = true ]; then
+            jq -cn \
+                --arg branch_name "$BRANCH_NAME" \
+                --arg spec_file "$SPEC_FILE" \
+                --arg feature_num "$FEATURE_NUM" \
+                '{BRANCH_NAME:$branch_name,SPEC_FILE:$spec_file,FEATURE_NUM:$feature_num,DRY_RUN:true}'
+        else
+            jq -cn \
+                --arg branch_name "$BRANCH_NAME" \
+                --arg spec_file "$SPEC_FILE" \
+                --arg feature_num "$FEATURE_NUM" \
+                '{BRANCH_NAME:$branch_name,SPEC_FILE:$spec_file,FEATURE_NUM:$feature_num}'
+        fi
+    else
+        if [ "$DRY_RUN" = true ]; then
+            printf '{"BRANCH_NAME":"%s","SPEC_FILE":"%s","FEATURE_NUM":"%s","DRY_RUN":true}\n' "$(json_escape "$BRANCH_NAME")" "$(json_escape "$SPEC_FILE")" "$(json_escape "$FEATURE_NUM")"
+        else
+            printf '{"BRANCH_NAME":"%s","SPEC_FILE":"%s","FEATURE_NUM":"%s"}\n' "$(json_escape "$BRANCH_NAME")" "$(json_escape "$SPEC_FILE")" "$(json_escape "$FEATURE_NUM")"
+        fi
+    fi
+else
+    echo "BRANCH_NAME: $BRANCH_NAME"
+    echo "SPEC_FILE: $SPEC_FILE"
+    echo "FEATURE_NUM: $FEATURE_NUM"
+    if [ "$DRY_RUN" != true ]; then
+        printf '# To persist in your shell: export SPECIFY_FEATURE=%q\n' "$BRANCH_NAME"
+        printf '#                           export SPECIFY_FEATURE_DIRECTORY=%q\n' "$FEATURE_DIR"
+    fi
+fi
diff --git a/.specify/scripts/bash/setup-plan.sh b/.specify/scripts/bash/setup-plan.sh
new file mode 100755
index 0000000..cb67943
--- /dev/null
+++ b/.specify/scripts/bash/setup-plan.sh
@@ -0,0 +1,84 @@
+#!/usr/bin/env bash
+
+set -e
+
+# Parse command line arguments
+JSON_MODE=false
+ARGS=()
+
+for arg in "$@"; do
+    case "$arg" in
+        --json) 
+            JSON_MODE=true 
+            ;;
+        --help|-h) 
+            echo "Usage: $0 [--json]"
+            echo "  --json    Output results in JSON format"
+            echo "  --help    Show this help message"
+            exit 0 
+            ;;
+        *) 
+            ARGS+=("$arg") 
+            ;;
+    esac
+done
+
+# Get script directory and load common functions
+SCRIPT_DIR="$(CDPATH="" cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+source "$SCRIPT_DIR/common.sh"
+
+# Get all paths and variables from common functions
+_paths_output=$(get_feature_paths) || { echo "ERROR: Failed to resolve feature paths" >&2; exit 1; }
+eval "$_paths_output"
+unset _paths_output
+
+# Ensure the feature directory exists
+mkdir -p "$FEATURE_DIR"
+
+# Copy plan template if plan doesn't already exist
+if [[ -f "$IMPL_PLAN" ]]; then
+    if $JSON_MODE; then
+        echo "Plan already exists at $IMPL_PLAN, skipping template copy" >&2
+    else
+        echo "Plan already exists at $IMPL_PLAN, skipping template copy"
+    fi
+else
+    TEMPLATE=$(resolve_template "plan-template" "$REPO_ROOT") || true
+    if [[ -n "$TEMPLATE" ]] && [[ -f "$TEMPLATE" ]]; then
+        cp "$TEMPLATE" "$IMPL_PLAN"
+        if $JSON_MODE; then
+            echo "Copied plan template to $IMPL_PLAN" >&2
+        else
+            echo "Copied plan template to $IMPL_PLAN"
+        fi
+    else
+        if $JSON_MODE; then
+            echo "Warning: Plan template not found" >&2
+        else
+            echo "Warning: Plan template not found"
+        fi
+        # Create a basic plan file if template doesn't exist
+        touch "$IMPL_PLAN"
+    fi
+fi
+
+# Output results
+if $JSON_MODE; then
+    if has_jq; then
+        jq -cn \
+            --arg feature_spec "$FEATURE_SPEC" \
+            --arg impl_plan "$IMPL_PLAN" \
+            --arg specs_dir "$FEATURE_DIR" \
+            --arg branch "$CURRENT_BRANCH" \
+            '{FEATURE_SPEC:$feature_spec,IMPL_PLAN:$impl_plan,SPECS_DIR:$specs_dir,BRANCH:$branch}'
+    else
+        printf '{"FEATURE_SPEC":"%s","IMPL_PLAN":"%s","SPECS_DIR":"%s","BRANCH":"%s"}\n' \
+            "$(json_escape "$FEATURE_SPEC")" "$(json_escape "$IMPL_PLAN")" "$(json_escape "$FEATURE_DIR")" "$(json_escape "$CURRENT_BRANCH")"
+    fi
+else
+    echo "FEATURE_SPEC: $FEATURE_SPEC"
+    echo "IMPL_PLAN: $IMPL_PLAN" 
+    echo "SPECS_DIR: $FEATURE_DIR"
+    echo "BRANCH: $CURRENT_BRANCH"
+fi
+
diff --git a/.specify/scripts/bash/setup-tasks.sh b/.specify/scripts/bash/setup-tasks.sh
new file mode 100755
index 0000000..ae0d7bd
--- /dev/null
+++ b/.specify/scripts/bash/setup-tasks.sh
@@ -0,0 +1,91 @@
+#!/usr/bin/env bash
+
+set -e
+
+# Parse command line arguments
+JSON_MODE=false
+
+for arg in "$@"; do
+    case "$arg" in
+        --json) JSON_MODE=true ;;
+        --help|-h)
+            echo "Usage: $0 [--json]"
+            echo "  --json    Output results in JSON format"
+            echo "  --help    Show this help message"
+            exit 0
+            ;;
+        *) echo "ERROR: Unknown option '$arg'" >&2; exit 1 ;;
+    esac
+done
+
+# Source common functions
+SCRIPT_DIR="$(CDPATH="" cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+source "$SCRIPT_DIR/common.sh"
+
+# Get feature paths
+_paths_output=$(get_feature_paths) || { echo "ERROR: Failed to resolve feature paths" >&2; exit 1; }
+eval "$_paths_output"
+unset _paths_output
+
+# Validate required files
+if [[ ! -f "$IMPL_PLAN" ]]; then
+    echo "ERROR: plan.md not found in $FEATURE_DIR" >&2
+    echo "Run /speckit-plan first to create the implementation plan." >&2
+    exit 1
+fi
+
+if [[ ! -f "$FEATURE_SPEC" ]]; then
+    echo "ERROR: spec.md not found in $FEATURE_DIR" >&2
+    echo "Run /speckit-specify first to create the feature structure." >&2
+    exit 1
+fi
+
+# Build available docs list
+docs=()
+[[ -f "$RESEARCH" ]] && docs+=("research.md")
+[[ -f "$DATA_MODEL" ]] && docs+=("data-model.md")
+if [[ -d "$CONTRACTS_DIR" ]] && [[ -n "$(ls -A "$CONTRACTS_DIR" 2>/dev/null)" ]]; then
+    docs+=("contracts/")
+fi
+[[ -f "$QUICKSTART" ]] && docs+=("quickstart.md")
+
+# Resolve tasks template through override stack
+TASKS_TEMPLATE=$(resolve_template "tasks-template" "$REPO_ROOT") || true
+if [[ -z "$TASKS_TEMPLATE" ]] || [[ ! -f "$TASKS_TEMPLATE" ]]; then
+    echo "ERROR: Could not resolve required tasks-template from the template override stack for $REPO_ROOT" >&2
+    echo "Template 'tasks-template' was not found in any supported location (overrides, presets, extensions, or shared core). Add an override at .specify/templates/overrides/tasks-template.md, or run 'specify init' / reinstall shared infra to restore the core .specify/templates/tasks-template.md template." >&2
+    exit 1
+fi
+
+# Output results
+if $JSON_MODE; then
+    if has_jq; then
+        if [[ ${#docs[@]} -eq 0 ]]; then
+            json_docs="[]"
+        else
+            json_docs=$(printf '%s\n' "${docs[@]}" | jq -R . | jq -s .)
+        fi
+        jq -cn \
+            --arg feature_dir "$FEATURE_DIR" \
+            --argjson docs "$json_docs" \
+            --arg tasks_template "${TASKS_TEMPLATE:-}" \
+            '{FEATURE_DIR:$feature_dir,AVAILABLE_DOCS:$docs,TASKS_TEMPLATE:$tasks_template}'
+    else
+        if [[ ${#docs[@]} -eq 0 ]]; then
+            json_docs="[]"
+        else
+            json_docs=$(for d in "${docs[@]}"; do printf '"%s",' "$(json_escape "$d")"; done)
+            json_docs="[${json_docs%,}]"
+        fi
+        printf '{"FEATURE_DIR":"%s","AVAILABLE_DOCS":%s,"TASKS_TEMPLATE":"%s"}\n' \
+            "$(json_escape "$FEATURE_DIR")" "$json_docs" "$(json_escape "${TASKS_TEMPLATE:-}")"
+    fi
+else
+    echo "FEATURE_DIR: $FEATURE_DIR"
+    echo "TASKS_TEMPLATE: ${TASKS_TEMPLATE:-not found}"
+    echo "AVAILABLE_DOCS:"
+    check_file "$RESEARCH" "research.md"
+    check_file "$DATA_MODEL" "data-model.md"
+    check_dir "$CONTRACTS_DIR" "contracts/"
+    check_file "$QUICKSTART" "quickstart.md"
+fi
diff --git a/.specify/templates/checklist-template.md b/.specify/templates/checklist-template.md
new file mode 100644
index 0000000..c4aa166
--- /dev/null
+++ b/.specify/templates/checklist-template.md
@@ -0,0 +1,40 @@
+# [CHECKLIST TYPE] Checklist: [FEATURE NAME]
+
+**Purpose**: [Brief description of what this checklist covers]
+**Created**: [DATE]
+**Feature**: [Link to spec.md or relevant documentation]
+
+**Note**: This checklist is generated by the `/speckit-checklist` command based on feature context and requirements.
+
+<!-- 
+  ============================================================================
+  IMPORTANT: The checklist items below are SAMPLE ITEMS for illustration only.
+  
+  The /speckit-checklist command MUST replace these with actual items based on:
+  - User's specific checklist request
+  - Feature requirements from spec.md
+  - Technical context from plan.md
+  - Implementation details from tasks.md
+  
+  DO NOT keep these sample items in the generated checklist file.
+  ============================================================================
+-->
+
+## [Category 1]
+
+- [ ] CHK001 First checklist item with clear action
+- [ ] CHK002 Second checklist item
+- [ ] CHK003 Third checklist item
+
+## [Category 2]
+
+- [ ] CHK004 Another category item
+- [ ] CHK005 Item with specific criteria
+- [ ] CHK006 Final item in this category
+
+## Notes
+
+- Check items off as completed: `[x]`
+- Add comments or findings inline
+- Link to relevant resources or documentation
+- Items are numbered sequentially for easy reference
diff --git a/.specify/templates/constitution-template.md b/.specify/templates/constitution-template.md
new file mode 100644
index 0000000..a4670ff
--- /dev/null
+++ b/.specify/templates/constitution-template.md
@@ -0,0 +1,50 @@
+# [PROJECT_NAME] Constitution
+<!-- Example: Spec Constitution, TaskFlow Constitution, etc. -->
+
+## Core Principles
+
+### [PRINCIPLE_1_NAME]
+<!-- Example: I. Library-First -->
+[PRINCIPLE_1_DESCRIPTION]
+<!-- Example: Every feature starts as a standalone library; Libraries must be self-contained, independently testable, documented; Clear purpose required - no organizational-only libraries -->
+
+### [PRINCIPLE_2_NAME]
+<!-- Example: II. CLI Interface -->
+[PRINCIPLE_2_DESCRIPTION]
+<!-- Example: Every library exposes functionality via CLI; Text in/out protocol: stdin/args → stdout, errors → stderr; Support JSON + human-readable formats -->
+
+### [PRINCIPLE_3_NAME]
+<!-- Example: III. Test-First (NON-NEGOTIABLE) -->
+[PRINCIPLE_3_DESCRIPTION]
+<!-- Example: TDD mandatory: Tests written → User approved → Tests fail → Then implement; Red-Green-Refactor cycle strictly enforced -->
+
+### [PRINCIPLE_4_NAME]
+<!-- Example: IV. Integration Testing -->
+[PRINCIPLE_4_DESCRIPTION]
+<!-- Example: Focus areas requiring integration tests: New library contract tests, Contract changes, Inter-service communication, Shared schemas -->
+
+### [PRINCIPLE_5_NAME]
+<!-- Example: V. Observability, VI. Versioning & Breaking Changes, VII. Simplicity -->
+[PRINCIPLE_5_DESCRIPTION]
+<!-- Example: Text I/O ensures debuggability; Structured logging required; Or: MAJOR.MINOR.BUILD format; Or: Start simple, YAGNI principles -->
+
+## [SECTION_2_NAME]
+<!-- Example: Additional Constraints, Security Requirements, Performance Standards, etc. -->
+
+[SECTION_2_CONTENT]
+<!-- Example: Technology stack requirements, compliance standards, deployment policies, etc. -->
+
+## [SECTION_3_NAME]
+<!-- Example: Development Workflow, Review Process, Quality Gates, etc. -->
+
+[SECTION_3_CONTENT]
+<!-- Example: Code review requirements, testing gates, deployment approval process, etc. -->
+
+## Governance
+<!-- Example: Constitution supersedes all other practices; Amendments require documentation, approval, migration plan -->
+
+[GOVERNANCE_RULES]
+<!-- Example: All PRs/reviews must verify compliance; Complexity must be justified; Use [GUIDANCE_FILE] for runtime development guidance -->
+
+**Version**: [CONSTITUTION_VERSION] | **Ratified**: [RATIFICATION_DATE] | **Last Amended**: [LAST_AMENDED_DATE]
+<!-- Example: Version: 2.1.1 | Ratified: 2025-06-13 | Last Amended: 2025-07-16 -->
diff --git a/.specify/templates/plan-template.md b/.specify/templates/plan-template.md
new file mode 100644
index 0000000..92b96c7
--- /dev/null
+++ b/.specify/templates/plan-template.md
@@ -0,0 +1,113 @@
+# Implementation Plan: [FEATURE]
+
+**Branch**: `[###-feature-name]` | **Date**: [DATE] | **Spec**: [link]
+
+**Input**: Feature specification from `/specs/[###-feature-name]/spec.md`
+
+**Note**: This template is filled in by the `/speckit-plan` command. See `.specify/templates/plan-template.md` for the execution workflow.
+
+## Summary
+
+[Extract from feature spec: primary requirement + technical approach from research]
+
+## Technical Context
+
+<!--
+  ACTION REQUIRED: Replace the content in this section with the technical details
+  for the project. The structure here is presented in advisory capacity to guide
+  the iteration process.
+-->
+
+**Language/Version**: [e.g., Python 3.11, Swift 5.9, Rust 1.75 or NEEDS CLARIFICATION]
+
+**Primary Dependencies**: [e.g., FastAPI, UIKit, LLVM or NEEDS CLARIFICATION]
+
+**Storage**: [if applicable, e.g., PostgreSQL, CoreData, files or N/A]
+
+**Testing**: [e.g., pytest, XCTest, cargo test or NEEDS CLARIFICATION]
+
+**Target Platform**: [e.g., Linux server, iOS 15+, WASM or NEEDS CLARIFICATION]
+
+**Project Type**: [e.g., library/cli/web-service/mobile-app/compiler/desktop-app or NEEDS CLARIFICATION]
+
+**Performance Goals**: [domain-specific, e.g., 1000 req/s, 10k lines/sec, 60 fps or NEEDS CLARIFICATION]
+
+**Constraints**: [domain-specific, e.g., <200ms p95, <100MB memory, offline-capable or NEEDS CLARIFICATION]
+
+**Scale/Scope**: [domain-specific, e.g., 10k users, 1M LOC, 50 screens or NEEDS CLARIFICATION]
+
+## Constitution Check
+
+*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
+
+[Gates determined based on constitution file]
+
+## Project Structure
+
+### Documentation (this feature)
+
+```text
+specs/[###-feature]/
+├── plan.md              # This file (/speckit-plan command output)
+├── research.md          # Phase 0 output (/speckit-plan command)
+├── data-model.md        # Phase 1 output (/speckit-plan command)
+├── quickstart.md        # Phase 1 output (/speckit-plan command)
+├── contracts/           # Phase 1 output (/speckit-plan command)
+└── tasks.md             # Phase 2 output (/speckit-tasks command - NOT created by /speckit-plan)
+```
+
+### Source Code (repository root)
+<!--
+  ACTION REQUIRED: Replace the placeholder tree below with the concrete layout
+  for this feature. Delete unused options and expand the chosen structure with
+  real paths (e.g., apps/admin, packages/something). The delivered plan must
+  not include Option labels.
+-->
+
+```text
+# [REMOVE IF UNUSED] Option 1: Single project (DEFAULT)
+src/
+├── models/
+├── services/
+├── cli/
+└── lib/
+
+tests/
+├── contract/
+├── integration/
+└── unit/
+
+# [REMOVE IF UNUSED] Option 2: Web application (when "frontend" + "backend" detected)
+backend/
+├── src/
+│   ├── models/
+│   ├── services/
+│   └── api/
+└── tests/
+
+frontend/
+├── src/
+│   ├── components/
+│   ├── pages/
+│   └── services/
+└── tests/
+
+# [REMOVE IF UNUSED] Option 3: Mobile + API (when "iOS/Android" detected)
+api/
+└── [same as backend above]
+
+ios/ or android/
+└── [platform-specific structure: feature modules, UI flows, platform tests]
+```
+
+**Structure Decision**: [Document the selected structure and reference the real
+directories captured above]
+
+## Complexity Tracking
+
+> **Fill ONLY if Constitution Check has violations that must be justified**
+
+| Violation | Why Needed | Simpler Alternative Rejected Because |
+|-----------|------------|-------------------------------------|
+| [e.g., 4th project] | [current need] | [why 3 projects insufficient] |
+| [e.g., Repository pattern] | [specific problem] | [why direct DB access insufficient] |
diff --git a/.specify/templates/spec-template.md b/.specify/templates/spec-template.md
new file mode 100644
index 0000000..ceb2877
--- /dev/null
+++ b/.specify/templates/spec-template.md
@@ -0,0 +1,131 @@
+# Feature Specification: [FEATURE NAME]
+
+**Feature Branch**: `[###-feature-name]`
+
+**Created**: [DATE]
+
+**Status**: Draft
+
+**Input**: User description: "$ARGUMENTS"
+
+## User Scenarios & Testing *(mandatory)*
+
+<!--
+  IMPORTANT: User stories should be PRIORITIZED as user journeys ordered by importance.
+  Each user story/journey must be INDEPENDENTLY TESTABLE - meaning if you implement just ONE of them,
+  you should still have a viable MVP (Minimum Viable Product) that delivers value.
+
+  Assign priorities (P1, P2, P3, etc.) to each story, where P1 is the most critical.
+  Think of each story as a standalone slice of functionality that can be:
+  - Developed independently
+  - Tested independently
+  - Deployed independently
+  - Demonstrated to users independently
+-->
+
+### User Story 1 - [Brief Title] (Priority: P1)
+
+[Describe this user journey in plain language]
+
+**Why this priority**: [Explain the value and why it has this priority level]
+
+**Independent Test**: [Describe how this can be tested independently - e.g., "Can be fully tested by [specific action] and delivers [specific value]"]
+
+**Acceptance Scenarios**:
+
+1. **Given** [initial state], **When** [action], **Then** [expected outcome]
+2. **Given** [initial state], **When** [action], **Then** [expected outcome]
+
+---
+
+### User Story 2 - [Brief Title] (Priority: P2)
+
+[Describe this user journey in plain language]
+
+**Why this priority**: [Explain the value and why it has this priority level]
+
+**Independent Test**: [Describe how this can be tested independently]
+
+**Acceptance Scenarios**:
+
+1. **Given** [initial state], **When** [action], **Then** [expected outcome]
+
+---
+
+### User Story 3 - [Brief Title] (Priority: P3)
+
+[Describe this user journey in plain language]
+
+**Why this priority**: [Explain the value and why it has this priority level]
+
+**Independent Test**: [Describe how this can be tested independently]
+
+**Acceptance Scenarios**:
+
+1. **Given** [initial state], **When** [action], **Then** [expected outcome]
+
+---
+
+[Add more user stories as needed, each with an assigned priority]
+
+### Edge Cases
+
+<!--
+  ACTION REQUIRED: The content in this section represents placeholders.
+  Fill them out with the right edge cases.
+-->
+
+- What happens when [boundary condition]?
+- How does system handle [error scenario]?
+
+## Requirements *(mandatory)*
+
+<!--
+  ACTION REQUIRED: The content in this section represents placeholders.
+  Fill them out with the right functional requirements.
+-->
+
+### Functional Requirements
+
+- **FR-001**: System MUST [specific capability, e.g., "allow users to create accounts"]
+- **FR-002**: System MUST [specific capability, e.g., "validate email addresses"]
+- **FR-003**: Users MUST be able to [key interaction, e.g., "reset their password"]
+- **FR-004**: System MUST [data requirement, e.g., "persist user preferences"]
+- **FR-005**: System MUST [behavior, e.g., "log all security events"]
+
+*Example of marking unclear requirements:*
+
+- **FR-006**: System MUST authenticate users via [NEEDS CLARIFICATION: auth method not specified - email/password, SSO, OAuth?]
+- **FR-007**: System MUST retain user data for [NEEDS CLARIFICATION: retention period not specified]
+
+### Key Entities *(include if feature involves data)*
+
+- **[Entity 1]**: [What it represents, key attributes without implementation]
+- **[Entity 2]**: [What it represents, relationships to other entities]
+
+## Success Criteria *(mandatory)*
+
+<!--
+  ACTION REQUIRED: Define measurable success criteria.
+  These must be technology-agnostic and measurable.
+-->
+
+### Measurable Outcomes
+
+- **SC-001**: [Measurable metric, e.g., "Users can complete account creation in under 2 minutes"]
+- **SC-002**: [Measurable metric, e.g., "System handles 1000 concurrent users without degradation"]
+- **SC-003**: [User satisfaction metric, e.g., "90% of users successfully complete primary task on first attempt"]
+- **SC-004**: [Business metric, e.g., "Reduce support tickets related to [X] by 50%"]
+
+## Assumptions
+
+<!--
+  ACTION REQUIRED: The content in this section represents placeholders.
+  Fill them out with the right assumptions based on reasonable defaults
+  chosen when the feature description did not specify certain details.
+-->
+
+- [Assumption about target users, e.g., "Users have stable internet connectivity"]
+- [Assumption about scope boundaries, e.g., "Mobile support is out of scope for v1"]
+- [Assumption about data/environment, e.g., "Existing authentication system will be reused"]
+- [Dependency on existing system/service, e.g., "Requires access to the existing user profile API"]
diff --git a/.specify/templates/tasks-template.md b/.specify/templates/tasks-template.md
new file mode 100644
index 0000000..d46a1f1
--- /dev/null
+++ b/.specify/templates/tasks-template.md
@@ -0,0 +1,252 @@
+---
+
+description: "Task list template for feature implementation"
+---
+
+# Tasks: [FEATURE NAME]
+
+**Input**: Design documents from `/specs/[###-feature-name]/`
+
+**Prerequisites**: plan.md (required), spec.md (required for user stories), research.md, data-model.md, contracts/
+
+**Tests**: The examples below include test tasks. Tests are OPTIONAL - only include them if explicitly requested in the feature specification.
+
+**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
+
+## Format: `[ID] [P?] [Story] Description`
+
+- **[P]**: Can run in parallel (different files, no dependencies)
+- **[Story]**: Which user story this task belongs to (e.g., US1, US2, US3)
+- Include exact file paths in descriptions
+
+## Path Conventions
+
+- **Single project**: `src/`, `tests/` at repository root
+- **Web app**: `backend/src/`, `frontend/src/`
+- **Mobile**: `api/src/`, `ios/src/` or `android/src/`
+- Paths shown below assume single project - adjust based on plan.md structure
+
+<!--
+  ============================================================================
+  IMPORTANT: The tasks below are SAMPLE TASKS for illustration purposes only.
+
+  The /speckit-tasks command MUST replace these with actual tasks based on:
+  - User stories from spec.md (with their priorities P1, P2, P3...)
+  - Feature requirements from plan.md
+  - Entities from data-model.md
+  - Endpoints from contracts/
+
+  Tasks MUST be organized by user story so each story can be:
+  - Implemented independently
+  - Tested independently
+  - Delivered as an MVP increment
+
+  DO NOT keep these sample tasks in the generated tasks.md file.
+  ============================================================================
+-->
+
+## Phase 1: Setup (Shared Infrastructure)
+
+**Purpose**: Project initialization and basic structure
+
+- [ ] T001 Create project structure per implementation plan
+- [ ] T002 Initialize [language] project with [framework] dependencies
+- [ ] T003 [P] Configure linting and formatting tools
+
+---
+
+## Phase 2: Foundational (Blocking Prerequisites)
+
+**Purpose**: Core infrastructure that MUST be complete before ANY user story can be implemented
+
+**⚠️ CRITICAL**: No user story work can begin until this phase is complete
+
+Examples of foundational tasks (adjust based on your project):
+
+- [ ] T004 Setup database schema and migrations framework
+- [ ] T005 [P] Implement authentication/authorization framework
+- [ ] T006 [P] Setup API routing and middleware structure
+- [ ] T007 Create base models/entities that all stories depend on
+- [ ] T008 Configure error handling and logging infrastructure
+- [ ] T009 Setup environment configuration management
+
+**Checkpoint**: Foundation ready - user story implementation can now begin in parallel
+
+---
+
+## Phase 3: User Story 1 - [Title] (Priority: P1) 🎯 MVP
+
+**Goal**: [Brief description of what this story delivers]
+
+**Independent Test**: [How to verify this story works on its own]
+
+### Tests for User Story 1 (OPTIONAL - only if tests requested) ⚠️
+
+> **NOTE: Write these tests FIRST, ensure they FAIL before implementation**
+
+- [ ] T010 [P] [US1] Contract test for [endpoint] in tests/contract/test_[name].py
+- [ ] T011 [P] [US1] Integration test for [user journey] in tests/integration/test_[name].py
+
+### Implementation for User Story 1
+
+- [ ] T012 [P] [US1] Create [Entity1] model in src/models/[entity1].py
+- [ ] T013 [P] [US1] Create [Entity2] model in src/models/[entity2].py
+- [ ] T014 [US1] Implement [Service] in src/services/[service].py (depends on T012, T013)
+- [ ] T015 [US1] Implement [endpoint/feature] in src/[location]/[file].py
+- [ ] T016 [US1] Add validation and error handling
+- [ ] T017 [US1] Add logging for user story 1 operations
+
+**Checkpoint**: At this point, User Story 1 should be fully functional and testable independently
+
+---
+
+## Phase 4: User Story 2 - [Title] (Priority: P2)
+
+**Goal**: [Brief description of what this story delivers]
+
+**Independent Test**: [How to verify this story works on its own]
+
+### Tests for User Story 2 (OPTIONAL - only if tests requested) ⚠️
+
+- [ ] T018 [P] [US2] Contract test for [endpoint] in tests/contract/test_[name].py
+- [ ] T019 [P] [US2] Integration test for [user journey] in tests/integration/test_[name].py
+
+### Implementation for User Story 2
+
+- [ ] T020 [P] [US2] Create [Entity] model in src/models/[entity].py
+- [ ] T021 [US2] Implement [Service] in src/services/[service].py
+- [ ] T022 [US2] Implement [endpoint/feature] in src/[location]/[file].py
+- [ ] T023 [US2] Integrate with User Story 1 components (if needed)
+
+**Checkpoint**: At this point, User Stories 1 AND 2 should both work independently
+
+---
+
+## Phase 5: User Story 3 - [Title] (Priority: P3)
+
+**Goal**: [Brief description of what this story delivers]
+
+**Independent Test**: [How to verify this story works on its own]
+
+### Tests for User Story 3 (OPTIONAL - only if tests requested) ⚠️
+
+- [ ] T024 [P] [US3] Contract test for [endpoint] in tests/contract/test_[name].py
+- [ ] T025 [P] [US3] Integration test for [user journey] in tests/integration/test_[name].py
+
+### Implementation for User Story 3
+
+- [ ] T026 [P] [US3] Create [Entity] model in src/models/[entity].py
+- [ ] T027 [US3] Implement [Service] in src/services/[service].py
+- [ ] T028 [US3] Implement [endpoint/feature] in src/[location]/[file].py
+
+**Checkpoint**: All user stories should now be independently functional
+
+---
+
+[Add more user story phases as needed, following the same pattern]
+
+---
+
+## Phase N: Polish & Cross-Cutting Concerns
+
+**Purpose**: Improvements that affect multiple user stories
+
+- [ ] TXXX [P] Documentation updates in docs/
+- [ ] TXXX Code cleanup and refactoring
+- [ ] TXXX Performance optimization across all stories
+- [ ] TXXX [P] Additional unit tests (if requested) in tests/unit/
+- [ ] TXXX Security hardening
+- [ ] TXXX Run quickstart.md validation
+
+---
+
+## Dependencies & Execution Order
+
+### Phase Dependencies
+
+- **Setup (Phase 1)**: No dependencies - can start immediately
+- **Foundational (Phase 2)**: Depends on Setup completion - BLOCKS all user stories
+- **User Stories (Phase 3+)**: All depend on Foundational phase completion
+  - User stories can then proceed in parallel (if staffed)
+  - Or sequentially in priority order (P1 → P2 → P3)
+- **Polish (Final Phase)**: Depends on all desired user stories being complete
+
+### User Story Dependencies
+
+- **User Story 1 (P1)**: Can start after Foundational (Phase 2) - No dependencies on other stories
+- **User Story 2 (P2)**: Can start after Foundational (Phase 2) - May integrate with US1 but should be independently testable
+- **User Story 3 (P3)**: Can start after Foundational (Phase 2) - May integrate with US1/US2 but should be independently testable
+
+### Within Each User Story
+
+- Tests (if included) MUST be written and FAIL before implementation
+- Models before services
+- Services before endpoints
+- Core implementation before integration
+- Story complete before moving to next priority
+
+### Parallel Opportunities
+
+- All Setup tasks marked [P] can run in parallel
+- All Foundational tasks marked [P] can run in parallel (within Phase 2)
+- Once Foundational phase completes, all user stories can start in parallel (if team capacity allows)
+- All tests for a user story marked [P] can run in parallel
+- Models within a story marked [P] can run in parallel
+- Different user stories can be worked on in parallel by different team members
+
+---
+
+## Parallel Example: User Story 1
+
+```bash
+# Launch all tests for User Story 1 together (if tests requested):
+Task: "Contract test for [endpoint] in tests/contract/test_[name].py"
+Task: "Integration test for [user journey] in tests/integration/test_[name].py"
+
+# Launch all models for User Story 1 together:
+Task: "Create [Entity1] model in src/models/[entity1].py"
+Task: "Create [Entity2] model in src/models/[entity2].py"
+```
+
+---
+
+## Implementation Strategy
+
+### MVP First (User Story 1 Only)
+
+1. Complete Phase 1: Setup
+2. Complete Phase 2: Foundational (CRITICAL - blocks all stories)
+3. Complete Phase 3: User Story 1
+4. **STOP and VALIDATE**: Test User Story 1 independently
+5. Deploy/demo if ready
+
+### Incremental Delivery
+
+1. Complete Setup + Foundational → Foundation ready
+2. Add User Story 1 → Test independently → Deploy/Demo (MVP!)
+3. Add User Story 2 → Test independently → Deploy/Demo
+4. Add User Story 3 → Test independently → Deploy/Demo
+5. Each story adds value without breaking previous stories
+
+### Parallel Team Strategy
+
+With multiple developers:
+
+1. Team completes Setup + Foundational together
+2. Once Foundational is done:
+   - Developer A: User Story 1
+   - Developer B: User Story 2
+   - Developer C: User Story 3
+3. Stories complete and integrate independently
+
+---
+
+## Notes
+
+- [P] tasks = different files, no dependencies
+- [Story] label maps task to specific user story for traceability
+- Each user story should be independently completable and testable
+- Verify tests fail before implementing
+- Commit after each task or logical group
+- Stop at any checkpoint to validate story independently
+- Avoid: vague tasks, same file conflicts, cross-story dependencies that break independence
diff --git a/.specify/workflows/speckit/workflow.yml b/.specify/workflows/speckit/workflow.yml
new file mode 100644
index 0000000..f69efea
--- /dev/null
+++ b/.specify/workflows/speckit/workflow.yml
@@ -0,0 +1,77 @@
+schema_version: "1.0"
+workflow:
+  id: "speckit"
+  name: "Full SDD Cycle"
+  version: "1.0.0"
+  author: "GitHub"
+  description: "Runs specify → plan → tasks → implement with review gates"
+
+requires:
+  # 0.8.5 is the first release with engine-side resolution of the
+  # ``integration: "auto"`` default. Older versions would treat "auto"
+  # as a literal integration key and fail at dispatch.
+  speckit_version: ">=0.8.5"
+  integrations:
+    # The four commands below (specify, plan, tasks, implement) are core
+    # spec-kit commands provided by every integration. The list here is an
+    # advisory, non-exhaustive compatibility hint following the documented
+    # ``any: [...]`` schema -- it is NOT a closed set. The workflow runs
+    # against any integration the project was initialized with, including
+    # ones not listed below, as long as that integration provides the four
+    # core commands referenced in ``steps``.
+    any:
+      - "claude"
+      - "copilot"
+      - "gemini"
+      - "opencode"
+
+inputs:
+  spec:
+    type: string
+    required: true
+    prompt: "Describe what you want to build"
+  integration:
+    type: string
+    default: "auto"
+    prompt: "Integration to use (e.g. claude, copilot, gemini; 'auto' uses the project's initialized integration)"
+  scope:
+    type: string
+    default: "full"
+    enum: ["full", "backend-only", "frontend-only"]
+
+steps:
+  - id: specify
+    command: speckit.specify
+    integration: "{{ inputs.integration }}"
+    input:
+      args: "{{ inputs.spec }}"
+
+  - id: review-spec
+    type: gate
+    message: "Review the generated spec before planning."
+    options: [approve, reject]
+    on_reject: abort
+
+  - id: plan
+    command: speckit.plan
+    integration: "{{ inputs.integration }}"
+    input:
+      args: "{{ inputs.spec }}"
+
+  - id: review-plan
+    type: gate
+    message: "Review the plan before generating tasks."
+    options: [approve, reject]
+    on_reject: abort
+
+  - id: tasks
+    command: speckit.tasks
+    integration: "{{ inputs.integration }}"
+    input:
+      args: "{{ inputs.spec }}"
+
+  - id: implement
+    command: speckit.implement
+    integration: "{{ inputs.integration }}"
+    input:
+      args: "{{ inputs.spec }}"
diff --git a/.specify/workflows/workflow-registry.json b/.specify/workflows/workflow-registry.json
new file mode 100644
index 0000000..2912343
--- /dev/null
+++ b/.specify/workflows/workflow-registry.json
@@ -0,0 +1,13 @@
+{
+  "schema_version": "1.0",
+  "workflows": {
+    "speckit": {
+      "name": "Full SDD Cycle",
+      "version": "1.0.0",
+      "description": "Runs specify \u2192 plan \u2192 tasks \u2192 implement with review gates",
+      "source": "bundled",
+      "installed_at": "2026-06-27T21:48:08.099604+00:00",
+      "updated_at": "2026-06-27T21:48:08.099611+00:00"
+    }
+  }
+}
\ No newline at end of file
diff --git a/CLAUDE.md b/CLAUDE.md
index a79498b..19e5637 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -142,3 +142,18 @@ cd frontend && npm install   # Vue 3 frontend
 ```
 
 `ANTHROPIC_API_KEY` must be set in the environment.
+
+For the OpenRouter backend (ensemble workflow synthesis/extraction), `pip install
+openai` and set `OPENROUTER_API_KEY` in the environment. OpenRouter is reached
+only through `campaignlib/api` (`make_client(backend="openrouter")`); select it on
+a CLI with `--backend openrouter --model <openrouter-id>`, or via the
+`CG_BACKEND=openrouter` env var.
+
+<!-- SPECKIT START -->
+For additional context about technologies to be used, project structure,
+shell commands, and other important information, read the current plan:
+`specs/001-ensemble-workflow-ui/plan.md` (Ensemble Grounding-Doc Workflow UI —
+adds a stepped `/ensemble` UI page and OpenRouter as a per-stage LLM backend
+through the single `campaignlib` seam; leaves the existing `/grounding` Anthropic
+path unchanged).
+<!-- SPECKIT END -->
diff --git a/campaign_state.py b/campaign_state.py
index 4188692..95ed1aa 100644
--- a/campaign_state.py
+++ b/campaign_state.py
@@ -66,7 +66,9 @@
 
 from campaignlib import (
     DEFAULT_MODEL,
+    add_backend_args,
     build_alias_normalizer,
+    client_from_args,
     format_npc_roster,
     load_agent_prompt,
     load_alias_map,
@@ -151,7 +153,8 @@ def main() -> None:
                              "canonical name before extract/synth, and a "
                              "'Known NPCs' roster seeds the system prompts.")
     parser.add_argument("--model", default=DEFAULT_MODEL,
-                        help="Claude model to use")
+                        help="Model id (Claude id, or an OpenRouter id for --backend openrouter)")
+    add_backend_args(parser)
     parser.add_argument("--dump-input", default=None, metavar="FILE",
                         help="Write the synthesis prompt to FILE (and FILE.system.md) "
                              "without making an API call — for use with `claude -p`.")
@@ -195,7 +198,7 @@ def main() -> None:
     if alias_map:
         print(f"Alias map: {len(alias_map)} NPC(s) from {args.dossier_dir}")
 
-    client = make_client()
+    client = client_from_args(args)
 
     if tracked_items:
         print(f"\n  Tracking {len(tracked_items)} item(s):")
diff --git a/campaignlib/__init__.py b/campaignlib/__init__.py
index 181022f..e914beb 100644
--- a/campaignlib/__init__.py
+++ b/campaignlib/__init__.py
@@ -27,7 +27,10 @@
     assemble_docs,
 )
 from .util import copy_to_clipboard, save_log
-from .api.client import make_client, call_api, call_api_with_tools, stream_api
+from .api.client import (
+    make_client, call_api, call_api_with_tools, stream_api,
+    add_backend_args, client_from_args,
+)
 from .api.batch import (
     build_batch_request,
     submit_batch,
@@ -79,6 +82,8 @@
     "save_log",
     # api — client
     "make_client",
+    "add_backend_args",
+    "client_from_args",
     "call_api",
     "call_api_with_tools",
     "stream_api",
diff --git a/campaignlib/api/backends.py b/campaignlib/api/backends.py
index f5bd658..72061c2 100644
--- a/campaignlib/api/backends.py
+++ b/campaignlib/api/backends.py
@@ -6,9 +6,11 @@
 
 import json
 import os
+import sys
 
 
 DGX_DEFAULT_MODEL = "Qwen/Qwen2.5-14B-Instruct-AWQ"
+OPENROUTER_DEFAULT_BASE_URL = "https://openrouter.ai/api/v1"
 
 
 def _flatten_to_text(value) -> str:
@@ -197,6 +199,76 @@ def extra_body_for(self, resolved_model: str, thinking: bool | None) -> dict:
         return self._dgxlib.resolve_model_config(resolved_model, thinking=thinking).extra_body
 
 
+# ── OpenRouter backend ───────────────────────────────────────────────────────
+#
+# OpenRouter (https://openrouter.ai) is an OpenAI-wire-compatible gateway to many
+# model vendors. It is reached ONLY through this class — Constitution Principle V
+# (one seam per boundary). Unlike the DGX adapter it (a) uses a real API key from
+# OPENROUTER_API_KEY, (b) does NOT consult the dgxlib model registry (OpenRouter
+# ids are namespaced, e.g. "anthropic/claude-sonnet-4", and pass through verbatim),
+# and (c) maps a no-thinking request to OpenRouter's `reasoning` control so the
+# silently-empty-extraction trap (a reasoning model spending its whole budget on a
+# think trace) can be suppressed on this path too.
+
+
+class _OpenRouterMessages(_OpenAICompatMessages):
+    """Messages façade for OpenRouter — same wire calls as the DGX adapter, but
+    model ids pass through verbatim (no dgxlib registry, no claude→DGX substitution)."""
+
+    def _resolve_model(self, model: str) -> str:
+        # Honor an explicit override; otherwise send the caller's id straight
+        # through. OpenRouter ids are vendor-namespaced, so the DGX adapter's
+        # "claude-* → DGX default" substitution must NOT apply here.
+        return self._client.model_override or model
+
+
+class _OpenRouterClient:
+    """Anthropic-shaped façade over OpenRouter's OpenAI-compatible API.
+
+    Presents the same small slice of the anthropic SDK surface
+    (``.messages.create`` / ``.messages.stream``) that stream_api / call_api use,
+    reusing the OpenAI-compat stream/response machinery.
+    """
+
+    def __init__(self, model_override: str | None = None):
+        # Check config before importing the SDK so a missing key fails with a
+        # clear, deterministic error (no silent fallback to another backend).
+        api_key = os.environ.get("OPENROUTER_API_KEY")
+        if not api_key:
+            raise RuntimeError(
+                "OPENROUTER_API_KEY is not set. The openrouter backend requires a key; "
+                "export OPENROUTER_API_KEY in the environment."
+            )
+        try:
+            from openai import OpenAI
+        except ImportError:
+            print("Error: openai not installed. Run: pip install openai", file=sys.stderr)
+            sys.exit(1)
+        base_url = (os.environ.get("OPENROUTER_BASE_URL")
+                    or OPENROUTER_DEFAULT_BASE_URL).rstrip("/")
+        self.model_override = model_override or os.environ.get("OPENROUTER_MODEL")
+        import httpx
+        env_to = os.environ.get("OPENROUTER_READ_TIMEOUT")
+        read_timeout = float(env_to) if env_to else 600.0
+        timeout = httpx.Timeout(connect=10.0, read=read_timeout, write=30.0, pool=30.0)
+        self.oai = OpenAI(base_url=base_url, api_key=api_key, timeout=timeout)
+        self.messages = _OpenRouterMessages(self)
+
+    def extra_body_for(self, resolved_model: str, thinking: bool | None) -> dict:
+        """Per-call request extras. Maps no-thinking to OpenRouter's `reasoning`.
+
+        ``thinking`` is a per-call decision: ``None`` leaves OpenRouter's default
+        (but OPENROUTER_NO_THINKING / DGX_NO_THINKING force it off for parity with
+        the DGX extraction path); ``False`` disables reasoning; ``True`` leaves it on.
+        """
+        if thinking is None and (os.environ.get("OPENROUTER_NO_THINKING")
+                                 or os.environ.get("DGX_NO_THINKING")):
+            thinking = False
+        if thinking is False:
+            return {"reasoning": {"enabled": False}}
+        return {}
+
+
 # ── Claude Code (subscription) backend ──────────────────────────────────────
 #
 # Routes generation through the `claude` CLI in headless print mode (`claude -p`)
diff --git a/campaignlib/api/client.py b/campaignlib/api/client.py
index 35ad684..d69131e 100644
--- a/campaignlib/api/client.py
+++ b/campaignlib/api/client.py
@@ -3,7 +3,29 @@
 import os
 import sys
 
-from .backends import _OpenAICompatClient, _ClaudeCodeClient
+from .backends import _OpenAICompatClient, _OpenRouterClient, _ClaudeCodeClient
+
+# Clients that accept the DGX-style `thinking` request extra (mapped per-backend
+# to the right knob: enable_thinking for vLLM, `reasoning` for OpenRouter). The
+# real Anthropic SDK would reject it, so it is only forwarded to these.
+_THINKING_EXTRA_CLIENTS = (_OpenAICompatClient, _OpenRouterClient)
+
+
+def _require_nonempty(text: str) -> str:
+    """Guard against a silently-empty model response (Constitution Principle I).
+
+    A reasoning model can spend its entire token budget on a thinking trace and
+    return empty content — which would otherwise be written to disk as a valid
+    (but empty) extraction/synthesis artifact. Fail loudly instead so the caller
+    aborts before persisting anything.
+    """
+    if text is None or not text.strip():
+        raise RuntimeError(
+            "model returned empty output (no content). On a reasoning model this "
+            "usually means the token budget was spent on a thinking trace — disable "
+            "thinking (DGX_NO_THINKING=1 / OPENROUTER_NO_THINKING=1) or raise max_tokens."
+        )
+    return text
 
 
 def make_client(endpoint: str | None = None, model_override: str | None = None,
@@ -30,6 +52,8 @@ def make_client(endpoint: str | None = None, model_override: str | None = None,
     backend = backend or os.environ.get("CG_BACKEND")
     if backend == "claude-code":
         return _ClaudeCodeClient(model_override=model_override)
+    if backend == "openrouter":
+        return _OpenRouterClient(model_override=model_override)
     endpoint = endpoint or os.environ.get("DGX_ENDPOINT")
     if endpoint:
         return _OpenAICompatClient(endpoint, model_override=model_override)
@@ -41,6 +65,37 @@ def make_client(endpoint: str | None = None, model_override: str | None = None,
     return anthropic.Anthropic()
 
 
+def add_backend_args(parser) -> None:
+    """Register the uniform --backend/--endpoint selection on a synthesis CLI.
+
+    Shared so every LLM-bearing script speaks the same backend vocabulary
+    (Constitution Principle V). Default is anthropic — see client_from_args for
+    the backward-compatibility contract.
+    """
+    parser.add_argument(
+        "--backend", choices=["anthropic", "dgx", "openrouter"], default="anthropic",
+        help="LLM backend (default: anthropic). 'dgx'/'openrouter' route through the "
+             "campaignlib seam; with no flag, behaviour is unchanged (Anthropic API).")
+    parser.add_argument(
+        "--endpoint", default=None, metavar="URL",
+        help="OpenAI-compatible endpoint for --backend dgx (OpenRouter uses its own base URL).")
+
+
+def client_from_args(args):
+    """Build a client from parsed --backend/--endpoint/--model args.
+
+    Backward-compatible: with the default ``--backend anthropic`` and no
+    ``--endpoint``, this resolves to ``make_client()`` exactly — env vars
+    (CG_BACKEND / DGX_ENDPOINT) still apply, so existing invocations are
+    byte-for-byte unchanged. For dgx/openrouter the chosen ``--model`` becomes the
+    seam's model override.
+    """
+    backend = None if getattr(args, "backend", "anthropic") == "anthropic" else args.backend
+    model_override = getattr(args, "model", None) if backend in ("dgx", "openrouter") else None
+    return make_client(backend=backend, endpoint=getattr(args, "endpoint", None),
+                       model_override=model_override)
+
+
 def _is_retryable(exc) -> bool:
     """Return True for transient API errors that are worth retrying."""
     try:
@@ -98,8 +153,8 @@ def call_api(client, system: str, content, model: str, max_tokens: int = 8096,
     """
     import time
     messages = [{"role": "user", "content": content}]
-    # `thinking` is a DGX-only knob; the real Anthropic SDK would reject it.
-    extra = {"thinking": thinking} if isinstance(client, _OpenAICompatClient) else {}
+    # `thinking` is a local/OpenRouter knob; the real Anthropic SDK would reject it.
+    extra = {"thinking": thinking} if isinstance(client, _THINKING_EXTRA_CLIENTS) else {}
     delays = [10, 20, 40]
     for attempt, delay in enumerate([-1] + delays):
         if delay >= 0:
@@ -114,7 +169,7 @@ def call_api(client, system: str, content, model: str, max_tokens: int = 8096,
                 messages=messages,
                 **extra,
             )
-            return response.content[0].text
+            return _require_nonempty(response.content[0].text)
         except Exception as e:
             if _is_retryable(e) and attempt < len(delays):
                 continue
@@ -186,8 +241,8 @@ def stream_api(client, system, user: str, model: str, max_tokens: int = 8096,
     else:
         system_arg = system
 
-    # `thinking` is a DGX-only knob; the real Anthropic SDK would reject it.
-    extra = {"thinking": thinking} if isinstance(client, _OpenAICompatClient) else {}
+    # `thinking` is a local/OpenRouter knob; the real Anthropic SDK would reject it.
+    extra = {"thinking": thinking} if isinstance(client, _THINKING_EXTRA_CLIENTS) else {}
     delays = [60, 120, 240]  # seconds to wait before each retry
     for attempt, delay in enumerate([-1] + delays):
         if delay >= 0:
@@ -209,7 +264,7 @@ def stream_api(client, system, user: str, model: str, max_tokens: int = 8096,
                     chunks.append(text)
             if not silent:
                 print()
-            return "".join(chunks)
+            return _require_nonempty("".join(chunks))
         except Exception as e:
             if _is_retryable(e) and attempt < len(delays):
                 continue
diff --git a/docs/cli/ensemble_workflow.md b/docs/cli/ensemble_workflow.md
index 0c55d33..12622fe 100644
--- a/docs/cli/ensemble_workflow.md
+++ b/docs/cli/ensemble_workflow.md
@@ -1,5 +1,13 @@
 # Ensemble extraction workflow
 
+> **Run this from the UI.** The Ensemble Workflow page (`/ensemble` in the web UI)
+> mechanizes this whole sequence — Setup → Extract → Bundle → Synthesize — with the
+> scope-review, alias-correction, and diff-before-promote checkpoints kept as gates
+> you satisfy in the CLI or a Claude chat. Each LLM-bearing stage is backend-selectable
+> (Anthropic / DGX-Spark / **OpenRouter**), chosen independently for extraction and
+> synthesis. The UI only invokes the same CLI commands documented below; nothing here
+> is bypassed. See `docs/web/web_ui.md` and `specs/001-ensemble-workflow-ui/`.
+
 End-to-end guide: from a set of chapter files to reviewed dossiers ready for synthesis into the four grounding docs (`world_state.md`, `campaign_state.md`, `party.md`, `planning.md`).
 
 The core insight is that **extraction is expensive and should happen once**. Running the Claude API inside each grounding-doc tool (the old path) re-extracts the same chapter text three or four times, spending 2.5–3.4M metered tokens per full refresh. The local ensemble approach extracts once on Spark hardware (~free), aggregates to per-entity dossiers, lets a human review scope, then calls the API only for the final synthesis per doc (~280K tokens total).
diff --git a/ensemble_batch.py b/ensemble_batch.py
index 694ce30..96990e2 100644
--- a/ensemble_batch.py
+++ b/ensemble_batch.py
@@ -32,8 +32,11 @@ def _build_parser():
         formatter_class=argparse.RawDescriptionHelpFormatter,
     )
     p.add_argument(
-        "--chapters", required=True, metavar="GLOB",
-        help="Glob for chapter files, e.g. 'docs/chapters/chapter_*.md'",
+        "--chapters", required=True, nargs="+", metavar="GLOB",
+        help="One or more globs or explicit chapter paths, e.g. "
+             "'docs/chapters/chapter_*.md' or a hand-picked subset "
+             "'docs/chapters/chapter_03.md docs/chapters/chapter_07.md'. "
+             "Matches are unioned, de-duplicated, and sorted.",
     )
     p.add_argument(
         "--per-chapter-dir", default="per_chapter", metavar="DIR",
@@ -133,9 +136,13 @@ def _build_ensemble_cmd(chapter: Path, workdir: Path, args) -> list[str]:
 def main():
     args = _build_parser().parse_args()
 
-    chapters = sorted(Path(p) for p in glob_module.glob(args.chapters))
+    matched: set[Path] = set()
+    for pattern in args.chapters:
+        for p in glob_module.glob(pattern):
+            matched.add(Path(p))
+    chapters = sorted(matched)
     if not chapters:
-        print(f"No chapter files matched: {args.chapters}", file=sys.stderr)
+        print(f"No chapter files matched: {' '.join(args.chapters)}", file=sys.stderr)
         sys.exit(1)
 
     per_chapter_dir = Path(args.per_chapter_dir)
diff --git a/frontend/src/components/layout/AppSidebar.vue b/frontend/src/components/layout/AppSidebar.vue
index d186e63..b49f875 100644
--- a/frontend/src/components/layout/AppSidebar.vue
+++ b/frontend/src/components/layout/AppSidebar.vue
@@ -49,6 +49,12 @@ const navGroups: NavGroup[] = [
       { label: 'Planning Document', path: '/grounding/planning' },
     ],
   },
+  {
+    title: 'ENSEMBLE WORKFLOW',
+    items: [
+      { label: 'Ensemble Grounding Docs', path: '/ensemble/setup' },
+    ],
+  },
   {
     title: 'PREP',
     items: [
diff --git a/frontend/src/router.ts b/frontend/src/router.ts
index c994c16..cb7669a 100644
--- a/frontend/src/router.ts
+++ b/frontend/src/router.ts
@@ -64,6 +64,33 @@ const routes = [
       },
     ],
   },
+  {
+    path: '/ensemble',
+    component: () => import('./views/EnsembleWorkflow.vue'),
+    children: [
+      { path: '', redirect: '/ensemble/setup' },
+      {
+        path: 'setup',
+        name: 'ensemble-setup',
+        component: () => import('./views/ensemble/EnsembleSetup.vue'),
+      },
+      {
+        path: 'extract',
+        name: 'ensemble-extract',
+        component: () => import('./views/ensemble/EnsembleExtract.vue'),
+      },
+      {
+        path: 'bundle',
+        name: 'ensemble-bundle',
+        component: () => import('./views/ensemble/EnsembleBundle.vue'),
+      },
+      {
+        path: 'synthesize',
+        name: 'ensemble-synthesize',
+        component: () => import('./views/ensemble/EnsembleSynthesize.vue'),
+      },
+    ],
+  },
   {
     path: '/prep',
     component: () => import('./views/PrepTools.vue'),
diff --git a/frontend/src/views/EnsembleWorkflow.vue b/frontend/src/views/EnsembleWorkflow.vue
new file mode 100644
index 0000000..923c829
--- /dev/null
+++ b/frontend/src/views/EnsembleWorkflow.vue
@@ -0,0 +1,75 @@
+<script setup lang="ts">
+import { ref, onMounted } from 'vue'
+import { useRoute } from 'vue-router'
+import { watch } from 'vue'
+import WizardShell from '../components/wizard/WizardShell.vue'
+import type { WizardStep } from '../components/wizard/WizardShell.vue'
+import { apiFetch } from '../api/client'
+
+const steps: WizardStep[] = [
+  { number: 1, label: 'Setup', path: '/ensemble/setup' },
+  { number: 2, label: 'Extract', path: '/ensemble/extract' },
+  { number: 3, label: 'Bundle', path: '/ensemble/bundle' },
+  { number: 4, label: 'Synthesize', path: '/ensemble/synthesize' },
+]
+
+// Disk-derived stage status (FR-002). Recomputed on mount and on navigation, so
+// it reflects work done from the CLI/chat and survives reload — never cached in
+// the browser.
+const stages = ref<any[]>([])
+const currentStage = ref('')
+const route = useRoute()
+
+async function loadStatus() {
+  try {
+    const s = await apiFetch('/api/ensemble/status')
+    stages.value = s.stages ?? []
+    currentStage.value = s.current_stage ?? ''
+  } catch {
+    stages.value = []
+  }
+}
+
+onMounted(loadStatus)
+watch(() => route.path, loadStatus)
+</script>
+
+<template>
+  <WizardShell :steps="steps">
+    <div class="ensemble-status">
+      <span class="status-label">PIPELINE STATE (from disk):</span>
+      <span
+        v-for="s in stages"
+        :key="s.id"
+        class="status-chip"
+        :class="{ complete: s.status === 'complete', current: s.id === currentStage }"
+      >{{ s.id }}: {{ s.status === 'complete' ? '✓' : '—' }}</span>
+      <button class="btn-neutral btn-sm refresh" @click="loadStatus">↻</button>
+    </div>
+    <router-view @changed="loadStatus" />
+  </WizardShell>
+</template>
+
+<style scoped>
+.ensemble-status {
+  display: flex;
+  align-items: center;
+  gap: 8px;
+  flex-wrap: wrap;
+  padding: 8px 14px;
+  background: var(--bg-mantle);
+  border-bottom: 1px solid var(--bg-surface0);
+  font-size: 11px;
+}
+.status-label { font-weight: 700; color: var(--text-muted); letter-spacing: .04em; }
+.status-chip {
+  font-family: var(--mono);
+  padding: 2px 8px;
+  border-radius: 10px;
+  background: var(--bg-surface0);
+  color: var(--text-muted);
+}
+.status-chip.complete { color: var(--green); }
+.status-chip.current { border: 1px solid var(--mauve); color: var(--text); }
+.refresh { margin-left: auto; }
+</style>
diff --git a/frontend/src/views/ensemble/ChapterPicker.vue b/frontend/src/views/ensemble/ChapterPicker.vue
new file mode 100644
index 0000000..2cb864e
--- /dev/null
+++ b/frontend/src/views/ensemble/ChapterPicker.vue
@@ -0,0 +1,154 @@
+<script setup lang="ts">
+import { ref, computed, watch, onMounted } from 'vue'
+import { apiFetch } from '../../api/client'
+
+interface Chapter {
+  path: string
+  stem: string
+  size: number
+  extracted: boolean
+}
+
+const props = defineProps<{
+  glob: string
+  /** Explicitly selected chapter paths. Empty == "all" (run the whole glob). */
+  selected: string[]
+}>()
+const emit = defineEmits<{
+  'update:glob': [v: string]
+  'update:selected': [v: string[]]
+}>()
+
+const chapters = ref<Chapter[]>([])
+const loading = ref(false)
+const error = ref('')
+const sortDir = ref<'asc' | 'desc'>('asc')
+
+// Local editable copy of the glob so typing doesn't re-resolve on every keystroke.
+const globText = ref(props.glob)
+watch(() => props.glob, (v) => { globText.value = v })
+
+const collator = new Intl.Collator(undefined, { numeric: true, sensitivity: 'base' })
+const sorted = computed(() => {
+  const c = [...chapters.value].sort((a, b) => collator.compare(a.stem, b.stem))
+  return sortDir.value === 'asc' ? c : c.reverse()
+})
+
+// The checked set IS the selection. Empty == no explicit pick → extraction
+// falls back to the full glob (clearly labeled below). No ambiguous "empty=all".
+const checked = computed<Set<string>>(() => new Set(props.selected))
+const selectedCount = computed(() => chapters.value.filter(c => checked.value.has(c.path)).length)
+
+async function resolve() {
+  const g = globText.value.trim()
+  emit('update:glob', g)
+  if (!g) { chapters.value = []; return }
+  loading.value = true
+  error.value = ''
+  try {
+    // Allow several whitespace/newline-separated globs in the one field.
+    const params = g.split(/\s+/).filter(Boolean)
+      .map(p => `glob=${encodeURIComponent(p)}`).join('&')
+    const r = await apiFetch(`/api/ensemble/chapters?${params}`)
+    chapters.value = r.chapters || []
+    // Drop any persisted selections that no longer resolve.
+    if (props.selected.length) {
+      const live = new Set(chapters.value.map(c => c.path))
+      const pruned = props.selected.filter(p => live.has(p))
+      if (pruned.length !== props.selected.length) emit('update:selected', pruned)
+    }
+  } catch (e: any) {
+    error.value = e?.message || 'failed to resolve glob'
+    chapters.value = []
+  } finally {
+    loading.value = false
+  }
+}
+
+function toggle(path: string) {
+  const next = new Set(checked.value)
+  if (next.has(path)) next.delete(path); else next.add(path)
+  emit('update:selected', [...next])
+}
+
+function selectAll() { emit('update:selected', chapters.value.map(c => c.path)) }
+function selectNone() { emit('update:selected', []) }
+function only(path: string) { emit('update:selected', [path]) }
+
+onMounted(resolve)
+</script>
+
+<template>
+  <div class="picker">
+    <label class="fld">
+      <span>Chapters glob <small>(space-separate for several; e.g. <code>docs/chapters/chapter_*.md</code>)</small></span>
+      <div class="globrow">
+        <input v-model="globText" type="text" @keyup.enter="resolve" />
+        <button class="btn-neutral btn-sm" :disabled="loading" @click="resolve">
+          {{ loading ? 'Resolving…' : 'Resolve' }}
+        </button>
+      </div>
+    </label>
+
+    <p v-if="error" class="err">{{ error }}</p>
+
+    <div v-if="chapters.length" class="toolbar">
+      <button class="btn-neutral btn-sm" @click="selectAll">Select all</button>
+      <button class="btn-neutral btn-sm" @click="selectNone">Select none</button>
+      <button class="btn-neutral btn-sm"
+              @click="sortDir = sortDir === 'asc' ? 'desc' : 'asc'">
+        Sort {{ sortDir === 'asc' ? '▲ A→Z' : '▼ Z→A' }}
+      </button>
+      <span class="count">{{ selectedCount }} / {{ chapters.length }} selected</span>
+    </div>
+
+    <ul v-if="chapters.length" class="list">
+      <li v-for="c in sorted" :key="c.path" :class="{ on: checked.has(c.path) }">
+        <label class="row">
+          <input type="checkbox" :checked="checked.has(c.path)" @change="toggle(c.path)" />
+          <span class="name">{{ c.stem }}</span>
+          <span v-if="c.extracted" class="badge done" title="merged.json exists on disk">extracted</span>
+          <span v-else class="badge pending">pending</span>
+        </label>
+        <button class="only" title="Select only this chapter" @click="only(c.path)">only</button>
+      </li>
+    </ul>
+    <p v-else-if="!loading && !error" class="hint">No chapters match this glob.</p>
+
+    <p v-if="!selected.length && chapters.length" class="hint warnnote">
+      Nothing selected — extraction will <strong>refuse to run</strong>. Tick the
+      chapters you want, or click <strong>Select all</strong> to choose every one.
+    </p>
+  </div>
+</template>
+
+<style scoped>
+.picker { margin-bottom: 12px; }
+.fld { display: block; font-size: 12px; }
+.fld > span { display: block; margin-bottom: 3px; color: var(--text-sub); }
+.fld small { color: var(--text-muted); font-weight: 400; }
+.globrow { display: flex; gap: 8px; }
+.globrow input {
+  flex: 1; max-width: 520px; font-size: 12px; padding: 5px 7px;
+  background: var(--bg-surface0); color: var(--text);
+  border: 1px solid var(--bg-surface1); border-radius: 4px; font-family: var(--mono);
+}
+.toolbar { display: flex; align-items: center; gap: 8px; margin: 8px 0 6px; }
+.count { font-size: 11px; color: var(--text-muted); margin-left: auto; }
+.list { list-style: none; margin: 0; padding: 0; max-height: 260px; overflow-y: auto;
+  border: 1px solid var(--bg-surface1); border-radius: 5px; }
+.list li { display: flex; align-items: center; padding: 3px 8px; border-bottom: 1px solid var(--bg-surface0); }
+.list li:last-child { border-bottom: none; }
+.list li.on { background: var(--bg-surface0); }
+.row { display: flex; align-items: center; gap: 8px; flex: 1; font-size: 12px; cursor: pointer; }
+.name { font-family: var(--mono); }
+.badge { font-size: 9px; border-radius: 8px; padding: 1px 7px; font-weight: 700; }
+.badge.done { background: var(--green); color: var(--bg-mantle); }
+.badge.pending { background: var(--bg-surface1); color: var(--text-muted); }
+.only { font-size: 10px; background: none; border: 1px solid var(--bg-surface1); color: var(--text-muted);
+  border-radius: 4px; padding: 1px 7px; cursor: pointer; }
+.only:hover { color: var(--text); border-color: var(--mauve); }
+.hint { font-size: 11px; color: var(--text-muted); margin: 6px 0 0; }
+.hint.warnnote { color: var(--peach); }
+.err { font-size: 12px; color: var(--red); }
+</style>
diff --git a/frontend/src/views/ensemble/EnsembleBundle.vue b/frontend/src/views/ensemble/EnsembleBundle.vue
new file mode 100644
index 0000000..b32e731
--- /dev/null
+++ b/frontend/src/views/ensemble/EnsembleBundle.vue
@@ -0,0 +1,156 @@
+<script setup lang="ts">
+import { ref, onMounted } from 'vue'
+import { useConfigStore } from '../../stores/config'
+import { apiFetch, apiPut } from '../../api/client'
+import { useEnsembleRun, readEnsembleConfig, type EnsembleConfig } from './useEnsembleRun'
+import StreamOutput from '../../components/shared/StreamOutput.vue'
+
+const emit = defineEmits<{ changed: [] }>()
+const config = useConfigStore()
+const cfg = ref<EnsembleConfig>(readEnsembleConfig({}))
+const listRun = useEnsembleRun()
+const aggRun = useEnsembleRun()
+const threadsRun = useEnsembleRun()
+
+// Gate state — aggregation is blocked until the operator confirms they reviewed
+// scope + aliases (Principle II: no precision decision auto-fed downstream).
+const gateConfirmed = ref(false)
+const aliasContent = ref('')
+const aliasLoaded = ref(false)
+
+onMounted(async () => {
+  await config.load()
+  cfg.value = readEnsembleConfig(config.resolved)
+  await loadAliases()
+})
+
+function commonParams() {
+  return {
+    corpus: 'docs/ensemble/per_chapter/*/merged.json',
+    aliases: cfg.value.aliases_path,
+    known_names: cfg.value.known_names,
+    min_facts: 3,
+  }
+}
+
+function runList() {
+  listRun.run('/api/ensemble/run/bundle', { ...commonParams(), list: true })
+}
+
+async function loadAliases() {
+  if (!cfg.value.aliases_path) { aliasLoaded.value = false; return }
+  try {
+    const r = await apiFetch(`/api/ensemble/file?path=${encodeURIComponent(cfg.value.aliases_path)}`)
+    aliasContent.value = r.content ?? ''
+    aliasLoaded.value = true
+  } catch {
+    aliasContent.value = ''
+    aliasLoaded.value = false
+  }
+}
+
+async function saveAliases() {
+  await apiPut(`/api/ensemble/file?path=${encodeURIComponent(cfg.value.aliases_path)}`,
+              { content: aliasContent.value })
+}
+
+function runAggregate() {
+  if (!gateConfirmed.value) return
+  aggRun.run('/api/ensemble/run/bundle', {
+    ...commonParams(),
+    known_only: true,
+    out_dir: 'docs/ensemble/state_dossiers',
+    backend: cfg.value.extract.backend,
+    endpoint: cfg.value.extract.endpoint,
+    model: cfg.value.extract.model,
+  }, (rc) => { if (rc === 0) emit('changed') })
+}
+
+function runThreads() {
+  threadsRun.run('/api/ensemble/run/threads', {
+    corpus: 'docs/ensemble/per_chapter/*/merged.json',
+    aliases: cfg.value.aliases_path,
+    output: 'docs/ensemble/threads.md',
+  })
+}
+</script>
+
+<template>
+  <div class="step">
+    <h2>Stage 2 — Fact bundling</h2>
+
+    <!-- Gate 1: scope review (--list, no model) -->
+    <section class="gate">
+      <h3>① Scope review <span class="tag">human checkpoint</span></h3>
+      <p class="hint">
+        List the entity universe and the known/location split before spending model
+        time. No model call. Review which names are <code>[known]</code> vs
+        <code>[location]</code>-scoped — this is a precision decision; you may also
+        run <code>facts_to_state.py --list</code> at the CLI.
+      </p>
+      <button class="btn-neutral" :disabled="listRun.status.value === 'running'" @click="runList">
+        {{ listRun.status.value === 'running' ? 'Listing…' : 'Run scope list' }}
+      </button>
+      <StreamOutput v-if="listRun.output.value" :text="listRun.output.value" />
+    </section>
+
+    <!-- Gate 2: alias correction -->
+    <section class="gate">
+      <h3>② Alias correction <span class="tag">human checkpoint</span></h3>
+      <p class="hint" v-if="cfg.aliases_path">
+        Edit <code>{{ cfg.aliases_path }}</code> here, or in the CLI/chat and click
+        Reload — changes are reflected without re-running any LLM step.
+      </p>
+      <p class="hint warn" v-else>Set an aliases path on the Setup step to use this gate.</p>
+      <template v-if="cfg.aliases_path">
+        <textarea v-model="aliasContent" rows="6" class="alias-box"
+                  placeholder='{ "Canonical Name": ["variant1", "variant2"] }'></textarea>
+        <div class="controls">
+          <button class="btn-neutral btn-sm" @click="loadAliases">↻ Reload from disk</button>
+          <button class="btn-success btn-sm" @click="saveAliases">Save aliases</button>
+        </div>
+      </template>
+    </section>
+
+    <!-- Gate confirm + aggregate -->
+    <section class="gate">
+      <h3>③ Aggregate dossiers</h3>
+      <label class="confirm">
+        <input type="checkbox" v-model="gateConfirmed" />
+        I reviewed the scope list and corrected aliases.
+      </label>
+      <p class="hint">
+        Runs <code>facts_to_state.py --known-only</code> → <code>state_dossiers/*.md</code>.
+        Backend: <strong>{{ cfg.extract.backend }}</strong>. Resumable.
+      </p>
+      <div class="controls">
+        <button class="btn-success" :disabled="!gateConfirmed || aggRun.status.value === 'running'"
+                @click="runAggregate">
+          {{ aggRun.status.value === 'running' ? 'Aggregating…' : '▶ Aggregate' }}
+        </button>
+        <span v-if="aggRun.returnCode.value !== null"
+              :class="aggRun.returnCode.value === 0 ? 'ok' : 'err'">
+          {{ aggRun.returnCode.value === 0 ? 'Done' : `Exit ${aggRun.returnCode.value}` }}
+        </span>
+        <button class="btn-neutral btn-sm" @click="runThreads">Render threads.md</button>
+      </div>
+      <StreamOutput v-if="aggRun.output.value" :text="aggRun.output.value" />
+      <StreamOutput v-if="threadsRun.output.value" :text="threadsRun.output.value" />
+    </section>
+  </div>
+</template>
+
+<style scoped>
+.step { padding: 16px 20px; overflow-y: auto; }
+h2 { font-size: 16px; margin-bottom: 10px; }
+h3 { font-size: 13px; margin-bottom: 4px; }
+.gate { border: 1px solid var(--bg-surface1); border-radius: 6px; padding: 12px 14px; margin-bottom: 14px; }
+.tag { font-size: 9px; background: var(--peach); color: var(--bg-mantle); border-radius: 8px; padding: 1px 7px; margin-left: 6px; font-weight: 700; vertical-align: middle; }
+.hint { font-size: 12px; color: var(--text-muted); margin: 4px 0 8px; max-width: 64ch; }
+.hint.warn { color: var(--peach); }
+.alias-box { width: 100%; max-width: 640px; font-family: var(--mono); font-size: 12px; padding: 6px 8px; background: var(--bg-surface0); color: var(--text); border: 1px solid var(--bg-surface1); border-radius: 4px; }
+.controls { display: flex; align-items: center; gap: 10px; margin: 8px 0; }
+.confirm { display: flex; align-items: center; gap: 6px; font-size: 12px; margin-bottom: 6px; }
+.ok { color: var(--green); font-size: 12px; font-weight: 600; }
+.err { color: var(--red); font-size: 12px; font-weight: 600; }
+</style>
diff --git a/frontend/src/views/ensemble/EnsembleExtract.vue b/frontend/src/views/ensemble/EnsembleExtract.vue
new file mode 100644
index 0000000..b2ad420
--- /dev/null
+++ b/frontend/src/views/ensemble/EnsembleExtract.vue
@@ -0,0 +1,82 @@
+<script setup lang="ts">
+import { ref, onMounted, computed } from 'vue'
+import { useConfigStore } from '../../stores/config'
+import { useEnsembleRun, readEnsembleConfig, type EnsembleConfig } from './useEnsembleRun'
+import StreamOutput from '../../components/shared/StreamOutput.vue'
+import ChapterPicker from './ChapterPicker.vue'
+
+const emit = defineEmits<{ changed: [] }>()
+const config = useConfigStore()
+const cfg = ref<EnsembleConfig>(readEnsembleConfig({}))
+const { output, status, returnCode, run, clear } = useEnsembleRun()
+
+onMounted(async () => {
+  await config.load()
+  cfg.value = readEnsembleConfig(config.resolved)
+})
+
+const backendLabel = computed(() => cfg.value.extract.backend)
+// Principle X — what runs is exactly what was explicitly selected. No glob
+// fallback: an empty selection cannot start a run.
+const selectedCount = computed(() => cfg.value.chapters_selected.length)
+const canRun = computed(() => selectedCount.value > 0)
+
+async function persistChapters() {
+  await config.updateSection('ensemble', {
+    chapters_glob: cfg.value.chapters_glob,
+    chapters_selected: cfg.value.chapters_selected,
+  })
+}
+
+function start() {
+  if (!canRun.value) return
+  run('/api/ensemble/run/extract', {
+    chapters: cfg.value.chapters_selected,
+    backend: cfg.value.extract.backend,
+    endpoint: cfg.value.extract.endpoint,
+    model: cfg.value.extract.model,
+  }, (rc) => { if (rc === 0) emit('changed') })
+}
+</script>
+
+<template>
+  <div class="step">
+    <h2>Stage 1 — Extraction</h2>
+    <p class="hint">
+      Runs <code>ensemble_batch.py</code> over the chapters you pick below. Resumable:
+      chapters already extracted are skipped. Backend: <strong>{{ backendLabel }}</strong>
+      (change it on the Setup step). Writes
+      <code>docs/ensemble/per_chapter/*/merged.json</code>.
+    </p>
+
+    <ChapterPicker
+      v-model:glob="cfg.chapters_glob"
+      v-model:selected="cfg.chapters_selected"
+      @update:glob="persistChapters"
+      @update:selected="persistChapters" />
+
+    <div class="controls">
+      <button class="btn-success" :disabled="status === 'running' || !canRun" @click="start">
+        {{ status === 'running' ? 'Running…'
+           : canRun ? `▶ Run extraction (${selectedCount})` : '▶ Run extraction' }}
+      </button>
+      <span v-if="!canRun" class="need">Select at least one chapter to run.</span>
+      <span v-if="returnCode !== null" :class="returnCode === 0 ? 'ok' : 'err'">
+        {{ returnCode === 0 ? 'Done' : `Exit ${returnCode}` }}
+      </span>
+      <span style="flex:1"></span>
+      <button v-if="output" class="btn-neutral btn-sm" @click="clear">Clear</button>
+    </div>
+    <StreamOutput v-if="output" :text="output" />
+  </div>
+</template>
+
+<style scoped>
+.step { padding: 16px 20px; overflow-y: auto; display: flex; flex-direction: column; }
+h2 { font-size: 16px; margin-bottom: 6px; }
+.hint { font-size: 12px; color: var(--text-muted); margin-bottom: 12px; max-width: 64ch; }
+.controls { display: flex; align-items: center; gap: 10px; margin-bottom: 10px; }
+.ok { color: var(--green); font-size: 12px; font-weight: 600; }
+.err { color: var(--red); font-size: 12px; font-weight: 600; }
+.need { color: var(--peach); font-size: 12px; }
+</style>
diff --git a/frontend/src/views/ensemble/EnsembleSetup.vue b/frontend/src/views/ensemble/EnsembleSetup.vue
new file mode 100644
index 0000000..99009c5
--- /dev/null
+++ b/frontend/src/views/ensemble/EnsembleSetup.vue
@@ -0,0 +1,112 @@
+<script setup lang="ts">
+import { ref, onMounted } from 'vue'
+import { useConfigStore } from '../../stores/config'
+import { readEnsembleConfig, type EnsembleConfig } from './useEnsembleRun'
+import ChapterPicker from './ChapterPicker.vue'
+
+const config = useConfigStore()
+const cfg = ref<EnsembleConfig>(readEnsembleConfig({}))
+const knownNamesText = ref('')
+const saved = ref(false)
+
+onMounted(async () => {
+  await config.load()
+  cfg.value = readEnsembleConfig(config.resolved)
+  knownNamesText.value = cfg.value.known_names.join('\n')
+})
+
+async function save() {
+  cfg.value.known_names = knownNamesText.value.split('\n').map(s => s.trim()).filter(Boolean)
+  await config.updateSection('ensemble', {
+    campaign_dir: cfg.value.campaign_dir,
+    chapters_glob: cfg.value.chapters_glob,
+    chapters_selected: cfg.value.chapters_selected,
+    extract: cfg.value.extract,
+    synthesize: cfg.value.synthesize,
+    known_names: cfg.value.known_names,
+    aliases_path: cfg.value.aliases_path,
+  })
+  saved.value = true
+  setTimeout(() => (saved.value = false), 1500)
+}
+</script>
+
+<template>
+  <div class="step">
+    <h2>Ensemble Setup</h2>
+    <p class="hint">
+      Point at your inputs and pick a backend for each LLM-bearing stage. Extraction
+      and synthesis are chosen independently. Files on disk are the source of truth —
+      this only records your selections.
+    </p>
+
+    <div class="fld">
+      <span>Chapters</span>
+      <ChapterPicker
+        v-model:glob="cfg.chapters_glob"
+        v-model:selected="cfg.chapters_selected" />
+    </div>
+
+    <label class="fld">
+      <span>Known-names sources (one path per line — module inventory, <code>.dedup_state.json</code>)</span>
+      <textarea v-model="knownNamesText" rows="3"
+                placeholder="docs/background/module-inventory.md&#10;docs/npcs/.dedup_state.json"></textarea>
+    </label>
+
+    <label class="fld">
+      <span>Aliases file (the alias-correction gate edits this)</span>
+      <input v-model="cfg.aliases_path" type="text" placeholder="docs/ensemble/aliases.json" />
+    </label>
+
+    <div class="profiles">
+      <fieldset v-for="stage in (['extract','synthesize'] as const)" :key="stage">
+        <legend>{{ stage === 'extract' ? 'Extraction backend' : 'Synthesis backend' }}</legend>
+        <label class="fld">
+          <span>Backend</span>
+          <select v-model="cfg[stage].backend">
+            <option value="anthropic">Anthropic (Claude)</option>
+            <option value="dgx">DGX / Spark (local)</option>
+            <option value="openrouter">OpenRouter</option>
+          </select>
+        </label>
+        <label class="fld" v-if="cfg[stage].backend === 'dgx'">
+          <span>Endpoint</span>
+          <input v-model="cfg[stage].endpoint" type="text" placeholder="http://192.168.1.147:8001/v1" />
+        </label>
+        <label class="fld" v-if="cfg[stage].backend !== 'anthropic'">
+          <span>Model id</span>
+          <input v-model="cfg[stage].model" type="text"
+                 :placeholder="cfg[stage].backend === 'openrouter' ? 'anthropic/claude-sonnet-4' : 'Qwen/Qwen3-Next-80B-A3B-Instruct-FP8'" />
+        </label>
+        <p v-if="stage === 'synthesize' && cfg.synthesize.backend !== 'anthropic'" class="warn-note">
+          Synthesis assumes a model at least as capable as Sonnet; weak local models
+          underperform here (you'll get a warning at run time, not a block).
+        </p>
+      </fieldset>
+    </div>
+
+    <div class="actions">
+      <button class="btn-success" @click="save">Save selections</button>
+      <span v-if="saved" class="ok">Saved</span>
+    </div>
+  </div>
+</template>
+
+<style scoped>
+.step { padding: 16px 20px; overflow-y: auto; }
+h2 { font-size: 16px; margin-bottom: 6px; }
+.hint { font-size: 12px; color: var(--text-muted); margin-bottom: 14px; max-width: 60ch; }
+.fld { display: block; margin-bottom: 10px; font-size: 12px; }
+.fld > span { display: block; margin-bottom: 3px; color: var(--text-sub); }
+.fld input, .fld textarea, .fld select {
+  width: 100%; max-width: 560px; font-size: 12px; padding: 5px 7px;
+  background: var(--bg-surface0); color: var(--text);
+  border: 1px solid var(--bg-surface1); border-radius: 4px; font-family: var(--mono);
+}
+.profiles { display: flex; gap: 16px; flex-wrap: wrap; margin: 10px 0; }
+fieldset { border: 1px solid var(--bg-surface1); border-radius: 6px; padding: 10px 12px; min-width: 280px; }
+legend { font-size: 11px; font-weight: 700; color: var(--mauve); padding: 0 6px; }
+.warn-note { font-size: 11px; color: var(--peach); max-width: 40ch; margin-top: 4px; }
+.actions { margin-top: 12px; display: flex; align-items: center; gap: 10px; }
+.ok { color: var(--green); font-size: 12px; font-weight: 600; }
+</style>
diff --git a/frontend/src/views/ensemble/EnsembleSynthesize.vue b/frontend/src/views/ensemble/EnsembleSynthesize.vue
new file mode 100644
index 0000000..9c3c3a3
--- /dev/null
+++ b/frontend/src/views/ensemble/EnsembleSynthesize.vue
@@ -0,0 +1,99 @@
+<script setup lang="ts">
+import { ref, onMounted, reactive } from 'vue'
+import { useConfigStore } from '../../stores/config'
+import { apiFetch, apiPost } from '../../api/client'
+import { useEnsembleRun, readEnsembleConfig, type EnsembleConfig } from './useEnsembleRun'
+import StreamOutput from '../../components/shared/StreamOutput.vue'
+
+const emit = defineEmits<{ changed: [] }>()
+const config = useConfigStore()
+const cfg = ref<EnsembleConfig>(readEnsembleConfig({}))
+const run = useEnsembleRun()
+
+const DOCS = [
+  { id: 'world_state', label: 'World State' },
+  { id: 'campaign_state', label: 'Campaign State' },
+  { id: 'party', label: 'Party' },
+  { id: 'planning', label: 'Planning' },
+] as const
+
+const selectedDoc = ref<typeof DOCS[number]['id']>('world_state')
+const diffs = reactive<Record<string, string>>({})
+
+onMounted(async () => {
+  await config.load()
+  cfg.value = readEnsembleConfig(config.resolved)
+})
+
+function synthesize() {
+  run.run('/api/ensemble/run/synthesize', {
+    doc: selectedDoc.value,
+    backend: cfg.value.synthesize.backend,
+    endpoint: cfg.value.synthesize.endpoint,
+    model: cfg.value.synthesize.model,
+  }, (rc) => { if (rc === 0) emit('changed') })
+}
+
+async function showDiff(doc: string) {
+  const r = await apiFetch(
+    `/api/ensemble/diff?draft=docs/${doc}_draft.md&live=docs/${doc}.md`)
+  diffs[doc] = r.diff || '(no differences — or live doc does not exist yet)'
+}
+
+async function promote(doc: string) {
+  if (!confirm(`Promote ${doc}_draft.md over the live docs/${doc}.md?`)) return
+  await apiPost('/api/ensemble/promote',
+                { draft: `docs/${doc}_draft.md`, live: `docs/${doc}.md` })
+  delete diffs[doc]
+  emit('changed')
+}
+</script>
+
+<template>
+  <div class="step">
+    <h2>Stage 3 — Synthesis &amp; promotion</h2>
+    <p class="hint">
+      Synthesis writes <code>*_draft.md</code> only — never a live doc. Backend:
+      <strong>{{ cfg.synthesize.backend }}</strong>. Review the diff, then promote by hand.
+    </p>
+
+    <div class="controls">
+      <select v-model="selectedDoc">
+        <option v-for="d in DOCS" :key="d.id" :value="d.id">{{ d.label }}</option>
+      </select>
+      <button class="btn-success" :disabled="run.status.value === 'running'" @click="synthesize">
+        {{ run.status.value === 'running' ? 'Synthesizing…' : '▶ Synthesize draft' }}
+      </button>
+      <span v-if="run.returnCode.value !== null"
+            :class="run.returnCode.value === 0 ? 'ok' : 'err'">
+        {{ run.returnCode.value === 0 ? 'Draft written' : `Exit ${run.returnCode.value}` }}
+      </span>
+    </div>
+    <StreamOutput v-if="run.output.value" :text="run.output.value" />
+
+    <h3>Diff &amp; promote <span class="tag">human checkpoint</span></h3>
+    <table class="promote-tbl">
+      <tr v-for="d in DOCS" :key="d.id">
+        <td>{{ d.label }}</td>
+        <td><button class="btn-neutral btn-sm" @click="showDiff(d.id)">Diff vs live</button></td>
+        <td><button class="btn-warn btn-sm" @click="promote(d.id)">Promote →</button></td>
+      </tr>
+    </table>
+    <pre v-for="(d, k) in diffs" :key="k" class="diff"><strong>{{ k }}</strong>
+{{ d }}</pre>
+  </div>
+</template>
+
+<style scoped>
+.step { padding: 16px 20px; overflow-y: auto; }
+h2 { font-size: 16px; margin-bottom: 6px; }
+h3 { font-size: 13px; margin: 16px 0 6px; }
+.tag { font-size: 9px; background: var(--peach); color: var(--bg-mantle); border-radius: 8px; padding: 1px 7px; margin-left: 6px; font-weight: 700; }
+.hint { font-size: 12px; color: var(--text-muted); margin-bottom: 12px; max-width: 64ch; }
+.controls { display: flex; align-items: center; gap: 10px; margin-bottom: 10px; }
+select { font-size: 12px; padding: 5px 7px; background: var(--bg-surface0); color: var(--text); border: 1px solid var(--bg-surface1); border-radius: 4px; }
+.ok { color: var(--green); font-size: 12px; font-weight: 600; }
+.err { color: var(--red); font-size: 12px; font-weight: 600; }
+.promote-tbl td { padding: 4px 10px 4px 0; font-size: 12px; }
+.diff { background: #141420; border: 1px solid var(--bg-surface0); border-radius: 4px; padding: 8px 10px; font-family: var(--mono); font-size: 11px; white-space: pre-wrap; max-height: 300px; overflow-y: auto; }
+</style>
diff --git a/frontend/src/views/ensemble/useEnsembleRun.ts b/frontend/src/views/ensemble/useEnsembleRun.ts
new file mode 100644
index 0000000..24e79df
--- /dev/null
+++ b/frontend/src/views/ensemble/useEnsembleRun.ts
@@ -0,0 +1,85 @@
+import { ref } from 'vue'
+import { connectSSE } from '../../api/sse'
+
+/** Run an ensemble stage over SSE. Unlike RunPanel this does NOT gate on
+ *  ANTHROPIC_API_KEY — the ensemble page supports OpenRouter/DGX backends that
+ *  don't need it. */
+export function useEnsembleRun() {
+  const output = ref('')
+  const status = ref<'idle' | 'running' | 'done' | 'error'>('idle')
+  const returnCode = ref<number | null>(null)
+
+  function buildUrl(endpoint: string, params: Record<string, any>): string {
+    const url = new URL(endpoint, window.location.origin)
+    for (const [k, v] of Object.entries(params)) {
+      if (v === '' || v === false || v === null || v === undefined) continue
+      if (Array.isArray(v)) {
+        for (const it of v) if (it) url.searchParams.append(k, String(it))
+      } else if (typeof v === 'boolean') {
+        url.searchParams.set(k, 'true')
+      } else {
+        url.searchParams.set(k, String(v))
+      }
+    }
+    return url.pathname + url.search
+  }
+
+  function run(endpoint: string, params: Record<string, any>, onDone?: (rc: number) => void) {
+    if (status.value === 'running') return
+    status.value = 'running'
+    output.value = ''
+    returnCode.value = null
+    connectSSE(buildUrl(endpoint, params), {
+      onData(t) { output.value += t },
+      onDone(rc) {
+        status.value = rc === 0 ? 'done' : 'error'
+        returnCode.value = rc
+        if (onDone) onDone(rc)
+      },
+      onError() { status.value = 'error' },
+    })
+  }
+
+  function clear() {
+    output.value = ''
+    status.value = 'idle'
+    returnCode.value = null
+  }
+
+  return { output, status, returnCode, run, clear }
+}
+
+export interface BackendProfile {
+  backend: 'anthropic' | 'dgx' | 'openrouter'
+  endpoint: string
+  model: string
+}
+
+export interface EnsembleConfig {
+  campaign_dir: string
+  chapters_glob: string
+  chapters_selected: string[]
+  extract: BackendProfile
+  synthesize: BackendProfile
+  known_names: string[]
+  aliases_path: string
+}
+
+/** Read ui.ensemble from the resolved config with safe defaults. */
+export function readEnsembleConfig(resolved: any): EnsembleConfig {
+  const e = resolved?.ui?.ensemble ?? {}
+  const prof = (p: any): BackendProfile => ({
+    backend: p?.backend ?? 'anthropic',
+    endpoint: p?.endpoint ?? '',
+    model: p?.model ?? '',
+  })
+  return {
+    campaign_dir: e.campaign_dir ?? '',
+    chapters_glob: e.chapters_glob ?? 'docs/chapters/chapter_*.md',
+    chapters_selected: Array.isArray(e.chapters_selected) ? e.chapters_selected : [],
+    extract: prof(e.extract),
+    synthesize: prof(e.synthesize),
+    known_names: Array.isArray(e.known_names) ? e.known_names : [],
+    aliases_path: e.aliases_path ?? '',
+  }
+}
diff --git a/party.py b/party.py
index cb68803..80af32c 100644
--- a/party.py
+++ b/party.py
@@ -56,7 +56,9 @@
 
 from campaignlib import (
     DEFAULT_MODEL,
+    add_backend_args,
     build_alias_normalizer,
+    client_from_args,
     format_npc_roster,
     load_agent_prompt,
     load_alias_map,
@@ -260,7 +262,8 @@ def main() -> None:
                              "canonical name before extract/synth, and a "
                              "'Known NPCs' roster seeds the system prompts.")
     parser.add_argument("--model", default=DEFAULT_MODEL,
-                        help="Claude model to use")
+                        help="Model id (Claude id, or an OpenRouter id for --backend openrouter)")
+    add_backend_args(parser)
     parser.add_argument("--dump-input", default=None, metavar="FILE",
                         help="Write the synthesis prompt to FILE (and FILE.system.md) "
                              "without making an API call — for use with `claude -p`.")
@@ -317,7 +320,7 @@ def main() -> None:
     if alias_map:
         print(f"Alias map: {len(alias_map)} NPC(s) from {args.dossier_dir}")
 
-    client = make_client()
+    client = client_from_args(args)
 
     # ── Extract pass ──────────────────────────────────────────────────────────
     if args.summaries and not args.synthesize_only:
diff --git a/planning.py b/planning.py
index b91915f..21c3490 100644
--- a/planning.py
+++ b/planning.py
@@ -58,7 +58,9 @@
 
 from campaignlib import (
     DEFAULT_MODEL,
+    add_backend_args,
     build_alias_normalizer,
+    client_from_args,
     format_npc_roster,
     load_agent_prompt,
     load_alias_map,
@@ -734,7 +736,8 @@ def main() -> None:
                              "(e.g. --since 11 when extract_011.md is the new chunk) to skip "
                              "historical chunks already rolled into dossiers.")
     parser.add_argument("--model", default=DEFAULT_MODEL,
-                        help="Claude model to use")
+                        help="Model id (Claude id, or an OpenRouter id for --backend openrouter)")
+    add_backend_args(parser)
     parser.add_argument("--campaign-dir", default=None,
                         help="Campaign workspace root (default: $CAMPAIGN_DIR "
                              "or the output file's parent, or CWD). Used to "
@@ -819,7 +822,7 @@ def main() -> None:
             print(f"Error: file not found: {f}", file=sys.stderr)
             sys.exit(1)
 
-    client = make_client()
+    client = client_from_args(args)
 
     # ── Build-dossiers mode ───────────────────────────────────────────────────
     if args.build_dossiers:
diff --git a/server/config_models.py b/server/config_models.py
index 66ac5ed..3f96238 100644
--- a/server/config_models.py
+++ b/server/config_models.py
@@ -138,12 +138,51 @@ class ProfilesSection(BaseModel):
     active: OptStr = None
 
 
+class BackendProfile(BaseModel):
+    """A selectable execution target for one LLM-bearing ensemble stage.
+
+    The API key is NEVER stored here — it is read from the environment
+    (ANTHROPIC_API_KEY / OPENROUTER_API_KEY) at run time. `endpoint` is used
+    for the dgx backend; openrouter uses its own base URL.
+    """
+
+    model_config = ConfigDict(extra="allow")
+
+    backend: Literal["anthropic", "dgx", "openrouter"] = "anthropic"
+    endpoint: OptStr = None
+    model: OptStr = None
+
+
+class EnsembleSection(BaseModel):
+    """``ui.ensemble`` — the ensemble grounding-doc workflow page.
+
+    Per-stage backend choice (extract vs synthesize are independent) plus the
+    scope inputs (known-names sources, aliases file) the bundle stage and the
+    alias-correction gate consume. Files on disk remain the source of truth;
+    this only records the operator's selections.
+    """
+
+    model_config = ConfigDict(extra="allow")
+
+    campaign_dir: OptStr = None
+    chapters_glob: str = "docs/chapters/chapter_*.md"
+    # The explicit set of chapters chosen in the picker (relative paths).
+    # Principle X — there is no silent "all": empty means *nothing selected*
+    # and extraction refuses to run; "Select all" materializes every path here.
+    chapters_selected: list[str] = Field(default_factory=list)
+    extract: BackendProfile = Field(default_factory=BackendProfile)
+    synthesize: BackendProfile = Field(default_factory=BackendProfile)
+    known_names: list[str] = Field(default_factory=list)
+    aliases_path: OptStr = None
+
+
 class UISection(BaseModel):
     """All per-page state, one attribute per page or group of pages."""
 
     session_doc: SessionDocSection = Field(default_factory=SessionDocSection)
     vtt_summary: VttSummarySection = Field(default_factory=VttSummarySection)
     grounding: GroundingSection = Field(default_factory=GroundingSection)
+    ensemble: EnsembleSection = Field(default_factory=EnsembleSection)
     profiles: ProfilesSection = Field(default_factory=ProfilesSection)
     campaign_state: _LooseSection = Field(default_factory=_LooseSection)
     distill: _LooseSection = Field(default_factory=_LooseSection)
diff --git a/server/main.py b/server/main.py
index 1b6b954..97b6a95 100644
--- a/server/main.py
+++ b/server/main.py
@@ -12,7 +12,7 @@
 from server.config import derive_campaign_paths, derive_session_paths
 from server.config_service import CampaignConfigService, ConfigError
 from server.routers import (
-    config_routes, connections, experimental, grounding, prep,
+    config_routes, connections, ensemble, experimental, grounding, prep,
     scene_editor, session_workflow, setup,
 )
 
@@ -31,6 +31,7 @@
 app.include_router(config_routes.router, prefix="/api/config", tags=["config"])
 app.include_router(session_workflow.router, prefix="/api/workflow", tags=["workflow"])
 app.include_router(grounding.router, prefix="/api/grounding", tags=["grounding"])
+app.include_router(ensemble.router, prefix="/api/ensemble", tags=["ensemble"])
 app.include_router(prep.router, prefix="/api/prep", tags=["prep"])
 app.include_router(setup.router, prefix="/api/setup", tags=["setup"])
 app.include_router(experimental.router, prefix="/api/experimental", tags=["experimental"])
diff --git a/server/routers/ensemble.py b/server/routers/ensemble.py
new file mode 100644
index 0000000..4837590
--- /dev/null
+++ b/server/routers/ensemble.py
@@ -0,0 +1,432 @@
+"""Ensemble grounding-doc workflow API routes.
+
+The UI mechanizes the ensemble pipeline (extract → bundle → synthesize → review);
+this router shells out to the CLI scripts via subprocess_runner and exposes
+disk-derived stage status. It contains NO pipeline logic and issues NO
+retrieval/render calls — the CLI is the engine (Constitution Principle VI), files
+on disk are the truth (Principle I), and OpenRouter is selected via env that the
+single campaignlib seam honors (Principle V).
+"""
+
+import difflib
+import glob
+import shutil
+from pathlib import Path
+
+from fastapi import APIRouter, HTTPException, Query, Request
+from fastapi.responses import JSONResponse, StreamingResponse
+
+from server.subprocess_runner import python_exe, stream_subprocess, sse_error_stream
+
+router = APIRouter()
+
+SCRIPT_DIR = Path(__file__).resolve().parent.parent.parent  # CampaignGenerator/
+
+# The four grounding docs the workflow targets. live = promote target; draft =
+# what synthesis writes. Nothing else may be promoted (FR-013).
+GROUNDING_DOCS = {
+    "world_state": ("docs/world_state.md", "docs/world_state_draft.md"),
+    "campaign_state": ("docs/campaign_state.md", "docs/campaign_state_draft.md"),
+    "party": ("docs/party.md", "docs/party_draft.md"),
+    "planning": ("docs/planning.md", "docs/planning_draft.md"),
+}
+
+# Models considered capable enough for synthesis (FR-014 / R6). Anything else
+# selected for the synthesize stage triggers a non-fatal warning.
+SYNTHESIS_CAPABLE = {
+    "claude-sonnet-4-6", "claude-sonnet-4-20250514",
+    "claude-opus-4-8", "claude-opus-4-6", "claude-opus-4-7",
+    "anthropic/claude-sonnet-4", "anthropic/claude-opus-4",
+    "openai/gpt-5", "google/gemini-2.5-pro",
+}
+
+
+# ── Command-building helpers (mirror grounding.py) ──────────────────────────
+
+def _cmd_opt(cmd: list[str], flag: str, value) -> None:
+    if value:
+        cmd += [flag, str(value)]
+
+
+def _cmd_multi(cmd: list[str], flag: str, values: list[str]) -> None:
+    for v in values or []:
+        if v and v.strip():
+            cmd += [flag, v.strip()]
+
+
+def _cmd_flag(cmd: list[str], flag: str, condition: bool) -> None:
+    if condition:
+        cmd.append(flag)
+
+
+def _resolve_ensemble_path(path: str) -> Path:
+    """Resolve a path and confine it to the campaign workspace (CWD).
+
+    Rejects traversal outside the workspace — the UI must not read/write
+    arbitrary disk locations.
+    """
+    if not path:
+        raise HTTPException(status_code=400, detail="path is required")
+    cwd = Path.cwd().resolve()
+    p = Path(path).expanduser()
+    if not p.is_absolute():
+        p = (cwd / p)
+    p = p.resolve()
+    if cwd != p and cwd not in p.parents:
+        raise HTTPException(status_code=400, detail="path escapes the campaign workspace")
+    return p
+
+
+def _is_live_doc(path: Path) -> bool:
+    cwd = Path.cwd().resolve()
+    live = {(cwd / live_rel).resolve() for live_rel, _ in GROUNDING_DOCS.values()}
+    return path.resolve() in live
+
+
+# ── LLM backend selection → subprocess env (Principle V) ────────────────────
+
+def _llm_env(backend: str, endpoint: str, model: str) -> dict[str, str]:
+    """Translate a per-stage backend choice into env that campaignlib.make_client
+    honors. The API key itself is inherited from the server env, never injected
+    from a query param.
+    """
+    if backend == "openrouter":
+        env = {"CG_BACKEND": "openrouter"}
+        if model:
+            env["OPENROUTER_MODEL"] = model
+        return env
+    if backend == "dgx":
+        env = {"DGX_ENDPOINT": endpoint or "http://localhost:8000"}
+        if model:
+            env["DGX_MODEL"] = model
+        return env
+    return {}  # anthropic: default path, no overrides
+
+
+# ── Per-stage in-flight lock (M4) ───────────────────────────────────────────
+# Single-operator, local-first: an in-process guard is enough to stop a
+# double-click or a second tab from launching two writers on the same workdir
+# (the orphaned-worker cache-corruption trap in ensemble_workflow.md).
+
+_RUNNING: set[str] = set()
+
+
+def _lock_key(stage: str) -> str:
+    return f"{Path.cwd().resolve()}::{stage}"
+
+
+def _run_locked(stage: str, cmd: list[str], env_extra: dict[str, str] | None = None,
+                prelude: str = "") -> StreamingResponse:
+    key = _lock_key(stage)
+    if key in _RUNNING:
+        return StreamingResponse(
+            sse_error_stream(f"stage '{stage}' is already running for this campaign — "
+                             f"wait for it to finish (avoids corrupting the workdir)."),
+            media_type="text/event-stream",
+            headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
+        )
+    _RUNNING.add(key)
+
+    def _release(_rc):
+        _RUNNING.discard(key)
+
+    async def _gen():
+        if prelude:
+            import json
+            yield f"data: {json.dumps(prelude)}\n\n"
+        async for chunk in stream_subprocess(cmd, cwd=str(Path.cwd()),
+                                             env_extra=env_extra or None,
+                                             on_complete=_release):
+            yield chunk
+
+    return StreamingResponse(
+        _gen(),
+        media_type="text/event-stream",
+        headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
+    )
+
+
+# ── Status (disk-derived, FR-002) ───────────────────────────────────────────
+
+@router.get("/status")
+def status(chapters: str = "docs/chapters/chapter_*.md"):
+    """Pipeline state computed entirely from artifacts on disk — no caching."""
+    cwd = Path.cwd()
+    per_chapter = sorted(glob.glob(str(cwd / "docs/ensemble/per_chapter/*/merged.json")))
+    dossiers = sorted(glob.glob(str(cwd / "docs/ensemble/state_dossiers/*.md")))
+    drafts = [name for name, (_, draft_rel) in GROUNDING_DOCS.items()
+              if (cwd / draft_rel).exists()]
+    promoted = [name for name, (live_rel, draft_rel) in GROUNDING_DOCS.items()
+                if (cwd / live_rel).exists() and (cwd / draft_rel).exists()
+                and (cwd / live_rel).stat().st_mtime >= (cwd / draft_rel).stat().st_mtime]
+
+    def st(done: bool) -> str:
+        return "complete" if done else "not_started"
+
+    stages = [
+        {"id": "extract", "status": st(bool(per_chapter)), "artifacts": len(per_chapter)},
+        {"id": "bundle", "status": st(bool(dossiers)), "artifacts": len(dossiers)},
+        {"id": "synthesize", "status": st(bool(drafts)), "drafts": drafts},
+        {"id": "review", "status": st(bool(promoted)), "promoted": promoted},
+    ]
+    current = next((s["id"] for s in stages if s["status"] != "complete"), "review")
+    return {"campaign_dir": str(cwd.resolve()), "stages": stages, "current_stage": current}
+
+
+# ── File listing / read / write (FR-004, FR-012, FR-017) ────────────────────
+
+@router.get("/files")
+def list_files(dir: str, pattern: str = "*.md"):
+    d = _resolve_ensemble_path(dir)
+    if not d.exists():
+        return {"dir": str(d), "exists": False, "files": []}
+    files = sorted(f.name for f in d.glob(pattern) if f.is_file())
+    return {"dir": str(d), "exists": True,
+            "files": [{"name": n, "size": (d / n).stat().st_size} for n in files]}
+
+
+@router.get("/chapters")
+def list_chapters(
+    glob: list[str] = Query(default=["docs/chapters/chapter_*.md"]),
+    per_chapter_dir: str = "docs/ensemble/per_chapter",
+):
+    """Resolve one or more chapter globs/paths to the concrete file list the
+    extraction stage would run over (FR: chapter selection). Each entry is
+    flagged `extracted` when its per-chapter merged.json already exists on disk
+    (Principle I — the picker reflects truth, not a cached selection)."""
+    cwd = Path.cwd().resolve()
+    pc_dir = (cwd / per_chapter_dir).resolve()
+    matched: dict[str, Path] = {}
+    for pattern in glob or []:
+        if not pattern or not pattern.strip():
+            continue
+        for hit in cwd.glob(pattern.strip()):
+            if not hit.is_file():
+                continue
+            r = hit.resolve()
+            if cwd not in r.parents:
+                continue  # confine to the workspace
+            matched[str(r.relative_to(cwd))] = r
+    out = []
+    for rel in sorted(matched):
+        p = matched[rel]
+        merged = pc_dir / p.stem / "merged.json"
+        out.append({"path": rel, "stem": p.stem, "size": p.stat().st_size,
+                    "extracted": merged.exists()})
+    return {"chapters": out, "count": len(out)}
+
+
+@router.get("/file")
+def read_file(path: str):
+    p = _resolve_ensemble_path(path)
+    if not p.exists() or not p.is_file():
+        return JSONResponse({"exists": False, "content": ""}, status_code=404)
+    return {"exists": True, "content": p.read_text(encoding="utf-8")}
+
+
+@router.put("/file")
+async def write_file(path: str, request: Request):
+    """Write an interchange file (e.g. aliases.json). Live grounding docs are
+    rejected — promotion is the only path to a live doc (FR-013)."""
+    p = _resolve_ensemble_path(path)
+    if _is_live_doc(p):
+        raise HTTPException(status_code=403,
+                            detail="refusing to write a live grounding doc; use /promote")
+    p.parent.mkdir(parents=True, exist_ok=True)
+    data = await request.json()
+    p.write_text(data.get("content", ""), encoding="utf-8")
+    return {"ok": True, "size": p.stat().st_size}
+
+
+# ── Diff + promote (US3 gate, FR-013, SC-005) ───────────────────────────────
+
+@router.get("/diff")
+def diff(draft: str, live: str):
+    """Unified diff draft vs live for the diff-before-promote gate. Read-only."""
+    dp = _resolve_ensemble_path(draft)
+    lp = _resolve_ensemble_path(live)
+    draft_text = dp.read_text(encoding="utf-8").splitlines(keepends=True) if dp.exists() else []
+    live_text = lp.read_text(encoding="utf-8").splitlines(keepends=True) if lp.exists() else []
+    ud = "".join(difflib.unified_diff(live_text, draft_text,
+                                      fromfile=str(lp), tofile=str(dp)))
+    return {"draft": str(dp), "live": str(lp), "diff": ud,
+            "draft_exists": dp.exists(), "live_exists": lp.exists()}
+
+
+@router.post("/promote")
+async def promote(request: Request):
+    """Copy a reviewed draft over its live grounding doc — the single explicit
+    live-doc writer (FR-013). Restricted to the four known grounding docs."""
+    body = await request.json()
+    draft = _resolve_ensemble_path(body.get("draft", ""))
+    live = _resolve_ensemble_path(body.get("live", ""))
+    if not _is_live_doc(live):
+        raise HTTPException(status_code=400,
+                            detail="promote target must be one of the four grounding docs")
+    if not draft.exists():
+        raise HTTPException(status_code=404, detail="draft does not exist")
+    shutil.copyfile(draft, live)
+    return {"ok": True, "live": str(live), "size": live.stat().st_size}
+
+
+# ── Stage runners (SSE) ─────────────────────────────────────────────────────
+
+@router.get("/run/extract")
+def run_extract(
+    chapters: list[str] = Query(default=[]),
+    per_chapter_dir: str = "docs/ensemble/per_chapter",
+    out: str = "docs/ensemble/merged.json",
+    plan: str = "",
+    endpoint: str = "",
+    model: str = "",
+    backend: str = "anthropic",
+    chapter_parallel: int = 3,
+    chunk_parallel: int = 4,
+    no_speculative: bool = False,
+):
+    # Principle X: no silent "all". An empty selection is refused, never
+    # expanded to the full glob — "Select all" must be an explicit choice the
+    # caller makes (the UI sends every resolved path; a CLI user types a glob).
+    picked = [c.strip() for c in (chapters or []) if c and c.strip()]
+    if not picked:
+        return StreamingResponse(
+            sse_error_stream("No chapters selected — pick chapters (or click "
+                             "'Select all') before running extraction."),
+            media_type="text/event-stream",
+            headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
+        )
+    cmd = [python_exe(), str(SCRIPT_DIR / "ensemble_batch.py"),
+           "--chapters", *picked,
+           "--per-chapter-dir", per_chapter_dir,
+           "--out", out]
+    _cmd_opt(cmd, "--plan", plan)
+    _cmd_opt(cmd, "--endpoints", endpoint)
+    _cmd_opt(cmd, "--model", model)
+    _cmd_opt(cmd, "--chapter-parallel", chapter_parallel)
+    _cmd_opt(cmd, "--chunk-parallel", chunk_parallel)
+    _cmd_flag(cmd, "--no-speculative", no_speculative)
+    return _run_locked("extract", cmd, env_extra=_llm_env(backend, endpoint, model))
+
+
+@router.get("/run/bundle")
+def run_bundle(
+    corpus: str = "docs/ensemble/per_chapter/*/merged.json",
+    aliases: str = "",
+    known_names: list[str] = Query(default=[]),
+    min_facts: int = 3,
+    known_only: bool = False,
+    out_dir: str = "docs/ensemble/state_dossiers",
+    list: bool = False,
+    endpoint: str = "",
+    model: str = "",
+    backend: str = "anthropic",
+    entity_parallel: int = 0,
+):
+    cmd = [python_exe(), str(SCRIPT_DIR / "facts_to_state.py"), "--corpus", corpus]
+    _cmd_opt(cmd, "--aliases", aliases)
+    _cmd_multi(cmd, "--known-names", known_names)
+    _cmd_opt(cmd, "--min-facts", min_facts)
+    if list:
+        cmd.append("--list")
+    else:
+        _cmd_opt(cmd, "--out-dir", out_dir)
+        _cmd_flag(cmd, "--known-only", known_only)
+        _cmd_opt(cmd, "--endpoints", endpoint)
+        _cmd_opt(cmd, "--model", model)
+        _cmd_opt(cmd, "--entity-parallel", entity_parallel)
+    # --list does no model work, so it never needs the lock or backend env.
+    if list:
+        return _run_locked("bundle-list", cmd)
+    return _run_locked("bundle", cmd, env_extra=_llm_env(backend, endpoint, model))
+
+
+@router.get("/run/recent-events")
+def run_recent_events(
+    corpus: str = "docs/ensemble/per_chapter/*/merged.json",
+    output: str = "docs/recent_events.md",
+    window: int = 0,
+):
+    cmd = [python_exe(), str(SCRIPT_DIR / "build_recent_events.py"),
+           "--corpus", corpus, "--output", output, "--window", str(window)]
+    return _run_locked("recent-events", cmd)
+
+
+@router.get("/run/threads")
+def run_threads(
+    corpus: str = "docs/ensemble/per_chapter/*/merged.json",
+    aliases: str = "",
+    output: str = "docs/ensemble/threads.md",
+    min_facts: int = 2,
+):
+    """(M1) Deterministic threads-track render — the chronological-spine input
+    fed to synthesis. No model call."""
+    cmd = [python_exe(), str(SCRIPT_DIR / "facts_to_state.py"),
+           "--corpus", corpus, "--types", "thread",
+           "--min-facts", str(min_facts), "--render-only", output]
+    _cmd_opt(cmd, "--aliases", aliases)
+    return _run_locked("threads", cmd)
+
+
+@router.get("/run/synthesize")
+def run_synthesize(
+    doc: str,
+    output: str = "",
+    backend: str = "anthropic",
+    endpoint: str = "",
+    model: str = "",
+    # world_state
+    dossiers: str = "docs/ensemble/merged_dossiers/*.md",
+    dossier_min_facts: int = 10,
+    party: str = "",
+    threads: str = "",
+    backstories: list[str] = Query(default=[]),
+    # campaign_state / party (staging)
+    extract_dir: str = "",
+    synthesize_only: bool = True,
+    # planning
+    npc: list[str] = Query(default=[]),
+    arc_scores: list[str] = Query(default=[]),
+    context: list[str] = Query(default=[]),
+):
+    if doc not in GROUNDING_DOCS:
+        raise HTTPException(status_code=400, detail=f"unknown doc '{doc}'")
+    out = output or GROUNDING_DOCS[doc][1]  # default to the draft path
+    # FR-013: never let synthesis target a live grounding doc.
+    if _is_live_doc(_resolve_ensemble_path(out)):
+        raise HTTPException(status_code=400,
+                            detail="synthesis output must be a draft, not a live doc")
+
+    if doc == "world_state":
+        cmd = [python_exe(), str(SCRIPT_DIR / "synthesise_world_state.py"),
+               "--dossiers", dossiers, "--dossier-min-facts", str(dossier_min_facts),
+               "--output", out]
+        _cmd_opt(cmd, "--party", party)
+        _cmd_opt(cmd, "--threads", threads)
+        _cmd_multi(cmd, "--backstories", backstories)
+    elif doc == "campaign_state":
+        cmd = [python_exe(), str(SCRIPT_DIR / "campaign_state.py"), "--output", out]
+        _cmd_flag(cmd, "--synthesize-only", synthesize_only)
+        _cmd_opt(cmd, "--extract-dir", extract_dir)
+    elif doc == "party":
+        cmd = [python_exe(), str(SCRIPT_DIR / "party.py"), "--output", out]
+        _cmd_flag(cmd, "--synthesize-only", synthesize_only)
+        _cmd_opt(cmd, "--extract-dir", extract_dir)
+    else:  # planning
+        cmd = [python_exe(), str(SCRIPT_DIR / "planning.py"), "--output", out]
+        _cmd_multi(cmd, "--npc", npc)
+        _cmd_multi(cmd, "--arc-scores", arc_scores)
+        _cmd_multi(cmd, "--context", context)
+
+    _cmd_opt(cmd, "--model", model)
+    if backend != "anthropic":
+        cmd += ["--backend", backend]
+        _cmd_opt(cmd, "--endpoint", endpoint)
+
+    # FR-014 / R6: warn (don't block) on a sub-Sonnet synthesis model.
+    prelude = ""
+    if model and model not in SYNTHESIS_CAPABLE:
+        prelude = (f"⚠️  '{model}' is not on the synthesis-capable list — synthesis "
+                   f"assumes a model at least as capable as Sonnet; output quality may "
+                   f"degrade. Proceeding anyway.\n\n")
+    return _run_locked(f"synthesize-{doc}", cmd,
+                       env_extra=_llm_env(backend, endpoint, model), prelude=prelude)
diff --git a/specs/001-ensemble-workflow-ui/checklists/requirements.md b/specs/001-ensemble-workflow-ui/checklists/requirements.md
new file mode 100644
index 0000000..d8480f2
--- /dev/null
+++ b/specs/001-ensemble-workflow-ui/checklists/requirements.md
@@ -0,0 +1,44 @@
+# Specification Quality Checklist: Ensemble Grounding-Doc Workflow UI
+
+**Purpose**: Validate specification completeness and quality before proceeding to planning
+**Created**: 2026-06-27
+**Feature**: [spec.md](../spec.md)
+
+## Content Quality
+
+- [x] No implementation details (languages, frameworks, APIs)
+- [x] Focused on user value and business needs
+- [x] Written for non-technical stakeholders
+- [x] All mandatory sections completed
+
+## Requirement Completeness
+
+- [x] No [NEEDS CLARIFICATION] markers remain
+- [x] Requirements are testable and unambiguous
+- [x] Success criteria are measurable
+- [x] Success criteria are technology-agnostic (no implementation details)
+- [x] All acceptance scenarios are defined
+- [x] Edge cases are identified
+- [x] Scope is clearly bounded
+- [x] Dependencies and assumptions identified
+
+## Feature Readiness
+
+- [x] All functional requirements have clear acceptance criteria
+- [x] User scenarios cover primary flows
+- [x] Feature meets measurable outcomes defined in Success Criteria
+- [x] No implementation details leak into specification
+
+## Notes
+
+- Two scope decisions were resolved with the user before drafting: (1) OpenRouter is
+  selectable per-stage across both LLM-bearing stages (extraction and synthesis); (2) the
+  ensemble workflow gets a new, separate UI surface and the existing Grounding Docs page is
+  left unchanged. Both are recorded in the Assumptions section.
+- The spec intentionally names "DGX/Spark", "Anthropic/Claude", and "OpenRouter" as backend
+  *choices* (product-level options the operator sees), not as implementation prescriptions.
+  These are user-facing selections, consistent with the feature's premise.
+- Constitutional alignment was kept front-of-mind: Principle IX (UI mechanizes; Claude
+  converses), Principle II (human checkpoints non-negotiable), Principle VI (CLI is the
+  engine; FR-016), and Principle I/VIII (files are truth; state discoverable; FR-002, FR-017).
+- Items marked incomplete require spec updates before `/speckit-clarify` or `/speckit-plan`.
diff --git a/specs/001-ensemble-workflow-ui/contracts/api.md b/specs/001-ensemble-workflow-ui/contracts/api.md
new file mode 100644
index 0000000..2f2f3de
--- /dev/null
+++ b/specs/001-ensemble-workflow-ui/contracts/api.md
@@ -0,0 +1,83 @@
+# API Contract: `/api/ensemble`
+
+New FastAPI router `server/routers/ensemble.py`, mounted at `/api/ensemble`, registered in `server/main.py` alongside the existing routers. It mirrors `server/routers/grounding.py`: stage runners return SSE streams from `stream_subprocess()`; status/file endpoints return JSON. **The router builds CLI commands and shells out — it contains no pipeline logic and issues no retrieval/render calls** (Principles VI, III).
+
+All run endpoints accept a per-stage backend selection: `backend` ∈ {`anthropic`, `dgx`, `openrouter`}, plus optional `endpoint` and `model`. These map to CLI flags (see `cli.md`). The `OPENROUTER_API_KEY` / `ANTHROPIC_API_KEY` are injected into the subprocess via `env_extra`, never passed as query params.
+
+---
+
+## Stage runners (SSE)
+
+### `GET /api/ensemble/run/extract`
+Runs `ensemble_batch.py` over the chapter glob.
+
+Query params: `chapters` (glob), `per_chapter_dir`, `out`, `plan`, `endpoint`/`endpoints[]`, `model`, `backend`, `chapter_parallel`, `chunk_parallel`, `embed_endpoint`, `embed_model`, `embed_threshold`, `unit_timeout`, `no_speculative` (bool).
+
+Response: `text/event-stream` — `data:` chunks of stdout/stderr; terminal `event: done` with `{"returncode": N}`.
+
+Behavior: resumable (chapters with existing `merged.json` are skipped by the CLI). On a backend/endpoint failure, the stream surfaces the error and ends with non-zero `returncode` (FR-009); prior chapters' outputs persist.
+
+### `GET /api/ensemble/run/bundle`
+Runs `facts_to_state.py` (aggregation). Supports `--list` mode (no model call) for the scope-review gate.
+
+Query params: `corpus` (glob), `aliases`, `known_names[]`, `min_facts`, `known_only` (bool), `out_dir`, `list` (bool → `--list`), `types[]`, `render_only`, `endpoint`/`endpoints[]`, `model`, `backend`, `entity_parallel`.
+
+Response: SSE as above. When `list=true`, the stream is the entity/scope table only.
+
+### `GET /api/ensemble/run/recent-events`  *(deterministic, no model)*
+Runs `build_recent_events.py`. Query params: `corpus`, `output`, `window`. SSE.
+
+### `GET /api/ensemble/run/synthesize`
+Runs one of the four synthesis scripts depending on `doc`.
+
+Query params: `doc` ∈ {`world_state`, `campaign_state`, `party`, `planning`} (selects the script), the doc-specific inputs (e.g. `dossiers`, `dossier_min_facts`, `threads`, `party`, `npc[]`, `arc_scores[]`, `context[]`, `extract_dir`, `synthesize_only`), `output` (must be a `*_draft.md` path), `backend`, `endpoint`, `model`.
+
+Response: SSE.
+
+Behavior:
+- `output` MUST resolve to a draft path; the router rejects (HTTP 400) an `output` that targets a live grounding doc (`docs/<name>.md`) to enforce FR-013.
+- If `doc` ∈ {`campaign_state`, `party`} and `backend` resolves to the subscription `claude-code` path, the router disables agent tools so output goes to stdout (the documented `claude -p` clobber gotcha) — but the default synthesis path here is direct API/OpenRouter, so this is an edge guard.
+- If the synthesis `model`/`backend` is below the capability bar, the response includes a non-fatal warning line in the stream (FR-014).
+
+---
+
+## Status & file endpoints (JSON)
+
+### `GET /api/ensemble/status?campaign_dir=…&chapters=…`
+Returns disk-derived pipeline state (R4, FR-002). No model call, no caching.
+
+```json
+{
+  "campaign_dir": "/abs/path",
+  "stages": [
+    {"id": "extract",    "status": "complete",    "artifacts": 45},
+    {"id": "bundle",     "status": "not_started", "artifacts": 0},
+    {"id": "synthesize", "status": "not_started", "drafts": []},
+    {"id": "review",     "status": "not_started"}
+  ],
+  "current_stage": "bundle"
+}
+```
+
+Completion predicates: `extract` ⇔ `per_chapter/*/merged.json` exist; `bundle` ⇔ `state_dossiers/*.md` exist; `synthesize` ⇔ `*_draft.md` exist; `review` ⇔ operator-promoted (best-effort: live doc newer than draft).
+
+### `GET /api/ensemble/files?dir=…&pattern=…`
+Lists artifacts in an ensemble subdir (dossiers, drafts, per_chapter outputs) for review. Mirrors `grounding.py:/extracts`. Returns `{dir, exists, files:[{name,size}]}`.
+
+### `GET /api/ensemble/file?path=…`  /  `PUT /api/ensemble/file?path=…`
+Read / write a single interchange file (e.g. `aliases.json`, a draft) so the operator can preview and the alias-correction gate is satisfiable from the UI *or* the CLI/chat (FR-012). Write is path-validated and confined to the campaign workspace. **PUT to a live grounding doc is rejected** (promotion is a deliberate, separate action).
+
+### `GET /api/ensemble/diff?draft=…&live=…`
+Returns a unified diff between a `*_draft.md` and its live counterpart for the diff-before-promote gate. Read-only; never writes.
+
+### `POST /api/ensemble/promote`
+Body `{draft, live}`. Copies a reviewed draft over the live doc — the single explicit promotion action (FR-013, SC-005). The router refuses any `live` outside the four known grounding docs.
+
+---
+
+## Cross-cutting contract rules
+
+1. Every run endpoint records the backend+model into the produced artifact's provenance (FR-008) — implemented in the CLI, surfaced here.
+2. No endpoint stores pipeline state server-side; status is always recomputed from disk (FR-017).
+3. Secrets travel only via `env_extra` to the subprocess, never as query params or in logs.
+4. The router never imports `anthropic`/`openai` and never calls `stream_api`/`call_api`/`retrieve` — it only spawns CLI processes (Principles III, V, VI).
diff --git a/specs/001-ensemble-workflow-ui/contracts/cli.md b/specs/001-ensemble-workflow-ui/contracts/cli.md
new file mode 100644
index 0000000..00dc77e
--- /dev/null
+++ b/specs/001-ensemble-workflow-ui/contracts/cli.md
@@ -0,0 +1,69 @@
+# CLI Contract: backend selection across LLM-bearing scripts
+
+The CLI is the engine (Principle VI); the UI only sets these flags. This contract defines the **uniform backend-selection vocabulary** added so every LLM stage can target DGX, Anthropic, or OpenRouter — and the **seam change** that makes OpenRouter reachable from the one boundary (Principle V).
+
+---
+
+## Seam: `campaignlib/api`
+
+### `make_client(endpoint=None, model_override=None, backend=None)` — MODIFY
+Add an OpenRouter branch, preserving existing precedence (`backend`/`$CG_BACKEND` first, then `endpoint`/`$DGX_ENDPOINT`, then Anthropic default):
+
+```
+backend = backend or os.environ.get("CG_BACKEND")
+if backend == "claude-code":  return _ClaudeCodeClient(...)          # existing
+if backend == "openrouter":   return _OpenRouterClient(model_override=model_override)  # NEW
+endpoint = endpoint or os.environ.get("DGX_ENDPOINT")
+if endpoint:                  return _OpenAICompatClient(endpoint, model_override)     # existing
+return anthropic.Anthropic()                                          # existing default
+```
+
+### `_OpenRouterClient` (in `campaignlib/api/backends.py`) — NEW
+- Reuses the `openai` SDK: `OpenAI(base_url="https://openrouter.ai/api/v1", api_key=os.environ["OPENROUTER_API_KEY"], timeout=…)`.
+- Base URL overridable via `OPENROUTER_BASE_URL`.
+- Model id passed through verbatim (no dgxlib registry lookup — that is the difference from `_OpenAICompatClient`).
+- Exposes the same Anthropic-shaped `.messages.create(...)` façade the other clients expose, so `stream_api`/`call_api` work unchanged.
+- Missing `OPENROUTER_API_KEY` → a clear, immediate error (no silent fallback), consistent with the seam's "the choice is explicit" docstring.
+
+**Contract test** (`tests/test_openrouter_seam.py`): `make_client(backend="openrouter")` returns the OpenRouter client; no module outside `campaignlib/api` imports `openai`/`anthropic` for OpenRouter; missing key raises.
+
+---
+
+## Synthesis scripts — ADD flags
+
+`synthesise_world_state.py`, `campaign_state.py`, `party.py`, `planning.py` each gain:
+
+| Flag | Values | Effect |
+|---|---|---|
+| `--backend` | `anthropic` (default) \| `dgx` \| `openrouter` | Passed to `make_client(backend=…)`. Omitted ⇒ `anthropic` ⇒ **identical to today** (FR-015, SC-006). |
+| `--endpoint` | URL | Passed to `make_client(endpoint=…)` (for `dgx`; OpenRouter uses its default base). |
+| `--model` | id | Already present; for `openrouter`, an OpenRouter model id. |
+
+These scripts currently call `make_client()` with no args; the change threads the parsed args into that single call. No other behavior changes.
+
+**Backward-compatibility invariant**: with none of the new flags supplied, the constructed command and the resulting output are unchanged from the current Anthropic path. This is the regression guard behind SC-006.
+
+---
+
+## Extraction / aggregation scripts — NO new flags needed
+
+`ensemble.py`, `ensemble_batch.py`, `ensemble_extract.py`, `facts_to_state.py` already accept `--endpoints`/`--dgx-endpoint`/`--model`. To target OpenRouter:
+- set `CG_BACKEND=openrouter` (env, injected by the server) **or** rely on the seam recognizing the OpenRouter selection, and
+- pass the OpenRouter `--model` id.
+
+`facts_to_state.py` already calls `make_client(endpoint=…, model_override=…)`; once the seam honors `openrouter`, no script edit is required there. (If a per-stage `--backend` flag is desired on these for symmetry, it is additive and optional.)
+
+---
+
+## Provenance (FR-008)
+
+Each LLM-bearing script records the backend+model it used into its output artifact (frontmatter or trailing comment), so a mixed-backend run is auditable. This is the same place each script already stamps `n_facts`/model metadata.
+
+---
+
+## Invariants enforced by this contract
+
+- One seam: OpenRouter is constructed only inside `campaignlib/api` (Principle V).
+- CLI-first: every backend choice is expressible and runnable from the terminal without the UI (Principle VI, FR-016).
+- Safe default: absent flags ⇒ today's Anthropic behavior (FR-015).
+- Explicit failure: a missing key or unreachable endpoint errors loudly, never silently degrades (FR-009).
diff --git a/specs/001-ensemble-workflow-ui/data-model.md b/specs/001-ensemble-workflow-ui/data-model.md
new file mode 100644
index 0000000..41c5988
--- /dev/null
+++ b/specs/001-ensemble-workflow-ui/data-model.md
@@ -0,0 +1,125 @@
+# Phase 1 Data Model: Ensemble Grounding-Doc Workflow UI
+
+This feature's "data" is almost entirely **files on disk** (Principle I) plus a small amount of **UI configuration state**. There is no new database. The entities below describe the conceptual model the UI presents and the on-disk artifacts that back it.
+
+---
+
+## 1. Backend Profile (config + runtime selection)
+
+Represents how an LLM-bearing stage executes. Selectable per stage at run time (FR-006, FR-018).
+
+| Field | Type | Notes |
+|---|---|---|
+| `backend` | enum `anthropic` \| `dgx` \| `openrouter` | Which seam branch `make_client` takes. Default `anthropic`. |
+| `endpoint` | string \| null | For `dgx`: the Spark `--endpoints` URL(s). For `openrouter`: defaults to `https://openrouter.ai/api/v1` (rarely overridden). Null for `anthropic`. |
+| `model` | string | Model id. Claude id for `anthropic`; Spark model id for `dgx`; OpenRouter id (e.g. `anthropic/claude-sonnet-4`) for `openrouter`. Free-text. |
+| `api_key_source` | derived | `ANTHROPIC_API_KEY` (anthropic), none (dgx), `OPENROUTER_API_KEY` (openrouter). Never stored in tracked config. |
+
+**Validation rules**:
+- `backend == "openrouter"` requires `OPENROUTER_API_KEY` to be present in the environment; absence surfaces as an explicit error (FR-009), not a silent fallback.
+- `backend == "dgx"` requires a reachable `endpoint`; unreachable surfaces as a fast, explicit error (edge case: local hardware unreachable).
+- A synthesis-stage profile whose `model` is not on the synthesis-capable allow-list raises a **warning, not an error** (FR-014, R6).
+
+**Persistence**: backend/endpoint/model selections persist in `ui_state.yaml` under `ui.ensemble` (per-stage). The key (secret) is environment-only.
+
+---
+
+## 2. Pipeline State (derived, not stored)
+
+The campaign's position in the workflow. **Computed from disk on every read** (FR-002, FR-017) — never cached in the browser or written as a manifest (R4).
+
+| Field | Type | Derivation |
+|---|---|---|
+| `campaign_dir` | path | From the active config (`runtime.session_dir` / campaign root). |
+| `stages` | list of Stage | One per pipeline stage (below), each with a computed status. |
+| `current_stage` | derived | First stage that is not `complete`. |
+
+There are **no state transitions stored** — the state is a pure function of which artifacts exist. "Transition" happens implicitly when a stage's artifacts appear on disk.
+
+---
+
+## 3. Stage
+
+One step in the ordered workflow. Status is derived from artifact presence (R4).
+
+| Field | Type | Notes |
+|---|---|---|
+| `id` | enum | `extract` \| `bundle` \| `synthesize` \| `review` |
+| `label` | string | Human label for the UI. |
+| `status` | enum `not_started` \| `complete` (\| `running` transient) | Derived from artifacts; `running` is an in-flight UI state only. |
+| `backend_profile` | Backend Profile \| null | Null for non-LLM stages (e.g. the `review` gate, the deterministic threads/recent-events renders). |
+| `artifacts` | list of Artifact | What this stage reads and writes. |
+| `gate` | Checkpoint \| null | A blocking human checkpoint attached to this stage, if any. |
+
+**Stage → artifact / gate map** (the concrete pipeline):
+
+| Stage | Backend? | Reads | Writes | Completion predicate | Gate |
+|---|---|---|---|---|---|
+| `extract` | yes (extract) | `docs/chapters/chapter_*.md` | `docs/ensemble/per_chapter/<stem>/merged.json`, root `merged.json` | per-chapter `merged.json` exist for the glob | — |
+| `bundle` | yes (extract) | `merged.json`, `aliases.json`, `--known-names` | `docs/ensemble/state_dossiers/*.md`, `merged_dossiers/*.md` | dossier files exist | **scope review** (`--list`), **alias correction** |
+| `synthesize` | yes (synthesis) | `merged_dossiers/*.md`, `threads.md`, `recent_events.md` | `docs/{world_state,campaign_state,party,planning}_draft.md` | `*_draft.md` exist | — |
+| `review` | no | `*_draft.md`, live docs | (promotion writes live docs, human-initiated) | live docs updated by operator | **diff-before-promote** |
+
+---
+
+## 4. Checkpoint / Gate
+
+A human-judgment point that blocks automatic advancement (FR-010, FR-011, Principle II). The UI represents it; the *decision* happens in Claude/CLI (Principle IX).
+
+| Field | Type | Notes |
+|---|---|---|
+| `id` | enum | `scope_review` \| `alias_correction` \| `diff_promote` |
+| `stage_id` | enum | The stage it gates. |
+| `satisfied` | bool (operator-confirmed) | The UI does not auto-satisfy; the operator confirms after doing the work. |
+| `handoff` | description | What to do in Claude/CLI (e.g. "run `--list`, review scope", "edit `aliases.json`", "`diff` draft vs live, then promote"). |
+| `interchange_files` | list of path | The files the operator edits/reviews (e.g. `aliases.json`, `*_draft.md`) — the contract between UI, CLI, and chat (FR-012, FR-017). |
+
+**Rule**: a gate is never bypassed by the pipeline; `synthesize` must not consume `bundle` output until `scope_review`/`alias_correction` are operator-confirmed (Principle II — no LLM output feeds another across a precision boundary without a human gate).
+
+---
+
+## 5. Artifact
+
+A file produced or consumed by a stage — the unit of interchange (FR-004, FR-017).
+
+| Field | Type | Notes |
+|---|---|---|
+| `path` | path | Absolute or campaign-relative; the source of truth. |
+| `kind` | enum | `chapter` \| `facts` \| `dossier` \| `threads` \| `recent_events` \| `draft` \| `live_doc` \| `aliases` \| `known_names`. |
+| `produced_by` | stage id \| null | Which stage wrote it (null for human-authored inputs). |
+| `backend_used` | string \| null | For LLM-produced artifacts: the backend+model recorded with the output (FR-008). |
+| `exists` | bool | Drives stage status. |
+
+**Provenance rule (FR-008)**: every LLM-produced artifact records which backend and model produced it (e.g. a frontmatter/comment line). This is how a mixed run (extract on OpenRouter, synthesize on Anthropic) stays auditable.
+
+---
+
+## 6. Grounding Document (draft / live)
+
+The four targets, with a hard draft/live distinction (Principle I, FR-013).
+
+| Field | Type | Notes |
+|---|---|---|
+| `name` | enum | `world_state` \| `campaign_state` \| `party` \| `planning`. |
+| `draft_path` | path | `docs/<name>_draft.md` — what synthesis writes. |
+| `live_path` | path | `docs/<name>.md` — only the operator promotes to here. |
+
+**Rule**: the workflow writes drafts only; the UI never auto-overwrites a live doc; promotion is an explicit operator action (SC-005).
+
+---
+
+## Config schema addition (`server/config_models.py`)
+
+A new `EnsembleSection` added to `UISection`, registered in `UI_SECTION_NAMES`:
+
+```
+ui.ensemble:
+  campaign_dir: str
+  chapters_glob: str               # default docs/chapters/chapter_*.md
+  extract:    { backend, endpoint, model }   # Backend Profile
+  synthesize: { backend, endpoint, model }   # Backend Profile (independent of extract)
+  known_names: [str]
+  aliases_path: str
+```
+
+No secret fields. Mirrors existing `SessionDocSection`'s `backend`/`dgx_endpoint`/`dgx_model` precedent (`config_models.py`).
diff --git a/specs/001-ensemble-workflow-ui/plan.md b/specs/001-ensemble-workflow-ui/plan.md
new file mode 100644
index 0000000..75738c9
--- /dev/null
+++ b/specs/001-ensemble-workflow-ui/plan.md
@@ -0,0 +1,126 @@
+# Implementation Plan: Ensemble Grounding-Doc Workflow UI
+
+**Branch**: `001-ensemble-workflow-ui` | **Date**: 2026-06-27 | **Spec**: [spec.md](./spec.md)
+
+**Input**: Feature specification from `specs/001-ensemble-workflow-ui/spec.md`
+
+## Summary
+
+Add a dedicated, stepped UI surface that mechanizes the ensemble grounding-doc pipeline (extraction → fact bundling → synthesis → review/promote), deriving stage status from files on disk and streaming each mechanical step's output, while preserving the human-judgment checkpoints (scope review, alias correction, diff-before-promote) as handoffs to a Claude conversation or the CLI. Make each LLM-bearing stage backend-selectable, **adding OpenRouter** alongside the existing local-hardware (DGX/Spark) and Anthropic (Claude) options, independently per stage.
+
+Technical approach, in one line per layer:
+
+- **Seam (`campaignlib/api`)**: add an OpenRouter branch to `make_client` so OpenRouter is reached through the *one* LLM seam (Principle V) — a real API key from the environment and OpenRouter model ids, not the dgxlib registry.
+- **CLI (engine)**: plumb a uniform `--backend` / `--endpoint` / `--model` selection into the four synthesis scripts (`synthesise_world_state.py`, `campaign_state.py`, `party.py`, `planning.py`) so synthesis can target DGX/Anthropic/OpenRouter — the extraction scripts already accept `--endpoints`/`--model` and only need the seam change.
+- **Server (face)**: a new `server/routers/ensemble.py` (mounted `/api/ensemble`) that shells out to those CLI scripts via `subprocess_runner` and exposes disk-derived stage status — never reimplementing pipeline logic (Principle VI).
+- **Frontend (face)**: a new `/ensemble` stepped page built on the existing `WizardShell` + `connectSSE` patterns, leaving the existing `/grounding` page untouched.
+
+## Technical Context
+
+**Language/Version**: Python 3.11+ (backend + CLI); TypeScript 5 / Vue 3 (frontend).
+
+**Primary Dependencies**: FastAPI + uvicorn (server); `anthropic` SDK and `openai` SDK (both already present — `openai` powers the DGX path today); `dgxlib` (local model registry); Vue 3 + Pinia + Vue Router; PyYAML. OpenRouter is reached via the existing `openai` SDK pointed at `https://openrouter.ai/api/v1`.
+
+**Storage**: Files on disk are the source of truth (Principle I) — chapter files, `docs/ensemble/per_chapter/*/merged.json`, `docs/ensemble/state_dossiers/*.md`, `merged_dossiers/*.md`, `*_draft.md`, live grounding docs. UI state in `ui_state.yaml` (`ui.ensemble` section); machine-local secrets/config in `.campaigngenerator.local.yaml` (gitignored) or environment.
+
+**Testing**: `pytest` (`tests/`), including the CI guard `tests/test_retrieve_render_isolation.py`. Frontend: existing Vite/Vue toolchain (no test mandate added here).
+
+**Target Platform**: Single-operator local workstation (WSL2 on Windows 11), local-first, intermittent network tolerated.
+
+**Project Type**: Web application (FastAPI backend + Vue 3 frontend) layered over a CLI engine.
+
+**Performance Goals**: Extraction is a long-running job (tens of minutes) — the UI streams progress over SSE and relies on the CLI's per-item resumability rather than expecting fast responses. Synthesis token cost stays bounded (~280K metered for a full Phandalin-scale refresh) by keeping extraction off the metered API.
+
+**Constraints**: One seam per external boundary (Principle V) — OpenRouter must route through `campaignlib`. CLI-first (Principle VI) — every UI step is a CLI invocation. Human checkpoints are blocking (Principle II). No browser-only pipeline state (Principles I/VIII). Drafts only; never auto-overwrite live docs (Principle I).
+
+**Scale/Scope**: One GM; campaigns up to ~45 chapters / ~1900 entities / ~860 known names (Phandalin is the reference scale).
+
+## Constitution Check
+
+*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
+
+The CampaignGenerator constitution (v1.2.0) has ten principles. This feature is, by the constitution's own words, the **canonical shape** for Principle IX, so alignment is load-bearing, not incidental. (Principle X — *Selection is Explicit* — was added during this feature, arising from its chapter picker; see the post-implement amendment in `tasks.md`.)
+
+| Principle | Gate for this feature | Verdict |
+|---|---|---|
+| I. Disk is Truth, Model is Draft | Stage status derived from files; synthesis writes `*_draft.md` only; promotion is a manual file act (FR-002, FR-013, FR-017). | ✅ PASS |
+| II. Human Checkpoint Non-Negotiable | Scope/alias/promote gates block auto-advance; UI never feeds one LLM stage's unreviewed output into the next across a precision boundary (FR-010, FR-011). | ✅ PASS |
+| III. Retrieval/Render Separated | New router only shells out; it issues neither retrieval (`retrieve`/`rpg_search`) nor render (`stream_api`/`call_api`) calls. `test_retrieve_render_isolation.py` stays green. | ✅ PASS (no new mixing) |
+| IV. Verbatim is Sacred | Extraction preserves `source_quote`; no step paraphrases transcripts. No new verbatim surface introduced. | ✅ PASS |
+| V. One Seam per Boundary | **The pivotal gate.** OpenRouter is a *new external dependency* and MUST be reached only through `campaignlib`'s `make_client`. No `import openai`/OpenRouter calls added in routers or scripts outside the seam. | ✅ PASS *by design* (see Research) |
+| VI. CLI is Engine, UI is Face | Backend selection is a CLI flag first; the router builds commands and streams via `subprocess_runner`, reimplementing nothing (FR-016). | ✅ PASS |
+| VII. Extract Once, Synthesize Deliberately | The pipeline *is* this shape; the plan adds no pass-collapsing. Extraction stays local/cheap; synthesis stays deliberate. | ✅ PASS |
+| VIII. State is Discoverable | The ensemble page reads campaign state from disk; what is done/pending is visible, not tribal (FR-002). | ✅ PASS |
+| IX. UI Mechanizes; Claude Converses | The whole feature: UI steps the sequence; judgment between steps happens in Claude/CLI; files are the interchange; the human is never trapped in the UI (FR-012, FR-016, FR-017). | ✅ PASS |
+| X. Selection is Explicit; No Silent "All" | Chapter picker stores the literal chosen set; extraction refuses an empty selection; "Select all" materializes every path. The CLI glob is exempt (explicit at the CLI). | ✅ PASS |
+
+**Authority & Human Checkpoint clause**: This plan is a draft reviewed against the constitution; it imposes no autonomous precision decision. The one risk surface — OpenRouter as a second LLM vendor — is contained to the single seam, which is exactly what Principle V demands.
+
+**Result**: No violations. Complexity Tracking left empty.
+
+## Project Structure
+
+### Documentation (this feature)
+
+```text
+specs/001-ensemble-workflow-ui/
+├── plan.md              # This file (/speckit-plan command output)
+├── spec.md              # Feature specification (/speckit-specify)
+├── research.md          # Phase 0 output (/speckit-plan)
+├── data-model.md        # Phase 1 output (/speckit-plan)
+├── quickstart.md        # Phase 1 output (/speckit-plan)
+├── contracts/           # Phase 1 output (/speckit-plan)
+│   ├── api.md           #   HTTP endpoints for /api/ensemble
+│   └── cli.md           #   CLI backend-selection flag contract
+└── checklists/
+    └── requirements.md  # Spec quality checklist (/speckit-specify)
+```
+
+### Source Code (repository root)
+
+```text
+# ── Seam: the one LLM boundary (Principle V) ──
+campaignlib/
+└── api/
+    ├── client.py        # MODIFY: make_client() gains an "openrouter" backend branch
+    └── backends.py      # MODIFY: OpenRouter client (OpenAI SDK + real api_key, no dgxlib registry)
+
+# ── CLI engine (Principle VI): backend selection plumbed into synthesis scripts ──
+synthesise_world_state.py   # MODIFY: add --backend/--endpoint, pass to make_client()
+campaign_state.py           # MODIFY: same (synthesize path)
+party.py                    # MODIFY: same (synthesize path)
+planning.py                 # MODIFY: same (synthesize path)
+# ensemble.py / ensemble_batch.py / ensemble_extract.py / facts_to_state.py
+#   already accept --endpoints/--model → reach OpenRouter once the seam supports it
+
+# ── Server (face): new router, mirrors grounding.py ──
+server/
+├── main.py              # MODIFY: include_router(ensemble.router, prefix="/api/ensemble")
+├── config_models.py     # MODIFY: add EnsembleSection + backend-profile fields to UIState
+├── config.py            # MODIFY (maybe): OpenRouter model id suggestions for the picker
+└── routers/
+    └── ensemble.py      # NEW: stage runners (SSE) + disk-derived stage-status endpoints
+
+# ── Frontend (face): new stepped page, /grounding untouched ──
+frontend/src/
+├── router.ts            # MODIFY: add /ensemble route tree
+├── views/
+│   ├── EnsembleWorkflow.vue       # NEW: WizardShell host (mirrors SessionWorkflow.vue)
+│   └── ensemble/                  # NEW: one component per stage
+│       ├── EnsembleSetup.vue      #   paths + per-stage backend selection
+│       ├── EnsembleExtract.vue    #   Stage 1 run + status
+│       ├── EnsembleBundle.vue     #   Stage 2 run + scope-review gate
+│       └── EnsembleSynthesize.vue #   Stage 3 run + diff/promote gate
+└── stores/
+    └── config.ts        # REUSE: ui.ensemble section via updateSection()
+
+# ── Tests ──
+tests/
+└── test_openrouter_seam.py        # NEW: make_client("openrouter") routing + no out-of-seam imports
+```
+
+**Structure Decision**: Web-application layout already in place (`server/` + `frontend/` over root-level CLI scripts). This feature is purely additive at every layer — one new seam branch, four script flag additions, one new router, one new frontend page tree — and touches the existing `/grounding` surface not at all (FR-015).
+
+## Complexity Tracking
+
+> No constitution violations. No entries required.
diff --git a/specs/001-ensemble-workflow-ui/quickstart.md b/specs/001-ensemble-workflow-ui/quickstart.md
new file mode 100644
index 0000000..d7baaa6
--- /dev/null
+++ b/specs/001-ensemble-workflow-ui/quickstart.md
@@ -0,0 +1,104 @@
+# Quickstart / Validation Guide: Ensemble Grounding-Doc Workflow UI
+
+This guide proves the feature end-to-end. It assumes a campaign workspace with chapter files already prepared (the upstream spelling/known-names pass is out of scope here — see `docs/cli/ensemble_workflow.md`). Details of flags and endpoints live in `contracts/cli.md` and `contracts/api.md`; the data model is in `data-model.md`.
+
+## Prerequisites
+
+- A campaign workspace with `docs/chapters/chapter_*.md`.
+- `ANTHROPIC_API_KEY` set (for the Anthropic synthesis path / regression check).
+- `OPENROUTER_API_KEY` set (for the OpenRouter path).
+- For the DGX path: at least one reachable Spark endpoint (`/spark-status`); optional if validating only Anthropic + OpenRouter.
+- Server + frontend running via `./startup`.
+
+---
+
+## Validation 1 — Seam: OpenRouter routes through `make_client` (Principle V)
+
+```bash
+python -m pytest tests/test_openrouter_seam.py -q
+```
+
+**Expected**: `make_client(backend="openrouter")` returns the OpenRouter client; a missing `OPENROUTER_API_KEY` raises a clear error; no module outside `campaignlib/api` imports the OpenRouter client. (Maps to FR-007, FR-018; R1.)
+
+## Validation 2 — Regression: existing Anthropic path unchanged (FR-015, SC-006)
+
+```bash
+# Old per-tool path and the synthesis scripts with NO new flags must be byte-identical.
+python -m pytest tests/                       # full suite incl. test_retrieve_render_isolation.py
+# Spot check: synthesise_world_state.py with no --backend builds the same command/output as before.
+```
+
+**Expected**: full suite green; the isolation guard passes (router added no retrieval/render mixing); default synthesis still hits Anthropic.
+
+## Validation 3 — Stage status is disk-derived (FR-002, FR-017)
+
+```bash
+curl -s "http://localhost:8000/api/ensemble/status?campaign_dir=$PWD&chapters=docs/chapters/chapter_*.md" | python -m json.tool
+```
+
+**Expected**: with no prior run, `extract` is `current_stage`. After files appear under `docs/ensemble/per_chapter/*/merged.json` (Validation 5), the same call — with no server restart — reports `extract: complete`. Confirms no browser/server-cached state.
+
+## Validation 4 — Walk the pipeline from the UI, no CLI typing (US1, SC-001, SC-002)
+
+1. Open the app → navigate to **Ensemble Workflow** (`/ensemble`). Confirm it is a distinct page from **Grounding Docs** (`/grounding`), which is unchanged (US4).
+2. **Setup** step: set chapter glob and pick a backend for *extract* and (independently) for *synthesize* (US2).
+3. **Extract** step: Run → watch SSE progress stream → on completion, the page lists per-chapter artifacts.
+4. Reload the page → Extract shows **complete** (disk-derived).
+
+**Expected**: an operator who has not read the workflow doc reaches the synthesis stage without typing a command (SC-002).
+
+## Validation 5 — Per-stage backend mix, incl. OpenRouter with local box down (US2, SC-003, SC-008)
+
+```bash
+# Simulate local hardware unreachable, then drive extraction via OpenRouter from the UI's Extract step.
+# (Equivalent CLI the UI runs — proves CLI-first, FR-016:)
+CG_BACKEND=openrouter python ensemble_batch.py \
+  --chapters 'docs/chapters/chapter_*.md' --per-chapter-dir docs/ensemble/per_chapter \
+  --out docs/ensemble/merged.json --model anthropic/claude-sonnet-4
+```
+
+Then synthesize on a *different* backend from the UI's Synthesize step (e.g. Anthropic):
+
+```bash
+python synthesise_world_state.py --backend anthropic \
+  --dossiers 'docs/ensemble/merged_dossiers/*.md' --dossier-min-facts 10 \
+  --output docs/world_state_draft.md
+```
+
+**Expected**: extraction completes against OpenRouter with the local box down (SC-003); each artifact records the backend that produced it (FR-008, SC-008); a full refresh is achievable with mixed backends.
+
+## Validation 6 — Human checkpoints block auto-advance (US3, Principle II)
+
+1. After Extract, the UI presents the **scope-review** gate (`bundle --list`) and does **not** auto-run aggregation.
+2. Edit `docs/ensemble/aliases.json` from the CLI/chat → return to the UI → the alias-correction gate reflects the edited file **without** re-running any LLM step (FR-012).
+3. Proceed to Synthesize → reach the **diff-before-promote** gate.
+
+**Expected**: aggregation never consumes extraction output until the operator confirms scope/alias (Principle II); the gate's interchange files are visible to CLI and chat alike.
+
+## Validation 7 — Drafts only; promotion is explicit (FR-013, SC-005)
+
+```bash
+# Synthesis writes a draft, never the live doc.
+ls docs/world_state_draft.md          # exists after synthesize
+git status docs/world_state.md        # live doc UNCHANGED by synthesis
+# Promotion is the single explicit action:
+curl -s -X POST http://localhost:8000/api/ensemble/promote \
+  -H 'Content-Type: application/json' \
+  -d '{"draft":"docs/world_state_draft.md","live":"docs/world_state.md"}'
+```
+
+**Expected**: the synthesis step never modifies a live grounding doc; only the explicit promote action does. A `PUT /api/ensemble/file` targeting a live doc is rejected. Zero automatic live-doc overwrites across all runs (SC-005).
+
+## Validation 8 — Sub-Sonnet synthesis warning (FR-014, R6)
+
+Pick a known-weak model (e.g. a small open model id) for the **synthesize** stage and run.
+
+**Expected**: the stream includes a non-fatal warning that the model is below the assumed synthesis capability; the run still proceeds (warn, not block). Extraction with the same weak model produces no such warning.
+
+---
+
+## Done-when
+
+- Validations 1–8 pass.
+- `/grounding` behaves identically to before (US4, SC-006).
+- A full grounding-doc refresh is completable entirely from `/ensemble` (SC-001), including with the local box unreachable by selecting OpenRouter (SC-003).
diff --git a/specs/001-ensemble-workflow-ui/research.md b/specs/001-ensemble-workflow-ui/research.md
new file mode 100644
index 0000000..5969e65
--- /dev/null
+++ b/specs/001-ensemble-workflow-ui/research.md
@@ -0,0 +1,105 @@
+# Phase 0 Research: Ensemble Grounding-Doc Workflow UI
+
+All decisions below resolve the design unknowns implied by the spec and the Technical Context. The dominant constraint throughout is **Principle V (One Seam per Boundary)**: OpenRouter is a new external dependency and may be reached from exactly one place.
+
+---
+
+## R1 — How OpenRouter plugs into the existing LLM seam
+
+**Decision**: Add an `"openrouter"` backend branch to `make_client()` in `campaignlib/api/client.py`, backed by an OpenRouter-aware client in `campaignlib/api/backends.py`. OpenRouter is OpenAI-wire-compatible, so the client reuses the `openai` SDK pointed at `https://openrouter.ai/api/v1`, but with two differences from the existing `_OpenAICompatClient`:
+1. A **real API key** from `OPENROUTER_API_KEY` (the DGX client uses `api_key="not-needed"`).
+2. **Model resolution does not go through the dgxlib registry** — OpenRouter model ids (e.g. `anthropic/claude-sonnet-4`, `meta-llama/llama-3.1-70b-instruct`) are passed through verbatim, and per-call request extras (timeouts, thinking) use sensible defaults instead of `dgxlib.resolve_model_config`.
+
+**Rationale**:
+- `campaignlib/api/backends.py:_OpenAICompatClient` (lines 150–185) hard-imports `dgxlib` and calls `resolve_model_config(self.model_override)`. dgxlib only knows Spark-served models, so OpenRouter ids would fail registry lookup. A separate branch keeps the DGX path unchanged while still living **inside the one seam** Principle V mandates.
+- `make_client(endpoint, model_override, backend)` already has a `backend` parameter and a `$CG_BACKEND` env hook (`client.py:30–35`), and already precedes the endpoint/Anthropic branches. Adding `if backend == "openrouter":` is the minimal, idiomatic extension.
+- Routing through `make_client` means `stream_api`/`call_api` (which already branch on client type for `thinking`/cache extras) and their retry logic are inherited for free.
+
+**Alternatives considered**:
+- *Reuse `_OpenAICompatClient` by passing `endpoint=https://openrouter.ai/api/v1`*: rejected — it would still call `dgxlib.resolve_model_config` on OpenRouter ids and use `api_key="not-needed"`. Bending it to OpenRouter would entangle the DGX path with vendor-specific behavior.
+- *Add a new top-level module / `import openai` in the synthesis scripts*: rejected outright — a direct Constitution Principle V violation (a second place that crosses the LLM boundary).
+- *Use the `anthropic` SDK against OpenRouter's Anthropic-compat shim*: rejected — OpenRouter's first-class surface is the OpenAI wire format already used here; the `openai` SDK is already a dependency.
+
+---
+
+## R2 — Where the OpenRouter credential and model list live
+
+**Decision**: The API key comes from the `OPENROUTER_API_KEY` environment variable, mirroring how `ANTHROPIC_API_KEY` is handled today (CLAUDE.md: "`ANTHROPIC_API_KEY` must be set in the environment"). The server passes it through to subprocesses via `subprocess_runner`'s existing `env_extra` mechanism — it is never written to a tracked file. A small, editable list of suggested OpenRouter model ids is surfaced for the picker (alongside the existing `server/config.py:MODELS` Claude list and the DGX model id), but the operator may type any id.
+
+**Rationale**: Secrets stay out of `config.yaml`/`ui_state.yaml` (both tracked). `.campaigngenerator.local.yaml` (gitignored) is an acceptable fallback for a machine-local key, but environment-variable parity with Anthropic is the least surprising. The model id is free-text because OpenRouter's catalog changes faster than any hard-coded list.
+
+**Alternatives considered**:
+- *Store the key in `ui_state.yaml`*: rejected — it is tracked; secrets must not be committed.
+- *Fetch OpenRouter's live model catalog for the picker*: rejected for v1 — adds a network dependency at UI load (bad in Bear Valley) for marginal benefit; a static suggestion list plus free-text covers it.
+
+---
+
+## R3 — Backend selection surface across CLI stages
+
+**Decision**: Introduce a uniform selection convention across the LLM-bearing scripts:
+- **Synthesis scripts** (`synthesise_world_state.py`, `campaign_state.py`, `party.py`, `planning.py`) gain `--backend {anthropic,dgx,openrouter}` plus the already-conventional `--endpoint`/`--model`, and pass them into `make_client(...)`. They currently call `make_client()` with no args (Anthropic-only); default stays `anthropic` so existing invocations are byte-for-byte unchanged (FR-015).
+- **Extraction/aggregation scripts** (`ensemble.py`, `ensemble_batch.py`, `ensemble_extract.py`, `facts_to_state.py`) already accept `--endpoints`/`--dgx-endpoint`/`--model`; selecting OpenRouter for them is achieved by pointing the endpoint at OpenRouter and relying on the R1 seam branch (driven by `--backend openrouter` or `CG_BACKEND=openrouter`, which `make_client` already reads).
+
+**Rationale**: Honors Principle VI — the backend choice is a CLI capability first; the UI merely sets the flag. A single `--backend` vocabulary across scripts keeps the router's command-building uniform and the contract testable.
+
+**Alternatives considered**:
+- *Only support OpenRouter via env vars, no flags*: rejected — env-only selection is invisible state and harder to test per stage; the spec requires per-stage, run-time selection (FR-006, FR-018).
+- *A single global backend setting for the whole run*: rejected — the clarified scope is **per-stage** choice (extract on one backend, synthesize on another).
+
+---
+
+## R4 — Stage-status discovery from disk
+
+**Decision**: The router exposes read-only status endpoints that infer each stage's completion from artifact presence, reusing the pattern already in `grounding.py` (`/extracts`, `/extracts/{filename}`). Specifically: extraction complete ⇔ `docs/ensemble/per_chapter/*/merged.json` exist for the chapter glob; bundling complete ⇔ `docs/ensemble/state_dossiers/*.md` (and `merged_dossiers/*.md`) exist; synthesis complete ⇔ the relevant `*_draft.md` exist. No status is stored server-side or in the browser (Principles I/VIII, FR-002, FR-017).
+
+**Rationale**: `facts_to_state.py` and `ensemble_batch.py` are already resumable by checking for these exact files, so "does the file exist?" is the same predicate the CLI uses — the UI and CLI cannot disagree. Reusing `grounding.py`'s file-listing endpoints minimizes new surface.
+
+**Alternatives considered**:
+- *A status manifest file the router writes*: rejected — introduces a second source of truth that can drift from the actual artifacts; the artifacts already are the state.
+
+---
+
+## R5 — Long-running extraction over SSE
+
+**Decision**: Run each stage as a streamed subprocess via the existing `stream_subprocess()` (SSE `data:`/`event: done`), exactly as `grounding.py`/`session_workflow.py` do. Resumability comes from the CLI's existing per-chapter / per-entity skip-if-exists behavior; an interrupted run is restarted by re-invoking the same stage, which skips completed items. The doc's `tmux` guidance remains the recommended path for *very* long unattended runs; the UI targets attended runs and surfaces progress live.
+
+**Rationale**: No new long-job infrastructure is needed — the CLI is already resumable and the SSE plumbing already exists. This keeps the UI a thin face (Principle VI).
+
+**Alternatives considered**:
+- *A background job queue / persistent worker*: rejected for v1 — over-engineered for a single local operator; adds a daemon (a recurring tax the constitution warns against) for a workflow that is already resumable on disk.
+
+---
+
+## R6 — Synthesis-capability warning
+
+**Decision**: The UI warns (does not block) when a backend/model chosen for the **synthesis** stage is below the assumed capability bar (a model at least as capable as Sonnet). The signal is heuristic: a curated "synthesis-capable" allow-list (the Claude `MODELS` and a small set of frontier OpenRouter ids) versus everything else (local 3B/80B open models, which the workflow doc records as unable to synthesize). Extraction has no such warning — weak open models are expected and fine there.
+
+**Rationale**: Encodes the user's explicit statement that the workflow "assumes a model at least as powerful as Sonnet," and the doc's calibration finding that `Qwen3-Next-80B` "cannot handle synthesis." A warning, not a block, respects operator agency (it is their experiment to run).
+
+**Alternatives considered**:
+- *Hard block on sub-Sonnet synthesis*: rejected — contradicts the local-hardware exploration goal; the operator may deliberately want to calibrate a weak model on synthesis.
+
+---
+
+## R7 — Keeping the existing Anthropic workflow untouched
+
+**Decision**: The new ensemble page is a separate route tree (`/ensemble`) and a separate router (`/api/ensemble`); `GroundingDocs.vue` and `grounding.py` are not modified. The synthesis scripts default `--backend anthropic`, so the old `/grounding` invocations produce identical commands and identical results (FR-015, SC-006).
+
+**Rationale**: The user requires the old path preserved "until I decide to retire it." Physical separation at both router and view layers is the simplest guarantee against regression.
+
+**Alternatives considered**:
+- *Add an ensemble mode/tab inside `GroundingDocs.vue`*: rejected per the clarification (a new separate page was chosen), and because co-locating raises the risk of touching the old path.
+
+---
+
+## Summary of decisions
+
+| # | Decision | Primary principle upheld |
+|---|----------|--------------------------|
+| R1 | OpenRouter branch inside `make_client`/`backends.py` | V (one seam) |
+| R2 | `OPENROUTER_API_KEY` env var; free-text model id | I (no secrets on tracked disk) |
+| R3 | Uniform `--backend`/`--endpoint`/`--model` on synthesis scripts; default `anthropic` | VI (CLI first) |
+| R4 | Disk-derived stage status, reuse `grounding.py` pattern | I/VIII (disk is truth, discoverable) |
+| R5 | SSE subprocess streaming + CLI resumability; no new daemon | VI; "no recurring tax" |
+| R6 | Warn (not block) on sub-Sonnet synthesis backend | II/IX (human decides) |
+| R7 | Separate `/ensemble` route + router; old path defaults unchanged | (regression guard) |
diff --git a/specs/001-ensemble-workflow-ui/spec.md b/specs/001-ensemble-workflow-ui/spec.md
new file mode 100644
index 0000000..033851b
--- /dev/null
+++ b/specs/001-ensemble-workflow-ui/spec.md
@@ -0,0 +1,161 @@
+# Feature Specification: Ensemble Grounding-Doc Workflow UI
+
+**Feature Branch**: `001-ensemble-workflow-ui`
+
+**Created**: 2026-06-27
+
+**Status**: Draft
+
+**Input**: User description: "I want you to transform the docs/cli/ensemble_workflow.md into a feature that uses the UI to simplify the workflow management. Between steps of the UI, the user will interact with claude. The current feature is designed to only work against dgx and claude, I would like to have the ability to use openrouter as well. the feature should not replace the current workflow that uses anthropic. That feature assumes a model that is at least as powerful as sonnet and can be kept around until I decide to retire it."
+
+## Overview
+
+The ensemble grounding-doc workflow (`docs/cli/ensemble_workflow.md`) turns a campaign's chapter files into the four grounding documents (`world_state.md`, `campaign_state.md`, `party.md`, `planning.md`). It does this in stages: extract atomic facts cheaply on local hardware, bundle them into per-entity dossiers, let a human review scope, then spend metered tokens only on the final synthesis. Today the whole thing is a sequence of long, flag-heavy command-line invocations that the operator must remember and run in the right order, interleaved with manual review steps.
+
+This feature gives the operator a **UI surface that mechanizes the sequence** — it shows where the campaign is in the pipeline, runs each mechanical step on request, and surfaces the files each step produces — while preserving the judgment steps (scope review, alias correction, diff-before-promote) as handoffs to a Claude conversation or the CLI. It also makes each LLM-bearing stage **backend-selectable**, adding OpenRouter alongside the existing local-hardware (DGX/Spark) and Anthropic (Claude) options.
+
+The existing per-tool grounding-doc workflow on the current Grounding Docs page is **not** changed by this feature. It remains available, unmodified, until the operator chooses to retire it.
+
+## User Scenarios & Testing *(mandatory)*
+
+### User Story 1 - Walk the ensemble pipeline from a single UI surface (Priority: P1)
+
+The operator opens a dedicated ensemble-workflow page for the current campaign. The page shows the pipeline as an ordered set of stages (extract → bundle → synthesize → review/promote), reflects which stages have already produced output (discovered from files on disk), and lets the operator run the mechanical step for the current stage and watch its output stream. After a step finishes, the operator can see the files it produced and move to the next stage.
+
+**Why this priority**: This is the core value — replacing "remember the right command with the right flags, in the right order" with a guided, stateful surface. It delivers value even before OpenRouter exists, using only the backends the workflow supports today.
+
+**Independent Test**: With a campaign that already has chapter files, an operator who has never seen the CLI can run the extraction step, see per-chapter outputs appear, run the bundling step, and reach the synthesis stage — entirely from the page, without typing a command. The page correctly shows, on reload, which stages are already complete.
+
+**Acceptance Scenarios**:
+
+1. **Given** a campaign workspace with chapter files and no prior ensemble run, **When** the operator opens the ensemble page, **Then** the page shows the extraction stage as the next actionable step and later stages as not-yet-started.
+2. **Given** a completed extraction (per-chapter outputs exist on disk), **When** the operator reloads the page, **Then** the extraction stage is shown as complete and the bundling stage is shown as the next actionable step.
+3. **Given** the operator runs a stage, **When** the underlying step emits progress, **Then** the page streams that progress live and, on completion, lists the artifacts the step wrote.
+4. **Given** a stage whose outputs already exist, **When** the operator re-runs it, **Then** already-completed work is skipped (the run is resumable) and the page makes clear nothing was needlessly recomputed.
+
+---
+
+### User Story 2 - Choose the backend per stage, including OpenRouter (Priority: P2)
+
+For each LLM-bearing stage — extraction/aggregation and synthesis — the operator chooses which backend runs it: local hardware (DGX/Spark), Anthropic (Claude), or OpenRouter. The choices are independent: the operator can extract on one backend and synthesize on another. OpenRouter is a new option added without removing the existing two.
+
+**Why this priority**: It removes the workflow's hard dependency on having both a reachable local box and a Claude path. From a remote location with no local hardware, the operator can still run extraction (on OpenRouter); for synthesis, the operator can pick whichever frontier model they prefer. It builds on the stepped UI from US1.
+
+**Independent Test**: With the local box unreachable, an operator can select OpenRouter for extraction, run it successfully, then select Claude for synthesis and complete a grounding-doc refresh — all from the page.
+
+**Acceptance Scenarios**:
+
+1. **Given** the ensemble page, **When** the operator views a stage that uses an LLM, **Then** they can choose among local hardware, Anthropic, and OpenRouter as the backend for that stage.
+2. **Given** OpenRouter is selected for a stage, **When** the operator runs that stage, **Then** the step executes against OpenRouter and the page reports which backend and model produced the output.
+3. **Given** the operator extracts on OpenRouter and synthesizes on Anthropic, **When** the full pipeline completes, **Then** each stage's artifacts record the backend that produced them.
+4. **Given** a backend is unreachable or misconfigured, **When** the operator runs a stage against it, **Then** the page surfaces a clear failure (not a silent hang) and the operator can retry with a different backend without losing prior-stage output.
+
+---
+
+### User Story 3 - Drop to Claude or the CLI for the judgment between steps (Priority: P2)
+
+Between mechanical steps, the pipeline has human-judgment checkpoints: reviewing the entity scope list before aggregation, correcting name aliases, and diffing a draft against the live doc before promoting it. The UI represents these as explicit gates that point the operator to do the work in a Claude conversation or at the CLI. Because every step reads and writes files, the operator can leave the UI, make the change (e.g. edit an alias map, correct a draft, promote a reviewed draft), and return to a UI that reflects the new file state — losing nothing.
+
+**Why this priority**: This is the constitutional spine of the feature (the UI mechanizes; Claude converses). Without it the UI would either skip the precision decisions or try to absorb them — both of which break the workflow's correctness guarantees. It is P2 because US1 is usable for the mechanical steps before the gates are formalized, but the feature is not trustworthy without it.
+
+**Independent Test**: At the scope-review gate, the operator opens the entity list, makes a scope/alias correction outside the UI, and the UI — without re-running any LLM step — reflects the corrected scope before the operator proceeds to aggregation. At the promote gate, a draft is never written to a live grounding doc by the UI itself.
+
+**Acceptance Scenarios**:
+
+1. **Given** extraction is complete, **When** the operator reaches the scope-review gate, **Then** the UI presents the entity/scope list for review and does not proceed to aggregation until the operator confirms.
+2. **Given** the operator edits an alias map or scope input outside the UI, **When** they return, **Then** the UI reflects the updated files without having re-run any LLM step.
+3. **Given** a synthesized draft exists, **When** the operator reaches the promote gate, **Then** the UI offers to compare the draft against the live document but never overwrites a live grounding document automatically.
+4. **Given** any stage, **When** the operator inspects what that stage did, **Then** every input and output is a file on disk that is equally visible from the CLI and a Claude conversation.
+
+---
+
+### User Story 4 - Keep the existing Anthropic workflow available (Priority: P3)
+
+The operator who prefers the current per-tool grounding-doc path (each tool re-extracting from the chapter bible, synthesized by a Claude model at least as capable as Sonnet) continues to use it exactly as before. The new ensemble page is additive.
+
+**Why this priority**: It is a guardrail rather than new capability, but it must hold: the user explicitly wants the old path preserved until they decide to retire it.
+
+**Independent Test**: After this feature ships, an operator runs the existing Grounding Docs page exactly as before and gets the same behavior; nothing about that path changed.
+
+**Acceptance Scenarios**:
+
+1. **Given** the existing Grounding Docs page, **When** the operator uses it after this feature ships, **Then** its behavior is unchanged.
+2. **Given** the new ensemble page, **When** the operator navigates the app, **Then** the two workflows are clearly distinct surfaces and neither is a prerequisite for the other.
+
+---
+
+### Edge Cases
+
+- **Local hardware unreachable** (intermittent network at a remote location): selecting the local backend for a stage must fail fast with a clear message, not hang silently; the operator can switch that stage to OpenRouter or Anthropic and proceed.
+- **Backend produces empty output** (e.g. a reasoning model that emits only its thinking trace and no result): the stage must be reported as failed/empty, not silently recorded as complete.
+- **Underpowered synthesis model**: synthesis requires a model capable of prioritizing and organizing across many dossiers. When a backend/model that cannot do this is chosen for synthesis, the operator should be warned (the workflow assumes a model at least as capable as Sonnet for synthesis).
+- **Re-running a completed stage**: completed per-item work is skipped (resumable); the operator is not forced to recompute an expensive stage to make a small downstream change.
+- **Operator skips a judgment gate**: the pipeline does not auto-advance past a human checkpoint; scope, alias, and promote decisions remain blocking.
+- **Concurrent/duplicate runs of the same stage**: launching a stage that is already running must not corrupt shared working files.
+- **A draft is promoted, then re-synthesized**: promotion is a manual, file-level act; a fresh draft never silently clobbers the live doc.
+- **Mid-run backend interruption**: a long extraction interrupted partway can be resumed from its cached per-item progress rather than restarted from zero.
+
+## Requirements *(mandatory)*
+
+### Functional Requirements
+
+- **FR-001**: The system MUST provide a dedicated UI surface, separate from the existing Grounding Docs page, that presents the ensemble grounding-doc workflow as an ordered sequence of stages (extraction, fact bundling/aggregation, synthesis of each grounding doc, review/promotion).
+- **FR-002**: The system MUST derive and display each stage's completion status from the artifacts present on disk for the current campaign, so the displayed state survives a page reload and reflects work done outside the UI.
+- **FR-003**: The system MUST let the operator run the mechanical step for a stage from the UI and MUST stream that step's progress and output to the page in real time.
+- **FR-004**: The system MUST list the artifacts (files) a stage produced after it completes, and those artifacts MUST be the same files the CLI and a Claude conversation can read.
+- **FR-005**: Re-running a stage MUST reuse already-completed work where the underlying step supports resumption, and MUST NOT silently recompute completed items.
+- **FR-006**: For each LLM-bearing stage (extraction/aggregation and synthesis), the system MUST let the operator choose the backend independently from: local hardware (DGX/Spark), Anthropic (Claude), and OpenRouter.
+- **FR-007**: The system MUST support OpenRouter as a backend for both the extraction/aggregation stage and the synthesis stage.
+- **FR-008**: The system MUST record, with each stage's output, which backend and model produced it.
+- **FR-009**: The system MUST surface backend failures (unreachable endpoint, auth/config error, empty result) as explicit, actionable errors and MUST allow retrying a failed stage with a different backend without discarding prior-stage output.
+- **FR-010**: The system MUST represent the workflow's human-judgment checkpoints — scope/entity review before aggregation, name-alias correction, and diff-before-promote — as explicit gates that block automatic advancement of the pipeline.
+- **FR-011**: The system MUST NOT perform a precision decision (scope, ordering, attribution) on the operator's behalf, and MUST NOT feed one LLM stage's unreviewed output into the next across a checkpoint without operator confirmation.
+- **FR-012**: The system MUST allow the operator to perform any checkpoint's judgment work in a Claude conversation or at the CLI and then continue in the UI, with the UI reflecting the resulting file changes without re-running an LLM step.
+- **FR-013**: The system MUST write synthesis results to draft artifacts only, and MUST NOT automatically overwrite a live grounding document; promotion of a draft to a live document is an explicit, operator-initiated act.
+- **FR-014**: The system MUST warn the operator when a backend/model selected for the synthesis stage is below the capability the workflow assumes (a model at least as capable as Sonnet), since underpowered synthesis silently degrades the result.
+- **FR-015**: The system MUST leave the existing per-tool Anthropic grounding-doc workflow (the current Grounding Docs page) functionally unchanged and independently usable.
+- **FR-016**: Every step in the new workflow MUST be expressible and runnable equivalently from the CLI; the UI MUST NOT be the only way to perform any step.
+- **FR-017**: The system MUST NOT hold pipeline state that exists only in the browser; if a step produced something, it produced a file that is the source of truth for that state.
+- **FR-018**: OpenRouter backend configuration (credentials/endpoint/model selection) MUST be supplied through the system's existing configuration mechanism, not hard-coded, and MUST be selectable per stage at run time.
+
+### Key Entities *(include if data involved)*
+
+- **Pipeline state**: the current campaign's position in the ensemble workflow, derived entirely from which stage artifacts exist on disk; not stored in the browser.
+- **Stage**: one step in the ordered workflow (extraction, bundling/aggregation, per-doc synthesis, review/promotion), with a completion status, the artifacts it produces, and — for LLM-bearing stages — a selected backend.
+- **Backend profile**: a selectable execution target for an LLM-bearing stage — local hardware (DGX/Spark), Anthropic (Claude), or OpenRouter — including the model used and any reachability/config it needs.
+- **Checkpoint / gate**: a human-judgment point between stages (scope review, alias correction, diff-before-promote) that blocks automatic advancement and is satisfied via Claude/CLI.
+- **Artifact**: a file on disk produced or consumed by a stage (per-chapter facts, merged facts, per-entity dossiers, draft grounding docs, live grounding docs); the unit of interchange between UI, CLI, and Claude.
+- **Grounding document (draft / live)**: the four target docs (`world_state`, `campaign_state`, `party`, `planning`); the workflow writes drafts and the operator promotes them to live docs.
+
+## Success Criteria *(mandatory)*
+
+### Measurable Outcomes
+
+- **SC-001**: An operator can run a full grounding-doc refresh through the ensemble page — from chapter files to four reviewed drafts — without typing a single command-line invocation.
+- **SC-002**: An operator who has not memorized the workflow can identify the correct next step and run it without consulting `docs/cli/ensemble_workflow.md`, in their first session with the page.
+- **SC-003**: With local hardware unreachable, an operator can still complete a full grounding-doc refresh by selecting OpenRouter (and/or Anthropic) for the LLM-bearing stages.
+- **SC-004**: The metered-token cost of a full refresh through the UI is no higher than the same refresh run from the CLI today (i.e. extraction stays off the metered API when a local or OpenRouter open-model backend is chosen; only synthesis spends frontier tokens).
+- **SC-005**: No live grounding document is ever modified by the workflow without an explicit operator promotion action — measured as zero automatic overwrites of live docs across all runs.
+- **SC-006**: The existing per-tool Anthropic workflow produces identical results before and after this feature ships (no regression).
+- **SC-007**: After any stage runs, 100% of its inputs and outputs are files visible from the CLI; no pipeline state is recoverable only from the browser.
+- **SC-008**: For every LLM-bearing stage, the operator can independently select among at least three backends (local, Anthropic, OpenRouter), and the produced artifact records which one was used.
+
+## Assumptions
+
+- **Per-stage backend choice across both LLM stages** (from clarification): OpenRouter is selectable independently for extraction/aggregation and for synthesis; the operator may mix backends across stages (e.g. extract on OpenRouter, synthesize on Anthropic).
+- **New separate UI surface** (from clarification): the ensemble workflow lives on its own page/section; the existing Grounding Docs page is left in place and unchanged.
+- **Single operator, local-first**: the UI serves one GM on their own workstation; multi-user concurrency and access control are out of scope.
+- **Campaign workspace already exists**: the operator runs the page from within a campaign workspace that has chapter files (or the documented inputs); creating the workspace and preparing chapters is out of scope for this feature.
+- **Spelling/known-names/alias preparation remains a documented prerequisite**: this feature mechanizes the pipeline stages and their gates; it does not replace the upstream proper-noun consistency pass, which the operator performs as today.
+- **Synthesis assumes a capable model**: the synthesis stage assumes a model at least as capable as Sonnet; weaker open models may be fine for extraction/aggregation but are expected to underperform on synthesis, and the UI warns rather than blocks.
+- **Long-running stages**: extraction can take tens of minutes; the UI is expected to handle a long-running step (progress, resumability) rather than assume sub-second responses.
+- **Existing configuration mechanism is reused**: backend endpoints, models, and credentials (including OpenRouter) are provided through the project's existing configuration files/UI rather than a new bespoke store.
+- **Files are the contract**: all interchange between the UI, the CLI, and Claude conversations happens through files on disk; the UI never becomes the sole holder of workflow state.
+
+## Out of Scope
+
+- Replacing, modifying, or retiring the existing per-tool Anthropic grounding-doc workflow.
+- Running the synthesis stages on local/open models as the *primary* path (the "all-Spark synthesis" and "per-section fan-out" ideas in the workflow doc remain future exploration, not part of this feature).
+- Automating the human-judgment checkpoints (scope, alias, promotion) — these are deliberately preserved as human decisions.
+- Multi-user, remote-hosted, or access-controlled deployment of the UI.
+- Creating campaign workspaces, preparing chapter files, or running the upstream spelling/known-names preparation passes.
diff --git a/specs/001-ensemble-workflow-ui/tasks.md b/specs/001-ensemble-workflow-ui/tasks.md
new file mode 100644
index 0000000..a5f32d1
--- /dev/null
+++ b/specs/001-ensemble-workflow-ui/tasks.md
@@ -0,0 +1,286 @@
+---
+
+description: "Task list for Ensemble Grounding-Doc Workflow UI"
+---
+
+# Tasks: Ensemble Grounding-Doc Workflow UI
+
+**Input**: Design documents from `specs/001-ensemble-workflow-ui/`
+
+**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/api.md, contracts/cli.md, quickstart.md
+
+**Tests**: Targeted tests are included where the contracts specify behavior (the OpenRouter seam contract test, gate/promote guards, the Anthropic-path regression). This is not full TDD — it matches the constitution's "tested by name" expectation and the CI isolation guard.
+
+**Organization**: Tasks are grouped by user story. This feature is an **extension of the existing Vue app + FastAPI server** (same `./startup`, same nav) — not a new application. The existing `/grounding` (Anthropic per-tool) path is left untouched.
+
+## Format: `[ID] [P?] [Story] Description`
+
+- **[P]**: Can run in parallel (different files, no dependencies on incomplete tasks)
+- **[Story]**: US1 / US2 / US3 / US4
+
+## Path Conventions
+
+Web app over a CLI engine. Backend: `server/`, root-level CLI scripts, `campaignlib/`. Frontend: `frontend/src/`. Tests: `tests/`.
+
+---
+
+## Phase 1: Setup (Shared Infrastructure)
+
+**Purpose**: Minimal scaffolding for an additive feature in a mature codebase.
+
+- [X] T001 Document the `OPENROUTER_API_KEY` env var and confirm the `openai` SDK is importable, updating the Dependencies section of `CLAUDE.md` (parity with the existing `ANTHROPIC_API_KEY` note)
+- [X] T002 [P] Create the frontend stage-component directory `frontend/src/views/ensemble/` and an empty backend router stub `server/routers/ensemble.py` (module + `router = APIRouter()`, mirroring the header of `server/routers/grounding.py`)
+
+---
+
+## Phase 2: Foundational (Blocking Prerequisites)
+
+**Purpose**: The shared kernel every user story builds on — config schema, router mount, frontend route + shell + nav.
+
+**⚠️ CRITICAL**: No user story work can begin until this phase is complete.
+
+- [X] T003 Add an `EnsembleSection` Pydantic model (campaign_dir, chapters_glob, per-stage `extract`/`synthesize` backend profiles, known_names, aliases_path — per data-model.md §"Config schema addition") to `server/config_models.py` and register `"ensemble"` in `UI_SECTION_NAMES`
+- [X] T004 Mount the ensemble router at `/api/ensemble` via `app.include_router(...)` in `server/main.py` (alongside the existing routers; do not modify any existing registration)
+- [X] T005 Add the `/ensemble` route tree to `frontend/src/router.ts` and create `frontend/src/views/EnsembleWorkflow.vue` as a `WizardShell` host with the stage steps (Setup → Extract → Bundle → Synthesize → Review), mirroring `frontend/src/views/SessionWorkflow.vue`
+- [X] T006 [P] Add an "Ensemble Workflow" entry to the app's primary navigation in `frontend/src/App.vue`, placed beside the existing "Grounding Docs" link (which stays unchanged)
+- [X] T007 [P] Add shared router helpers to `server/routers/ensemble.py`: an `_sse_response()` and `_cmd_opt/_cmd_multi/_cmd_flag` set (copy the pattern from `server/routers/grounding.py`) and a `_resolve_ensemble_path()` that confines paths to the campaign workspace
+
+**Checkpoint**: The new page is reachable and empty; the router is mounted; config persists. User stories can begin.
+
+---
+
+## Phase 3: User Story 1 - Walk the ensemble pipeline from a single UI surface (Priority: P1) 🎯 MVP
+
+**Goal**: Step the operator through extract → bundle → synthesize → review from one page, with stage status derived from disk and each step's output streamed. Works with the **existing** backends (DGX/Anthropic); OpenRouter arrives in US2.
+
+**Independent Test**: With a campaign that has chapter files, run extraction from the page, see per-chapter artifacts appear, run bundling, reach synthesis — without typing a command; reload and confirm completed stages are still shown complete.
+
+### Tests for User Story 1
+
+- [X] T008 [P] [US1] Integration test: `GET /api/ensemble/status` reports `extract` as current with no run, then `extract: complete` once `docs/ensemble/per_chapter/*/merged.json` exist — in `tests/test_ensemble_status.py` (quickstart Validation 3)
+
+### Implementation for User Story 1
+
+- [X] T009 [US1] Implement disk-derived `GET /api/ensemble/status` (completion predicates per contracts/api.md §Status; no caching) in `server/routers/ensemble.py`
+- [X] T010 [US1] Implement `GET /api/ensemble/files` and `GET /api/ensemble/file` (list/read artifacts, mirror `grounding.py:/extracts`) in `server/routers/ensemble.py`
+- [X] T010a [US1] **(M4)** Implement a per-campaign, per-stage in-flight lock helper in `server/routers/ensemble.py` (lock file or in-process registry keyed by campaign+stage). ALL `/run/*` endpoints (T011–T013a) MUST acquire it on launch and return HTTP 409 "stage already running" if held — preventing concurrent writers from corrupting `per_chapter/` cache (the `ensemble_workflow.md` orphaned-worker trap). Released on stream completion.
+- [X] T011 [US1] Implement stage runner `GET /api/ensemble/run/extract` (builds `ensemble_batch.py`, SSE via `stream_subprocess`, resumable; acquires the T010a lock) in `server/routers/ensemble.py`
+- [X] T012 [US1] Implement stage runner `GET /api/ensemble/run/bundle` (builds `facts_to_state.py`, including `list=true` → `--list` no-model mode) in `server/routers/ensemble.py`
+- [X] T013 [US1] Implement stage runners `GET /api/ensemble/run/recent-events` (`build_recent_events.py`) and `GET /api/ensemble/run/synthesize` (dispatch on `doc` to the four synthesis scripts; reject `output` that targets a live doc) in `server/routers/ensemble.py`
+- [X] T013a [US1] **(M1)** Implement stage runner `GET /api/ensemble/run/threads` (builds `facts_to_state.py --types thread --render-only`, deterministic/no-model, writes `docs/ensemble/threads.md`) in `server/routers/ensemble.py`, symmetric with `/run/recent-events`. This is the chronological-spine input fed to `/run/synthesize --threads` (contracts/api.md, data-model.md §Stage). Surface it in `EnsembleBundle.vue` (T016).
+- [X] T014 [P] [US1] Build `frontend/src/views/ensemble/EnsembleSetup.vue` — campaign dir + chapter glob inputs **plus known-names (multi-path) and aliases-path inputs (M2)**, all persisted via `config.updateSection('ensemble', …)`. The bundle endpoint (T012) and the US3 alias gate (T036) read `known_names`/`aliases_path` from this config.
+- [X] T015 [P] [US1] Build `frontend/src/views/ensemble/EnsembleExtract.vue` — run `/run/extract` via `connectSSE`/`RunPanel`, stream progress, list produced artifacts, reflect status
+- [X] T016 [P] [US1] Build `frontend/src/views/ensemble/EnsembleBundle.vue` — run `/run/bundle` (and the `--list` scope view), stream output, list dossiers
+- [X] T017 [P] [US1] Build `frontend/src/views/ensemble/EnsembleSynthesize.vue` — run `/run/synthesize` per doc, write `*_draft.md`, list drafts
+- [X] T018 [US1] Wire the `WizardShell` steps in `EnsembleWorkflow.vue` to `GET /api/ensemble/status` so stage completion (disk-derived) drives step state and survives reload
+
+**Checkpoint**: A full extract→bundle→synthesize→draft walk is doable from the page using DGX/Anthropic. MVP complete.
+
+---
+
+## Phase 4: User Story 2 - Choose the backend per stage, including OpenRouter (Priority: P2)
+
+**Goal**: Make extraction/aggregation and synthesis backend-selectable independently among DGX, Anthropic, and **OpenRouter** — OpenRouter reached only through the single `campaignlib` seam (Principle V).
+
+**Independent Test**: With the local box unreachable, select OpenRouter for extraction, run it, then select Anthropic for synthesis and complete a refresh; each artifact records the backend used.
+
+### Tests for User Story 2
+
+- [X] T019 [P] [US2] Contract test `tests/test_openrouter_seam.py`: `make_client(backend="openrouter")` returns the OpenRouter client; missing `OPENROUTER_API_KEY` raises; no module outside `campaignlib/api` constructs it (contracts/cli.md §Seam)
+- [X] T019a [P] [US2] **(M5)** Integration test `tests/test_backend_retry_resume.py`: fail a stage partway on backend A, retry on backend B, and assert (1) prior-stage artifacts intact, (2) the failed stage resumes (skip-if-exists) rather than restarts, (3) no empty/partial `merged.json` counts as complete (locks SC-003; also exercises the M3 guard from T027a)
+
+### Implementation for User Story 2
+
+- [X] T020 [US2] Implement `_OpenRouterClient` in `campaignlib/api/backends.py` (OpenAI SDK at `https://openrouter.ai/api/v1`, real `OPENROUTER_API_KEY`, model id passed verbatim — no dgxlib lookup, `OPENROUTER_BASE_URL` override, Anthropic-shaped `.messages` façade). **(M3 prevention)** Honors a no-thinking request extra (per-call and via `DGX_NO_THINKING`/equivalent env) so extraction can suppress reasoning traces — the dgxlib `thinking_default: false` safety net does not apply on this path.
+- [X] T021 [US2] Add the `backend == "openrouter"` branch to `make_client()` in `campaignlib/api/client.py` (precedence: claude-code → openrouter → dgx endpoint → Anthropic default) — depends on T020
+- [X] T022 [P] [US2] Add `--backend {anthropic,dgx,openrouter}` + `--endpoint` flags to `synthesise_world_state.py` and thread them into its `make_client(...)` call (default `anthropic` ⇒ unchanged)
+- [X] T023 [P] [US2] Add the same `--backend`/`--endpoint` flags to `campaign_state.py`, threaded into `make_client(...)`
+- [X] T024 [P] [US2] Add the same `--backend`/`--endpoint` flags to `party.py`, threaded into `make_client(...)`
+- [X] T025 [P] [US2] Add the same `--backend`/`--endpoint` flags to `planning.py`, threaded into `make_client(...)`
+- [X] T026 [US2] Verify the extraction/aggregation scripts reach OpenRouter via `CG_BACKEND=openrouter` + an OpenRouter `--model` (no script edit expected for `ensemble_batch.py`/`facts_to_state.py`); add a `--backend` pass-through only if needed for symmetry
+- [ ] T027 [US2] Stamp backend+model provenance into LLM-produced outputs (synthesis drafts and `facts_to_state.py` dossiers) where each script already records metadata (FR-008) — sequential, touches the synthesis scripts + `facts_to_state.py`
+- [X] T027a [US2] **(M3 detection)** Add an empty-output guard in the seam (`campaignlib/api`: treat empty/whitespace `content` from any backend as an error, not a result) and ensure the extraction/aggregation/synthesis scripts fail loudly (non-zero exit) and write NO empty/partial artifact when output is empty — so a silently-empty run never flips disk-derived status (FR-002) to "complete" (spec edge case; FR-009). Covered by T019a.
+- [X] T028 [US2] Add `backend`/`endpoint`/`model` query params to all `/api/ensemble/run/*` endpoints and inject `ANTHROPIC_API_KEY`/`OPENROUTER_API_KEY` via `stream_subprocess` `env_extra` (never as query params) in `server/routers/ensemble.py`
+- [X] T029 [US2] Add a synthesis-capability allow-list to `server/config.py` and surface a non-fatal warning in `/api/ensemble/run/synthesize` when a sub-Sonnet model is chosen for synthesis (FR-014, R6) in `server/routers/ensemble.py`
+- [X] T030 [P] [US2] Add per-stage backend selectors (extract + synthesize, independent) to `frontend/src/views/ensemble/EnsembleSetup.vue`, persist to `ui.ensemble`, and display the recorded backend on produced artifacts
+
+**Checkpoint**: Each LLM stage runs on any of the three backends, mixable; OpenRouter lives only in the seam.
+
+---
+
+## Phase 5: User Story 3 - Drop to Claude or the CLI for the judgment between steps (Priority: P2)
+
+**Goal**: Represent the human-judgment checkpoints (scope review, alias correction, diff-before-promote) as blocking gates satisfied in Claude/CLI; files are the interchange; the UI never auto-advances past a precision boundary and never auto-overwrites a live doc.
+
+**Independent Test**: At the scope gate, an alias edit made outside the UI is reflected on return without re-running any LLM step; at the promote gate, a draft reaches a live doc only via the explicit promote action.
+
+### Tests for User Story 3
+
+- [X] T031 [P] [US3] Integration test: `/api/ensemble/run/synthesize` rejects an `output` pointing at a live grounding doc; `PUT /api/ensemble/file` to a live doc is rejected; `POST /api/ensemble/promote` is the only writer of live docs — in `tests/test_ensemble_gates.py` (quickstart Validation 6/7)
+
+### Implementation for User Story 3
+
+- [X] T032 [US3] Implement `PUT /api/ensemble/file` (path-validated, confined to workspace, **rejects live grounding docs**) in `server/routers/ensemble.py`
+- [X] T033 [US3] Implement `GET /api/ensemble/diff` (unified diff draft vs live, read-only) in `server/routers/ensemble.py`
+- [X] T034 [US3] Implement `POST /api/ensemble/promote` (copy reviewed draft → live; restricted to the four known grounding docs) in `server/routers/ensemble.py`
+- [X] T035 [P] [US3] Add the scope-review gate to `frontend/src/views/ensemble/EnsembleBundle.vue` — show the `--list` output, block advancement to aggregation until the operator confirms
+- [X] T036 [US3] Add the alias-correction gate to `frontend/src/views/ensemble/EnsembleBundle.vue` — edit `aliases.json` via the file endpoints (or hand off to CLI/chat) and reflect external edits without re-running an LLM step — same file as T035, sequential
+- [X] T037 [P] [US3] Add the diff-before-promote gate to `frontend/src/views/ensemble/EnsembleSynthesize.vue` — render the `/diff`, expose an explicit **Promote** button calling `/promote`, never auto-write
+- [X] T038 [US3] Reflect gate confirmation state in `EnsembleWorkflow.vue` so the wizard cannot skip an unsatisfied gate
+
+**Checkpoint**: Aggregation never consumes extraction output until scope/alias are confirmed; promotion is always explicit.
+
+---
+
+## Phase 6: User Story 4 - Keep the existing Anthropic workflow available (Priority: P3)
+
+**Goal**: Guarantee the existing per-tool Anthropic grounding-doc path (the `/grounding` page) is unchanged and independently usable.
+
+**Independent Test**: After this feature ships, the `/grounding` page behaves identically and the synthesis scripts with no new flags produce the same commands/output.
+
+### Tests for User Story 4
+
+- [X] T039 [P] [US4] Regression test: each synthesis script invoked with **no** `--backend`/`--endpoint` constructs the same `make_client()` (Anthropic) path and output as before — in `tests/test_synthesis_backend_default.py` (SC-006)
+
+### Implementation for User Story 4
+
+- [X] T040 [US4] Confirm `tests/test_retrieve_render_isolation.py` passes with the new router (the router must contain no retrieval/render calls) and run the full `pytest tests/` suite
+- [X] T041 [US4] Verify by inspection that `server/routers/grounding.py` and `frontend/src/views/GroundingDocs.vue` (and its nested views) are untouched by this feature; record the diff scope
+
+**Checkpoint**: New workflow and old workflow coexist; no regression.
+
+---
+
+## Phase 7: Polish & Cross-Cutting Concerns
+
+- [ ] T042 [P] Run all 8 validations in `quickstart.md` end-to-end and record results
+- [X] T043 [P] Update `docs/web/web_ui.md` to document the new Ensemble Workflow page and add a "run this from the UI" pointer near the top of `docs/cli/ensemble_workflow.md`
+- [X] T044 Consistency/cleanup pass on `server/routers/ensemble.py` (helper reuse, error messages match the fast-fail contract in FR-009)
+
+---
+
+## Dependencies & Execution Order
+
+### Phase Dependencies
+
+- **Setup (Phase 1)**: no dependencies.
+- **Foundational (Phase 2)**: depends on Setup; **blocks all user stories**.
+- **US1 (Phase 3)**: depends on Foundational. The MVP.
+- **US2 (Phase 4)**: depends on Foundational. Builds on US1's run endpoints (adds backend params) and Setup page (adds selectors), but the seam/CLI work (T019–T027) is independent of US1 and can proceed in parallel with Phase 3.
+- **US3 (Phase 5)**: depends on Foundational and on US1's bundle/synthesize endpoints + step components (it adds gates to them).
+- **US4 (Phase 6)**: depends on US2 (the `--backend` defaults it asserts) and on the new router existing; otherwise independent.
+- **Polish (Phase 7)**: after the desired stories are complete.
+
+### Story-level notes
+
+- **US2's seam + CLI tasks (T019–T027)** touch `campaignlib/` and root scripts — fully independent of the US1 UI and can be built first or in parallel.
+- **US3** extends `EnsembleBundle.vue` / `EnsembleSynthesize.vue` created in US1, so it follows US1 for those files.
+
+### Within `server/routers/ensemble.py`
+
+Tasks T007, T009–T013, **T010a, T013a**, T028, T029, T032–T034 all edit this one file → they are **sequential** with respect to each other (no `[P]`), even across stories. Plan to serialize router edits. Note T010a (the in-flight lock) must land before/with the `/run/*` endpoints since they acquire it.
+
+---
+
+## Parallel Opportunities
+
+- **Setup**: T002 [P].
+- **Foundational**: T006, T007 [P] (different files: `App.vue`, `ensemble.py`).
+- **US1 frontend**: T014, T015, T016, T017 [P] (four distinct `.vue` files). T008 [P] (test).
+- **US2 CLI**: T022, T023, T024, T025 [P] (four distinct scripts); T019, T019a [P] (tests); T030 [P] (frontend).
+- **US3**: T031 [P] (test); T035 and T037 [P] (different `.vue` files); T036 follows T035 (same file).
+- **US4**: T039 [P] (test).
+- **Polish**: T042, T043 [P].
+
+### Parallel example — US1 frontend
+
+```bash
+# After the run/status endpoints exist, build the four step components together:
+Task: "Build EnsembleSetup.vue"      # T014
+Task: "Build EnsembleExtract.vue"    # T015
+Task: "Build EnsembleBundle.vue"     # T016
+Task: "Build EnsembleSynthesize.vue" # T017
+```
+
+### Parallel example — US2 CLI flags
+
+```bash
+# Independent scripts, same flag addition:
+Task: "Add --backend/--endpoint to synthesise_world_state.py"  # T022
+Task: "Add --backend/--endpoint to campaign_state.py"          # T023
+Task: "Add --backend/--endpoint to party.py"                   # T024
+Task: "Add --backend/--endpoint to planning.py"                # T025
+```
+
+---
+
+## Implementation Strategy
+
+### MVP First (User Story 1 only)
+
+1. Phase 1 Setup → Phase 2 Foundational.
+2. Phase 3 US1.
+3. **STOP and VALIDATE**: walk extract→bundle→synthesize→draft from the page on DGX/Anthropic (quickstart Validations 3–4). Demo.
+
+### Incremental Delivery
+
+1. Setup + Foundational → page reachable.
+2. + US1 → walk the pipeline (MVP).
+3. + US2 → OpenRouter and per-stage backends (the headline ask; ship the seam test first).
+4. + US3 → blocking gates and explicit promotion.
+5. + US4 → regression guard locks the old path.
+
+### Recommended early track
+
+Because US2's seam (T019–T021) is the riskiest, novel surface (a new LLM vendor through the one seam) and is UI-independent, build and test it **in parallel with US1** even though it's P2 — it de-risks the headline requirement without blocking the MVP.
+
+---
+
+## Notes
+
+- `[P]` = different files, no incomplete dependency. All `server/routers/ensemble.py` edits are mutually sequential.
+- Every backend choice is a CLI flag first (Principle VI); the UI only sets it.
+- Drafts only; `POST /promote` is the sole live-doc writer (Principle I).
+- Gates block auto-advance (Principle II); files are the interchange (Principle IX).
+- Commit after each task or logical group; stop at any checkpoint to validate a story independently.
+
+---
+
+## Remediation Log (from `/speckit-analyze`)
+
+MEDIUM findings resolved into tasks (suffixed IDs avoid renumbering):
+
+| Finding | Decision | Task(s) |
+|---|---|---|
+| M1 — `threads.md` had no producer | B: dedicated endpoint | T013a |
+| M2 — Setup UI lacked known-names/aliases inputs | A: add to Setup | T014 (expanded) |
+| M3 — empty-output trap on OpenRouter path | A: prevention + detection | T020 (prevention), T027a (detection) |
+| M4 — no concurrent-run guard | A: server-side lock | T010a |
+| M5 — backend-retry-without-loss untested | A: integration test | T019a |
+
+LOW findings (A1, L1–L4, C1) were accepted as-is; see the analysis report. C1 (the pre-existing `ensemble_merge.py` embedding client outside the seam) is explicitly **not** extended by this feature — OpenRouter chat goes only through `campaignlib/api`.
+
+---
+
+## Post-implement enhancement — chapter picker (operator request)
+
+The single chapters-glob text field was too blunt: the operator needs to **select
+all / select one / pick a subset / sort** the chapters before extraction, not just
+type a glob. Resolved additively, CLI-first:
+
+| Layer | Change |
+|---|---|
+| Engine | `ensemble_batch.py --chapters` now `nargs="+"` — unions one or more globs/paths, de-dupes, sorts. Single-glob callers unchanged (Principle VI: the engine gains the capability). |
+| API | `GET /api/ensemble/chapters?glob=…` resolves globs → sorted file list with a disk-derived `extracted` flag (Principle I); `GET /run/extract` `chapters` is now a list (select-all = the glob, subset = the picked paths). |
+| Config | `EnsembleSection.chapters_selected: list[str]` — the explicit chosen set; empty == nothing selected. No secrets. |
+| UI | New `ChapterPicker.vue` (glob + Resolve, Select all / Select none / "only", natural sort ▲▼, per-chapter `extracted`/`pending` badge) wired into both Setup and Extract. |
+| Tests | `test_ensemble_chapters.py` (resolution, multi-glob union/dedupe, empty, workspace-confinement, **empty-selection refusal**), `test_ensemble_batch_chapters.py` (nargs contract). +7 passing, zero regressions. |
+
+### Constitution amendment — Principle X (operator-elevated)
+
+The operator ruled, as a matter of UX design, that **"there is no 'select all' that isn't explicit."** This was elevated to the constitution as **Principle X — Selection is Explicit; There is No Silent "All"** (v1.1.0 → **1.2.0**, MINOR). The chapter picker is now its concrete clause:
+
+- `chapters_selected == []` means *nothing selected* — it no longer falls back to the glob.
+- `GET /api/ensemble/run/extract` **refuses** an empty selection (SSE error, returncode 1) instead of expanding to "all"; the Run button is disabled until ≥1 chapter is picked.
+- "Select all" **materializes** every resolved path into `chapters_selected` — it is a deliberate act, not a default.
+- The CLI engine (`ensemble_batch.py`) is exempt: a glob typed at the CLI is itself explicit. The UI must never manufacture that act for the human.
diff --git a/synthesise_world_state.py b/synthesise_world_state.py
index 20d6783..f680c5b 100644
--- a/synthesise_world_state.py
+++ b/synthesise_world_state.py
@@ -68,6 +68,8 @@
 
 from campaignlib import (
     DEFAULT_MODEL,
+    add_backend_args,
+    client_from_args,
     load_agent_prompt,
     make_client,
     stream_api,
@@ -334,8 +336,10 @@ def main() -> None:
                              "input for grounding (default: on). --no-quotes for "
                              "a clean baseline comparison against the extracts.")
     parser.add_argument("--model", default=DEFAULT_MODEL,
-                        help=f"Claude model id (default: {DEFAULT_MODEL}). "
-                             f"Use claude-opus-4-7 for highest-quality synthesis.")
+                        help=f"Model id (default: {DEFAULT_MODEL}). "
+                             f"Use claude-opus-4-7 for highest-quality synthesis; "
+                             f"an OpenRouter id (e.g. anthropic/claude-sonnet-4) for --backend openrouter.")
+    add_backend_args(parser)
     parser.add_argument("--max-tokens", type=int, default=16000,
                         help="max_tokens for the synthesis call (default: 16000).")
     parser.add_argument("--dump-input", default=None, metavar="FILE",
@@ -470,7 +474,7 @@ def main() -> None:
     print(f"[Input: {len(user_prompt):,} chars]")
     print("=" * 60)
 
-    client = make_client()
+    client = client_from_args(args)
     world_state = stream_api(
         client,
         system_prompt,
diff --git a/tests/test_ensemble_batch_chapters.py b/tests/test_ensemble_batch_chapters.py
new file mode 100644
index 0000000..73b127b
--- /dev/null
+++ b/tests/test_ensemble_batch_chapters.py
@@ -0,0 +1,16 @@
+"""ensemble_batch.py --chapters accepts one or more globs/paths (the engine
+contract behind the UI chapter picker's select-all / select-one / subset)."""
+
+import ensemble_batch
+
+
+def test_chapters_accepts_multiple_globs():
+    p = ensemble_batch._build_parser()
+    args = p.parse_args(["--chapters", "docs/a_*.md", "docs/b_03.md", "--out", "x.json"])
+    assert args.chapters == ["docs/a_*.md", "docs/b_03.md"]
+
+
+def test_chapters_single_value_still_works():
+    p = ensemble_batch._build_parser()
+    args = p.parse_args(["--chapters", "docs/chapters/chapter_*.md", "--out", "x.json"])
+    assert args.chapters == ["docs/chapters/chapter_*.md"]
diff --git a/tests/test_ensemble_chapters.py b/tests/test_ensemble_chapters.py
new file mode 100644
index 0000000..bcd7895
--- /dev/null
+++ b/tests/test_ensemble_chapters.py
@@ -0,0 +1,81 @@
+"""Tests for the chapter picker: /api/ensemble/chapters resolution + the
+multi-chapter extract contract (select all / select one / subset)."""
+
+import json
+
+from fastapi.testclient import TestClient
+
+from server.main import app
+
+client = TestClient(app)
+
+
+def _make_chapters(tmp_path):
+    d = tmp_path / "docs/chapters"
+    d.mkdir(parents=True)
+    for n in ("01", "02", "10"):
+        (d / f"chapter_{n}.md").write_text(f"# chapter {n}")
+
+
+def test_chapters_resolves_glob_with_extracted_flag(tmp_path, monkeypatch):
+    monkeypatch.chdir(tmp_path)
+    _make_chapters(tmp_path)
+    # chapter_02 already has a merged.json → must be flagged extracted.
+    pc = tmp_path / "docs/ensemble/per_chapter/chapter_02"
+    pc.mkdir(parents=True)
+    (pc / "merged.json").write_text(json.dumps({"facts": []}))
+
+    body = client.get("/api/ensemble/chapters",
+                      params={"glob": "docs/chapters/chapter_*.md"}).json()
+    assert body["count"] == 3
+    by_stem = {c["stem"]: c for c in body["chapters"]}
+    assert set(by_stem) == {"chapter_01", "chapter_02", "chapter_10"}
+    assert by_stem["chapter_02"]["extracted"] is True
+    assert by_stem["chapter_01"]["extracted"] is False
+    # Paths are workspace-relative.
+    assert by_stem["chapter_01"]["path"] == "docs/chapters/chapter_01.md"
+
+
+def test_chapters_unions_multiple_globs_and_dedupes(tmp_path, monkeypatch):
+    monkeypatch.chdir(tmp_path)
+    _make_chapters(tmp_path)
+    body = client.get(
+        "/api/ensemble/chapters",
+        params=[("glob", "docs/chapters/chapter_01.md"),
+                ("glob", "docs/chapters/chapter_0*.md")],  # overlaps chapter_01
+    ).json()
+    stems = sorted(c["stem"] for c in body["chapters"])
+    assert stems == ["chapter_01", "chapter_02"]  # 01 not duplicated
+
+
+def test_chapters_empty_when_nothing_matches(tmp_path, monkeypatch):
+    monkeypatch.chdir(tmp_path)
+    _make_chapters(tmp_path)
+    body = client.get("/api/ensemble/chapters",
+                      params={"glob": "docs/chapters/nope_*.md"}).json()
+    assert body == {"chapters": [], "count": 0}
+
+
+def test_chapters_confined_to_workspace(tmp_path, monkeypatch):
+    monkeypatch.chdir(tmp_path)
+    _make_chapters(tmp_path)
+    # An escaping glob resolves to nothing inside the workspace, never leaks.
+    body = client.get("/api/ensemble/chapters",
+                      params={"glob": "../*.md"}).json()
+    assert body["count"] == 0
+
+
+# ── Principle X: no silent "all" ─────────────────────────────────────────────
+
+def test_extract_refuses_empty_selection(tmp_path, monkeypatch):
+    """An empty selection must be refused, never expanded to the full glob."""
+    monkeypatch.chdir(tmp_path)
+    _make_chapters(tmp_path)
+    # No chapters param at all → must refuse with a clear message, not run.
+    r = client.get("/api/ensemble/run/extract")
+    assert r.status_code == 200  # SSE channel opens, but carries a refusal
+    assert "No chapters selected" in r.text
+    assert '"returncode": 1' in r.text
+    # An explicitly empty list is refused identically (no glob fallback).
+    r2 = client.get("/api/ensemble/run/extract", params={"chapters": ""})
+    assert "No chapters selected" in r2.text
diff --git a/tests/test_ensemble_gates.py b/tests/test_ensemble_gates.py
new file mode 100644
index 0000000..73ee78e
--- /dev/null
+++ b/tests/test_ensemble_gates.py
@@ -0,0 +1,62 @@
+"""Gate guards: drafts-only synthesis, no live-doc writes, promote is the sole
+live-doc writer (FR-013, SC-005, spec US3)."""
+
+from fastapi.testclient import TestClient
+
+from server.main import app
+
+client = TestClient(app)
+
+
+def test_synthesize_rejects_live_doc_output(tmp_path, monkeypatch):
+    monkeypatch.chdir(tmp_path)
+    r = client.get("/api/ensemble/run/synthesize",
+                   params={"doc": "world_state", "output": "docs/world_state.md"})
+    assert r.status_code == 400
+    assert "draft" in r.json()["detail"]
+
+
+def test_put_file_rejects_live_doc(tmp_path, monkeypatch):
+    monkeypatch.chdir(tmp_path)
+    (tmp_path / "docs").mkdir()
+    r = client.put("/api/ensemble/file", params={"path": "docs/world_state.md"},
+                   json={"content": "clobbered"})
+    assert r.status_code == 403
+
+
+def test_put_file_allows_aliases(tmp_path, monkeypatch):
+    monkeypatch.chdir(tmp_path)
+    r = client.put("/api/ensemble/file",
+                   params={"path": "docs/ensemble/aliases.json"},
+                   json={"content": "{}"})
+    assert r.status_code == 200
+    assert (tmp_path / "docs/ensemble/aliases.json").read_text() == "{}"
+
+
+def test_promote_is_sole_live_writer(tmp_path, monkeypatch):
+    monkeypatch.chdir(tmp_path)
+    (tmp_path / "docs").mkdir()
+    draft = tmp_path / "docs/world_state_draft.md"
+    draft.write_text("promoted body")
+    live = tmp_path / "docs/world_state.md"
+    assert not live.exists()
+
+    r = client.post("/api/ensemble/promote",
+                    json={"draft": "docs/world_state_draft.md", "live": "docs/world_state.md"})
+    assert r.status_code == 200
+    assert live.read_text() == "promoted body"
+
+
+def test_promote_rejects_non_grounding_target(tmp_path, monkeypatch):
+    monkeypatch.chdir(tmp_path)
+    (tmp_path / "docs").mkdir()
+    (tmp_path / "docs/world_state_draft.md").write_text("x")
+    r = client.post("/api/ensemble/promote",
+                    json={"draft": "docs/world_state_draft.md", "live": "docs/notes.md"})
+    assert r.status_code == 400
+
+
+def test_path_traversal_rejected(tmp_path, monkeypatch):
+    monkeypatch.chdir(tmp_path)
+    r = client.get("/api/ensemble/file", params={"path": "../../etc/passwd"})
+    assert r.status_code == 400
diff --git a/tests/test_ensemble_status.py b/tests/test_ensemble_status.py
new file mode 100644
index 0000000..30e812b
--- /dev/null
+++ b/tests/test_ensemble_status.py
@@ -0,0 +1,34 @@
+"""Integration tests for /api/ensemble/status — disk-derived stage state (FR-002)."""
+
+import json
+
+from fastapi.testclient import TestClient
+
+from server.main import app
+
+client = TestClient(app)
+
+
+def test_status_extract_current_then_complete(tmp_path, monkeypatch):
+    monkeypatch.chdir(tmp_path)
+    (tmp_path / "docs/chapters").mkdir(parents=True)
+    (tmp_path / "docs/chapters/chapter_01.md").write_text("# ch1")
+
+    # No run yet → extract is the current stage.
+    r = client.get("/api/ensemble/status")
+    assert r.status_code == 200
+    body = r.json()
+    assert body["current_stage"] == "extract"
+    assert {s["id"] for s in body["stages"]} == {"extract", "bundle", "synthesize", "review"}
+    assert next(s for s in body["stages"] if s["id"] == "extract")["status"] == "not_started"
+
+    # Extraction artifacts appear on disk → status flips to complete with no caching.
+    pc = tmp_path / "docs/ensemble/per_chapter/chapter_01"
+    pc.mkdir(parents=True)
+    (pc / "merged.json").write_text(json.dumps({"facts": []}))
+
+    body2 = client.get("/api/ensemble/status").json()
+    extract = next(s for s in body2["stages"] if s["id"] == "extract")
+    assert extract["status"] == "complete"
+    assert extract["artifacts"] == 1
+    assert body2["current_stage"] == "bundle"
diff --git a/tests/test_openrouter_seam.py b/tests/test_openrouter_seam.py
new file mode 100644
index 0000000..434d482
--- /dev/null
+++ b/tests/test_openrouter_seam.py
@@ -0,0 +1,129 @@
+"""Contract tests for the OpenRouter backend seam (spec 001-ensemble-workflow-ui).
+
+Enforces Constitution Principle V (one seam per boundary): OpenRouter is reached
+ONLY through campaignlib.api, selection is uniform across scripts, a missing key
+fails loudly, and an empty model response is never silently accepted (M3).
+"""
+
+import argparse
+from pathlib import Path
+
+import pytest
+
+import campaignlib
+from campaignlib.api import client as client_mod
+from campaignlib.api import backends as backends_mod
+
+
+REPO_ROOT = Path(__file__).resolve().parent.parent
+
+
+# ── Routing: make_client dispatches to the OpenRouter client ────────────────
+
+def test_make_client_routes_openrouter(monkeypatch):
+    """backend='openrouter' must construct _OpenRouterClient (not Anthropic/DGX)."""
+    sentinel = object()
+    captured = {}
+
+    def fake_ctor(model_override=None):
+        captured["model_override"] = model_override
+        return sentinel
+
+    monkeypatch.setattr(client_mod, "_OpenRouterClient", fake_ctor)
+    out = client_mod.make_client(backend="openrouter", model_override="anthropic/claude-sonnet-4")
+    assert out is sentinel
+    assert captured["model_override"] == "anthropic/claude-sonnet-4"
+
+
+def test_cg_backend_env_selects_openrouter(monkeypatch):
+    """CG_BACKEND=openrouter selects the branch with no explicit arg."""
+    monkeypatch.setattr(client_mod, "_OpenRouterClient", lambda model_override=None: "OR")
+    monkeypatch.setenv("CG_BACKEND", "openrouter")
+    assert client_mod.make_client() == "OR"
+
+
+# ── Missing key fails loudly (no silent fallback) ───────────────────────────
+
+def test_missing_key_raises(monkeypatch):
+    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
+    with pytest.raises(RuntimeError, match="OPENROUTER_API_KEY"):
+        backends_mod._OpenRouterClient()
+
+
+# ── No-thinking mapping (M3 prevention) ─────────────────────────────────────
+
+def test_extra_body_no_thinking_mapping(monkeypatch):
+    """thinking=False maps to OpenRouter's reasoning-disable control."""
+    monkeypatch.setenv("OPENROUTER_API_KEY", "x")
+    # Build only the method under test without constructing the SDK client.
+    inst = backends_mod._OpenRouterClient.__new__(backends_mod._OpenRouterClient)
+    assert inst.extra_body_for("m", thinking=False) == {"reasoning": {"enabled": False}}
+    assert inst.extra_body_for("m", thinking=True) == {}
+    monkeypatch.setenv("DGX_NO_THINKING", "1")
+    assert inst.extra_body_for("m", thinking=None) == {"reasoning": {"enabled": False}}
+
+
+# ── Empty-output guard (M3 detection) ───────────────────────────────────────
+
+@pytest.mark.parametrize("bad", [None, "", "   ", "\n\t "])
+def test_require_nonempty_raises(bad):
+    with pytest.raises(RuntimeError, match="empty output"):
+        client_mod._require_nonempty(bad)
+
+
+def test_require_nonempty_passes_through():
+    assert client_mod._require_nonempty("real text") == "real text"
+
+
+# ── Uniform backend-selection vocabulary + backward compatibility ───────────
+
+def test_add_backend_args_defaults_anthropic():
+    p = argparse.ArgumentParser()
+    p.add_argument("--model", default="claude-sonnet-4-6")
+    campaignlib.add_backend_args(p)
+    ns = p.parse_args([])
+    assert ns.backend == "anthropic"
+    assert ns.endpoint is None
+
+
+def test_client_from_args_anthropic_is_backward_compatible(monkeypatch):
+    """Default backend must call make_client(None, None, None) so env still applies."""
+    seen = {}
+    monkeypatch.setattr(client_mod, "make_client",
+                        lambda backend=None, endpoint=None, model_override=None:
+                        seen.update(backend=backend, endpoint=endpoint, model_override=model_override))
+    ns = argparse.Namespace(backend="anthropic", endpoint=None, model="claude-sonnet-4-6")
+    client_mod.client_from_args(ns)
+    assert seen == {"backend": None, "endpoint": None, "model_override": None}
+
+
+def test_client_from_args_openrouter_passes_model(monkeypatch):
+    seen = {}
+    monkeypatch.setattr(client_mod, "make_client",
+                        lambda backend=None, endpoint=None, model_override=None:
+                        seen.update(backend=backend, endpoint=endpoint, model_override=model_override))
+    ns = argparse.Namespace(backend="openrouter", endpoint=None, model="anthropic/claude-sonnet-4")
+    client_mod.client_from_args(ns)
+    assert seen == {"backend": "openrouter", "endpoint": None,
+                    "model_override": "anthropic/claude-sonnet-4"}
+
+
+# ── Principle V: OpenRouter constructed only inside campaignlib/api ──────────
+
+def test_no_out_of_seam_openrouter_construction():
+    """No module outside campaignlib/api may hard-wire OpenRouter's base URL or
+    construct the client directly — selection goes through make_client / env."""
+    offenders = []
+    seam = (REPO_ROOT / "campaignlib" / "api").resolve()
+    for py in REPO_ROOT.rglob("*.py"):
+        rp = py.resolve()
+        if seam in rp.parents or rp.parent == seam:
+            continue
+        if "/tests/" in str(rp) or rp.name.startswith("test_"):
+            continue
+        if ".specify" in rp.parts or "node_modules" in rp.parts:
+            continue
+        text = py.read_text(encoding="utf-8", errors="ignore")
+        if "openrouter.ai" in text or "_OpenRouterClient(" in text:
+            offenders.append(str(rp.relative_to(REPO_ROOT)))
+    assert not offenders, f"OpenRouter referenced outside the seam: {offenders}"

From f2a634455f5742990997f5aaa6d1e6d5564cda98 Mon Sep 17 00:00:00 2001
From: Kostadis <kostadis@gmail.com>
Date: Sun, 28 Jun 2026 19:13:33 -0700
Subject: [PATCH 3/3] =?UTF-8?q?feat(ensemble):=20run=20observability=20?=
 =?UTF-8?q?=E2=80=94=20copyable=20command,=20live=20stream,=20abort=20+=20?=
 =?UTF-8?q?durable=20record?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Implements spec 002-ensemble-run-observability (T001–T032; T033 = manual QA pending):

Engine / shared seam
- atomic_write_text / atomic_write_json in campaignlib.util (FR-014): temp-then-rename
  so a SIGKILL never leaves a truncated merged.json or dossier at a resume-trusted path
- subprocess_runner: classify_result(), extended _save_run_log with result field (T003/T004)
- subprocess_runner: start_new_session=True + SIGTERM→wait(4 s)→SIGKILL process-group
  teardown on every exit path — normal, explicit abort, and disconnect (T020/T021)
- subprocess_runner: emits `event: command` as first SSE event carrying the secret-free
  invocation string; `done` payload includes aborted flag on signal exit (T006/T022)
- ensemble_merge + facts_to_state: atomic cache writes (T018/T019)

Frontend
- sse.ts: onCommand callback; onerror while running closes EventSource (no auto-restart)
  and transitions to aborted — a network drop is an implicit abort (T007/T025, I1)
- useEnsembleRun: command state, aborted status, abort() method (T008/T024/T025)
- RunCommandBar.vue: monospace copyable command box (T009)
- EnsembleExtract/Bundle/Synthesize: RunCommandBar wired; Abort button while running;
  aborted/connection-lost labels; success vs failure color distinction (T010/T014/T015/T026)

Tests (tests/test_subprocess_abort.py)
- secret-safety + explicit-selection-faithfulness (T011/T012)
- process-group kill on explicit abort and disconnect, child + grandchild (T027)
- grace→force timing; aborted record written (T028)
- atomic-write integrity under SIGKILL; lock released after abort (T029)
- non-ensemble SSE route regression — group-killed on disconnect, no orphan (T031)
- success and failure run records verified (T017)

Docs: web_ui.md + ensemble_workflow.md updated with abort/reconnect/per-run-log notes (T030)
Spec: specs/002-ensemble-run-observability/ — full artifact set committed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 .specify/feature.json                         |   2 +-
 CLAUDE.md                                     |  13 +-
 campaignlib/__init__.py                       |   4 +-
 campaignlib/util.py                           |  34 ++
 docs/cli/ensemble_workflow.md                 |  11 +
 docs/web/web_ui.md                            |  10 +
 ensemble_merge.py                             |   4 +-
 facts_to_state.py                             |   3 +-
 frontend/src/api/sse.ts                       |  16 +-
 .../src/components/shared/RunCommandBar.vue   |  77 ++++
 .../src/views/ensemble/EnsembleBundle.vue     |  40 +-
 .../src/views/ensemble/EnsembleExtract.vue    |  17 +-
 .../src/views/ensemble/EnsembleSynthesize.vue |  21 +-
 frontend/src/views/ensemble/useEnsembleRun.ts |  49 +-
 server/routers/ensemble.py                    |   3 +
 server/subprocess_runner.py                   | 191 +++++---
 .../checklists/requirements.md                |  36 ++
 .../contracts/run-stream.md                   |  64 +++
 .../data-model.md                             |  58 +++
 specs/002-ensemble-run-observability/plan.md  | 104 +++++
 .../quickstart.md                             |  83 ++++
 .../research.md                               |  69 +++
 specs/002-ensemble-run-observability/spec.md  | 143 ++++++
 specs/002-ensemble-run-observability/tasks.md | 208 +++++++++
 tests/test_subprocess_abort.py                | 427 ++++++++++++++++++
 25 files changed, 1604 insertions(+), 83 deletions(-)
 create mode 100644 frontend/src/components/shared/RunCommandBar.vue
 create mode 100644 specs/002-ensemble-run-observability/checklists/requirements.md
 create mode 100644 specs/002-ensemble-run-observability/contracts/run-stream.md
 create mode 100644 specs/002-ensemble-run-observability/data-model.md
 create mode 100644 specs/002-ensemble-run-observability/plan.md
 create mode 100644 specs/002-ensemble-run-observability/quickstart.md
 create mode 100644 specs/002-ensemble-run-observability/research.md
 create mode 100644 specs/002-ensemble-run-observability/spec.md
 create mode 100644 specs/002-ensemble-run-observability/tasks.md
 create mode 100644 tests/test_subprocess_abort.py

diff --git a/.specify/feature.json b/.specify/feature.json
index 69a4651..84dd9c2 100644
--- a/.specify/feature.json
+++ b/.specify/feature.json
@@ -1,3 +1,3 @@
 {
-  "feature_directory": "specs/001-ensemble-workflow-ui"
+  "feature_directory": "specs/002-ensemble-run-observability"
 }
diff --git a/CLAUDE.md b/CLAUDE.md
index 19e5637..7402cdb 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -152,8 +152,13 @@ a CLI with `--backend openrouter --model <openrouter-id>`, or via the
 <!-- SPECKIT START -->
 For additional context about technologies to be used, project structure,
 shell commands, and other important information, read the current plan:
-`specs/001-ensemble-workflow-ui/plan.md` (Ensemble Grounding-Doc Workflow UI —
-adds a stepped `/ensemble` UI page and OpenRouter as a per-stage LLM backend
-through the single `campaignlib` seam; leaves the existing `/grounding` Anthropic
-path unchanged).
+`specs/002-ensemble-run-observability/plan.md` (Ensemble Run Observability —
+makes an ensemble-stage run observable and controllable from the `/ensemble` UI:
+a copyable, secret-free reproducible command; live streamed output; an
+unambiguous succeeded/failed/aborted result plus a durable on-disk run record;
+and abort = graceful→force process-group kill, where a lost connection is an
+implicit abort. Engine correctness — process-group kill in the shared
+`server/subprocess_runner.py` seam, atomic per-unit cache writes in
+`ensemble_batch.py`/`facts_to_state.py` — stays in the CLI/seam layer, not the
+router. Predecessor: `specs/001-ensemble-workflow-ui/plan.md`.).
 <!-- SPECKIT END -->
diff --git a/campaignlib/__init__.py b/campaignlib/__init__.py
index e914beb..fd90a37 100644
--- a/campaignlib/__init__.py
+++ b/campaignlib/__init__.py
@@ -26,7 +26,7 @@
     load_agent_prompt,
     assemble_docs,
 )
-from .util import copy_to_clipboard, save_log
+from .util import copy_to_clipboard, save_log, atomic_write_text, atomic_write_json
 from .api.client import (
     make_client, call_api, call_api_with_tools, stream_api,
     add_backend_args, client_from_args,
@@ -80,6 +80,8 @@
     # util
     "copy_to_clipboard",
     "save_log",
+    "atomic_write_text",
+    "atomic_write_json",
     # api — client
     "make_client",
     "add_backend_args",
diff --git a/campaignlib/util.py b/campaignlib/util.py
index dd6402a..42fdce2 100644
--- a/campaignlib/util.py
+++ b/campaignlib/util.py
@@ -1,7 +1,41 @@
 """Clipboard and timestamped-log helpers."""
 
+import json
+import os
 from datetime import datetime
 from pathlib import Path
+from typing import Any
+
+
+def atomic_write_text(path: Path | str, text: str, encoding: str = "utf-8") -> None:
+    """Write text to path atomically (FR-014: no partial file at the trusted path).
+
+    Writes to a temp file in the same directory as `path`, then renames via
+    os.replace — a POSIX atomic rename on the same filesystem. A SIGKILL during
+    write leaves at most a discardable .tmp file; the destination is always either
+    the complete new content or the previous version, never a partial write.
+    """
+    path = Path(path)
+    path.parent.mkdir(parents=True, exist_ok=True)
+    tmp = path.with_suffix(path.suffix + ".tmp")
+    try:
+        tmp.write_text(text, encoding=encoding)
+        os.replace(tmp, path)
+    except BaseException:
+        try:
+            tmp.unlink(missing_ok=True)
+        except OSError:
+            pass
+        raise
+
+
+def atomic_write_json(path: Path | str, obj: Any, indent: int = 2) -> None:
+    """Write obj as JSON to path atomically (FR-014).
+
+    Serialises as json.dumps(obj, indent=indent) + "\\n" to match the existing
+    ensemble_merge.py output format exactly, then delegates to atomic_write_text.
+    """
+    atomic_write_text(path, json.dumps(obj, indent=indent) + "\n")
 
 
 def copy_to_clipboard(text: str) -> None:
diff --git a/docs/cli/ensemble_workflow.md b/docs/cli/ensemble_workflow.md
index 12622fe..bb9c4b5 100644
--- a/docs/cli/ensemble_workflow.md
+++ b/docs/cli/ensemble_workflow.md
@@ -968,6 +968,17 @@ Old per-tool API path (distill / planning / party / campaign_state each re-extra
 
 ---
 
+## Observability & abort
+
+All ensemble stages (extraction, bundling, synthesis) support:
+
+- **Copyable command**: Every UI-launched run emits the exact, secret-free invocation as the first SSE event. Paste it into a terminal in the campaign workspace to reproduce the run. No API key appears in the command — keys are inherited from the server environment, never on the command line.
+- **Live streaming**: Output lines appear incrementally as each chapter/step completes. The UI shows a "Running…" state while the run is active.
+- **Abort**: The Abort button (UI) closes the EventSource connection. The server observes the disconnect and group-kills the entire worker tree (SIGTERM → SIGKILL after ~4 s) — no orphaned `ensemble_extract` or `facts_to_state` subprocesses keep spending tokens.
+- **Disconnect = implicit abort**: Closing the browser tab or dropping the network mid-run is treated identically to clicking Abort. The UI does NOT auto-reconnect during a running stage (that would silently restart the run). The server kills the process group when the connection drops.
+- **Durable record**: Every run writes `<campaign>/logs/<timestamp>_<script>.md` with the command, full output, outcome (succeeded / failed / aborted), and duration. Recoverable after the browser is closed.
+- **Cache integrity**: Per-chapter `merged.json` and per-entity `state_dossiers/*.md` are written atomically (temp-file + rename). A force-kill during write never leaves a truncated file at the resume-trusted path — the next run either finds a complete artifact (skip) or none (re-run), never a partial one.
+
 ## See also
 
 - [`ensemble_extraction.md`](ensemble_extraction.md) — single-file extraction, merge options, `--samples` and `--dry-run` patterns
diff --git a/docs/web/web_ui.md b/docs/web/web_ui.md
index 9a0c90d..c097ac1 100644
--- a/docs/web/web_ui.md
+++ b/docs/web/web_ui.md
@@ -43,6 +43,16 @@ Users specify **campaign directory** + **session directory** on the Session Conf
 
 **Grounding Docs**: Campaign State, World State, Party Document, Planning Document
 
+**Ensemble** (four-stage extraction + synthesis workflow):
+- **Stage 1 — Extraction**: Runs `ensemble_batch.py` over the selected chapters.
+  - **Command bar**: Shows the exact, secret-free, copyable invocation emitted by the server. Paste it into a workspace terminal to reproduce the run. No API key ever appears in the command.
+  - **Running state**: Button shows "Running…" while the stage is active; live output streams incrementally.
+  - **Abort**: An Abort button appears while running. Clicking it closes the stream — the server group-kills the entire worker tree (including per-chapter `ensemble_extract` subprocesses) via SIGTERM → SIGKILL. A "connection lost" note appears if the browser drops the network mid-run (treated as an implicit abort; no unobserved token spending).
+  - **Finished states**: "Done" (exit 0), "Exit N" (failure), or "Aborted" — each visually distinct.
+  - **Durable record**: Every run writes `<campaign>/logs/<timestamp>_ensemble_batch.md` with the command, full output, result, and duration. Recoverable after closing the browser.
+- **Stage 2 — Fact Bundling**: Human-gated scope review + alias correction + aggregate step. Each sub-run has a running state, abort button, and success/failure/aborted labels.
+- **Stage 3 — Synthesis & Promotion**: Synthesizes draft grounding docs; human reviews diff before promoting.
+
 **Prep**: Session Prep, NPC Table, Query Summaries, Connection Graph
 
 **Setup**: D&D Sheet, Make Tracking
diff --git a/ensemble_merge.py b/ensemble_merge.py
index 262fba6..12930f1 100644
--- a/ensemble_merge.py
+++ b/ensemble_merge.py
@@ -43,6 +43,8 @@
 from difflib import SequenceMatcher
 from pathlib import Path
 
+import campaignlib
+
 
 def _norm_subject(s: str) -> str:
     return re.sub(r"[^a-z0-9]+", "", s.lower())
@@ -360,7 +362,7 @@ def _resolve(cli, key, default):
         f["n_samples"] = len(runs)
         f["passes"] = sorted({p.split("#")[0] for p in runs})
 
-    output_path.write_text(json.dumps(merged, indent=2) + "\n", encoding="utf-8")
+    campaignlib.atomic_write_json(output_path, merged)  # FR-014: atomic publish
 
     counts_by_lens: dict[str, int] = {}
     for key, facts in pass_outputs.items():
diff --git a/facts_to_state.py b/facts_to_state.py
index f2136ec..61e0c4f 100644
--- a/facts_to_state.py
+++ b/facts_to_state.py
@@ -44,6 +44,7 @@
 
 from campaignlib import (
     DEFAULT_MODEL,
+    atomic_write_text,
     load_agent_prompt,
     make_client,
     stream_api,
@@ -358,7 +359,7 @@ def write_dossier(out_dir: Path, b: Bundle, body: str) -> Path:
     fm = (f"---\nname: {b.display}\ntype: {b.type}\n"
           f"n_facts: {len(b.facts)}\nchapters: {lo}-{hi}\n---\n\n")
     dest = dossier_path(out_dir, b)
-    dest.write_text(fm + body.strip() + "\n", encoding="utf-8")
+    atomic_write_text(dest, fm + body.strip() + "\n")  # FR-014: atomic publish
     return dest
 
 
diff --git a/frontend/src/api/sse.ts b/frontend/src/api/sse.ts
index e4ad940..b38a52b 100644
--- a/frontend/src/api/sse.ts
+++ b/frontend/src/api/sse.ts
@@ -5,6 +5,15 @@
 export interface SSECallbacks {
   onData: (text: string) => void
   onDone: (returncode: number, error?: string) => void
+  /** Called with the secret-free, copyable invocation from the `command` event (US1). */
+  onCommand?: (cmd: string) => void
+  /**
+   * Called on connection error. During a running session the caller is
+   * responsible for closing the EventSource — NOT this handler (I1: closing
+   * inside onerror during 'running' would prevent the reconnect-as-abort logic
+   * in useEnsembleRun from running its own teardown). In idle/done state the
+   * caller may still close here via callbacks.onError handling.
+   */
   onError: (err: Event) => void
 }
 
@@ -15,6 +24,12 @@ export function connectSSE(url: string, callbacks: SSECallbacks): EventSource {
     callbacks.onData(JSON.parse(e.data))
   }
 
+  es.addEventListener('command', (e) => {
+    if (callbacks.onCommand) {
+      callbacks.onCommand(JSON.parse((e as MessageEvent).data))
+    }
+  })
+
   es.addEventListener('done', (e) => {
     es.close()
     const data = JSON.parse((e as MessageEvent).data)
@@ -22,7 +37,6 @@ export function connectSSE(url: string, callbacks: SSECallbacks): EventSource {
   })
 
   es.onerror = (e) => {
-    es.close()
     callbacks.onError(e)
   }
 
diff --git a/frontend/src/components/shared/RunCommandBar.vue b/frontend/src/components/shared/RunCommandBar.vue
new file mode 100644
index 0000000..f085c8c
--- /dev/null
+++ b/frontend/src/components/shared/RunCommandBar.vue
@@ -0,0 +1,77 @@
+<script setup lang="ts">
+import { ref } from 'vue'
+
+const props = defineProps<{ command: string }>()
+
+const copied = ref(false)
+
+function copy() {
+  if (!props.command) return
+  navigator.clipboard.writeText(props.command).then(() => {
+    copied.value = true
+    setTimeout(() => { copied.value = false }, 1500)
+  })
+}
+</script>
+
+<template>
+  <div v-if="command" class="command-bar">
+    <span class="label">Command</span>
+    <pre class="command-text">{{ command }}</pre>
+    <button class="copy-btn" :class="{ copied }" @click="copy" title="Copy to clipboard">
+      {{ copied ? 'Copied!' : 'Copy' }}
+    </button>
+  </div>
+  <div v-else class="command-bar command-bar--empty">
+    <span class="label">Command</span>
+    <span class="placeholder">Run a stage to see the exact command.</span>
+  </div>
+</template>
+
+<style scoped>
+.command-bar {
+  display: flex;
+  align-items: flex-start;
+  gap: 8px;
+  background: var(--bg-surface0);
+  border: 1px solid var(--bg-surface1);
+  border-radius: 4px;
+  padding: 8px 10px;
+  margin-bottom: 10px;
+  font-size: 11px;
+}
+.command-bar--empty { opacity: 0.55; }
+.label {
+  color: var(--text-muted);
+  font-size: 10px;
+  font-weight: 700;
+  text-transform: uppercase;
+  letter-spacing: 0.05em;
+  white-space: nowrap;
+  padding-top: 1px;
+  min-width: 58px;
+}
+.command-text {
+  flex: 1;
+  font-family: var(--mono);
+  font-size: 11px;
+  color: var(--text);
+  margin: 0;
+  white-space: pre-wrap;
+  word-break: break-all;
+}
+.placeholder { flex: 1; color: var(--text-muted); font-style: italic; }
+.copy-btn {
+  flex-shrink: 0;
+  font-size: 10px;
+  padding: 2px 8px;
+  border-radius: 3px;
+  border: 1px solid var(--bg-surface1);
+  background: var(--bg-mantle);
+  color: var(--text-muted);
+  cursor: pointer;
+  transition: background 0.1s, color 0.1s;
+}
+.copy-btn:hover { background: var(--bg-surface1); color: var(--text); }
+.copy-btn.copied { background: var(--green); color: var(--bg-mantle); border-color: var(--green); }
+</style>
diff --git a/frontend/src/views/ensemble/EnsembleBundle.vue b/frontend/src/views/ensemble/EnsembleBundle.vue
index b32e731..ce6a0d2 100644
--- a/frontend/src/views/ensemble/EnsembleBundle.vue
+++ b/frontend/src/views/ensemble/EnsembleBundle.vue
@@ -12,6 +12,19 @@ const listRun = useEnsembleRun()
 const aggRun = useEnsembleRun()
 const threadsRun = useEnsembleRun()
 
+function statusLabel(s: string, rc: number | null): string {
+  if (s === 'done') return 'Done'
+  if (s === 'error') return `Exit ${rc}`
+  if (s === 'aborted') return 'Aborted'
+  return ''
+}
+function statusClass(s: string): string {
+  if (s === 'done') return 'ok'
+  if (s === 'error') return 'err'
+  if (s === 'aborted') return 'aborted'
+  return ''
+}
+
 // Gate state — aggregation is blocked until the operator confirms they reviewed
 // scope + aliases (Principle II: no precision decision auto-fed downstream).
 const gateConfirmed = ref(false)
@@ -88,9 +101,16 @@ function runThreads() {
         <code>[location]</code>-scoped — this is a precision decision; you may also
         run <code>facts_to_state.py --list</code> at the CLI.
       </p>
-      <button class="btn-neutral" :disabled="listRun.status.value === 'running'" @click="runList">
-        {{ listRun.status.value === 'running' ? 'Listing…' : 'Run scope list' }}
-      </button>
+      <div class="controls">
+        <button class="btn-neutral" :disabled="listRun.status.value === 'running'" @click="runList">
+          {{ listRun.status.value === 'running' ? 'Listing…' : 'Run scope list' }}
+        </button>
+        <button v-if="listRun.status.value === 'running'" class="btn-warn btn-sm" @click="listRun.abort()">Abort</button>
+        <span v-if="statusLabel(listRun.status.value, listRun.returnCode.value)"
+              :class="statusClass(listRun.status.value)">
+          {{ statusLabel(listRun.status.value, listRun.returnCode.value) }}
+        </span>
+      </div>
       <StreamOutput v-if="listRun.output.value" :text="listRun.output.value" />
     </section>
 
@@ -128,11 +148,16 @@ function runThreads() {
                 @click="runAggregate">
           {{ aggRun.status.value === 'running' ? 'Aggregating…' : '▶ Aggregate' }}
         </button>
-        <span v-if="aggRun.returnCode.value !== null"
-              :class="aggRun.returnCode.value === 0 ? 'ok' : 'err'">
-          {{ aggRun.returnCode.value === 0 ? 'Done' : `Exit ${aggRun.returnCode.value}` }}
+        <button v-if="aggRun.status.value === 'running'" class="btn-warn btn-sm" @click="aggRun.abort()">Abort</button>
+        <span v-if="statusLabel(aggRun.status.value, aggRun.returnCode.value)"
+              :class="statusClass(aggRun.status.value)">
+          {{ statusLabel(aggRun.status.value, aggRun.returnCode.value) }}
         </span>
-        <button class="btn-neutral btn-sm" @click="runThreads">Render threads.md</button>
+        <button class="btn-neutral btn-sm"
+                :disabled="threadsRun.status.value === 'running'"
+                @click="runThreads">
+          {{ threadsRun.status.value === 'running' ? 'Rendering…' : 'Render threads.md' }}
+        </button>
       </div>
       <StreamOutput v-if="aggRun.output.value" :text="aggRun.output.value" />
       <StreamOutput v-if="threadsRun.output.value" :text="threadsRun.output.value" />
@@ -153,4 +178,5 @@ h3 { font-size: 13px; margin-bottom: 4px; }
 .confirm { display: flex; align-items: center; gap: 6px; font-size: 12px; margin-bottom: 6px; }
 .ok { color: var(--green); font-size: 12px; font-weight: 600; }
 .err { color: var(--red); font-size: 12px; font-weight: 600; }
+.aborted { color: var(--peach); font-size: 12px; font-weight: 600; }
 </style>
diff --git a/frontend/src/views/ensemble/EnsembleExtract.vue b/frontend/src/views/ensemble/EnsembleExtract.vue
index b2ad420..e0711d0 100644
--- a/frontend/src/views/ensemble/EnsembleExtract.vue
+++ b/frontend/src/views/ensemble/EnsembleExtract.vue
@@ -3,12 +3,13 @@ import { ref, onMounted, computed } from 'vue'
 import { useConfigStore } from '../../stores/config'
 import { useEnsembleRun, readEnsembleConfig, type EnsembleConfig } from './useEnsembleRun'
 import StreamOutput from '../../components/shared/StreamOutput.vue'
+import RunCommandBar from '../../components/shared/RunCommandBar.vue'
 import ChapterPicker from './ChapterPicker.vue'
 
 const emit = defineEmits<{ changed: [] }>()
 const config = useConfigStore()
 const cfg = ref<EnsembleConfig>(readEnsembleConfig({}))
-const { output, status, returnCode, run, clear } = useEnsembleRun()
+const { output, status, returnCode, command, run, abort, clear } = useEnsembleRun()
 
 onMounted(async () => {
   await config.load()
@@ -55,17 +56,20 @@ function start() {
       @update:glob="persistChapters"
       @update:selected="persistChapters" />
 
+    <RunCommandBar :command="command" />
+
     <div class="controls">
       <button class="btn-success" :disabled="status === 'running' || !canRun" @click="start">
         {{ status === 'running' ? 'Running…'
            : canRun ? `▶ Run extraction (${selectedCount})` : '▶ Run extraction' }}
       </button>
-      <span v-if="!canRun" class="need">Select at least one chapter to run.</span>
-      <span v-if="returnCode !== null" :class="returnCode === 0 ? 'ok' : 'err'">
-        {{ returnCode === 0 ? 'Done' : `Exit ${returnCode}` }}
-      </span>
+      <button v-if="status === 'running'" class="btn-warn btn-sm" @click="abort">Abort</button>
+      <span v-if="!canRun && status !== 'running'" class="need">Select at least one chapter to run.</span>
+      <span v-if="status === 'done'" class="ok">Done</span>
+      <span v-else-if="status === 'error'" class="err">Exit {{ returnCode }}</span>
+      <span v-else-if="status === 'aborted'" class="aborted">Aborted</span>
       <span style="flex:1"></span>
-      <button v-if="output" class="btn-neutral btn-sm" @click="clear">Clear</button>
+      <button v-if="output && status !== 'running'" class="btn-neutral btn-sm" @click="clear">Clear</button>
     </div>
     <StreamOutput v-if="output" :text="output" />
   </div>
@@ -78,5 +82,6 @@ h2 { font-size: 16px; margin-bottom: 6px; }
 .controls { display: flex; align-items: center; gap: 10px; margin-bottom: 10px; }
 .ok { color: var(--green); font-size: 12px; font-weight: 600; }
 .err { color: var(--red); font-size: 12px; font-weight: 600; }
+.aborted { color: var(--peach); font-size: 12px; font-weight: 600; }
 .need { color: var(--peach); font-size: 12px; }
 </style>
diff --git a/frontend/src/views/ensemble/EnsembleSynthesize.vue b/frontend/src/views/ensemble/EnsembleSynthesize.vue
index 9c3c3a3..891b2bb 100644
--- a/frontend/src/views/ensemble/EnsembleSynthesize.vue
+++ b/frontend/src/views/ensemble/EnsembleSynthesize.vue
@@ -10,6 +10,19 @@ const config = useConfigStore()
 const cfg = ref<EnsembleConfig>(readEnsembleConfig({}))
 const run = useEnsembleRun()
 
+function statusLabel(s: string, rc: number | null): string {
+  if (s === 'done') return 'Draft written'
+  if (s === 'error') return `Exit ${rc}`
+  if (s === 'aborted') return 'Aborted'
+  return ''
+}
+function statusClass(s: string): string {
+  if (s === 'done') return 'ok'
+  if (s === 'error') return 'err'
+  if (s === 'aborted') return 'aborted'
+  return ''
+}
+
 const DOCS = [
   { id: 'world_state', label: 'World State' },
   { id: 'campaign_state', label: 'Campaign State' },
@@ -64,9 +77,10 @@ async function promote(doc: string) {
       <button class="btn-success" :disabled="run.status.value === 'running'" @click="synthesize">
         {{ run.status.value === 'running' ? 'Synthesizing…' : '▶ Synthesize draft' }}
       </button>
-      <span v-if="run.returnCode.value !== null"
-            :class="run.returnCode.value === 0 ? 'ok' : 'err'">
-        {{ run.returnCode.value === 0 ? 'Draft written' : `Exit ${run.returnCode.value}` }}
+      <button v-if="run.status.value === 'running'" class="btn-warn btn-sm" @click="run.abort()">Abort</button>
+      <span v-if="statusLabel(run.status.value, run.returnCode.value)"
+            :class="statusClass(run.status.value)">
+        {{ statusLabel(run.status.value, run.returnCode.value) }}
       </span>
     </div>
     <StreamOutput v-if="run.output.value" :text="run.output.value" />
@@ -94,6 +108,7 @@ h3 { font-size: 13px; margin: 16px 0 6px; }
 select { font-size: 12px; padding: 5px 7px; background: var(--bg-surface0); color: var(--text); border: 1px solid var(--bg-surface1); border-radius: 4px; }
 .ok { color: var(--green); font-size: 12px; font-weight: 600; }
 .err { color: var(--red); font-size: 12px; font-weight: 600; }
+.aborted { color: var(--peach); font-size: 12px; font-weight: 600; }
 .promote-tbl td { padding: 4px 10px 4px 0; font-size: 12px; }
 .diff { background: #141420; border: 1px solid var(--bg-surface0); border-radius: 4px; padding: 8px 10px; font-family: var(--mono); font-size: 11px; white-space: pre-wrap; max-height: 300px; overflow-y: auto; }
 </style>
diff --git a/frontend/src/views/ensemble/useEnsembleRun.ts b/frontend/src/views/ensemble/useEnsembleRun.ts
index 24e79df..77a3c14 100644
--- a/frontend/src/views/ensemble/useEnsembleRun.ts
+++ b/frontend/src/views/ensemble/useEnsembleRun.ts
@@ -6,8 +6,13 @@ import { connectSSE } from '../../api/sse'
  *  don't need it. */
 export function useEnsembleRun() {
   const output = ref('')
-  const status = ref<'idle' | 'running' | 'done' | 'error'>('idle')
+  const status = ref<'idle' | 'running' | 'done' | 'error' | 'aborted'>('idle')
   const returnCode = ref<number | null>(null)
+  /** Secret-free, copyable invocation from the server's `command` SSE event (US1). */
+  const command = ref<string>('')
+
+  // Private EventSource handle — kept so abort() can close it.
+  let _es: EventSource | null = null
 
   function buildUrl(endpoint: string, params: Record<string, any>): string {
     const url = new URL(endpoint, window.location.origin)
@@ -29,24 +34,58 @@ export function useEnsembleRun() {
     status.value = 'running'
     output.value = ''
     returnCode.value = null
-    connectSSE(buildUrl(endpoint, params), {
+    command.value = ''
+    _es = connectSSE(buildUrl(endpoint, params), {
+      onCommand(cmd) { command.value = cmd },
       onData(t) { output.value += t },
-      onDone(rc) {
+      onDone(rc, error) {
+        _es = null
         status.value = rc === 0 ? 'done' : 'error'
         returnCode.value = rc
+        // Surface precondition refusals (FR-011): done.error carries the message.
+        if (error && !output.value.includes(error)) {
+          output.value += `\nError: ${error}\n`
+        }
         if (onDone) onDone(rc)
       },
-      onError() { status.value = 'error' },
+      onError(_e) {
+        // I1: onerror during 'running' = network drop / disconnect.
+        // Close the EventSource explicitly — prevents automatic reconnect which
+        // would re-issue the GET and silently restart the run (metered calls!).
+        // Treat as implicit abort; the server group-kills the process tree.
+        if (status.value === 'running') {
+          _es?.close()
+          _es = null
+          status.value = 'aborted'
+          output.value += '\n[connection lost — run stopped]\n'
+        } else {
+          // Not running (e.g. initial connection failure) — just error out.
+          _es?.close()
+          _es = null
+          status.value = 'error'
+        }
+      },
     })
   }
 
+  /** Close the EventSource and mark status as aborted (valid only from 'running').
+   *  The server observes the connection drop and group-kills the worker tree.
+   */
+  function abort() {
+    if (status.value !== 'running') return
+    _es?.close()
+    _es = null
+    status.value = 'aborted'
+  }
+
   function clear() {
     output.value = ''
     status.value = 'idle'
     returnCode.value = null
+    command.value = ''
   }
 
-  return { output, status, returnCode, run, clear }
+  return { output, status, returnCode, command, run, abort, clear }
 }
 
 export interface BackendProfile {
diff --git a/server/routers/ensemble.py b/server/routers/ensemble.py
index 4837590..7c609cb 100644
--- a/server/routers/ensemble.py
+++ b/server/routers/ensemble.py
@@ -128,6 +128,9 @@ def _run_locked(stage: str, cmd: list[str], env_extra: dict[str, str] | None = N
     _RUNNING.add(key)
 
     def _release(_rc):
+        # T023: stream_subprocess calls on_complete from its finally block on
+        # every exit path (normal, explicit abort, or disconnect). The lock is
+        # therefore always released — no run can get stuck "running" after abort.
         _RUNNING.discard(key)
 
     async def _gen():
diff --git a/server/subprocess_runner.py b/server/subprocess_runner.py
index 177f830..7c04cfa 100644
--- a/server/subprocess_runner.py
+++ b/server/subprocess_runner.py
@@ -1,14 +1,48 @@
-"""Async subprocess runner with SSE streaming output."""
+"""Async subprocess runner with SSE streaming output.
+
+Shared seam used by ALL SSE routes (ensemble, grounding, prep, session_workflow,
+scene_editor, …). The termination behaviour added here (T020–T021: start_new_session
++ group-kill on disconnect) is intentionally global — no route should leak a runaway
+subprocess when the client disconnects. Non-ensemble routes' request/response shapes
+are unchanged; they additionally gain disconnect-driven cleanup for free.
+See plan.md "Constraints / Shared-seam blast radius (I2)" and tests/test_subprocess_abort.py
+for regression coverage.
+"""
 
 import asyncio
 import json
 import os
+import signal
 import sys
 import time
 from collections.abc import AsyncGenerator, Callable
 from datetime import datetime
 from pathlib import Path
 
+GRACE_SECONDS = 4.0  # SIGTERM grace window before SIGKILL (FR-008)
+
+
+def classify_result(returncode: int | None) -> str:
+    """Map a subprocess returncode to a run outcome string (R5, data-model.md).
+
+    - ``None`` or negative (signal) → ``"aborted"``
+    - ``0``                         → ``"succeeded"``
+    - positive non-zero             → ``"failed"``
+    """
+    if returncode is None or returncode < 0:
+        return "aborted"
+    if returncode == 0:
+        return "succeeded"
+    return "failed"
+
+
+def _killpg_safe(pgid: int, sig: int) -> None:
+    """Send sig to process group pgid, silently ignoring ProcessLookupError."""
+    try:
+        os.killpg(pgid, sig)
+    except (ProcessLookupError, PermissionError):
+        pass
+
 
 def _log_stem(cmd: list[str]) -> str:
     """Derive a filename stem from the script being run."""
@@ -19,12 +53,13 @@ def _log_stem(cmd: list[str]) -> str:
 
 
 def _save_run_log(cmd: list[str], cwd: str | None, output: str,
-                  returncode: int | None, duration: float) -> None:
-    """Persist the run to `logs/` so it survives the SSE buffer.
+                  returncode: int | None, result: str, duration: float) -> None:
+    """Persist the run to `logs/` so it survives the SSE buffer (FR-007, SC-006).
 
-    Mirrors the format of `campaignlib.save_log` — markdown sections, one
-    file per run with a timestamped filename. Failures here are silent;
-    logging is best-effort and must not break the running subprocess.
+    Mirrors the format of `campaignlib.save_log` — markdown sections, one file
+    per run with a timestamped filename. Failures here are silent; logging is
+    best-effort and must not break the running subprocess. Runs on every exit
+    path (normal, abort, disconnect) via the finally in stream_subprocess.
     """
     try:
         log_dir = Path(cwd or os.getcwd()) / "logs"
@@ -36,6 +71,7 @@ def _save_run_log(cmd: list[str], cwd: str | None, output: str,
             f"# Subprocess run — {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n\n"
             f"## Command\n\n```\n{cmd_block}\n```\n\n"
             f"## Result\n\n"
+            f"- result: `{result}`\n"
             f"- returncode: `{returncode}`\n"
             f"- duration: `{duration:.2f}s`\n"
             f"- cwd: `{cwd or os.getcwd()}`\n\n"
@@ -55,23 +91,33 @@ async def stream_subprocess(
 ) -> AsyncGenerator[str, None]:
     """Run a subprocess and yield Server-Sent Events as output arrives.
 
-    Yields SSE-formatted strings:
-      - ``data: "text chunk"\\n\\n`` for stdout/stderr output
-      - ``event: done\\ndata: {"returncode": N}\\n\\n`` when the process exits
+    Yields SSE-formatted strings (in order):
+      - ``event: command\\ndata: "<invocation>"\\n\\n`` — distinct named event
+        carrying the secret-free, copyable invocation (US1, FR-001/002/003)
+      - ``data: "$ <invocation>\\n\\n"\\n\\n`` — legacy inline chunk (back-compat)
+      - ``data: "text chunk"\\n\\n`` — stdout/stderr as produced (US2, FR-004)
+      - ``event: done\\ndata: {...}\\n\\n`` — terminal event (US3, FR-006)
 
     `env_extra` is merged on top of the inherited environment after
-    ``PYTHONUNBUFFERED``. Used to inject per-route LLM backend env
-    (``DGX_ENDPOINT`` / ``DGX_MODEL``) without leaking it into routes that
-    must stay on the default Anthropic path.
-
-    `on_complete`, if provided, fires once with the returncode after
-    ``proc.wait()`` returns but before the SSE ``done`` event is sent.
-    Exceptions are swallowed so a faulty hook can never break the stream.
-    Used by the editor routes to append a row to ``activity.jsonl``.
-
-    On exit, writes a per-run log file under `<cwd>/logs/` capturing the
-    command line, returncode, duration, and full output so failed runs can
-    be reproduced after the browser session is closed.
+    ``PYTHONUNBUFFERED``. Secrets (API keys) are inherited from the server
+    environment, never on the command line — so cmd_display is secret-safe.
+
+    `on_complete`, if provided, fires once with the returncode (or None on
+    abort) from the finally block — always fires on every exit path so that
+    callers (e.g. ensemble.py's _RUNNING lock release) are never orphaned.
+
+    On exit (normal, explicit-abort, or disconnect), writes a per-run log
+    under ``<cwd>/logs/`` capturing the command, full output, result, and
+    duration — survives browser close (FR-007, SC-006).
+
+    Termination (US4, R1):
+    Subprocess is launched in its own session (``start_new_session=True``) so
+    the whole worker tree is signalable as a group. When the client disconnects
+    (or calls es.close()), Starlette cancels the response task via anyio's
+    cancel scope, which propagates as CancelledError into this generator.  The
+    finally block sends SIGTERM to the process group and schedules SIGKILL via
+    loop.call_later (avoiding any await in the cancelled context, where any await
+    would re-raise CancelledError immediately, per Starlette's anyio cancel scope).
     """
     env = {**os.environ, "PYTHONUNBUFFERED": "1"}
     if env_extra:
@@ -80,44 +126,83 @@ async def stream_subprocess(
     env_prefix = " \\\n  ".join(f"{k}={v}" for k, v in (env_extra or {}).items())
     cmd_parts = ([env_prefix] if env_prefix else []) + list(cmd)
     cmd_display = " \\\n  ".join(cmd_parts)
-    yield f"data: {json.dumps(f'$ {cmd_display}\\n\\n')}\n\n"
 
-    proc = await asyncio.create_subprocess_exec(
-        *cmd,
-        stdout=asyncio.subprocess.PIPE,
-        stderr=asyncio.subprocess.STDOUT,
-        cwd=cwd,
-        env=env,
-    )
-
-    assert proc.stdout is not None
+    # proc is initialised here so the finally can reference it even if aclose()
+    # is called before the subprocess starts (e.g. during the command yields).
+    proc: asyncio.subprocess.Process | None = None
     buf = ""
     captured: list[str] = []
     started = time.monotonic()
-    while True:
-        chunk = await proc.stdout.read(64)
-        if not chunk:
-            break
-        buf += chunk.decode("utf-8", errors="replace")
-        if len(buf) >= 20 or "\n" in buf:
+
+    try:
+        # US1: distinct named event for copyable command (FR-001/002/003) — FIRST
+        # These yields are inside the try so that aclose() before subprocess start
+        # still triggers the finally (on_complete / log write).
+        yield f"event: command\ndata: {json.dumps(cmd_display)}\n\n"
+        # Back-compat inline chunk (clients ignoring the command event still see it)
+        yield f"data: {json.dumps(f'$ {cmd_display}\\n\\n')}\n\n"
+
+        proc = await asyncio.create_subprocess_exec(
+            *cmd,
+            stdout=asyncio.subprocess.PIPE,
+            stderr=asyncio.subprocess.STDOUT,
+            cwd=cwd,
+            env=env,
+            start_new_session=True,  # own process group → killpg kills child workers (R1)
+        )
+
+        assert proc.stdout is not None
+        while True:
+            chunk = await proc.stdout.read(64)
+            if not chunk:
+                break
+            buf += chunk.decode("utf-8", errors="replace")
+            if len(buf) >= 20 or "\n" in buf:
+                captured.append(buf)
+                yield f"data: {json.dumps(buf)}\n\n"
+                buf = ""
+
+        if buf:
             captured.append(buf)
             yield f"data: {json.dumps(buf)}\n\n"
-            buf = ""
-
-    if buf:
-        captured.append(buf)
-        yield f"data: {json.dumps(buf)}\n\n"
-
-    await proc.wait()
-    _save_run_log(cmd, cwd, "".join(captured), proc.returncode,
-                  time.monotonic() - started)
-    if on_complete is not None:
-        try:
-            on_complete(proc.returncode)
-        except Exception:
-            # Activity recording is opportunistic — never break the stream.
-            pass
-    yield f"event: done\ndata: {json.dumps({'returncode': proc.returncode})}\n\n"
+
+        await proc.wait()
+
+    finally:
+        # Fires on: normal exit, explicit abort (es.close()), browser disconnect.
+        # Guard on proc/returncode to avoid signaling an already-exited process
+        # or one that was never started (aclose before proc was created).
+        if proc is not None and proc.returncode is None:
+            try:
+                pgid = os.getpgid(proc.pid)
+                _killpg_safe(pgid, signal.SIGTERM)
+                # SIGKILL via call_later — do NOT await here. Starlette delivers
+                # disconnect as anyio cancel-scope cancellation, which makes any
+                # await in this finally re-raise CancelledError immediately.
+                # call_later schedules from the event loop after finally exits,
+                # guaranteeing bounded stop within GRACE_SECONDS (FR-008).
+                loop = asyncio.get_running_loop()
+                loop.call_later(GRACE_SECONDS, _killpg_safe, pgid, signal.SIGKILL)
+            except ProcessLookupError:
+                pass  # already exited between the returncode check and getpgid
+
+        returncode = proc.returncode if proc is not None else None
+        result = classify_result(returncode)
+        _save_run_log(cmd, cwd, "".join(captured), returncode, result,
+                      time.monotonic() - started)
+        if on_complete is not None:
+            try:
+                on_complete(returncode)
+            except Exception:
+                pass  # activity recording is opportunistic — never break the stream
+
+    # Only reached on normal completion (abort/disconnect exits via exception propagation)
+    if proc is not None:
+        result = classify_result(proc.returncode)
+        done_payload: dict[str, object] = {"returncode": proc.returncode}
+        if result == "aborted":
+            done_payload["aborted"] = True
+        yield f"event: done\ndata: {json.dumps(done_payload)}\n\n"
 
 
 async def sse_error_stream(message: str, returncode: int = 1) -> AsyncGenerator[str, None]:
diff --git a/specs/002-ensemble-run-observability/checklists/requirements.md b/specs/002-ensemble-run-observability/checklists/requirements.md
new file mode 100644
index 0000000..58ebf74
--- /dev/null
+++ b/specs/002-ensemble-run-observability/checklists/requirements.md
@@ -0,0 +1,36 @@
+# Specification Quality Checklist: Ensemble Run Observability
+
+**Purpose**: Validate specification completeness and quality before proceeding to planning
+**Created**: 2026-06-28
+**Feature**: [spec.md](../spec.md)
+
+## Content Quality
+
+- [x] No implementation details (languages, frameworks, APIs)
+- [x] Focused on user value and business needs
+- [x] Written for non-technical stakeholders
+- [x] All mandatory sections completed
+
+## Requirement Completeness
+
+- [x] No [NEEDS CLARIFICATION] markers remain
+- [x] Requirements are testable and unambiguous
+- [x] Success criteria are measurable
+- [x] Success criteria are technology-agnostic (no implementation details)
+- [x] All acceptance scenarios are defined
+- [x] Edge cases are identified
+- [x] Scope is clearly bounded
+- [x] Dependencies and assumptions identified
+
+## Feature Readiness
+
+- [x] All functional requirements have clear acceptance criteria
+- [x] User scenarios cover primary flows
+- [x] Feature meets measurable outcomes defined in Success Criteria
+- [x] No implementation details leak into specification
+
+## Notes
+
+- Items marked incomplete require spec updates before `/speckit-clarify` or `/speckit-plan`
+- All items pass. Four observability needs from the request map 1:1 to user stories US1–US4.
+- The single scope ambiguity in the request — "an extraction" vs. all command-running ensemble stages — is resolved by an informed guess (capability applies to every command-running stage) and documented in Assumptions, so no [NEEDS CLARIFICATION] marker was needed. Reconsider during `/speckit-clarify` if the operator wants extraction-only.
diff --git a/specs/002-ensemble-run-observability/contracts/run-stream.md b/specs/002-ensemble-run-observability/contracts/run-stream.md
new file mode 100644
index 0000000..c6b3fa0
--- /dev/null
+++ b/specs/002-ensemble-run-observability/contracts/run-stream.md
@@ -0,0 +1,64 @@
+# Contract: Ensemble Run Stream + Abort
+
+Applies to all streaming run endpoints under `GET /api/ensemble/run/*`
+(`extract`, `bundle`, `synthesize`, `threads`, `recent-events`, `bundle-list`).
+Transport: Server-Sent Events (`text/event-stream`) over an `EventSource`.
+
+## Request
+
+Unchanged from today: `GET /api/ensemble/run/<stage>?<params>`. Params carry the
+explicit input selection and backend/model (Principle X / FR-012). No request body.
+
+## Response: SSE event stream
+
+Events are emitted in this order. **Bold** = new or changed by this feature.
+
+| # | Event | `data` payload | Meaning |
+|---|---|---|---|
+| 1 | **`command`** | JSON string: the secret-free, copyable invocation (env prefix + `python … --flags`) | Emitted once, first. The reproducible command (US1, FR-001/002/003). |
+| 2 | `data` (default) | JSON string: an output chunk | Live stdout/stderr as produced (US2, FR-004). May repeat many times. A precondition failure emits a single readable `data` line here (FR-011). |
+| 3 | `done` | JSON `{ "returncode": N, "error"?: "...", **"aborted"?: true** }` | Terminal. `returncode==0` → success; `>0` → failure; **`aborted:true` or `returncode<0` → aborted** (FR-006/009, SC-004). |
+
+Notes:
+- The legacy inline `$ <cmd>` first **`data`** chunk MAY be retained for backward
+  compatibility, but the authoritative copyable command is the **`command`** event.
+- `done.error` carries the human-readable reason for a precondition refusal
+  (e.g. "No chapters selected …"), surfaced to the operator verbatim (FR-011).
+
+## Abort (FR-008) and disconnect (FR-013)
+
+There is **no separate abort endpoint**. Abort is performed by the client
+**closing the stream connection**:
+
+1. Frontend `abort()` calls `EventSource.close()` and sets local `status = aborted`.
+2. The server observes the dropped connection as a cancellation of the streaming
+   generator.
+3. In the generator's `finally`, the server terminates the run's **process group**:
+   `SIGTERM` → wait grace window (~3–5 s) → `SIGKILL` if still alive (FR-008).
+4. The same `finally` releases the per-stage `_RUNNING` lock and writes the durable
+   run record with `result: aborted` (FR-007, FR-009).
+
+A lost tab / navigation / network drop is identical to step 2 onward — it is an
+**implicit abort** (FR-013). The operator never has an unobserved run still burning
+tokens.
+
+### Termination guarantees
+
+- **Process-group kill**: child workers (e.g. `ensemble_batch.py`'s per-chapter
+  `ensemble_extract` subprocesses) are launched in the run's session/process group
+  and die with it. No orphaned token-spending workers.
+- **Bounded stop**: force-kill after the grace window guarantees exit within a few
+  seconds (SC-005).
+- **Cache integrity**: any in-flight cache unit is published atomically
+  (temp + `os.replace`), so an abort/force-kill never leaves a partial file the
+  resume check trusts (FR-014). Completed units survive; the interrupted unit is
+  recomputed on re-run (FR-010).
+
+## Backward compatibility
+
+- Non-ensemble run routes and the `/grounding` path are untouched. The
+  termination/`command`-event changes live in the **shared** `subprocess_runner`,
+  so other SSE routes inherit disconnect-driven cleanup for free, but their
+  request/response shapes do not otherwise change.
+- A client that ignores the `command` event still receives identical `data`/`done`
+  events.
diff --git a/specs/002-ensemble-run-observability/data-model.md b/specs/002-ensemble-run-observability/data-model.md
new file mode 100644
index 0000000..24e636b
--- /dev/null
+++ b/specs/002-ensemble-run-observability/data-model.md
@@ -0,0 +1,58 @@
+# Phase 1 Data Model: Ensemble Run Observability
+
+This feature adds no database and no persistent schema beyond a file. The "data" is (a) the **Run record** persisted to disk and (b) the **SSE stream protocol** the run emits. Both are described here.
+
+## Entity: Run record (persisted)
+
+One file per run, written under `<campaign>/logs/<YYYY-MM-DD_HHMMSS>_<script>.md` by `subprocess_runner._save_run_log`. It is the durable, CLI/Claude-visible truth of "what happened" (Principle VIII).
+
+| Field | Type | Source | Notes |
+|---|---|---|---|
+| `command` | string (multiline) | the launched `cmd` + non-secret env prefix | Secret-free (R4). The reproducible invocation. |
+| `result` | enum: `succeeded` \| `failed` \| `aborted` | derived from returncode | `0` → succeeded; positive non-zero → failed; negative (signal) → aborted (R5). |
+| `returncode` | int \| null | `proc.returncode` | Raw exit code (incl. negative signal codes). |
+| `duration` | float (seconds) | `time.monotonic()` delta | Wall time from launch to exit/abort. |
+| `cwd` | string (path) | `Path.cwd()` | The campaign workspace the run targeted. |
+| `output` | string (multiline) | full captured stdout/stderr | Not truncated (FR-007, R6). |
+| `timestamp` | datetime | filename + body header | When the run started. |
+
+**Lifecycle**: created exactly once, when the run reaches a terminal state (natural exit, explicit abort, or disconnect-abort). There is no update-in-place; a re-run writes a new timestamped file.
+
+**Validation / invariants**:
+- MUST NOT contain any API key or secret value, in `command` or `output` (SC-002). Guaranteed because secrets are inherited env, never on the command line or echoed.
+- MUST be written on *every* terminal path, including abort (enforced via the `finally` in `stream_subprocess`).
+- `result` MUST distinguish all three outcomes (SC-004).
+
+## Entity: Cache unit (existing, hardened)
+
+Not new, but its write contract is tightened by FR-014. A *cache unit* is any artifact whose mere existence the resume/skip logic trusts to mean "this work is done":
+
+| Cache unit | Trusted-by | Write site (to make atomic) |
+|---|---|---|
+| `docs/ensemble/per_chapter/<stem>/merged.json` | `ensemble_batch.py` `merged.exists()` skip check | the extract/merge worker that produces `workdir/merged.json` |
+| `docs/ensemble/state_dossiers/<type>_<slug>.md` | `facts_to_state.py` `dossier_path(...).exists()` | `facts_to_state.py:write_dossier` |
+
+**Invariant (new, FR-014)**: a cache unit MUST be published to its trusted path **atomically** (temp file in the same directory, then `os.replace`). At no instant may a partially-written file exist at the trusted path. An interrupted unit leaves at most a discardable temp file and is recomputed on re-run.
+
+## Entity: Run stream (transient, SSE)
+
+The live protocol between a `/api/ensemble/run/*` endpoint and the browser. See `contracts/run-stream.md` for the wire format. Logical states observed by the frontend `useEnsembleRun`:
+
+```
+idle ──run()──▶ running ──┬── done(rc=0) ────────▶ done
+                          ├── done(rc>0) ────────▶ error
+                          ├── done(aborted|rc<0) ▶ aborted
+                          └── abort()/disconnect ▶ aborted   (client closes stream;
+                                                              server group-kills in finally)
+```
+
+| Field (frontend state) | Type | Notes |
+|---|---|---|
+| `command` | string | populated from the `command` SSE event (R4); copyable. |
+| `output` | string | accumulated `data` chunks. |
+| `status` | `idle`\|`running`\|`done`\|`error`\|`aborted` | adds `aborted` to today's set. |
+| `returnCode` | int \| null | from the `done` event when present. |
+
+**Transitions of note**:
+- `abort()` is only valid from `running`; it closes the `EventSource` and sets `status=aborted` without waiting for a `done` event (the server may already be gone).
+- A precondition failure (empty selection, etc.) arrives as a `data` line + a non-zero `done` with an `error` field (existing `sse_error_stream`), landing in `error` with a readable reason (FR-011) — never silent.
diff --git a/specs/002-ensemble-run-observability/plan.md b/specs/002-ensemble-run-observability/plan.md
new file mode 100644
index 0000000..e0107fa
--- /dev/null
+++ b/specs/002-ensemble-run-observability/plan.md
@@ -0,0 +1,104 @@
+# Implementation Plan: Ensemble Run Observability
+
+**Branch**: `002-ensemble-run-observability` | **Date**: 2026-06-28 | **Spec**: [spec.md](./spec.md)
+
+**Input**: Feature specification from `/specs/002-ensemble-run-observability/spec.md`
+
+## Summary
+
+Make an ensemble-stage run a first-class, observable, controllable thing in the `/ensemble` UI. The operator must (1) see the **exact, copyable, reproducible command** that ran (secrets omitted), (2) **watch its output stream live**, (3) get an **unambiguous finished/failed/aborted result** plus a durable on-disk record, and (4) **abort** a run — where abort is graceful-then-force, kills the whole worker process group, and a lost connection counts as an implicit abort.
+
+Technical approach: the streaming/observability plumbing already exists in `server/subprocess_runner.py` and the `/api/ensemble/run/*` routes (command is echoed, stdout streamed, a per-run log written). The gaps are (a) **abort/disconnect termination** — `stream_subprocess` never watches for client disconnect and never terminates the child, and child *worker* processes (e.g. `ensemble_batch.py`'s `ThreadPoolExecutor` → `ensemble_extract.py`) are not in a killable group; (b) **a distinct copyable command surface** rather than an inline `$ …` line; (c) an explicit **aborted** state distinct from failure in the UI and the persisted record; and (d) **atomic per-unit cache writes** so a force-kill can never leave a truncated file the resume check trusts. All engine-side correctness work (process-group kill, atomic writes) lands in the CLI/seam layer; the router stays logic-free.
+
+## Technical Context
+
+**Language/Version**: Python 3.11+ (FastAPI backend, CLI engine); TypeScript / Vue 3 (frontend)
+
+**Primary Dependencies**: FastAPI + Starlette `StreamingResponse` (SSE), `asyncio` subprocess, `os` process-group signals; Vue 3 + Pinia + Vue Router; browser `EventSource`. No new third-party dependency.
+
+**Storage**: Files on disk only — per-run logs under `<campaign>/logs/`, cache artifacts under `docs/ensemble/` (`per_chapter/*/merged.json`, `state_dossiers/*.md`). No database.
+
+**Testing**: `pytest` (`tests/`, alongside existing `test_ensemble_*.py`); new `tests/test_subprocess_abort.py` for termination/atomicity. Frontend: manual via `quickstart.md` (no FE test harness in repo).
+
+**Target Platform**: Linux / WSL2, single local operator, no auth. Server is `uvicorn` behind the `startup` script.
+
+**Project Type**: Web (FastAPI backend + Vue frontend) over a CLI engine — the established CG shape (Constitution Principle VI).
+
+**Performance Goals**: Abort bounded to the graceful grace window + force-kill (~3–5 s, SC-005). Live output visible within a few seconds of being produced (SC-003). No new long-lived process or daemon.
+
+**Constraints**: Must not change what each stage *computes* (spec Assumption). Must not break the existing `/grounding` (Anthropic) path or non-ensemble run routes — their request/response shapes are unchanged; they additionally gain disconnect-driven cleanup for free via the shared seam (I2 / Shared-seam blast radius). No secrets in the command, live output, or persisted record (SC-002). One run at a time per stage (existing `_RUNNING` lock).
+
+**Scale/Scope**: One operator, one campaign at a time; a handful of stages; one in-flight run per stage. ~4 ensemble run endpoints already exist; this feature touches the shared runner + the frontend run composable, so it covers all of them uniformly (spec "Scope across stages").
+
+## Constitution Check
+
+*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
+
+| Principle | Verdict | Notes |
+|---|---|---|
+| I. Disk is Truth, Model is Draft | **PASS / reinforces** | The durable run record and atomic cache writes make disk the trustworthy truth; the index (resume check) can never be corrupted by a partial write. |
+| II. Human Checkpoint | **PASS (N/A)** | No new LLM call; no precision decision is automated. Abort *adds* operator control. |
+| III. Retrieval/Render Separated | **PASS** | No `retrieve`/`stream_api`/`call_api` added to any router or runner function. `tests/test_retrieve_render_isolation.py` stays green. |
+| IV. Verbatim is Sacred | **PASS (N/A)** | No transcript/quote handling touched. |
+| V. One Seam per Boundary | **PASS** | Process control lives in the one subprocess seam (`subprocess_runner.py`); LLM backend selection still flows through the existing `_llm_env` → `campaignlib` path. No new `import anthropic`. |
+| VI. CLI is Engine, UI is a Face | **PASS / reinforces** | Engine correctness (process-group kill, atomic per-unit writes) lands in the CLI scripts + the shared runner, **not** reimplemented in the router or browser. The copyable command is literally the CLI invocation. |
+| VII. Extract Once, Synthesize Deliberately | **PASS (N/A)** | Pass structure unchanged. |
+| VIII. State is Discoverable | **PASS / reinforces** | A run's command, output, and outcome (incl. aborted) become a discoverable file under `logs/`, not browser-only state. This is the principle's exact intent. |
+| IX. UI Mechanizes; Claude Converses | **PASS / reinforces** | The reproducible, copyable command is the escape hatch that lets the operator drop to the CLI and lose nothing — directly the anti-"walled garden" guarantee. |
+| X. Selection is Explicit | **PASS** | The displayed command/record reflects the explicitly selected inputs (FR-012); the existing empty-selection refusal in `run_extract` is preserved. |
+
+**Gate result: PASS.** No violations; several principles are actively reinforced. No entries in Complexity Tracking.
+
+## Project Structure
+
+### Documentation (this feature)
+
+```text
+specs/002-ensemble-run-observability/
+├── plan.md              # This file
+├── research.md          # Phase 0 — termination model, transport, atomicity, secret-safety
+├── data-model.md        # Phase 1 — Run record + SSE event protocol entities
+├── quickstart.md        # Phase 1 — runnable validation scenarios (US1–US4)
+├── contracts/
+│   └── run-stream.md     # SSE stream + abort contract for /api/ensemble/run/*
+└── tasks.md             # Phase 2 (/speckit-tasks — NOT created here)
+```
+
+### Source Code (repository root)
+
+```text
+server/
+├── subprocess_runner.py        # CHANGED: disconnect-aware termination; process-group
+│                               #          launch; graceful→force kill; command event;
+│                               #          aborted outcome in log + done event
+└── routers/
+    └── ensemble.py             # CHANGED: thread the run id / command event through
+                                #          _run_locked; release lock on abort path
+
+ensemble_batch.py               # CHANGED (FR-014): atomic per-chapter merged.json write
+                                #          in the extract/merge worker path
+facts_to_state.py               # CHANGED (FR-014): atomic write_dossier (temp + os.replace)
+campaignlib.py                  # MAYBE: a shared atomic_write_text/atomic_write_json helper
+                                #          (single home for the temp-then-rename idiom)
+
+frontend/src/
+├── api/sse.ts                  # CHANGED: surface a `command` event; expose abort (close)
+├── views/ensemble/
+│   ├── useEnsembleRun.ts        # CHANGED: track command + 'aborted' status; abort()
+│   ├── EnsembleExtract.vue      # CHANGED: copyable command box + Abort button
+│   ├── EnsembleBundle.vue       # CHANGED: same shared run controls
+│   └── EnsembleSynthesize.vue   # CHANGED: same shared run controls
+└── components/shared/
+    ├── StreamOutput.vue         # reused as-is (scrollable long output, Edge Case)
+    └── RunCommandBar.vue        # NEW (optional): command + status + abort, shared by stages
+
+tests/
+└── test_subprocess_abort.py    # NEW: group-kill, grace→force timing, atomic-write,
+                                #      lock-release-on-abort, no-secret-in-record
+```
+
+**Structure Decision**: Web-over-CLI (the existing CG layout). Correctness changes are concentrated in the **one subprocess seam** (`subprocess_runner.py`) and the **CLI engine** (`ensemble_batch.py`, `facts_to_state.py`, optional `campaignlib.py` helper); the router and frontend only carry the new command/abort/aborted signals through. No new top-level modules or services.
+
+## Complexity Tracking
+
+> No constitution violations — section intentionally empty.
diff --git a/specs/002-ensemble-run-observability/quickstart.md b/specs/002-ensemble-run-observability/quickstart.md
new file mode 100644
index 0000000..d0a6dc7
--- /dev/null
+++ b/specs/002-ensemble-run-observability/quickstart.md
@@ -0,0 +1,83 @@
+# Quickstart / Validation: Ensemble Run Observability
+
+Runnable scenarios that prove the feature works end-to-end. Each maps to a user
+story and its success criteria. Implementation details live in `tasks.md`; this is
+the validation guide.
+
+## Prerequisites
+
+- A campaign workspace with chapter files (e.g. `docs/chapters/chapter_*.md`).
+- The web UI running: from the campaign workspace, `./startup` (builds the frontend
+  and starts the FastAPI server), then open the `/ensemble` page.
+- For a metered backend test (optional): `OPENROUTER_API_KEY` set in the **server**
+  environment (never passed through the UI).
+- Contracts: see [`contracts/run-stream.md`](./contracts/run-stream.md). Data model:
+  see [`data-model.md`](./data-model.md).
+
+## Scenario A — See and reuse the exact command (US1 / FR-001–003, SC-001/002)
+
+1. On the `/ensemble` page, select one or more chapters and click **Run extraction**.
+2. **Expect**: a dedicated, copyable **command box** appears showing the full
+   invocation (env prefix + `python …/ensemble_batch.py --chapters … --model …`).
+3. Copy it, open a terminal in the campaign workspace, paste, and run.
+4. **Expect**: it runs the same operation with no hand-editing (SC-001).
+5. Inspect the command box and the live output for any API key.
+   **Expect**: none present (SC-002). With an OpenRouter backend, the command shows
+   `CG_BACKEND=openrouter OPENROUTER_MODEL=…` but **no** `OPENROUTER_API_KEY`.
+
+## Scenario B — Watch progress live (US2 / FR-004/005, SC-003)
+
+1. Run the extraction over several chapters.
+2. **Expect**: output lines appear **while the run is still going** (chapter-by-chapter
+   `[extract+merge] <stem>` / `[skip] <stem>` lines), not only at the end (SC-003).
+3. **Expect**: the page clearly shows a "running" state (button shows `Running…`).
+
+## Scenario C — Know it finished + durable record (US3 / FR-006/007/011, SC-004/006)
+
+1. Let an extraction run to completion.
+2. **Expect**: an unambiguous **success** state (e.g. "Done", exit 0) plus the final
+   output (SC-004).
+3. Force a failure (e.g. point a stage at a missing input) and run.
+   **Expect**: an unambiguous **failure** state distinct from success, with the error
+   output shown.
+4. Run a stage with **no chapters selected**.
+   **Expect**: a readable refusal ("No chapters selected …"), not a blank/generic
+   error (FR-011).
+5. Look under `<campaign>/logs/` for the newest `*_ensemble_batch.md` (or relevant
+   script) file. **Expect**: it contains the command, full output, returncode, and
+   duration — recoverable after closing the browser (SC-006).
+
+## Scenario D — Abort a run (US4 / FR-008/009/010/013, SC-005)
+
+1. Start an extraction over **many** chapters (long enough to interrupt).
+2. After a chapter or two completes, click **Abort**.
+3. **Expect**: the run stops within a few seconds; the page shows an **aborted** state
+   distinct from success and failure; output captured so far stays visible (SC-005,
+   FR-009).
+4. Verify no orphaned workers: `pgrep -af ensemble_extract` (or `ps`) shows **nothing**
+   still running for this campaign (FR-013 / process-group kill).
+5. Re-run the same extraction.
+   **Expect**: already-completed chapters are **skipped** (`[skip] <stem>`), only the
+   interrupted/remaining chapters run (FR-010).
+6. Inspect the per-chapter dir of the chapter that was mid-flight when you aborted.
+   **Expect**: either a complete `merged.json` or none — never a truncated one
+   (FR-014). The newest `logs/` entry records `result: aborted` (R5).
+
+## Scenario E — Disconnect = implicit abort (FR-013)
+
+1. Start a long extraction.
+2. **Close the browser tab** (or navigate away) mid-run.
+3. From a terminal: `pgrep -af ensemble_batch` and `pgrep -af ensemble_extract`.
+   **Expect**: within a few seconds, **nothing** for this campaign keeps running — the
+   server treated the disconnect as an abort and group-killed the tree. No metered
+   run continues unobserved.
+
+## Regression checks (must stay green)
+
+- `python -m pytest tests/` — including existing `tests/test_ensemble_*.py` and
+  `tests/test_retrieve_render_isolation.py` (the router/runner add no retrieval/render
+  mixing).
+- The `/grounding` (Anthropic) page still runs and streams unchanged.
+- New: `tests/test_subprocess_abort.py` — group-kill, grace→force timing, atomic
+  cache write (no truncated file on kill), lock release on abort, no secret in the
+  persisted record.
diff --git a/specs/002-ensemble-run-observability/research.md b/specs/002-ensemble-run-observability/research.md
new file mode 100644
index 0000000..76bfe9f
--- /dev/null
+++ b/specs/002-ensemble-run-observability/research.md
@@ -0,0 +1,69 @@
+# Phase 0 Research: Ensemble Run Observability
+
+All Technical Context unknowns are resolved below. Each item: Decision / Rationale / Alternatives considered.
+
+## R1 — How does an abort (or disconnect) actually terminate the run, including child workers?
+
+**Decision**: Launch every run subprocess in its **own session/process group** (`asyncio.create_subprocess_exec(..., start_new_session=True)`). On abort, signal the **whole group**: `os.killpg(pgid, SIGTERM)`, wait up to a grace window (~3–5 s) for exit, then `os.killpg(pgid, SIGKILL)` if still alive. Detect both explicit abort and client disconnect at the *same* place: the streaming async generator in `stream_subprocess`. When the client connection drops, Starlette/uvicorn cancels the response task, raising `asyncio.CancelledError`/`GeneratorExit` inside the generator; a `try/finally` around the read loop runs the group-kill, the log write, and the lock release on every exit path.
+
+**Rationale**: `ensemble_batch.py` fans out child workers (`ThreadPoolExecutor` → `subprocess.run(ensemble_extract…)`). Killing only the top process orphans those workers, which keep running and keep spending tokens — exactly the FR-013 harm. Process-group kill is the only reliable way to stop the whole tree. Doing termination in the generator's `finally` unifies explicit abort and disconnect into one mechanism (matches the clarification: disconnect = implicit abort = same graceful-then-force path).
+
+**Alternatives considered**:
+- *Kill only the parent PID* — rejected: orphans the extract workers (token + correctness leak).
+- *Poll `await request.is_disconnected()` in a side task* — workable but redundant once the generator already receives cancellation; adds a second code path. Kept the single `finally` path.
+- *`proc.terminate()` then `proc.kill()` on the parent only* — rejected for the same orphan reason as group-kill is required.
+
+## R2 — Transport for explicit abort, given `EventSource` is GET-only
+
+**Decision**: **Explicit abort = the client closes the stream connection** (`EventSource.close()` in `useEnsembleRun.abort()`), which the server observes as a disconnect and handles via the R1 `finally` termination. The UI sets its own status to `aborted` locally (it initiated the close, so it knows). No separate abort endpoint and no server-side run-id registry are required.
+
+**Rationale**: Minimal surface, one termination mechanism, and it's identical to the disconnect path the spec already mandates (FR-013) — so explicit abort and "closed the tab" are guaranteed to behave the same. Keeps the existing `EventSource` transport and the `_RUNNING` per-stage lock unchanged (the lock is released in the same `finally`).
+
+**Alternatives considered**:
+- *Separate `POST /run/abort` + in-process `{run_id: proc}` registry* — more moving parts (id generation, id propagation to the client via an early SSE event, registry lifecycle, races between abort and natural completion). Rejected as unnecessary for a single-operator local server when closing the connection already terminates the run. (Revisit only if multi-client or programmatic abort is ever needed.)
+- *Switch to `fetch` + `ReadableStream` + `AbortController`* — gives an explicit client-side abort handle, but means replacing the shared `connectSSE`/`EventSource` helper. Deferred: the close-connection approach achieves the same result without the rewrite.
+
+## R3 — Are per-unit cache writes atomic today? (FR-014)
+
+**Decision**: No — make them atomic with a **temp-file-then-`os.replace`** idiom at every *cache-trust* write site, via one shared helper (`campaignlib.atomic_write_text` / `atomic_write_json`). Concrete sites:
+- The per-chapter `merged.json` written by the extract/merge worker (the file `ensemble_batch.py` trusts by `merged.exists()` at line ~162 to skip a chapter).
+- `facts_to_state.py:write_dossier` (line ~361), trusted by `dossier_path(...).exists()` at the resume check (line ~493).
+
+`os.replace` is atomic on the same filesystem (POSIX rename), so a force-kill can leave at most a discardable temp file, never a half-written file at the trusted path. Write temp in the **same directory** as the destination to guarantee same-filesystem rename.
+
+**Rationale**: The resume/skip logic trusts *existence* of the destination file, not its integrity. A non-atomic `write_text` interrupted by SIGKILL yields a truncated-but-present file that the next run treats as complete → silent corruption downstream (a Principle I/IV precision failure). Atomic publish makes the integrity guarantee structural rather than timing-dependent (the clarified Q3 = Option A).
+
+**Alternatives considered**:
+- *Validate each cached file on resume (parse-check)* — rejected as the primary fix: it's per-format, easy to under-implement, and still races. Atomic publish is simpler and format-agnostic. (A cheap `json.loads` sanity check may be added opportunistically but is not the guarantee.)
+- *Write a `.done` sentinel beside each output* — extra files, extra bookkeeping; `os.replace` achieves the same with the real artifact.
+
+## R4 — Reproducible command form & secret-safety (FR-002, FR-003, SC-002)
+
+**Decision**: Reuse the existing command echo, but emit it as a **distinct SSE `command` event** (not just an inline `$ …` data chunk) so the UI can render a dedicated copy-to-clipboard box. The command string keeps the existing form: any non-secret env prefix (`CG_BACKEND=…`, `DGX_ENDPOINT=…`, `OPENROUTER_MODEL=…`, `DGX_MODEL=…`) followed by the full `python … script.py --flags`. **Secrets are already absent**: `_llm_env` never injects API keys — `OPENROUTER_API_KEY` / `ANTHROPIC_API_KEY` are inherited from the server's environment, so they never appear on the command line. The persisted log (`_save_run_log`) records the same secret-free command.
+
+**Rationale**: "Reproducible" per the spec means runnable by an operator whose own environment supplies the credentials (spec Assumption). The env prefix shown is exactly what an operator pastes in front of the command in their workspace shell; the API key comes from their environment, just as it does for the server. A distinct event (vs. parsing the first `$ ` line out of the output) keeps the copyable command robust and unambiguous.
+
+**Alternatives considered**:
+- *Keep the inline `$ …` line only* — works but forces the UI to string-parse output to find the command; brittle and not cleanly copyable. Rejected.
+- *Echo the resolved absolute interpreter path vs. `python`* — keep `python_exe()` (already used) so the copied command matches what actually ran; acceptable since the operator runs in the same workspace/venv.
+
+## R5 — Distinguishing aborted from failed in the result (FR-006, FR-009)
+
+**Decision**: A run terminated by group-kill exits with a **negative returncode** (e.g. `-15` SIGTERM, `-9` SIGKILL). The runner classifies this exit as `aborted` (not `failed`), records `result: aborted` in the persisted log, and — when the connection is still open at abort time (rare) — emits a `done` event carrying an `aborted` flag. On the common path (operator clicked Abort → connection closed), the **frontend** owns the `aborted` status because it initiated the close; the persisted log still records `aborted` from the negative returncode. Failure = the process exited on its own with a non-zero positive code; success = exit 0.
+
+**Rationale**: Three outcomes must be distinguishable (SC-004). Signal-based negative returncodes cleanly separate "we stopped it" from "it failed." Persisting `aborted` keeps the on-disk record honest even though the UI may have already shown `aborted` from the close.
+
+**Alternatives considered**:
+- *Treat any non-zero as failure* — rejected: collapses aborted into failed, violating SC-004.
+
+## R6 — Long-output readability & persistence (Edge Case, FR-007)
+
+**Decision**: Keep streaming into the existing `StreamOutput.vue` `<pre>` (already `overflow-y:auto`, scrollable, `white-space:pre-wrap`). The full output is captured server-side and written to `<campaign>/logs/<ts>_<script>.md` by `_save_run_log` regardless of how the run ended (now also on the abort `finally` path). No truncation of the persisted record.
+
+**Rationale**: The component already satisfies the "remain readable / scrollable" requirement; the only change is ensuring the log write happens on the abort path too (folded into R1's `finally`).
+
+**Alternatives considered**: virtualized log viewer / ring buffer — unnecessary at single-operator scale; deferred.
+
+## Cross-cutting note — what does NOT change
+
+No stage's computation changes (spec Assumption). No new LLM call, no retrieval/render mixing, no new external dependency, no DB. The `/grounding` (Anthropic) path and non-ensemble run routes are untouched because the changes live in the *shared* runner and the ensemble-specific frontend/engine files.
diff --git a/specs/002-ensemble-run-observability/spec.md b/specs/002-ensemble-run-observability/spec.md
new file mode 100644
index 0000000..04dc9bf
--- /dev/null
+++ b/specs/002-ensemble-run-observability/spec.md
@@ -0,0 +1,143 @@
+# Feature Specification: Ensemble Run Observability
+
+**Feature Branch**: `002-ensemble-run-observability`
+
+**Created**: 2026-06-28
+
+**Status**: Draft
+
+**Input**: User description: "when running an extraction, through the ensemble, as a user I need to observe what happened. I want to see the actual command that was run, if I have the actual command, I can run it later, if I need to. I want to see if the command is progressing - the output of the command as it runs. I want to know that the command finished and see the output of the command as it finished. And I want to be able to abort the command"
+
+## Overview
+
+When the operator runs a stage of the ensemble grounding-doc workflow from the UI, the page kicks off a long-running command and shows its output. Today that experience is thin: the operator cannot reliably tell *which* command ran (so they cannot reproduce it later at the CLI), cannot stop a run once it has started, and has only an informal sense of when a run has truly finished versus stalled.
+
+This feature makes an ensemble run **observable and controllable** as a first-class thing. For any ensemble stage the operator launches from the UI, they can: see the exact command that was run in a form they can copy and re-run themselves; watch the command's output appear live as it progresses; see a clear, unambiguous signal when the command finishes (success or failure) together with its final output; and abort a running command before it completes.
+
+This serves the project's standing commitment that the UI only *mechanizes* the sequence and never traps the human inside it: a copyable, reproducible command is precisely the escape hatch that lets the operator drop to the CLI and lose nothing, and a persisted run record keeps the truth of "what happened" on disk rather than only in a browser tab.
+
+## Clarifications
+
+### Session 2026-06-28
+
+- Q: When the operator aborts a running command, how should it be terminated? → A: Graceful stop signal first; force-kill if the process has not exited within a short grace period (~3–5s).
+- Q: If the operator closes the tab / navigates away / loses connection mid-run, what happens to the command? → A: Treat disconnect as an implicit abort — stop the run using the same graceful-then-force termination. No unobserved metered runs.
+- Q: How is in-flight unit integrity guaranteed when a force-kill interrupts a write? → A: Atomic per-unit publish (write-temp-then-rename) — a force-kill leaves no partial file the resume check trusts; the unit is recomputed on re-run.
+
+## User Scenarios & Testing *(mandatory)*
+
+### User Story 1 - See the exact command that ran, and reuse it (Priority: P1)
+
+When the operator launches an ensemble stage from the UI, the page shows the exact command that was executed — the full invocation, including which inputs and which backend/model it used — in a form the operator can read and copy. If the operator later needs to re-run that step themselves (to debug, to tweak a flag, or to run it outside the UI), they can take that command, paste it into a terminal in the campaign workspace, and reproduce the same run.
+
+**Why this priority**: This is the load-bearing escape hatch. The whole point of the UI is to mechanize the sequence without stealing the human's ability to step down to the CLI. If the operator cannot recover the actual command, the UI has become a walled garden. It is also the cheapest, most foundational slice — it delivers value the moment a single stage can be launched.
+
+**Independent Test**: Launch any ensemble stage from the UI. Confirm the displayed command names the same inputs and backend the operator chose, can be copied, and — when pasted into a terminal in the campaign workspace — runs the same operation and produces equivalent output. Confirm no secret values (e.g. API keys) appear in the displayed command.
+
+**Acceptance Scenarios**:
+
+1. **Given** the operator launches a stage with a specific set of inputs and a chosen backend, **When** the run starts, **Then** the page displays the full command line reflecting exactly those inputs and that backend.
+2. **Given** a displayed command, **When** the operator copies it and runs it in a terminal opened in the campaign workspace, **Then** it executes the same operation without manual editing to make it runnable.
+3. **Given** a run that used a backend requiring a secret credential, **When** the command is displayed, **Then** the secret value is not shown, while the command remains reproducible by an operator who has that credential in their own environment.
+
+---
+
+### User Story 2 - Watch the command progress as it runs (Priority: P1)
+
+While a stage is running, the operator sees the command's output appear incrementally, as it is produced, rather than waiting for the whole run to finish. This lets the operator tell that the run is alive and making progress (e.g. moving from chapter to chapter), and notice early if something is going wrong.
+
+**Why this priority**: Ensemble stages are long-running. Without live progress the operator cannot distinguish "working" from "hung," which makes every run anxiety-inducing and pushes people back to the CLI. Live output is core to the "observe what happened" need.
+
+**Independent Test**: Launch a stage that produces output over time. Confirm output lines appear in the page while the command is still running (not only at the end), within a couple of seconds of being produced, and that the page indicates a run is in progress.
+
+**Acceptance Scenarios**:
+
+1. **Given** a running stage that emits output over time, **When** the command produces a line, **Then** that line appears in the page shortly afterward, while the command is still running.
+2. **Given** a running stage, **When** the operator looks at the page, **Then** the page clearly indicates that a run is currently in progress.
+3. **Given** a stage that processes multiple inputs in sequence, **When** it advances from one input to the next, **Then** the operator can see that progression in the streamed output.
+
+---
+
+### User Story 3 - Know the command finished, and see its final output (Priority: P1)
+
+When the command completes, the page gives the operator an unambiguous signal that it has finished and whether it succeeded or failed, alongside the command's full output as it ended. The operator does not have to guess whether more output is still coming. The complete record of the run — the command, its full output, and its result — survives after the run, so the operator can review or reproduce it later even after closing the browser.
+
+**Why this priority**: "Did this finish, and did it work?" is the question every run ends on. A run whose completion is ambiguous, or whose output vanishes when the tab closes, fails the basic observability need. Persisting the record on disk keeps "what happened" as truth on disk rather than ephemeral browser state.
+
+**Independent Test**: Run a stage to completion (both a successful run and a failing one). Confirm the page shows a clear finished state distinguishing success from failure, shows the final output, and that the full run record (command + output + result) can still be found after the browser is closed.
+
+**Acceptance Scenarios**:
+
+1. **Given** a stage that completes successfully, **When** the command exits, **Then** the page shows an unambiguous "finished successfully" state along with the final output.
+2. **Given** a stage that fails, **When** the command exits with a failure, **Then** the page shows an unambiguous failure state distinct from success, along with the output that led to the failure.
+3. **Given** a completed run, **When** the operator returns later (including after closing and reopening the browser), **Then** the command, its full output, and its result are still recoverable.
+4. **Given** a run that cannot even start because a precondition is not met, **When** the operator launches it, **Then** the page shows a readable reason for the failure rather than a generic or silent error.
+
+---
+
+### User Story 4 - Abort a running command (Priority: P2)
+
+While a stage is running, the operator can abort it. Aborting stops the underlying command promptly and the page reflects that the run was aborted (distinct from finished-success and finished-failure). Work that the run had already completed and written to disk is preserved, so a later re-run can resume rather than start over.
+
+**Why this priority**: Long, metered, or mistaken runs need a stop button — the operator who realizes they picked the wrong inputs or the wrong backend should not have to wait it out or kill a server. It is P2 because the observe-and-reproduce slices (US1–US3) already deliver standalone value without it, but it closes the loop on real control.
+
+**Independent Test**: Launch a long-running stage, then abort it. Confirm the underlying command stops within a few seconds, the page shows an "aborted" state, and any work the run had already finished and written to disk is still present (a subsequent re-run skips that completed work).
+
+**Acceptance Scenarios**:
+
+1. **Given** a running stage, **When** the operator aborts it, **Then** the underlying command stops promptly and the page shows an aborted state distinct from success and failure.
+2. **Given** a stage that had already completed and persisted part of its work when it was aborted, **When** the operator re-runs the stage, **Then** the already-completed work is reused rather than recomputed.
+3. **Given** an aborted run, **When** the operator looks at the page, **Then** the output captured up to the abort point remains visible and the run record reflects that it was aborted.
+
+---
+
+### Edge Cases
+
+- **Browser closed mid-run**: If the operator closes the tab, navigates away, or loses connection while a command is running, the run is treated as an implicit abort (stopped via the same graceful-then-force termination), so no metered run keeps burning cost unobserved. The run record (command, output captured, aborted result) must still be recoverable afterward; the operator must not be left unable to tell whether the command kept running or stopped.
+- **Very long output**: A run that emits a large volume of output must remain readable (scrollable) in the page and must not lose earlier output from the persisted record.
+- **Failure to start**: A run blocked by an unmet precondition (e.g. nothing selected to act on) must surface a readable reason, not a blank or generic error.
+- **Abort during the final input of a batch**: Aborting just as an item is being written must not corrupt the partially-written output. Each unit's output is published to its trusted cache location atomically (written to a temp location, then atomically moved), so a force-kill can leave at most a discardable temp artifact — never a partial file at the path the resume check treats as complete. The interrupted unit is recomputed on the next run.
+- **Secret-bearing commands**: When a backend needs a credential, the displayed/reproducible command must omit the secret value while remaining runnable by an operator whose environment supplies it.
+
+## Requirements *(mandatory)*
+
+### Functional Requirements
+
+- **FR-001**: For every ensemble stage launched from the UI, the system MUST display the exact command that was executed, including the inputs acted on and the backend/model used.
+- **FR-002**: The displayed command MUST be copyable and, when run in a terminal opened in the campaign workspace, MUST reproduce the same operation without requiring the operator to hand-edit it to make it runnable.
+- **FR-003**: The system MUST NOT display secret credential values (e.g. API keys) as part of the command, while keeping the command reproducible for an operator whose own environment supplies those credentials.
+- **FR-004**: While a command is running, the system MUST stream its output to the operator incrementally as it is produced, not only after completion.
+- **FR-005**: While a command is running, the system MUST clearly indicate that a run is in progress.
+- **FR-006**: When a command finishes, the system MUST present an unambiguous finished state that distinguishes success from failure, together with the command's final output.
+- **FR-007**: The system MUST persist a durable record of each run — the command, its full output, and its result — that remains recoverable after the run ends and after the browser is closed.
+- **FR-008**: The operator MUST be able to abort a running command. Abort MUST first request a graceful stop and, if the command has not exited within a short grace period (~3–5 seconds), MUST force-kill it so the stop is bounded.
+- **FR-009**: After an abort, the system MUST show an aborted state distinct from both success and failure, and MUST retain the output captured up to the abort point.
+- **FR-010**: An abort MUST preserve work the run had already completed and written to disk, such that a subsequent re-run reuses that completed work rather than recomputing it.
+- **FR-011**: When a run cannot start because a precondition is unmet, the system MUST surface a readable reason rather than a silent or generic failure.
+- **FR-012**: The run record and displayed command MUST reflect the operator's explicit input selection for the stage, never an implicitly expanded set (consistent with explicit-selection rules for token-spending passes).
+- **FR-013**: When the operator's connection to a running command is lost (tab closed, navigation away, or network drop), the system MUST treat it as an implicit abort and stop the underlying command using the same graceful-then-force termination as an explicit abort, so no metered run continues unobserved.
+- **FR-014**: Each unit of resumable work MUST be published to its trusted cache location atomically (e.g. written to a temporary location then atomically moved), so that an abort or force-kill cannot leave a partial output that a subsequent re-run would treat as completed work. An interrupted unit MUST be recomputed on re-run.
+
+### Key Entities *(include if feature involves data)*
+
+- **Run record**: A durable account of one ensemble-stage execution. Attributes: the exact command line (with secrets omitted), the chosen backend/model, the inputs acted on, the captured output, the final result (succeeded / failed / aborted), and timing. Lives on disk so it outlives the browser session and can be reviewed or reproduced from the CLI.
+
+## Success Criteria *(mandatory)*
+
+### Measurable Outcomes
+
+- **SC-001**: For 100% of UI-launched ensemble runs, an operator can copy the displayed command and reproduce the run from a terminal in the campaign workspace with no manual edits to make it runnable.
+- **SC-002**: No run ever displays a secret credential value in its command, in the live output, or in its persisted record.
+- **SC-003**: Output produced by a running command becomes visible to the operator within a few seconds of being produced, while the command is still running.
+- **SC-004**: At the end of every run the operator can correctly tell, without ambiguity, whether it succeeded, failed, or was aborted.
+- **SC-005**: After aborting, the underlying command stops within a few seconds, and a subsequent re-run resumes from the already-completed work rather than starting over.
+- **SC-006**: The full record of any run (command, output, result) remains recoverable after the browser is closed.
+
+## Assumptions
+
+- **Scope across stages**: The four observability needs (see the command, watch progress, see completion, abort) are generic to any ensemble stage that runs a command. Extraction is the driving example named in the request; this spec treats the capability as applying uniformly to every command-running ensemble stage, not extraction alone.
+- **One run at a time per stage**: A given ensemble stage runs one command at a time from the UI; launching is disabled while that stage's run is in progress. Coordinating multiple simultaneous runs across stages is out of scope for this feature.
+- **Reproducible command form**: "Reproducible" means runnable from a terminal opened in the campaign workspace by an operator whose environment already provides the necessary credentials; the command may include the non-secret environment context needed to reproduce it (e.g. which backend/endpoint/model), but secret values are omitted.
+- **Abort semantics**: Abort issues a graceful stop signal first and force-kills only if the command has not exited within a short grace period (~3–5s), so the stop is bounded but a cleanly-exiting command keeps its chance to finish writing the current unit atomically. It does not attempt a graceful drain of all remaining in-flight work. Because ensemble extraction is resumable (completed inputs are cached on disk), preserving already-finished work and discarding the in-flight item is the intended behavior.
+- **Persistence location**: The durable run record uses the campaign workspace's existing per-run log convention (a file under the workspace `logs/` directory) so it is equally visible to the UI, the CLI, and a Claude conversation.
+- **No change to the underlying engine's behavior**: This feature is about observing and controlling runs, not changing what each ensemble stage computes; the stages themselves and their outputs are unchanged.
diff --git a/specs/002-ensemble-run-observability/tasks.md b/specs/002-ensemble-run-observability/tasks.md
new file mode 100644
index 0000000..3dff5e9
--- /dev/null
+++ b/specs/002-ensemble-run-observability/tasks.md
@@ -0,0 +1,208 @@
+---
+
+description: "Task list for Ensemble Run Observability"
+---
+
+# Tasks: Ensemble Run Observability
+
+**Input**: Design documents from `/specs/002-ensemble-run-observability/`
+
+**Prerequisites**: plan.md ✓, spec.md ✓, research.md ✓, data-model.md ✓, contracts/run-stream.md ✓, quickstart.md ✓
+
+**Tests**: Targeted tests ARE included for the high-risk engine changes (process-group kill, grace→force timing, atomic cache writes, secret-safety, explicit-selection) — the plan/quickstart call for `tests/test_subprocess_abort.py` and these are correctness-critical (Constitution I/IV/X). The already-working stream/UI surfaces are covered by verification tasks + quickstart, not new unit tests.
+
+> **Revision note**: This revision folds in `/speckit-analyze` findings — **I1** (EventSource reconnect must not restart a metered run), **I2** (the shared-seam disconnect-kill is a deliberate global change — reconcile + regression-test a non-ensemble route), **U1/U2** (hoist captured state in the `finally`; validate the grace window on the *disconnect* path), and **C1** (explicit-selection coverage for FR-012).
+
+## Format: `[ID] [P?] [Story] Description`
+
+- **[P]**: Can run in parallel (different files, no dependency on an incomplete task)
+- **[Story]**: US1 / US2 / US3 / US4
+- Exact file paths are included in each task.
+
+## Path notes
+
+Web-over-CLI layout (plan.md "Structure Decision"). Shared seam: `server/subprocess_runner.py`. Engine: `ensemble_merge.py`, `facts_to_state.py`, `campaignlib.py`. Router: `server/routers/ensemble.py`. Frontend: `frontend/src/api/sse.ts`, `frontend/src/views/ensemble/*`.
+
+⚠️ `server/subprocess_runner.py` and `frontend/src/views/ensemble/useEnsembleRun.ts` (and `frontend/src/api/sse.ts`) are each edited by multiple stories — those edits are **sequential** (never marked [P] against each other), even across phases.
+
+⚠️ **Shared-seam blast radius (I2)**: `stream_subprocess` is used by `grounding.py`, `prep.py`, `session_workflow.py`, `scene_editor.py`, etc. The termination changes (T019–T021) change disconnect behavior for **every** SSE route, not just ensemble. This is intended (no route should leak a runaway process), and T031 regression-tests one non-ensemble route to prove it's safe.
+
+---
+
+## Phase 1: Setup (Shared Infrastructure)
+
+**Purpose**: Shared utilities and the test scaffold that later phases build on.
+
+- [X] T001 [P] Add `atomic_write_text(path, text)` and `atomic_write_json(path, obj)` to `campaignlib.py` — write to a temp file in the **same directory** as the destination, then `os.replace` onto the destination (atomic same-filesystem rename). Docstring states the FR-014 guarantee (no partial file ever at the trusted path).
+- [X] T002 [P] Create test scaffold `tests/test_subprocess_abort.py` with pytest fixtures: a helper to drive `server.subprocess_runner.stream_subprocess` against a short-lived child script, plus a fixture spawning a long-running child that itself spawns a grandchild (to prove process-group kill on both explicit-abort and disconnect paths).
+
+---
+
+## Phase 2: Foundational (Blocking Prerequisites)
+
+**Purpose**: Run-record result semantics (shared by US3 + US4) and an explicit decision on the shared-seam blast radius (I2) before any termination code is written.
+
+**⚠️ CRITICAL**: US3 and US4 depend on T003/T004; T005 governs T019–T021. US1/US2 do not depend on this phase.
+
+- [X] T003 In `server/subprocess_runner.py`, add a pure helper `classify_result(returncode) -> str` → `"succeeded"` (rc == 0), `"failed"` (rc > 0), `"aborted"` (rc is None or rc < 0, i.e. signal). Reference R5 in the docstring.
+- [X] T004 In `server/subprocess_runner.py`, extend `_save_run_log` to record a `result` line (from `classify_result`) alongside returncode/duration/cwd, so the persisted Run record (data-model.md) distinguishes succeeded/failed/aborted.
+- [X] T005 **(I2 reconciliation)** In `server/subprocess_runner.py` module docstring, document that disconnect-driven group-kill (added in T019) applies to **all** SSE routes intentionally — no route may leak a runaway/orphaned process — and update `plan.md` "Constraints" so the "non-ensemble routes untouched" wording reads "non-ensemble routes' request/response shapes unchanged; they additionally gain disconnect-driven cleanup." (Doc-only; pairs with the T031 regression test.)
+
+**Checkpoint**: Run record carries an unambiguous outcome; the global-behavior decision is recorded. Build the stories.
+
+---
+
+## Phase 3: User Story 1 — See the exact command, and reuse it (Priority: P1) 🎯 MVP
+
+**Goal**: A dedicated, copyable, secret-free command surfaces for every UI-launched run, reflects the explicitly selected inputs, and reproduces the run when pasted into a workspace terminal.
+
+**Independent Test**: Launch a stage; a copy-able command box shows the full invocation reflecting the selected inputs/backend; pasting it in a workspace terminal runs the same op; no API key appears anywhere; the command never shows an expanded glob when the selection was explicit (quickstart Scenario A).
+
+- [X] T006 [US1] In `server/subprocess_runner.py` `stream_subprocess`, emit a distinct SSE event `event: command\ndata: <json-string>` as the FIRST event, carrying the secret-free invocation (existing `cmd_display` = non-secret env prefix + `python … --flags`). Keep the legacy inline `$ …` data chunk for back-compat (contracts/run-stream.md §Response).
+- [X] T007 [P] [US1] In `frontend/src/api/sse.ts`, add an optional `onCommand(cmd: string)` callback to `SSECallbacks` and an `es.addEventListener('command', …)` handler that parses the JSON string and invokes it.
+- [X] T008 [US1] In `frontend/src/views/ensemble/useEnsembleRun.ts`, add reactive `command` state, reset it in `run()`/`clear()`, and populate it from `onCommand`. (Sequential with later US4 edits to this file.)
+- [X] T009 [P] [US1] Create `frontend/src/components/shared/RunCommandBar.vue` — renders the `command` string in a monospace box with a copy-to-clipboard button (and a placeholder when empty).
+- [X] T010 [US1] Wire `RunCommandBar` into `frontend/src/views/ensemble/EnsembleExtract.vue` above `StreamOutput`, bound to the composable's `command`.
+- [X] T011 [P] [US1] In `tests/test_subprocess_abort.py`, add a secret-safety + reproducibility test: run a stage with an OpenRouter/DGX-style `env_extra`; assert the `command` event and the persisted record contain the backend/model but NO `*_API_KEY` value (SC-002, FR-003) and reflect the passed inputs (FR-001).
+- [X] T012 [P] [US1] **(C1 — FR-012)** In `tests/test_subprocess_abort.py` (or extend T011), assert the displayed command + run record contain **exactly the explicitly-passed chapter selection** and never a wildcard/expanded glob; and reference the existing empty-selection refusal in `tests/test_ensemble_chapters.py` as the companion guarantee that an empty selection is refused, never expanded (FR-012, Principle X).
+
+**Checkpoint**: US1 fully functional — copyable, reproducible, secret-free, explicit-selection-faithful command. Shippable MVP.
+
+---
+
+## Phase 4: User Story 2 — Watch the command progress as it runs (Priority: P1)
+
+**Goal**: Output streams incrementally with a clear in-progress indicator, on every stage.
+
+**Independent Test**: Run a multi-chapter stage; lines appear while running (not only at the end); the page shows a running state (quickstart Scenario B).
+
+- [X] T013 [US2] Verify incremental streaming in `server/subprocess_runner.py` (chunk flush on `>=20` bytes or newline) still delivers per-chapter lines live after the T006 `command` event; adjust only if the new first event regressed flush timing.
+- [X] T014 [P] [US2] In `frontend/src/views/ensemble/EnsembleBundle.vue` and `EnsembleSynthesize.vue`, ensure a visible "running" affordance (button → `Running…` / spinner) driven by the shared composable `status === 'running'`; add it where missing so US2 holds for every stage (spec "Scope across stages").
+
+**Checkpoint**: Every stage shows live progress + running state.
+
+---
+
+## Phase 5: User Story 3 — Know it finished, see final output + durable record (Priority: P1)
+
+**Goal**: Unambiguous success/failure finished state, full final output, readable precondition errors, and a persisted record recoverable after the browser closes.
+
+**Independent Test**: Run to success and to failure; states are distinct; nothing-selected gives a readable refusal; `logs/` holds the record after closing the browser (quickstart Scenario C).
+
+- [X] T015 [US3] In `frontend/src/views/ensemble/EnsembleBundle.vue` and `EnsembleSynthesize.vue`, ensure the finished state distinguishes success (`returnCode === 0`) from failure (`> 0`) with distinct labels/colors, matching `EnsembleExtract.vue` (FR-006, SC-004).
+- [X] T016 [US3] Verify precondition refusals (empty selection via existing `sse_error_stream`) render their `done.error` text to the operator in the stage views, not a generic "Stream error" (FR-011); fix the `onDone(error)` rendering path in `useEnsembleRun.ts`/views if the message is dropped.
+- [X] T017 [P] [US3] In `tests/test_subprocess_abort.py`, assert `_save_run_log` writes a recoverable record (command + full output + `result` + returncode + duration) for a success exit AND a non-zero failure exit, with `result` = succeeded / failed respectively (FR-007, SC-006; uses T003/T004).
+
+**Checkpoint**: Finished/failed outcomes unambiguous; durable record verified.
+
+---
+
+## Phase 6: User Story 4 — Abort a running command (Priority: P2)
+
+**Goal**: Operator (or a lost connection) can stop a run; termination is graceful→force, kills the whole worker group, preserves completed work, never corrupts the resume cache, and a network drop never silently restarts the run.
+
+**Independent Test**: Start a long extraction, click Abort → stops within seconds, "aborted" shown, no orphaned `ensemble_extract` workers, re-run skips completed chapters, mid-flight `merged.json` is complete-or-absent; closing the tab or dropping the network stops the run and does NOT restart it (quickstart Scenarios D & E).
+
+### Engine: atomic cache writes (FR-014) — independent, parallelizable
+
+- [X] T018 [P] [US4] In `ensemble_merge.py` (~line 363), replace `output_path.write_text(json.dumps(...))` for the per-chapter `merged.json` with `campaignlib.atomic_write_json(output_path, merged)` so a force-kill never leaves a truncated `merged.json` at the resume-trusted path (FR-014).
+- [X] T019 [P] [US4] In `facts_to_state.py` `write_dossier` (~line 361), replace `dest.write_text(...)` with an atomic write via `campaignlib.atomic_write_text(dest, …)` (the dossier path is trusted by the resume `exists()` check) (FR-014).
+
+### Seam: termination (FR-008, FR-013) — all in subprocess_runner.py, sequential
+
+- [X] T020 [US4] In `server/subprocess_runner.py` `stream_subprocess`, launch the child with `start_new_session=True` (own session/process group) so the whole worker tree is signalable (R1).
+- [X] T021 [US4] **(incorporates U1/U2)** In `server/subprocess_runner.py`, **hoist `captured`, `started`, and `proc` above** the read loop, then wrap the loop in `try/except (asyncio.CancelledError, GeneratorExit)/finally`. In `finally`, terminate the process **group**: `os.killpg(os.getpgid(proc.pid), SIGTERM)` → `await asyncio.wait_for(proc.wait(), GRACE)` (add `GRACE = 4.0`, ~3–5 s per FR-008) → `os.killpg(..., SIGKILL)` on `TimeoutError`. The finally MUST run `_save_run_log(... captured ...)` and `on_complete(returncode)` on **every** exit path (normal, explicit-abort, disconnect) so the `_RUNNING` lock is always released and the record (incl. `aborted`) is always written. Note in a comment: the grace `await` must survive async-generator teardown (it runs during `aclose()`), so do not `yield` inside the finally.
+- [X] T022 [US4] In `server/subprocess_runner.py`, when termination was abort/disconnect-initiated, include `"aborted": true` in the `done` event payload (best-effort if the connection is still open) and rely on `classify_result` (negative rc) for the persisted record (R5, contracts §done).
+- [X] T023 [US4] In `server/routers/ensemble.py`, confirm `_run_locked`'s `_release` (discards the `_RUNNING` key) is driven by the `on_complete` now fired in the T021 `finally`, so an abort/disconnect releases the per-stage lock; add a regression comment. No new endpoint (abort = connection close, contracts §Abort).
+
+### Frontend: abort control, aborted state, and reconnect safety (I1)
+
+- [X] T024 [US4] In `frontend/src/views/ensemble/useEnsembleRun.ts`, add `'aborted'` to the `status` union, keep the `EventSource` handle returned by `connectSSE`, and add `abort()` that calls `es.close()` and sets `status = 'aborted'` (valid only from `'running'`). (Sequential with T008.)
+- [X] T025 [US4] **(I1 — reconnect must not restart a metered run)** In `frontend/src/api/sse.ts` and `useEnsembleRun.ts`, treat an `onerror` while `status === 'running'` as **terminal**: call `es.close()` (preventing EventSource's automatic reconnect, which would re-issue the GET and start the run again) and set `status = 'aborted'` with a "connection lost — run stopped" note. A network drop thus behaves identically to an explicit abort (FR-013) and never silently restarts the run.
+- [X] T026 [US4] In `frontend/src/views/ensemble/EnsembleExtract.vue` (and `EnsembleBundle.vue`, `EnsembleSynthesize.vue`), add an **Abort** button shown while `status === 'running'`, wired to `abort()`, plus an `aborted` status label distinct from Done/Error (FR-009), and surface the "connection lost" note from T025.
+
+### Tests for US4
+
+- [X] T027 [P] [US4] In `tests/test_subprocess_abort.py`: process-group kill test — start the long child-with-grandchild fixture, then (a) cancel the generator to simulate **explicit abort** and (b) drop the connection to simulate **disconnect**; in BOTH cases assert child AND grandchild PIDs are gone within grace+ε (FR-008, FR-013, R1; covers U2's disconnect path).
+- [X] T028 [P] [US4] In `tests/test_subprocess_abort.py`: grace→force timing test — a child that ignores SIGTERM is SIGKILLed within ~GRACE seconds and the run record records `result: aborted` (FR-008, SC-005, R5).
+- [X] T029 [P] [US4] In `tests/test_subprocess_abort.py`: atomicity + lock test — kill a writer mid-`atomic_write_json`/`atomic_write_text` and assert the destination is either absent or a complete valid file, never truncated (FR-014); and assert `_RUNNING` is released after an aborted run (FR-010 resumability precondition).
+
+**Checkpoint**: Abort + disconnect stop the whole tree within seconds, never restart it, completed work survives, cache never corrupts.
+
+---
+
+## Phase 7: Polish & Cross-Cutting Concerns
+
+- [X] T030 [P] Update `docs/web/web_ui.md` (ensemble page: copyable command, abort button, aborted + "connection lost" states) and add an "Observability & abort" note to `docs/cli/ensemble_workflow.md` (disconnect = implicit abort; reconnect does not restart; per-run logs under `logs/`).
+- [X] T031 **(I2 regression)** Add a regression test (e.g. in `tests/test_subprocess_abort.py` or a sibling) that a **non-ensemble** SSE route's run (a `grounding.py`-style invocation through `stream_subprocess`) is also group-killed on disconnect and leaves no orphan — proving the shared-seam change is safe app-wide, not just for ensemble.
+- [X] T032 Run `python -m pytest tests/` — confirm green, especially `tests/test_retrieve_render_isolation.py` (no retrieval/render mixing introduced) and existing `tests/test_ensemble_*.py` (no regression).
+- [ ] T033 Execute `quickstart.md` Scenarios A–E manually against a real campaign workspace; confirm SC-001…SC-006 (Scenario E now also asserts no auto-restart after the tab is closed / network dropped).
+
+---
+
+## Dependencies & Execution Order
+
+### Phase dependencies
+
+- **Setup (P1)**: no deps. T001 unblocks T018/T019; T002 unblocks all test tasks.
+- **Foundational (P2)**: T003 → T004; T005 is doc-only. Blocks US3 result assertions (T017) and US4 record/aborted classification (T021/T022). Does NOT block US1/US2.
+- **US1 (P3)**, **US2 (P4)**, **US3 (P5)**, **US4 (P6)**: each depends only on Setup (+ Foundational for US3/US4). US1/US2 can start right after Setup.
+- **Polish (P7)**: after the stories you intend to ship; T031 depends on T020–T021.
+
+### Story independence
+
+- **US1**: T006 (seam) → T007/T008/T010 (frontend) → T011/T012 (tests); T009 [P]. Independently shippable MVP.
+- **US2**: verification (T013/T014); independent of US1.
+- **US3**: depends on Foundational (T003/T004) for the record's `result`; otherwise independent.
+- **US4**: atomic-writes (T018/T019) independent + parallel; termination (T020→T021→T022→T023) sequential in `subprocess_runner.py`; frontend (T024→T025→T026) sequential with T008.
+
+### Critical sequential chains
+
+- `subprocess_runner.py`: T003/T004 → T006 → T020 → T021 → T022 (same file; one editor at a time).
+- `useEnsembleRun.ts`: T008 → T024 → T025 (same file).
+- `sse.ts`: T007 → T025 (same file).
+
+### Parallel opportunities
+
+- T001 ∥ T002 (Setup).
+- T018 ∥ T019 (different engine files) — and both ∥ the seam/frontend US4 work.
+- T011 ∥ T012, and T027 ∥ T028 ∥ T029 (independent test functions; coordinate edits to the shared test file or write sequentially).
+- Across stories: once Setup is done, US1 and US2 can proceed in parallel with the US4 atomic-write tasks (different files).
+
+---
+
+## Parallel Example: User Story 4 engine vs frontend
+
+```bash
+# Engine atomic-write hardening (independent files):
+Task: "T018 atomic per-chapter merged.json write in ensemble_merge.py"
+Task: "T019 atomic write_dossier in facts_to_state.py"
+
+# Meanwhile, frontend abort + reconnect-safety (sequential within their files):
+Task: "T024 abort()/aborted status in useEnsembleRun.ts"
+Task: "T025 reconnect-as-abort in sse.ts + useEnsembleRun.ts"
+Task: "T026 Abort button + connection-lost note in EnsembleExtract/Bundle/Synthesize"
+# NOTE: termination tasks T020–T023 are sequential in subprocess_runner.py — not parallel.
+```
+
+---
+
+## Implementation Strategy
+
+### MVP first (User Story 1 only)
+
+1. Phase 1 Setup (T001–T002).
+2. Phase 3 US1 (T006–T012) — copyable, reproducible, secret-free, explicit-selection-faithful command.
+3. **STOP & VALIDATE** quickstart Scenario A. Ship: the escape-hatch (Principle IX) is delivered.
+
+### Incremental delivery
+
+1. Setup → US1 (MVP) → US2 (live progress, all stages) → US3 (durable record + Foundational) → **US4 (abort + reconnect safety)** — the largest, highest-risk slice last, fully test-covered.
+2. Each story is independently testable per its quickstart scenario.
+
+### Notes
+
+- [P] = different files, no incomplete-task dependency.
+- The two genuinely new capabilities are US4's abort/disconnect termination (with reconnect safety, I1) and FR-014 atomic writes; US1–US3 mostly harden existing plumbing — sequence accordingly and don't over-invest in US2.
+- The termination change is app-wide (shared seam); T005 records the decision and T031 proves it safe for non-ensemble routes (I2).
+- Commit after each task or logical group; keep `tests/test_retrieve_render_isolation.py` green throughout.
diff --git a/tests/test_subprocess_abort.py b/tests/test_subprocess_abort.py
new file mode 100644
index 0000000..150dac6
--- /dev/null
+++ b/tests/test_subprocess_abort.py
@@ -0,0 +1,427 @@
+"""Tests for stream_subprocess: group-kill, grace→force, atomicity, secret-safety.
+
+Tests are added per-phase by task ID (T011, T012, T017, T027–T029, T031).
+Fixtures live here and are shared across all test functions.
+"""
+
+import asyncio
+import json
+import os
+import sys
+from pathlib import Path
+
+import pytest
+
+# Repo root on sys.path so server.* imports work from any working directory.
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+
+
+# ---------------------------------------------------------------------------
+# Fixtures
+# ---------------------------------------------------------------------------
+
+@pytest.fixture
+def tmp_workspace(tmp_path: Path) -> Path:
+    """Minimal campaign workspace directory (cwd for subprocesses)."""
+    return tmp_path
+
+
+@pytest.fixture
+def long_running_script(tmp_path: Path) -> Path:
+    """Script that loops and also spawns a grandchild subprocess.
+
+    Used to verify that process-group kill terminates the full tree (child +
+    grandchild), not just the direct child.
+    """
+    script = tmp_path / "long_runner.py"
+    script.write_text(
+        "import subprocess, sys, time\n"
+        "grandchild = subprocess.Popen(\n"
+        "    [sys.executable, '-c',\n"
+        "     'import time\\nwhile True: time.sleep(0.05)'])\n"
+        "try:\n"
+        "    while True:\n"
+        "        print('tick', flush=True)\n"
+        "        time.sleep(0.05)\n"
+        "finally:\n"
+        "    grandchild.wait()\n",
+        encoding="utf-8",
+    )
+    return script
+
+
+@pytest.fixture
+def sigterm_ignorer_script(tmp_path: Path) -> Path:
+    """Script that catches and ignores SIGTERM; only SIGKILL stops it.
+
+    Used to verify grace→force escalation: SIGTERM is sent, then after the
+    grace window SIGKILL fires and the process actually exits.
+    """
+    script = tmp_path / "sigterm_ignorer.py"
+    script.write_text(
+        "import signal, time\n"
+        "signal.signal(signal.SIGTERM, signal.SIG_IGN)\n"
+        "while True:\n"
+        "    print('alive', flush=True)\n"
+        "    time.sleep(0.05)\n",
+        encoding="utf-8",
+    )
+    return script
+
+
+# ---------------------------------------------------------------------------
+# SSE drain helpers
+# ---------------------------------------------------------------------------
+
+async def _collect(gen) -> list[str]:
+    """Drain an async generator and return all yielded SSE strings."""
+    events: list[str] = []
+    async for chunk in gen:
+        events.append(chunk)
+    return events
+
+
+async def _drain_n_then_close(gen, n: int) -> list[str]:
+    """Collect n events then call aclose() — simulates explicit abort."""
+    events: list[str] = []
+    async for chunk in gen:
+        events.append(chunk)
+        if len(events) >= n:
+            await gen.aclose()
+            break
+    return events
+
+
+def _parse_command_event(events: list[str]) -> str | None:
+    """Return the payload of the first 'event: command' SSE event, or None."""
+    it = iter(events)
+    for chunk in it:
+        if chunk.startswith("event: command\n"):
+            for line in chunk.splitlines():
+                if line.startswith("data: "):
+                    return json.loads(line[6:])
+    return None
+
+
+def _parse_done_event(events: list[str]) -> dict | None:
+    """Return the parsed data of the 'event: done' SSE event, or None."""
+    for chunk in events:
+        if chunk.startswith("event: done\n"):
+            for line in chunk.splitlines():
+                if line.startswith("data: "):
+                    return json.loads(line[6:])
+    return None
+
+
+# ---------------------------------------------------------------------------
+# T011: Secret-safety + reproducibility (SC-002, FR-001, FR-003)
+# ---------------------------------------------------------------------------
+
+def test_command_event_has_no_api_key(tmp_workspace: Path) -> None:
+    """The command SSE event must not contain any API key value (SC-002)."""
+    fake_key = "sk-FAKESECRET12345"
+    cmd = [sys.executable, "-c", "import sys; print('hello'); sys.exit(0)"]
+    env_extra = {"CG_BACKEND": "openrouter", "OPENROUTER_MODEL": "test/model"}
+
+    # Inject a fake key into the test process environment so the subprocess
+    # inherits it — simulating a real API key in the server environment.
+    os.environ["OPENROUTER_API_KEY"] = fake_key
+    try:
+        from server.subprocess_runner import stream_subprocess
+        events = asyncio.run(_collect(stream_subprocess(cmd, cwd=str(tmp_workspace),
+                                                        env_extra=env_extra)))
+    finally:
+        os.environ.pop("OPENROUTER_API_KEY", None)
+
+    cmd_text = _parse_command_event(events)
+    assert cmd_text is not None, "command event must be emitted"
+    assert fake_key not in cmd_text, "API key must not appear in command event"
+    # Non-secret env vars DO appear (operator needs them to reproduce)
+    assert "openrouter" in cmd_text
+    assert "test/model" in cmd_text
+
+
+def test_command_event_reflects_explicit_inputs(tmp_workspace: Path) -> None:
+    """The command event reflects exactly the passed arguments (FR-001)."""
+    chapters = ["docs/chapters/chapter_01.md", "docs/chapters/chapter_03.md"]
+    cmd = [sys.executable, "-c", "print('ok')"] + [
+        arg for ch in chapters for arg in ("--chapters", ch)
+    ]
+    from server.subprocess_runner import stream_subprocess
+    events = asyncio.run(_collect(stream_subprocess(cmd, cwd=str(tmp_workspace))))
+
+    cmd_text = _parse_command_event(events)
+    assert cmd_text is not None
+    for ch in chapters:
+        assert ch in cmd_text, f"explicit chapter {ch!r} must appear in command"
+
+
+# ---------------------------------------------------------------------------
+# T012: Explicit-selection faithfulness (C1, FR-012, Principle X)
+# ---------------------------------------------------------------------------
+
+def test_empty_selection_is_refused_not_expanded() -> None:
+    """Companion to the empty-selection tests in test_ensemble_chapters.py.
+
+    This references rather than re-tests those guards so that the two files
+    jointly prove FR-012 / Principle X: an empty chapter selection is refused
+    before any command is built or emitted, never silently expanded to a glob.
+    """
+    import importlib
+    # The guard lives in the ensemble router and chapter-picker tests; confirm
+    # the modules are importable so we can trust those tests are in scope.
+    spec_mod = importlib.util.find_spec("server.routers.ensemble")
+    assert spec_mod is not None, "ensemble router must be importable"
+
+    # The run-stream contract also states that FR-012 is enforced via
+    # sse_error_stream, not a wildcard fallback. Verify sse_error_stream exists.
+    from server.subprocess_runner import sse_error_stream
+    assert callable(sse_error_stream)
+
+
+# ---------------------------------------------------------------------------
+# T017: Durable run record — success and failure (FR-007, SC-006)
+# ---------------------------------------------------------------------------
+
+def test_save_run_log_success(tmp_workspace: Path) -> None:
+    """Success run writes a log with result=succeeded, returncode=0, full output."""
+    cmd = [sys.executable, "-c", "print('hello world')"]
+    from server.subprocess_runner import stream_subprocess
+    asyncio.run(_collect(stream_subprocess(cmd, cwd=str(tmp_workspace))))
+
+    logs = list((tmp_workspace / "logs").glob("*.md"))
+    assert len(logs) == 1, "exactly one log file per run"
+    body = logs[0].read_text()
+    assert "result: `succeeded`" in body
+    assert "returncode: `0`" in body
+    assert "hello world" in body
+    assert "duration" in body
+
+
+def test_save_run_log_failure(tmp_workspace: Path) -> None:
+    """Failure run writes a log with result=failed and a positive returncode."""
+    cmd = [sys.executable, "-c", "import sys; print('fail output'); sys.exit(42)"]
+    from server.subprocess_runner import stream_subprocess
+    asyncio.run(_collect(stream_subprocess(cmd, cwd=str(tmp_workspace))))
+
+    logs = list((tmp_workspace / "logs").glob("*.md"))
+    assert len(logs) == 1
+    body = logs[0].read_text()
+    assert "result: `failed`" in body
+    assert "returncode: `42`" in body
+    assert "fail output" in body
+
+
+def test_save_run_log_abort(tmp_workspace: Path, long_running_script: Path) -> None:
+    """Aborted run writes a log with result=aborted (rc is None or negative)."""
+    from server.subprocess_runner import stream_subprocess
+
+    async def _abort_after_one():
+        gen = stream_subprocess([sys.executable, str(long_running_script)],
+                                cwd=str(tmp_workspace))
+        count = 0
+        async for _ in gen:
+            count += 1
+            if count >= 3:
+                await gen.aclose()
+                break
+
+    asyncio.run(_abort_after_one())
+
+    logs = list((tmp_workspace / "logs").glob("*.md"))
+    assert len(logs) == 1
+    body = logs[0].read_text()
+    assert "result: `aborted`" in body
+
+
+# ---------------------------------------------------------------------------
+# T027: Process-group kill — explicit abort and disconnect (FR-008, FR-013)
+# ---------------------------------------------------------------------------
+
+def test_process_group_killed_on_abort(tmp_workspace: Path,
+                                        long_running_script: Path) -> None:
+    """On abort (aclose), child AND grandchild PIDs are gone within a short window."""
+    import time
+    from server.subprocess_runner import stream_subprocess, GRACE_SECONDS
+
+    child_pid: list[int] = []
+
+    async def _run_and_abort():
+        gen = stream_subprocess([sys.executable, str(long_running_script)],
+                                cwd=str(tmp_workspace))
+        count = 0
+        async for chunk in gen:
+            count += 1
+            if count >= 3:
+                await gen.aclose()
+                break
+        # Keep event loop alive so any call_later callbacks can fire.
+        await asyncio.sleep(0.5)
+
+    asyncio.run(_run_and_abort())
+
+    # After GRACE + a small margin, no processes with the script name should survive.
+    deadline = time.monotonic() + GRACE_SECONDS + 2.0
+    script_name = long_running_script.name
+    while time.monotonic() < deadline:
+        import subprocess as _sp
+        r = _sp.run(["pgrep", "-f", script_name], capture_output=True)
+        if r.returncode != 0:
+            break  # pgrep found nothing — all dead
+        time.sleep(0.1)
+    else:
+        # Final check
+        r = _sp.run(["pgrep", "-f", script_name], capture_output=True)
+        assert r.returncode != 0, (
+            f"Orphaned processes still running after abort + grace window:\n"
+            f"{r.stdout.decode()}"
+        )
+
+
+# ---------------------------------------------------------------------------
+# T028: Grace→force timing — SIGTERM-ignoring child is SIGKILLed (FR-008, SC-005)
+# ---------------------------------------------------------------------------
+
+def test_sigterm_ignorer_killed_within_grace(tmp_workspace: Path,
+                                              sigterm_ignorer_script: Path) -> None:
+    """A child that ignores SIGTERM must be SIGKILLed within GRACE_SECONDS."""
+    import time
+    from server.subprocess_runner import stream_subprocess, GRACE_SECONDS
+
+    async def _run_abort_and_wait():
+        gen = stream_subprocess([sys.executable, str(sigterm_ignorer_script)],
+                                cwd=str(tmp_workspace))
+        count = 0
+        async for _ in gen:
+            count += 1
+            if count >= 3:
+                await gen.aclose()
+                break
+        # Keep the event loop alive so call_later(GRACE_SECONDS, SIGKILL) can fire.
+        # (asyncio.run() closes the loop immediately when the coro returns, which
+        # would discard any pending call_later callbacks before they fire.)
+        await asyncio.sleep(GRACE_SECONDS + 1.0)
+
+    t0 = time.monotonic()
+    asyncio.run(_run_abort_and_wait())
+    elapsed = time.monotonic() - t0
+
+    # The run log should show aborted
+    logs = list((tmp_workspace / "logs").glob("*.md"))
+    assert len(logs) == 1
+    body = logs[0].read_text()
+    assert "result: `aborted`" in body
+
+    # And no process should still be running
+    import subprocess as _sp
+    r = _sp.run(["pgrep", "-f", sigterm_ignorer_script.name], capture_output=True)
+    assert r.returncode != 0, (
+        f"SIGTERM-ignoring process still alive {elapsed:.1f}s after abort:\n"
+        f"{r.stdout.decode()}"
+    )
+
+
+# ---------------------------------------------------------------------------
+# T029: Atomicity + lock release (FR-014, FR-010)
+# ---------------------------------------------------------------------------
+
+def test_atomic_write_json_no_truncation(tmp_workspace: Path) -> None:
+    """atomic_write_json never leaves a truncated file at the destination path."""
+    from campaignlib import atomic_write_json
+    import json
+
+    dest = tmp_workspace / "merged.json"
+    data = {"facts": list(range(100)), "text": "x" * 10_000}
+
+    # Normal write succeeds and is valid JSON.
+    atomic_write_json(dest, data)
+    assert dest.exists()
+    loaded = json.loads(dest.read_text())
+    assert loaded == data
+
+
+def test_atomic_write_text_no_truncation(tmp_workspace: Path) -> None:
+    """atomic_write_text never leaves a truncated file at the destination path."""
+    from campaignlib import atomic_write_text
+
+    dest = tmp_workspace / "dossier.md"
+    content = "# Dossier\n" + "x" * 50_000
+
+    atomic_write_text(dest, content)
+    assert dest.exists()
+    assert dest.read_text() == content
+
+
+def test_lock_released_after_abort(tmp_workspace: Path, long_running_script: Path) -> None:
+    """_RUNNING lock must be released after an aborted run (FR-010 resumability)."""
+    from server.routers.ensemble import _RUNNING, _lock_key
+    from server.subprocess_runner import stream_subprocess
+    import os
+
+    stage = "test_abort_lock"
+    key = f"{tmp_workspace.resolve()}::{stage}"
+
+    released: list[bool] = []
+
+    def _on_complete(_rc):
+        released.append(key not in _RUNNING)  # True if already released
+        _RUNNING.discard(key)
+
+    _RUNNING.add(key)
+
+    async def _run_and_abort():
+        gen = stream_subprocess(
+            [sys.executable, str(long_running_script)],
+            cwd=str(tmp_workspace),
+            on_complete=_on_complete,
+        )
+        count = 0
+        async for _ in gen:
+            count += 1
+            if count >= 2:
+                await gen.aclose()
+                break
+
+    asyncio.run(_run_and_abort())
+    assert len(released) == 1, "on_complete must fire exactly once"
+    assert key not in _RUNNING, "_RUNNING key must be released after abort"
+
+
+# ---------------------------------------------------------------------------
+# T031: Non-ensemble SSE route regression (I2 shared-seam blast radius)
+# ---------------------------------------------------------------------------
+
+def test_non_ensemble_route_killed_on_abort(tmp_workspace: Path,
+                                             long_running_script: Path) -> None:
+    """A grounding-style stream_subprocess call is also group-killed on abort.
+
+    Proves the shared-seam change (start_new_session + finally kill) is safe
+    app-wide, not just for ensemble routes (I2 regression).
+    """
+    import time
+    from server.subprocess_runner import stream_subprocess, GRACE_SECONDS
+
+    # Simulate a non-ensemble route (grounding.py pattern): just call
+    # stream_subprocess directly with no ensemble-specific env.
+    async def _grounding_style_run():
+        gen = stream_subprocess(
+            [sys.executable, str(long_running_script)],
+            cwd=str(tmp_workspace),
+        )
+        count = 0
+        async for _ in gen:
+            count += 1
+            if count >= 3:
+                await gen.aclose()
+                break
+        await asyncio.sleep(0.5)  # keep loop alive for call_later
+
+    asyncio.run(_grounding_style_run())
+    time.sleep(0.5)  # extra margin for pgrep
+
+    import subprocess as _sp
+    r = _sp.run(["pgrep", "-f", long_running_script.name], capture_output=True)
+    assert r.returncode != 0, (
+        f"Non-ensemble subprocess still orphaned after disconnect:\n"
+        f"{r.stdout.decode()}"
+    )