From e874ce1ebc4f35825714f7a39c07717e1e92dfc2 Mon Sep 17 00:00:00 2001
From: Kostadis
Date: Sat, 27 Jun 2026 15:20:24 -0700
Subject: [PATCH 1/3] Add CampaignGenerator constitution v1.1.0
Sister doctrine to the mneme constitution, governing CG's LLM rendering
pipeline. Nine principles, each naming the anti-pattern it kills:
I. Disk is Truth, the Model is a Draft (Optimistic Lies)
II. The Human Checkpoint is Non-Negotiable (Error Compounding)
III. Retrieval and Render are Separated (Renderer Scope Decisions)
IV. Verbatim is Sacred (Hallucinated Dialogue)
V. One Seam per Boundary (Fragmented Integration)
VI. CLI is the Engine, UI is a Face (Split-Brain)
VII. Extract Once, Synthesize Deliberately (Depth Regression)
VIII. State is Discoverable (Opacity / Tribal State)
IX. The UI Mechanizes; Claude Converses (The Walled Garden)
Plus Architecture is Destiny (token/precision economics), Authority &
the Human Checkpoint (Spec Kit plans are drafts), and Governance
(I & II outrank all; semver amendments).
Co-Authored-By: Claude Opus 4.8 (1M context)
---
.specify/memory/constitution.md | 122 ++++++++++++++++++++++++++++++++
1 file changed, 122 insertions(+)
create mode 100644 .specify/memory/constitution.md
diff --git a/.specify/memory/constitution.md b/.specify/memory/constitution.md
new file mode 100644
index 0000000..fd92d99
--- /dev/null
+++ b/.specify/memory/constitution.md
@@ -0,0 +1,122 @@
+# CampaignGenerator Constitution
+
+This is a **sister doctrine** to the [mneme constitution](https://github.com/kostadis/mneme/blob/main/.specify/memory/constitution.md). Both descend from the Kostadis architectural doctrine; both name the anti-pattern each principle kills. The division of labor between them:
+
+- **mneme** governs the *platform's state* — identity, databases, reconciliation, the DGX integration plane. Its enemy is corrupted or fragmented infrastructure state.
+- **This constitution** governs *CampaignGenerator's pipeline* — how an LLM is used to render trustworthy campaign artifacts. Its enemy is the precision failure that breaks the fourth wall at the table.
+
+Where the two overlap (Optimistic Lies, Split-Brain, Fragmented State), the anti-pattern names are shared deliberately. CampaignGenerator is one actor in a larger flow that also includes Zoom, gm-assist, MemPalace, the Anthropic API, and a set of Claude skills that live outside this repo. This constitution binds general principles to CG via concrete clauses; a principle without a clause that names a file, a test, or a workspace path is aspiration, not law.
+
+## Core Principles
+
+### I. Disk is Truth, the Model is a Draft
+
+Markdown and YAML files on disk are the single source of truth. Every database in the system — the MemPalace palace, the vector DB behind it, `quote_ledger.db`, the rpg-library index — is an *index over* or *cache of* that truth, never the truth itself. A database may be deleted and rebuilt from disk; disk may never be rebuilt from a database.
+
+LLM output is a **draft** until a human has reviewed it. Generated text is not fact, not canon, and not input to the next step until a human has read it and let it through. The rough extraction pass is the ceiling of what the model can do unaided, not the floor.
+
+*Kills: Optimistic Lies* — treating a confident-looking generated artifact as established fact.
+
+### II. The Human Checkpoint is Non-Negotiable
+
+LLMs render; humans decide. Scope (what belongs where), ordering (what came before what), and attribution (who said or did what) are **precision decisions** and they require a human checkpoint. No LLM output may feed another LLM call across a precision boundary without a human gate in between.
+
+Before any LLM call is added, state what decision it removes from the human. If the answer is "a precision decision, fed automatically downstream," a human checkpoint is mandatory before the next call. If the answer is "none — the human reviews and corrects before it feeds anything," the call is safe.
+
+*Kills: Error Compounding* — one call's silent 10% error inherited and amplified by the next.
+
+### III. Retrieval and Render are Separated
+
+A function retrieves or it renders — never both. This is enforced by `tests/test_retrieve_render_isolation.py`, which fails the build if any function body mixes a retrieval call (`retrieve`, `search_hierarchical`, `rpg_search`) with a render call (`stream_api`, `call_api`). Do not bypass the test; fix the structure.
+
+Render pipelines (`prep.py`, `sd_narrate.py`, `planning.py`) refuse to run unless a human has approved `docs/dossier_proposal.md`. The choke point is `proposal_loader.py:require_approved_proposal`. Deciding *what content is in scope* is the human's; turning approved scope into prose is the model's.
+
+*Kills: the Renderer Making Scope Decisions* — letting the prose pass also decide what's in the world.
+
+### IV. Verbatim is Sacred
+
+Quotes and transcript records are reproduced exactly, never paraphrased and never invented. The Zoom VTT is the only record of "what was said" at the table; gm-assist is the authoritative record of "what happened in what order." Neither may be embellished by a model that can see past its boundary.
+
+The cost of violating this is not a bad diff — it is a player at the table asking why an NPC said something it never said, or why an action that should have rippled through the world quietly disappeared. A precision failure here breaks the fourth wall. That is the most expensive failure the system can produce.
+
+*Kills: Hallucinated Dialogue* — fabricated or "improved" verbatim content.
+
+### V. One Seam per Boundary
+
+Every external dependency is reached through exactly one file, and that file is reached one direction:
+
+- Anthropic API → `campaignlib.py` (the only module that imports `anthropic`; `make_client` / `stream_api` / `call_api` are the surface, and they already retry)
+- MemPalace → `mempalace_client.py`
+- DGX / local LLM per-model behavior → `dgxlib`
+- CampaignGenerator capability exposed *outward* to other Claude sessions → `mcp_server.py`
+
+When you need to change how CG talks to X, there must be exactly one file to open. New integration code that scatters `import anthropic` or talks to MemPalace outside its client is a constitutional violation, not a style nit.
+
+*Kills: Fragmented Integration* — the same boundary crossed from five places that drift apart.
+
+### VI. CLI is the Engine, UI is a Face
+
+Every capability is a CLI tool first. The FastAPI server never reimplements pipeline logic — it shells out to CLI scripts via `server/subprocess_runner.py` and streams their output as Server-Sent Events. Fixing a bug in a script fixes it in the UI; exposing a flag means adding it to the corresponding `_build_*_cmd()` in the router, never reimplementing the behavior in the router.
+
+*Kills: Split-Brain* — CLI and UI growing two divergent implementations of the same operation.
+
+### VII. Extract Once, Synthesize Deliberately
+
+The grounding-doc generators follow one shape: chunk the input, extract per chunk, cache the extractions on disk, then synthesize one document from the pile (`run_extract_pipeline` + `run_synthesize_pipeline` in `campaignlib.py`). Re-runs reuse cached extractions.
+
+Do not collapse passes that each need depth. The killed chapter-extract consolidation is the cautionary tale: merging three extract passes into one per-chapter pass regressed all three grounding docs, because breadth in one pass came at the cost of depth in each. Prefer more, narrower passes over one wide pass that does each job worse.
+
+*Kills: Depth Regression* — premature consolidation that trades per-job depth for fewer calls.
+
+### VIII. State is Discoverable
+
+The campaign workspace is self-describing. Which pipeline stage a session is in, what artifacts exist, what is still pending — all of it is discoverable from disk (the `summaries/{session}/` layout, the presence or absence of each stage's output file), not held in the operator's memory or in a skill's head. A question the system surfaces ("this scene has no approved quotes yet") matters as much as an answer it gives.
+
+When the flow falls back to a skill or a manual step, that seam should be *visible* — an artifact on disk or a state the UI can represent — not tribal knowledge about which command to run next.
+
+*Kills: Opacity / Tribal State* — the system's real status living only in the operator's head.
+
+### IX. The UI Mechanizes; Claude Converses
+
+UI workflows exist to make the *mechanical* parts of a pipeline easier — to walk a multi-step process one step at a time, run each step, and show what came out. They do **not** replace the Claude chat interface, and they are not the place where the thinking happens. The judgment between steps — reviewing a draft, deciding scope, correcting an attribution, choosing what to promote — happens in a Claude conversation or at the CLI. The UI's job is to remove the friction of *remembering and invoking* the steps in order, never to absorb the work that happens between them.
+
+The expectation is explicit: between any two UI steps, the operator may drop to the CLI or to a Claude chat to do the real work, and lose nothing by doing so. A UI step that cannot be performed equivalently at the CLI is a step that has stolen judgment from the human.
+
+Files are the interchange. Every step reads files and writes files; the file on disk is how information passes between the UI, the CLI, and the chat, and how all three stay consistent. The UI must never hold pipeline state that exists only in the browser — if a step produced something, it produced a file, and that file is equally visible to the CLI and to Claude. (This is Principles I, VI, and VIII applied to the UI surface: the file is the truth, the CLI is the engine, and the state is discoverable — so the human is never trapped inside the UI.)
+
+The ensemble grounding-doc workflow is the canonical shape: the UI may step you Stage 1 → 2 → 3, but the `--list` scope review, the `aliases.json` edit, and the `diff`-before-promote happen at the CLI or in chat, and every stage hands off through a file (`merged.json`, `state_dossiers/*.md`, `*_draft.md`). The UI mechanizes the sequence; it does not synthesize the campaign.
+
+*Kills: The Walled Garden* — a UI that swallows the whole workflow, hides the files, and locks the human out of the conversation and the CLI.
+
+## Architecture is Destiny
+
+Bad architectural choices are liabilities, and in this system the currency is twofold: **token spend** and **precision failures at the table**.
+
+- **Token spend** is standing cost. Every LLM call must justify itself; the ensemble/Spark path exists precisely so that *extraction* can be made ~free locally and the API is spent only on *synthesis*. Caching (the scene-extract system-prefix cache, the enhance-summary cached prefix, the Batch API at 50% off) is not an optimization to add later — it is how the architecture stays affordable.
+- **Precision failures** are the catastrophic cost. A token wasted is recoverable; a fabricated quote that reaches the table is not. This is why Principles I–IV exist and why they outrank convenience. The human checkpoint is not friction the architecture should engineer away — it is the load-bearing wall.
+
+Every new database, daemon, cache, or LLM call is a recurring tax. Justify the tax against the truth on disk and the human gate, or do not add it.
+
+## Authority & the Human Checkpoint
+
+Humans author structure, identity, and schema. The LLM — including Spec Kit itself — renders within that boundary; it never decides it.
+
+- Spec Kit `/speckit-*` plans, specs, and tasks are **drafts**. They are reviewed against this constitution before they feed implementation.
+- A generated spec that decides scope, ordering, or attribution autonomously is exactly the precision-decision-without-a-checkpoint that Principle II forbids — catch it at review.
+- Good pattern: LLM extracts → human reviews and imposes structure → LLM renders inside that structure. Bad pattern: LLM extracts → LLM structures → LLM renders. The second compounds errors silently and is prohibited here.
+
+## Governance
+
+This constitution supersedes conflicting specs, plans, and tasks. A conflict requires written justification or an amendment — not a silent override.
+
+- **Principle precedence:** I (Disk is Truth) and II (The Human Checkpoint) outrank all other principles. When a convenience, a performance gain, or a cleaner abstraction collides with truth-on-disk or the human gate, truth and the gate win.
+- Every spec and plan is tested, by name, against all nine principles before implementation begins.
+- Amendments require a stated rationale, a version bump, and a check that dependent templates and docs stay in sync.
+- Semantic versioning of this document:
+ - **MAJOR** — a principle removed or redefined in a backward-incompatible way.
+ - **MINOR** — a new principle or materially expanded section.
+ - **PATCH** — clarifications, wording, non-semantic refinements.
+
+Runtime development guidance lives in `CLAUDE.md` (this repo) and `~/.claude/CLAUDE.md` (global). Where those and this constitution agree, this is the canonical statement; where they drift, amend one to match the other.
+
+**Version**: 1.1.0 | **Ratified**: 2026-06-27 | **Last Amended**: 2026-06-27
From 460ebfe8c4fefb20c8e5d9eec9ce12b583da99e2 Mon Sep 17 00:00:00 2001
From: Kostadis
Date: Sat, 27 Jun 2026 23:24:03 -0700
Subject: [PATCH 2/3] feat(ensemble): UI workflow + OpenRouter backend +
explicit chapter selection
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Turn the ensemble grounding-doc CLI workflow (docs/cli/ensemble_workflow.md)
into a stepped UI page (Setup → Extract → Bundle → Synthesize), add OpenRouter
as a per-stage LLM backend through the single campaignlib seam, and make chapter
selection explicit. The existing Anthropic /grounding path is untouched.
Built via the Spec Kit flow (specs/001-ensemble-workflow-ui: spec/plan/research/
tasks/contracts/quickstart).
Backend / seam (Constitution V):
- campaignlib/api: _OpenRouterClient branch in make_client(backend="openrouter");
reasoning-off mapping; _require_nonempty guards stream_api/call_api against
empty model output. OpenRouter is constructed only inside campaignlib/api.
- Uniform add_backend_args / client_from_args; --backend/--endpoint added to the
four synthesis scripts. Default backend=anthropic is byte-identical to before.
UI (Constitution VI/IX):
- New /ensemble route + nav entry; EnsembleWorkflow shell with disk-derived
status; Setup, Extract, Bundle (scope + alias gates), Synthesize (diff-before-
promote). server/routers/ensemble.py shells out to the CLI and exposes
disk-derived status; it issues no retrieval/render calls.
Chapter picker + Constitution Principle X (operator-elevated):
- "Selection is Explicit; There is No Silent 'All'" added to the constitution
(v1.1.0 -> 1.2.0, MINOR). New ChapterPicker.vue (Resolve glob, Select all /
none / only, natural sort, extracted/pending badges).
- ensemble_batch.py --chapters is now nargs="+" (unions globs/paths); the engine
gains the capability, the UI mechanizes it.
- chapters_selected stores the literal chosen set; an empty selection is refused
(no glob fallback) by GET /run/extract and the disabled Run button.
Tests: +new suites (openrouter seam, ensemble status/gates/chapters, batch nargs).
Full suite 940 passed; isolation guard green; frontend builds clean.
Co-Authored-By: Claude Opus 4.8 (1M context)
---
.specify/extensions.yml | 23 +
.specify/extensions/.registry | 19 +
.specify/extensions/agent-context/README.md | 66 ++
.../agent-context/agent-context-config.yml | 5 +
.../commands/speckit.agent-context.update.md | 27 +
.../extensions/agent-context/extension.yml | 34 +
.../scripts/bash/update-agent-context.sh | 337 ++++++++++
.../powershell/update-agent-context.ps1 | 417 ++++++++++++
.specify/feature.json | 3 +
.specify/init-options.json | 9 +
.specify/integration.json | 15 +
.specify/integrations/claude.manifest.json | 17 +
.specify/integrations/speckit.manifest.json | 17 +
.specify/memory/constitution.md | 16 +-
.specify/scripts/bash/check-prerequisites.sh | 189 ++++++
.specify/scripts/bash/common.sh | 619 ++++++++++++++++++
.specify/scripts/bash/create-new-feature.sh | 299 +++++++++
.specify/scripts/bash/setup-plan.sh | 84 +++
.specify/scripts/bash/setup-tasks.sh | 91 +++
.specify/templates/checklist-template.md | 40 ++
.specify/templates/constitution-template.md | 50 ++
.specify/templates/plan-template.md | 113 ++++
.specify/templates/spec-template.md | 131 ++++
.specify/templates/tasks-template.md | 252 +++++++
.specify/workflows/speckit/workflow.yml | 77 +++
.specify/workflows/workflow-registry.json | 13 +
CLAUDE.md | 15 +
campaign_state.py | 7 +-
campaignlib/__init__.py | 7 +-
campaignlib/api/backends.py | 72 ++
campaignlib/api/client.py | 69 +-
docs/cli/ensemble_workflow.md | 8 +
ensemble_batch.py | 15 +-
frontend/src/components/layout/AppSidebar.vue | 6 +
frontend/src/router.ts | 27 +
frontend/src/views/EnsembleWorkflow.vue | 75 +++
frontend/src/views/ensemble/ChapterPicker.vue | 154 +++++
.../src/views/ensemble/EnsembleBundle.vue | 156 +++++
.../src/views/ensemble/EnsembleExtract.vue | 82 +++
frontend/src/views/ensemble/EnsembleSetup.vue | 112 ++++
.../src/views/ensemble/EnsembleSynthesize.vue | 99 +++
frontend/src/views/ensemble/useEnsembleRun.ts | 85 +++
party.py | 7 +-
planning.py | 7 +-
server/config_models.py | 39 ++
server/main.py | 3 +-
server/routers/ensemble.py | 432 ++++++++++++
.../checklists/requirements.md | 44 ++
.../001-ensemble-workflow-ui/contracts/api.md | 83 +++
.../001-ensemble-workflow-ui/contracts/cli.md | 69 ++
specs/001-ensemble-workflow-ui/data-model.md | 125 ++++
specs/001-ensemble-workflow-ui/plan.md | 126 ++++
specs/001-ensemble-workflow-ui/quickstart.md | 104 +++
specs/001-ensemble-workflow-ui/research.md | 105 +++
specs/001-ensemble-workflow-ui/spec.md | 161 +++++
specs/001-ensemble-workflow-ui/tasks.md | 286 ++++++++
synthesise_world_state.py | 10 +-
tests/test_ensemble_batch_chapters.py | 16 +
tests/test_ensemble_chapters.py | 81 +++
tests/test_ensemble_gates.py | 62 ++
tests/test_ensemble_status.py | 34 +
tests/test_openrouter_seam.py | 129 ++++
62 files changed, 5851 insertions(+), 24 deletions(-)
create mode 100644 .specify/extensions.yml
create mode 100644 .specify/extensions/.registry
create mode 100644 .specify/extensions/agent-context/README.md
create mode 100644 .specify/extensions/agent-context/agent-context-config.yml
create mode 100644 .specify/extensions/agent-context/commands/speckit.agent-context.update.md
create mode 100644 .specify/extensions/agent-context/extension.yml
create mode 100755 .specify/extensions/agent-context/scripts/bash/update-agent-context.sh
create mode 100644 .specify/extensions/agent-context/scripts/powershell/update-agent-context.ps1
create mode 100644 .specify/feature.json
create mode 100644 .specify/init-options.json
create mode 100644 .specify/integration.json
create mode 100644 .specify/integrations/claude.manifest.json
create mode 100644 .specify/integrations/speckit.manifest.json
create mode 100755 .specify/scripts/bash/check-prerequisites.sh
create mode 100755 .specify/scripts/bash/common.sh
create mode 100755 .specify/scripts/bash/create-new-feature.sh
create mode 100755 .specify/scripts/bash/setup-plan.sh
create mode 100755 .specify/scripts/bash/setup-tasks.sh
create mode 100644 .specify/templates/checklist-template.md
create mode 100644 .specify/templates/constitution-template.md
create mode 100644 .specify/templates/plan-template.md
create mode 100644 .specify/templates/spec-template.md
create mode 100644 .specify/templates/tasks-template.md
create mode 100644 .specify/workflows/speckit/workflow.yml
create mode 100644 .specify/workflows/workflow-registry.json
create mode 100644 frontend/src/views/EnsembleWorkflow.vue
create mode 100644 frontend/src/views/ensemble/ChapterPicker.vue
create mode 100644 frontend/src/views/ensemble/EnsembleBundle.vue
create mode 100644 frontend/src/views/ensemble/EnsembleExtract.vue
create mode 100644 frontend/src/views/ensemble/EnsembleSetup.vue
create mode 100644 frontend/src/views/ensemble/EnsembleSynthesize.vue
create mode 100644 frontend/src/views/ensemble/useEnsembleRun.ts
create mode 100644 server/routers/ensemble.py
create mode 100644 specs/001-ensemble-workflow-ui/checklists/requirements.md
create mode 100644 specs/001-ensemble-workflow-ui/contracts/api.md
create mode 100644 specs/001-ensemble-workflow-ui/contracts/cli.md
create mode 100644 specs/001-ensemble-workflow-ui/data-model.md
create mode 100644 specs/001-ensemble-workflow-ui/plan.md
create mode 100644 specs/001-ensemble-workflow-ui/quickstart.md
create mode 100644 specs/001-ensemble-workflow-ui/research.md
create mode 100644 specs/001-ensemble-workflow-ui/spec.md
create mode 100644 specs/001-ensemble-workflow-ui/tasks.md
create mode 100644 tests/test_ensemble_batch_chapters.py
create mode 100644 tests/test_ensemble_chapters.py
create mode 100644 tests/test_ensemble_gates.py
create mode 100644 tests/test_ensemble_status.py
create mode 100644 tests/test_openrouter_seam.py
diff --git a/.specify/extensions.yml b/.specify/extensions.yml
new file mode 100644
index 0000000..5415714
--- /dev/null
+++ b/.specify/extensions.yml
@@ -0,0 +1,23 @@
+installed:
+- agent-context
+settings:
+ auto_execute_hooks: true
+hooks:
+ after_specify:
+ - extension: agent-context
+ command: speckit.agent-context.update
+ enabled: true
+ optional: true
+ priority: 10
+ prompt: Execute speckit.agent-context.update?
+ description: Refresh agent context after specification
+ condition: null
+ after_plan:
+ - extension: agent-context
+ command: speckit.agent-context.update
+ enabled: true
+ optional: true
+ priority: 10
+ prompt: Execute speckit.agent-context.update?
+ description: Refresh agent context after planning
+ condition: null
diff --git a/.specify/extensions/.registry b/.specify/extensions/.registry
new file mode 100644
index 0000000..db05440
--- /dev/null
+++ b/.specify/extensions/.registry
@@ -0,0 +1,19 @@
+{
+ "schema_version": "1.0",
+ "extensions": {
+ "agent-context": {
+ "version": "1.0.0",
+ "source": "local",
+ "manifest_hash": "sha256:9a1dc02d2d0139bb03860392ecacef79183be2c442feda2f9ccaa4e5907b1e47",
+ "enabled": true,
+ "priority": 10,
+ "registered_commands": {
+ "claude": [
+ "speckit.agent-context.update"
+ ]
+ },
+ "registered_skills": [],
+ "installed_at": "2026-06-27T21:48:08.109321+00:00"
+ }
+ }
+}
\ No newline at end of file
diff --git a/.specify/extensions/agent-context/README.md b/.specify/extensions/agent-context/README.md
new file mode 100644
index 0000000..091e2b4
--- /dev/null
+++ b/.specify/extensions/agent-context/README.md
@@ -0,0 +1,66 @@
+# Coding Agent Context Extension
+
+This bundled extension manages the **coding agent context/instruction file** (e.g. `CLAUDE.md`, `.github/copilot-instructions.md`, `AGENTS.md`, `GEMINI.md`, …) for the active integration.
+
+It owns the lifecycle of the managed section delimited by the configurable start/end markers (defaults: `` / ``).
+
+## Why an extension?
+
+Not every Spec Kit user wants Spec Kit to write into the coding agent's context file. Extracting this behavior into a dedicated extension lets users:
+
+- **Opt out** entirely with `specify extension disable agent-context` — Spec Kit will then never create or modify the agent context file.
+- **Customize the markers** by editing `.specify/extensions/agent-context/agent-context-config.yml` — both the Python layer and the bundled scripts honor the same `context_markers` value.
+- **Synchronize multiple agent anchors** by setting `context_files` when a project intentionally uses more than one coding agent context file, such as `AGENTS.md` and `CLAUDE.md`.
+- **Refresh on demand** with `/speckit.agent-context.update`, or automatically through the hooks declared in `extension.yml` (`after_specify`, `after_plan`).
+
+## Commands
+
+| Command | Description |
+|---------|-------------|
+| `speckit.agent-context.update` | Refresh the managed section in the agent context file with the current plan path. |
+
+## Configuration
+
+All configuration flows through the extension's own config file at
+`.specify/extensions/agent-context/agent-context-config.yml`:
+
+```yaml
+# Path to the coding agent context file managed by this extension
+context_file: CLAUDE.md
+
+# Optional list of coding agent context files to manage together.
+# When non-empty, this takes precedence over context_file.
+context_files:
+ - AGENTS.md
+ - CLAUDE.md
+
+# Delimiters for the managed Spec Kit section
+context_markers:
+ start: ""
+ end: ""
+```
+
+- `context_file` — the project-relative path to the coding agent context file, written by `specify init` and `specify integration install`.
+- `context_files` — optional project-relative paths to multiple coding agent context files. When non-empty, the list takes precedence over `context_file`. Absolute paths, backslash separators, and `..` path segments are rejected.
+- `context_markers.start` / `.end` — the delimiters around the managed section. Edit these to use custom markers.
+
+## Requirements
+
+The bundled update scripts require **Python 3** with **PyYAML** for YAML/upsert processing (PowerShell can also use `ConvertFrom-Yaml` when available).
+
+PyYAML ships with the `specify` CLI and is normally available via the same `python3` interpreter. If a hook reports *"PyYAML is required … not available in the current Python environment"*, it means the system `python3` differs from the one used to install Spec Kit. To resolve, run:
+
+```bash
+pip install pyyaml
+# or target the specific interpreter Spec Kit uses:
+/path/to/speckit-python -m pip install pyyaml
+```
+
+## Disable
+
+```bash
+specify extension disable agent-context
+```
+
+When disabled, Spec Kit skips context file creation, updates, and removal (the gates are inside `upsert_context_section()` and `remove_context_section()`).
+Disabled projects also ignore stale `context_files` values during command rendering so disabling the extension remains a complete opt-out.
diff --git a/.specify/extensions/agent-context/agent-context-config.yml b/.specify/extensions/agent-context/agent-context-config.yml
new file mode 100644
index 0000000..d55ff7c
--- /dev/null
+++ b/.specify/extensions/agent-context/agent-context-config.yml
@@ -0,0 +1,5 @@
+context_file: CLAUDE.md
+context_files: []
+context_markers:
+ start:
+ end:
diff --git a/.specify/extensions/agent-context/commands/speckit.agent-context.update.md b/.specify/extensions/agent-context/commands/speckit.agent-context.update.md
new file mode 100644
index 0000000..a654eb5
--- /dev/null
+++ b/.specify/extensions/agent-context/commands/speckit.agent-context.update.md
@@ -0,0 +1,27 @@
+---
+description: "Refresh the managed Spec Kit section in coding agent context file(s)"
+---
+
+# Update Coding Agent Context
+
+Refresh the managed Spec Kit section inside the active coding agent's context/instruction file (e.g. `CLAUDE.md`, `.github/copilot-instructions.md`, `AGENTS.md`).
+
+## Behavior
+
+The script reads the agent-context extension config at
+`.specify/extensions/agent-context/agent-context-config.yml` to discover:
+
+- `context_file` — the path of the coding agent context file to manage.
+- `context_files` — optional project-relative paths for multiple coding agent context files. When non-empty, the script updates each listed file and the list takes precedence over `context_file`.
+- `context_markers.start` / `.end` — the delimiters surrounding the managed section. Defaults to `` and `` when the field is missing.
+
+It then creates, replaces, or appends the managed block so that the section points at the most recent plan path when one can be discovered (`specs//plan.md`).
+
+If `context_files` and `context_file` are empty, the command reports nothing to do and exits successfully. Context file paths must stay project-relative; absolute paths, Windows drive paths, backslash separators, and `..` path segments are rejected.
+
+## Execution
+
+- **Bash**: `.specify/extensions/agent-context/scripts/bash/update-agent-context.sh [plan_path]`
+- **PowerShell**: `.specify/extensions/agent-context/scripts/powershell/update-agent-context.ps1 [plan_path]`
+
+When `plan_path` is omitted, the script auto-detects the most recently modified `specs/*/plan.md`.
diff --git a/.specify/extensions/agent-context/extension.yml b/.specify/extensions/agent-context/extension.yml
new file mode 100644
index 0000000..191069e
--- /dev/null
+++ b/.specify/extensions/agent-context/extension.yml
@@ -0,0 +1,34 @@
+schema_version: "1.0"
+
+extension:
+ id: agent-context
+ name: "Coding Agent Context"
+ version: "1.0.0"
+ description: "Manages coding agent context/instruction files (e.g., CLAUDE.md, copilot-instructions.md) with project-specific plan references and configurable markers"
+ author: spec-kit-core
+ repository: https://github.com/github/spec-kit
+ license: MIT
+
+requires:
+ speckit_version: ">=0.2.0"
+
+provides:
+ commands:
+ - name: speckit.agent-context.update
+ file: commands/speckit.agent-context.update.md
+ description: "Refresh the managed Spec Kit section in the coding agent context file"
+
+hooks:
+ after_specify:
+ command: speckit.agent-context.update
+ optional: true
+ description: "Refresh agent context after specification"
+ after_plan:
+ command: speckit.agent-context.update
+ optional: true
+ description: "Refresh agent context after planning"
+
+tags:
+ - "agent"
+ - "context"
+ - "core"
diff --git a/.specify/extensions/agent-context/scripts/bash/update-agent-context.sh b/.specify/extensions/agent-context/scripts/bash/update-agent-context.sh
new file mode 100755
index 0000000..64e1bae
--- /dev/null
+++ b/.specify/extensions/agent-context/scripts/bash/update-agent-context.sh
@@ -0,0 +1,337 @@
+#!/usr/bin/env bash
+# update-agent-context.sh
+#
+# Refresh the managed Spec Kit section in the coding agent's context file(s)
+# (e.g. CLAUDE.md, .github/copilot-instructions.md, AGENTS.md).
+#
+# Reads `context_files` or `context_file`, plus `context_markers.{start,end}`, from the
+# agent-context extension config:
+# .specify/extensions/agent-context/agent-context-config.yml
+#
+# Usage: update-agent-context.sh [plan_path]
+#
+# When `plan_path` is omitted, the script derives it from `.specify/feature.json`
+# (written by /speckit-specify). Falls back to the most recently modified
+# `specs/*/plan.md` only when feature.json is absent or its plan does not exist yet.
+
+set -euo pipefail
+
+PROJECT_ROOT="$(pwd)"
+EXT_CONFIG="$PROJECT_ROOT/.specify/extensions/agent-context/agent-context-config.yml"
+DEFAULT_START=""
+DEFAULT_END=""
+
+if [[ ! -f "$EXT_CONFIG" ]]; then
+ echo "agent-context: $EXT_CONFIG not found; nothing to do." >&2
+ exit 0
+fi
+
+# Locate a Python 3 interpreter with PyYAML available.
+_python=""
+_python_candidates=()
+[[ -n "${SPECKIT_PYTHON:-}" ]] && _python_candidates+=("$SPECKIT_PYTHON")
+_python_candidates+=("python3" "python")
+for _candidate in "${_python_candidates[@]}"; do
+ if command -v "$_candidate" >/dev/null 2>&1 \
+ && "$_candidate" - <<'PY' >/dev/null 2>&1
+import sys
+try:
+ import yaml # noqa: F401
+except ImportError:
+ sys.exit(1)
+sys.exit(0 if sys.version_info[0] == 3 else 1)
+PY
+ then
+ _python="$_candidate"
+ break
+ fi
+done
+unset _candidate _python_candidates
+
+if [[ -z "$_python" ]]; then
+ echo "agent-context: Python 3 with PyYAML not found on PATH; skipping update." >&2
+ echo " To resolve: pip install pyyaml (or install it into the environment used by python3)." >&2
+ exit 0
+fi
+_case_insensitive_context_files=0
+case "$(uname -s 2>/dev/null || true)" in
+ MINGW*|MSYS*|CYGWIN*) _case_insensitive_context_files=1 ;;
+esac
+
+# Parse extension config once; emit context files as JSON, followed by marker strings.
+if ! _raw_opts="$("$_python" - "$EXT_CONFIG" "$_case_insensitive_context_files" <<'PY'
+import json
+import sys
+try:
+ import yaml
+except ImportError:
+ print(
+ "agent-context: PyYAML is required to parse extension config but is not available "
+ "in the current Python environment.\n"
+ " To resolve: pip install pyyaml (or install it into the environment used by python3).\n"
+ " Context file will not be updated until PyYAML is importable.",
+ file=sys.stderr,
+ )
+ sys.exit(2)
+try:
+ with open(sys.argv[1], "r", encoding="utf-8") as fh:
+ data = yaml.safe_load(fh)
+except Exception as exc:
+ print(
+ f"agent-context: unable to parse {sys.argv[1]} ({exc}); cannot update context.",
+ file=sys.stderr,
+ )
+ sys.exit(2)
+if not isinstance(data, dict):
+ data = {}
+def get_str(obj, *keys):
+ node = obj
+ for k in keys:
+ if isinstance(node, dict) and k in node:
+ node = node[k]
+ else:
+ return ""
+ return node if isinstance(node, str) else ""
+context_files = []
+seen_context_files = set()
+case_insensitive = sys.argv[2] == "1" or sys.platform.startswith(("win32", "cygwin"))
+raw_files = data.get("context_files")
+if isinstance(raw_files, list):
+ for value in raw_files:
+ if not isinstance(value, str):
+ continue
+ candidate = value.strip()
+ if not candidate:
+ continue
+ key = candidate.casefold() if case_insensitive else candidate
+ if key in seen_context_files:
+ continue
+ context_files.append(candidate)
+ seen_context_files.add(key)
+if not context_files:
+ raw_file = get_str(data, "context_file")
+ candidate = raw_file.strip()
+ if candidate:
+ context_files.append(candidate)
+print(json.dumps(context_files))
+print(get_str(data, "context_markers", "start"))
+print(get_str(data, "context_markers", "end"))
+PY
+)"; then
+ echo "agent-context: skipping update (see above for details)." >&2
+ exit 0
+fi
+
+_opts_lines=()
+while IFS= read -r _line || [[ -n "$_line" ]]; do
+ _opts_lines+=("$_line")
+done < <(printf '%s\n' "$_raw_opts")
+if (( ${#_opts_lines[@]} < 3 )); then
+ echo "agent-context: malformed config parser output; expected 3 lines (context_files, marker_start, marker_end), got ${#_opts_lines[@]}; skipping update." >&2
+ exit 0
+fi
+CONTEXT_FILES_JSON="${_opts_lines[0]}"
+MARKER_START="${_opts_lines[1]}"
+MARKER_END="${_opts_lines[2]}"
+
+if ! _context_files_raw="$("$_python" - "$CONTEXT_FILES_JSON" <<'PY'
+import json
+import sys
+try:
+ data = json.loads(sys.argv[1])
+except Exception:
+ data = []
+if not isinstance(data, list):
+ data = []
+for value in data:
+ if isinstance(value, str) and value:
+ print(value)
+PY
+)"; then
+ echo "agent-context: malformed context_files parser output; skipping update." >&2
+ exit 0
+fi
+
+CONTEXT_FILES=()
+while IFS= read -r _line || [[ -n "$_line" ]]; do
+ [[ -n "$_line" ]] && CONTEXT_FILES+=("$_line")
+done < <(printf '%s\n' "$_context_files_raw")
+
+if (( ${#CONTEXT_FILES[@]} == 0 )); then
+ echo "agent-context: context_files/context_file not set in extension config; nothing to do." >&2
+ exit 0
+fi
+
+for CONTEXT_FILE in "${CONTEXT_FILES[@]}"; do
+ # Reject absolute paths, backslash separators, and '..' path segments in context files
+ if [[ "$CONTEXT_FILE" == /* ]] || [[ "$CONTEXT_FILE" =~ ^[A-Za-z]: ]]; then
+ echo "agent-context: context files must be project-relative paths; got '$CONTEXT_FILE'." >&2
+ exit 1
+ fi
+ if [[ "$CONTEXT_FILE" == *\\* ]]; then
+ echo "agent-context: context files must not contain backslash separators; got '$CONTEXT_FILE'." >&2
+ exit 1
+ fi
+ IFS='/' read -ra _cf_parts <<< "$CONTEXT_FILE"
+ for _seg in "${_cf_parts[@]}"; do
+ if [[ "$_seg" == ".." ]]; then
+ echo "agent-context: context files must not contain '..' path segments; got '$CONTEXT_FILE'." >&2
+ exit 1
+ fi
+ done
+ if ! "$_python" - "$PROJECT_ROOT" "$CONTEXT_FILE" <<'PY'
+import sys
+from pathlib import Path
+
+root = Path(sys.argv[1]).resolve()
+target = (root / sys.argv[2]).resolve(strict=False)
+try:
+ target.relative_to(root)
+except ValueError:
+ sys.exit(1)
+PY
+ then
+ echo "agent-context: context file path resolves outside the project root; got '$CONTEXT_FILE'." >&2
+ exit 1
+ fi
+done
+unset _cf_parts _seg
+
+[[ -z "$MARKER_START" ]] && MARKER_START="$DEFAULT_START"
+[[ -z "$MARKER_END" ]] && MARKER_END="$DEFAULT_END"
+
+PLAN_PATH="${1:-}"
+if [[ -z "$PLAN_PATH" ]]; then
+ # Prefer .specify/feature.json (written by /speckit-specify) over mtime heuristic.
+ _feature_json="$PROJECT_ROOT/.specify/feature.json"
+ if [[ -f "$_feature_json" ]]; then
+ _feature_dir="$("$_python" - "$_feature_json" <<'PY'
+import sys, json
+try:
+ with open(sys.argv[1], encoding="utf-8") as fh:
+ d = json.load(fh)
+ val = d.get("feature_directory", "")
+ print(val if isinstance(val, str) else "")
+except Exception:
+ print("")
+PY
+)"
+ # Normalize backslashes (written by PS on Windows) to forward slashes before path ops.
+ _feature_dir="$(printf '%s' "$_feature_dir" | tr '\\' '/')"
+ _feature_dir="${_feature_dir%/}"
+ if [[ -n "$_feature_dir" ]]; then
+ # feature_directory may be relative or absolute (absolute paths outside PROJECT_ROOT
+ # are preserved as-is by _persist_feature_json in common.sh).
+ # Also match drive-qualified paths (C:/...) written by PowerShell on Windows.
+ if [[ "$_feature_dir" == /* ]] || [[ "$_feature_dir" =~ ^[A-Za-z]:/ ]]; then
+ _candidate="$_feature_dir/plan.md"
+ else
+ _candidate="$PROJECT_ROOT/$_feature_dir/plan.md"
+ fi
+ if [[ -f "$_candidate" ]]; then
+ # Resolve symlinks before comparing so paths like /var/… vs /private/var/…
+ # (macOS) are treated as equivalent. Mirrors the mtime-fallback approach.
+ PLAN_PATH="$("$_python" - "$PROJECT_ROOT" "$_candidate" <<'PY'
+import sys
+from pathlib import Path
+root = Path(sys.argv[1]).resolve()
+cand = Path(sys.argv[2]).resolve()
+try:
+ print(cand.relative_to(root).as_posix())
+except ValueError:
+ # Outside project root: emit the resolved path in POSIX form.
+ # as_posix() converts backslashes correctly on native Windows Python.
+ print(cand.as_posix())
+PY
+)"
+ fi
+ fi
+ fi
+
+ # Fall back to mtime only when feature.json is absent or its plan does not exist yet.
+ # Python emits a project-relative POSIX path directly to avoid bash prefix-strip
+ # issues with backslash paths on Windows (Git bash / MSYS2).
+ if [[ -z "$PLAN_PATH" ]]; then
+ _plan_rel="$("$_python" - "$PROJECT_ROOT" <<'PY'
+import sys
+from pathlib import Path
+root = Path(sys.argv[1]).resolve()
+specs = root / "specs"
+plans = sorted(
+ specs.glob("*/plan.md"),
+ key=lambda p: p.stat().st_mtime,
+ reverse=True,
+)
+if plans:
+ try:
+ print(plans[0].relative_to(root).as_posix())
+ except ValueError:
+ print("")
+else:
+ print("")
+PY
+)"
+ if [[ -n "$_plan_rel" ]]; then
+ PLAN_PATH="$_plan_rel"
+ fi
+ fi
+fi
+
+# Build the managed section
+TMP_SECTION="$(mktemp)"
+trap 'rm -f "$TMP_SECTION"' EXIT
+{
+ echo "$MARKER_START"
+ echo "For additional context about technologies to be used, project structure,"
+ echo "shell commands, and other important information, read the current plan"
+ if [[ -n "$PLAN_PATH" ]]; then
+ echo "at $PLAN_PATH"
+ fi
+ echo "$MARKER_END"
+} > "$TMP_SECTION"
+
+for CONTEXT_FILE in "${CONTEXT_FILES[@]}"; do
+ CTX_PATH="$PROJECT_ROOT/$CONTEXT_FILE"
+ mkdir -p "$(dirname "$CTX_PATH")"
+
+ "$_python" - "$CTX_PATH" "$MARKER_START" "$MARKER_END" "$TMP_SECTION" <<'PY'
+import sys, os
+ctx_path, start, end, section_path = sys.argv[1:5]
+with open(section_path, "r", encoding="utf-8") as fh:
+ section = fh.read().rstrip("\n") + "\n"
+
+if os.path.exists(ctx_path):
+ with open(ctx_path, "r", encoding="utf-8-sig") as fh:
+ content = fh.read()
+ s = content.find(start)
+ e = content.find(end, s if s != -1 else 0)
+ if s != -1 and e != -1 and e > s:
+ end_of_marker = e + len(end)
+ if end_of_marker < len(content) and content[end_of_marker] == "\r":
+ end_of_marker += 1
+ if end_of_marker < len(content) and content[end_of_marker] == "\n":
+ end_of_marker += 1
+ new_content = content[:s] + section + content[end_of_marker:]
+ elif s != -1:
+ new_content = content[:s] + section
+ elif e != -1:
+ end_of_marker = e + len(end)
+ if end_of_marker < len(content) and content[end_of_marker] == "\r":
+ end_of_marker += 1
+ if end_of_marker < len(content) and content[end_of_marker] == "\n":
+ end_of_marker += 1
+ new_content = section + content[end_of_marker:]
+ else:
+ if content and not content.endswith("\n"):
+ content += "\n"
+ new_content = (content + "\n" + section) if content else section
+else:
+ new_content = section
+
+new_content = new_content.replace("\r\n", "\n").replace("\r", "\n")
+with open(ctx_path, "wb") as fh:
+ fh.write(new_content.encode("utf-8"))
+PY
+
+ echo "agent-context: updated $CONTEXT_FILE"
+done
diff --git a/.specify/extensions/agent-context/scripts/powershell/update-agent-context.ps1 b/.specify/extensions/agent-context/scripts/powershell/update-agent-context.ps1
new file mode 100644
index 0000000..da9ff44
--- /dev/null
+++ b/.specify/extensions/agent-context/scripts/powershell/update-agent-context.ps1
@@ -0,0 +1,417 @@
+#!/usr/bin/env pwsh
+# update-agent-context.ps1
+#
+# Refresh the managed Spec Kit section in the coding agent's context file(s)
+# (e.g. CLAUDE.md, .github/copilot-instructions.md, AGENTS.md).
+#
+# Reads `context_files` or `context_file`, plus `context_markers.{start,end}`, from the
+# agent-context extension config:
+# .specify/extensions/agent-context/agent-context-config.yml
+#
+# Usage: update-agent-context.ps1 [plan_path]
+#
+# When `plan_path` is omitted, the script derives it from `.specify/feature.json`
+# (written by /speckit-specify). Falls back to the most recently modified
+# `specs/*/plan.md` only when feature.json is absent or its plan does not exist yet.
+
+[CmdletBinding()]
+param(
+ [Parameter(Position = 0)]
+ [string]$PlanPath
+)
+
+function Get-ConfigValue {
+ param(
+ [AllowNull()][object]$Object,
+ [Parameter(Mandatory = $true)][string]$Key
+ )
+
+ if ($null -eq $Object) {
+ return $null
+ }
+ if ($Object -is [System.Collections.IDictionary]) {
+ return $Object[$Key]
+ }
+ $prop = $Object.PSObject.Properties[$Key]
+ if ($prop) {
+ return $prop.Value
+ }
+ return $null
+}
+
+function Test-ConfigObject {
+ param(
+ [AllowNull()][object]$Object
+ )
+
+ if ($null -eq $Object) {
+ return $false
+ }
+ if ($Object -is [System.Collections.IDictionary]) {
+ return $true
+ }
+ if ($Object -is [System.Management.Automation.PSCustomObject]) {
+ return $true
+ }
+ return $false
+}
+
+function Resolve-ContextPath {
+ param(
+ [Parameter(Mandatory = $true)][string]$Root,
+ [Parameter(Mandatory = $true)][string]$RelativePath
+ )
+
+ $rootFull = [System.IO.Path]::GetFullPath($Root)
+ $segments = $RelativePath -split '/'
+ $resolved = $rootFull
+
+ foreach ($segment in $segments) {
+ if ([string]::IsNullOrWhiteSpace($segment) -or $segment -eq '.') {
+ continue
+ }
+
+ $candidate = [System.IO.Path]::GetFullPath((Join-Path $resolved $segment))
+ if (Test-Path -LiteralPath $candidate) {
+ $item = Get-Item -LiteralPath $candidate -Force
+ if ($item.Attributes -band [System.IO.FileAttributes]::ReparsePoint) {
+ $target = $item.Target
+ if ($target -is [System.Array]) {
+ $target = $target[0]
+ }
+ if ($target) {
+ if ([System.IO.Path]::IsPathRooted($target)) {
+ $candidate = [System.IO.Path]::GetFullPath($target)
+ } else {
+ $candidate = [System.IO.Path]::GetFullPath(
+ (Join-Path (Split-Path -Parent $candidate) $target)
+ )
+ }
+ }
+ }
+ }
+ $resolved = $candidate
+ }
+
+ return $resolved
+}
+
+function Test-IsSubPath {
+ param(
+ [Parameter(Mandatory = $true)][string]$Root,
+ [Parameter(Mandatory = $true)][string]$Path
+ )
+
+ $comparison = if ([System.Environment]::OSVersion.Platform -eq [System.PlatformID]::Win32NT) {
+ [System.StringComparison]::OrdinalIgnoreCase
+ } else {
+ [System.StringComparison]::Ordinal
+ }
+ $rootFull = [System.IO.Path]::GetFullPath($Root).TrimEnd(
+ [System.IO.Path]::DirectorySeparatorChar,
+ [System.IO.Path]::AltDirectorySeparatorChar
+ )
+ $pathFull = [System.IO.Path]::GetFullPath($Path)
+ return $pathFull.Equals($rootFull, $comparison) -or
+ $pathFull.StartsWith($rootFull + [System.IO.Path]::DirectorySeparatorChar, $comparison)
+}
+
+$ErrorActionPreference = 'Stop'
+$DefaultStart = ''
+$DefaultEnd = ''
+$ProjectRoot = (Get-Location).Path
+$ExtConfig = Join-Path $ProjectRoot '.specify/extensions/agent-context/agent-context-config.yml'
+
+if (-not (Test-Path -LiteralPath $ExtConfig)) {
+ Write-Warning "agent-context: $ExtConfig not found; nothing to do."
+ exit 0
+}
+
+$Options = $null
+if (Get-Command ConvertFrom-Yaml -ErrorAction SilentlyContinue) {
+ try {
+ $Options = Get-Content -LiteralPath $ExtConfig -Raw -Encoding UTF8 | ConvertFrom-Yaml -ErrorAction Stop
+ } catch {
+ # fall through to ConvertFrom-Json fallback
+ }
+}
+
+if ($null -eq $Options) {
+ # ConvertFrom-Yaml unavailable or failed; try ConvertFrom-Json (no external deps,
+ # works when the config file is valid JSON, which is a subset of YAML).
+ try {
+ $raw = Get-Content -LiteralPath $ExtConfig -Raw -Encoding UTF8
+ $Options = $raw | ConvertFrom-Json -ErrorAction Stop
+ if (-not (Test-ConfigObject -Object $Options)) { $Options = $null }
+ } catch {
+ $Options = $null
+ }
+}
+
+if ($null -eq $Options) {
+ # ConvertFrom-Yaml/Json unavailable or failed; fall back to Python+PyYAML.
+ $pythonCmd = $null
+ $pythonCandidates = @()
+ if ($env:SPECKIT_PYTHON) {
+ $pythonCandidates += $env:SPECKIT_PYTHON
+ }
+ $pythonCandidates += @('python3', 'python')
+ foreach ($candidate in $pythonCandidates) {
+ if (Get-Command $candidate -ErrorAction SilentlyContinue) {
+ # Verify it is Python 3 with PyYAML available.
+ $null = & $candidate -c "import sys; import yaml; sys.exit(0 if sys.version_info[0] == 3 else 1)" 2>$null
+ if ($LASTEXITCODE -eq 0) {
+ $pythonCmd = $candidate
+ break
+ }
+ }
+ }
+
+ if ($pythonCmd) {
+ $pyScript = $null
+ try {
+ $pyScript = [System.IO.Path]::GetTempFileName()
+ Set-Content -LiteralPath $pyScript -Encoding UTF8 -Value @'
+import json
+import sys
+try:
+ import yaml
+except ImportError:
+ print(
+ "agent-context: PyYAML is required to parse extension config; cannot update context.",
+ file=sys.stderr,
+ )
+ sys.exit(2)
+
+try:
+ with open(sys.argv[1], "r", encoding="utf-8") as fh:
+ data = yaml.safe_load(fh)
+except Exception as exc:
+ print(
+ f"agent-context: unable to parse {sys.argv[1]} ({exc}); cannot update context.",
+ file=sys.stderr,
+ )
+ sys.exit(2)
+
+if not isinstance(data, dict):
+ data = {}
+
+print(json.dumps(data))
+'@
+ $jsonOut = & $pythonCmd $pyScript $ExtConfig
+ if ($LASTEXITCODE -eq 0 -and $jsonOut) {
+ $Options = $jsonOut | ConvertFrom-Json -ErrorAction Stop
+ }
+ } catch {
+ $Options = $null
+ } finally {
+ if ($pyScript -and (Test-Path -LiteralPath $pyScript)) {
+ Remove-Item -LiteralPath $pyScript -Force -ErrorAction SilentlyContinue
+ }
+ }
+ }
+
+ if (-not $Options) {
+ Write-Warning "agent-context: unable to parse $ExtConfig; skipping update."
+ exit 0
+ }
+}
+
+if (-not (Test-ConfigObject -Object $Options)) {
+ Write-Warning "agent-context: $ExtConfig must contain a YAML mapping; skipping update."
+ exit 0
+}
+
+$ConfiguredContextFiles = Get-ConfigValue -Object $Options -Key 'context_files'
+$ContextFiles = @()
+if ($null -ne $ConfiguredContextFiles) {
+ foreach ($item in @($ConfiguredContextFiles)) {
+ if ($item -is [string] -and -not [string]::IsNullOrWhiteSpace($item)) {
+ $ContextFiles += $item.Trim()
+ }
+ }
+}
+if ($ContextFiles.Count -eq 0) {
+ $ContextFile = Get-ConfigValue -Object $Options -Key 'context_file'
+ if ($ContextFile -is [string] -and -not [string]::IsNullOrWhiteSpace($ContextFile)) {
+ $ContextFiles += $ContextFile.Trim()
+ }
+}
+$pathComparison = if ([System.Environment]::OSVersion.Platform -eq [System.PlatformID]::Win32NT) {
+ [System.StringComparer]::OrdinalIgnoreCase
+} else {
+ [System.StringComparer]::Ordinal
+}
+$seenContextFiles = [System.Collections.Generic.HashSet[string]]::new($pathComparison)
+$dedupedContextFiles = @()
+foreach ($ContextFile in $ContextFiles) {
+ if ($seenContextFiles.Add($ContextFile)) {
+ $dedupedContextFiles += $ContextFile
+ }
+}
+$ContextFiles = $dedupedContextFiles
+if ($ContextFiles.Count -eq 0) {
+ Write-Warning 'agent-context: context_files/context_file not set in extension config; nothing to do.'
+ exit 0
+}
+
+foreach ($ContextFile in $ContextFiles) {
+ # Reject absolute paths, drive-qualified paths, backslash separators, and '..' path segments in context files
+ if ($ContextFile -match '^[A-Za-z]:') {
+ Write-Warning "agent-context: context files must be project-relative paths; got '$ContextFile'."
+ exit 1
+ }
+ if ([System.IO.Path]::IsPathRooted($ContextFile)) {
+ Write-Warning "agent-context: context files must be project-relative paths; got '$ContextFile'."
+ exit 1
+ }
+ if ($ContextFile.Contains('\')) {
+ Write-Warning "agent-context: context files must not contain backslash separators; got '$ContextFile'."
+ exit 1
+ }
+ $cfSegments = $ContextFile -split '[/\\]'
+ if ($cfSegments -contains '..') {
+ Write-Warning "agent-context: context files must not contain '..' path segments; got '$ContextFile'."
+ exit 1
+ }
+ $resolvedTarget = Resolve-ContextPath -Root $ProjectRoot -RelativePath $ContextFile
+ if (-not (Test-IsSubPath -Root $ProjectRoot -Path $resolvedTarget)) {
+ Write-Warning "agent-context: context file path resolves outside the project root; got '$ContextFile'."
+ exit 1
+ }
+}
+
+$MarkerStart = $DefaultStart
+$MarkerEnd = $DefaultEnd
+$cm = Get-ConfigValue -Object $Options -Key 'context_markers'
+if ($cm) {
+ $cmStart = Get-ConfigValue -Object $cm -Key 'start'
+ if ($cmStart -is [string] -and $cmStart) {
+ $MarkerStart = $cmStart
+ }
+ $cmEnd = Get-ConfigValue -Object $cm -Key 'end'
+ if ($cmEnd -is [string] -and $cmEnd) {
+ $MarkerEnd = $cmEnd
+ }
+}
+
+if (-not $PlanPath) {
+ # Prefer .specify/feature.json (written by /speckit-specify) over mtime heuristic.
+ $FeatureJson = Join-Path $ProjectRoot '.specify/feature.json'
+ if (Test-Path -LiteralPath $FeatureJson) {
+ try {
+ $fj = Get-Content -LiteralPath $FeatureJson -Raw -Encoding UTF8 | ConvertFrom-Json
+ $featureDir = $fj.feature_directory
+ if ($featureDir -isnot [string] -or -not $featureDir) {
+ $featureDir = $null
+ } else {
+ $featureDir = $featureDir.TrimEnd('\', '/')
+ }
+ if ($featureDir) {
+ # Join-Path on Unix does not treat absolute ChildPath as "wins"; check explicitly.
+ if ([System.IO.Path]::IsPathRooted($featureDir)) {
+ $candidatePlan = Join-Path $featureDir 'plan.md'
+ } else {
+ $candidatePlan = Join-Path (Join-Path $ProjectRoot $featureDir) 'plan.md'
+ }
+ if (Test-Path -LiteralPath $candidatePlan) {
+ # Resolve ./ .. segments before relativizing (mirrors bash Path.resolve()).
+ # GetFullPath is available in .NET Framework 4.x (PS 5.1 compatible).
+ $resolvedPlan = [System.IO.Path]::GetFullPath($candidatePlan)
+ $resolvedDir = [System.IO.Path]::GetDirectoryName($resolvedPlan)
+ $normRoot = $ProjectRoot.TrimEnd('\', '/') + [System.IO.Path]::DirectorySeparatorChar
+ $normDir = $resolvedDir.TrimEnd('\', '/') + [System.IO.Path]::DirectorySeparatorChar
+ $cmp = if ([System.Environment]::OSVersion.Platform -eq [System.PlatformID]::Win32NT) { [System.StringComparison]::OrdinalIgnoreCase } else { [System.StringComparison]::Ordinal }
+ if ($normDir.StartsWith($normRoot, $cmp)) {
+ $relDir = $normDir.Substring($normRoot.Length).TrimEnd('\', '/')
+ $PlanPath = if ($relDir) { $relDir.Replace('\', '/') + '/plan.md' } else { 'plan.md' }
+ } else {
+ $PlanPath = $resolvedPlan.Replace('\', '/')
+ }
+ }
+ }
+ } catch {
+ # Non-fatal: fall through to mtime heuristic.
+ }
+ }
+
+ # Fall back to mtime only when feature.json is absent or its plan does not exist yet.
+ if (-not $PlanPath) {
+ try {
+ $specsDir = Join-Path $ProjectRoot 'specs'
+ $candidate = Get-ChildItem -Path $specsDir -Directory -ErrorAction SilentlyContinue |
+ ForEach-Object { Get-Item -LiteralPath (Join-Path $_.FullName 'plan.md') -ErrorAction SilentlyContinue } |
+ Where-Object { $_ } |
+ Sort-Object LastWriteTime -Descending |
+ Select-Object -First 1
+ if ($candidate) {
+ # GetRelativePath is .NET 5+ only; strip prefix manually for PS 5.1 compat.
+ # Use case-insensitive comparison on Windows only (matches common.ps1 pattern).
+ $fullPath = $candidate.FullName.Replace('\', '/')
+ $normRoot = $ProjectRoot.Replace('\', '/').TrimEnd('/') + '/'
+ $cmp = if ([System.Environment]::OSVersion.Platform -eq [System.PlatformID]::Win32NT) { [System.StringComparison]::OrdinalIgnoreCase } else { [System.StringComparison]::Ordinal }
+ if ($fullPath.StartsWith($normRoot, $cmp)) {
+ $PlanPath = $fullPath.Substring($normRoot.Length)
+ } else {
+ $PlanPath = $fullPath
+ }
+ }
+ } catch {
+ # Non-fatal: continue without a plan path.
+ }
+ }
+}
+
+$lines = @($MarkerStart,
+ 'For additional context about technologies to be used, project structure,',
+ 'shell commands, and other important information, read the current plan')
+if ($PlanPath) {
+ $lines += "at $PlanPath"
+}
+$lines += $MarkerEnd
+$Section = ($lines -join "`n") + "`n"
+
+foreach ($ContextFile in $ContextFiles) {
+ $CtxPath = Join-Path $ProjectRoot $ContextFile
+ $CtxDir = Split-Path -Parent $CtxPath
+ if ($CtxDir -and -not (Test-Path -LiteralPath $CtxDir)) {
+ New-Item -ItemType Directory -Path $CtxDir -Force | Out-Null
+ }
+
+ if (Test-Path -LiteralPath $CtxPath) {
+ $rawBytes = [System.IO.File]::ReadAllBytes($CtxPath)
+ # Strip UTF-8 BOM if present
+ if ($rawBytes.Length -ge 3 -and $rawBytes[0] -eq 0xEF -and $rawBytes[1] -eq 0xBB -and $rawBytes[2] -eq 0xBF) {
+ $content = [System.Text.Encoding]::UTF8.GetString($rawBytes, 3, $rawBytes.Length - 3)
+ } else {
+ $content = [System.Text.Encoding]::UTF8.GetString($rawBytes)
+ }
+
+ $s = $content.IndexOf($MarkerStart)
+ $e = if ($s -ge 0) { $content.IndexOf($MarkerEnd, $s) } else { $content.IndexOf($MarkerEnd) }
+
+ if ($s -ge 0 -and $e -ge 0 -and $e -gt $s) {
+ $endOfMarker = $e + $MarkerEnd.Length
+ if ($endOfMarker -lt $content.Length -and $content[$endOfMarker] -eq "`r") { $endOfMarker++ }
+ if ($endOfMarker -lt $content.Length -and $content[$endOfMarker] -eq "`n") { $endOfMarker++ }
+ $newContent = $content.Substring(0, $s) + $Section + $content.Substring($endOfMarker)
+ } elseif ($s -ge 0) {
+ $newContent = $content.Substring(0, $s) + $Section
+ } elseif ($e -ge 0) {
+ $endOfMarker = $e + $MarkerEnd.Length
+ if ($endOfMarker -lt $content.Length -and $content[$endOfMarker] -eq "`r") { $endOfMarker++ }
+ if ($endOfMarker -lt $content.Length -and $content[$endOfMarker] -eq "`n") { $endOfMarker++ }
+ $newContent = $Section + $content.Substring($endOfMarker)
+ } else {
+ if ($content -and -not $content.EndsWith("`n")) { $content += "`n" }
+ if ($content) { $newContent = $content + "`n" + $Section } else { $newContent = $Section }
+ }
+ } else {
+ $newContent = $Section
+ }
+
+ $newContent = $newContent.Replace("`r`n", "`n").Replace("`r", "`n")
+ [System.IO.File]::WriteAllText($CtxPath, $newContent, (New-Object System.Text.UTF8Encoding($false)))
+
+ Write-Host "agent-context: updated $ContextFile"
+}
diff --git a/.specify/feature.json b/.specify/feature.json
new file mode 100644
index 0000000..69a4651
--- /dev/null
+++ b/.specify/feature.json
@@ -0,0 +1,3 @@
+{
+ "feature_directory": "specs/001-ensemble-workflow-ui"
+}
diff --git a/.specify/init-options.json b/.specify/init-options.json
new file mode 100644
index 0000000..6b2408d
--- /dev/null
+++ b/.specify/init-options.json
@@ -0,0 +1,9 @@
+{
+ "ai": "claude",
+ "ai_skills": true,
+ "feature_numbering": "sequential",
+ "here": true,
+ "integration": "claude",
+ "script": "sh",
+ "speckit_version": "0.11.10.dev0"
+}
\ No newline at end of file
diff --git a/.specify/integration.json b/.specify/integration.json
new file mode 100644
index 0000000..5e4bc53
--- /dev/null
+++ b/.specify/integration.json
@@ -0,0 +1,15 @@
+{
+ "version": "0.11.10.dev0",
+ "integration_state_schema": 1,
+ "installed_integrations": [
+ "claude"
+ ],
+ "integration_settings": {
+ "claude": {
+ "script": "sh",
+ "invoke_separator": "-"
+ }
+ },
+ "integration": "claude",
+ "default_integration": "claude"
+}
diff --git a/.specify/integrations/claude.manifest.json b/.specify/integrations/claude.manifest.json
new file mode 100644
index 0000000..b8decd1
--- /dev/null
+++ b/.specify/integrations/claude.manifest.json
@@ -0,0 +1,17 @@
+{
+ "integration": "claude",
+ "version": "0.11.10.dev0",
+ "installed_at": "2026-06-27T21:48:08.043755+00:00",
+ "files": {
+ ".claude/skills/speckit-analyze/SKILL.md": "fecd4bf113c3dda58c75d387473c0106fc2dfea97a27bb7c65af94f3f916c188",
+ ".claude/skills/speckit-clarify/SKILL.md": "c1c2098756ca407530cca11c5b608f517d769962215ddafa013951b81e3e19c5",
+ ".claude/skills/speckit-constitution/SKILL.md": "ee3972318415a05559c6bf281dcbd2e8deda944e595d64ab5474abeacf558697",
+ ".claude/skills/speckit-implement/SKILL.md": "823049e49aa983fe398d4bccf6c686ab6afe8f2cd3856e0380c3ef797d78d56d",
+ ".claude/skills/speckit-converge/SKILL.md": "04226b8443797337624983111546d5e5a48d9993a176c4e6d72a4099a0af50d4",
+ ".claude/skills/speckit-plan/SKILL.md": "53733c8a4f4fd01685759bb1c68e94c73da4ce90d549139e79e419dec6471510",
+ ".claude/skills/speckit-checklist/SKILL.md": "946c6bc808891436972a11a423f89f0fbd272a79809bb8fd1d29f481ebe02613",
+ ".claude/skills/speckit-specify/SKILL.md": "9324dd55d12d420cd581031419fa37eb94ef75ae0bdd53391dd4414bd9d45e02",
+ ".claude/skills/speckit-tasks/SKILL.md": "cb29fb8247a30aac751be83de88d0399221692589dd26327552ae6f193816fda",
+ ".claude/skills/speckit-taskstoissues/SKILL.md": "dfe23aaca349cd76e98505dafa9aae1ef4616a0c35a5c79122b9bd881e16b62f"
+ }
+}
diff --git a/.specify/integrations/speckit.manifest.json b/.specify/integrations/speckit.manifest.json
new file mode 100644
index 0000000..ab72d99
--- /dev/null
+++ b/.specify/integrations/speckit.manifest.json
@@ -0,0 +1,17 @@
+{
+ "integration": "speckit",
+ "version": "0.11.10.dev0",
+ "installed_at": "2026-06-27T21:48:08.066355+00:00",
+ "files": {
+ ".specify/scripts/bash/setup-plan.sh": "4eb12c5b00f5c66a7d01b56c90898d320dcef4425d9b96652d57156c84948eda",
+ ".specify/scripts/bash/check-prerequisites.sh": "afce0aa8db177320d83aa0b8e3619c06b865fd810781894e4a7a3f81664941ce",
+ ".specify/scripts/bash/common.sh": "af8a16f87b4f9084759c42ff9abf35c0b2a2025dffe58c298758ff86de2923b2",
+ ".specify/scripts/bash/setup-tasks.sh": "cf21ba2212b4dd5b435c5ea8527500cfd27768b86c0bbc7ebc3207759f118d27",
+ ".specify/scripts/bash/create-new-feature.sh": "9ba116b64f0328eb69bc1a195d209074ea38823a73a554160d69df34a74daa65",
+ ".specify/templates/constitution-template.md": "ce7549540fa45543cca797a150201d868e64495fdff39dc38246fb17bd4024b3",
+ ".specify/templates/tasks-template.md": "fc29a233f6f5a27ca31f1aa46b596af6500c627441c6e62b2bc4a1d721525842",
+ ".specify/templates/checklist-template.md": "c37695297e5d3153d64f82c21223509940b13932046c7961c42d1d669516130c",
+ ".specify/templates/plan-template.md": "cc7f7979cf8d8836ec26492785affd80791d3422a2b745062ec695be8c985ef7",
+ ".specify/templates/spec-template.md": "3945437fc35cd30a5b2bf7beea680337c3516826d3efa5a6b92c4a7eca1ba28e"
+ }
+}
diff --git a/.specify/memory/constitution.md b/.specify/memory/constitution.md
index fd92d99..02dd813 100644
--- a/.specify/memory/constitution.md
+++ b/.specify/memory/constitution.md
@@ -88,6 +88,16 @@ The ensemble grounding-doc workflow is the canonical shape: the UI may step you
*Kills: The Walled Garden* — a UI that swallows the whole workflow, hides the files, and locks the human out of the conversation and the CLI.
+### X. Selection is Explicit; There is No Silent "All"
+
+A batch operation acts on the set the human explicitly chose — never on an implicit "everything" inferred from an empty or absent selection. **"Select all" is a deliberate act that materializes the full set as the chosen set; it is not the state the system falls into when the human chose nothing.** An empty selection means *nothing is selected*: the operation refuses to run and says so, rather than guessing that the human meant the whole corpus.
+
+Which inputs a token-spending pass touches is a **scope decision** (Principle II), and it is the human's — made explicitly, every time. A default that quietly expands to "all" removes that decision from the human exactly when it is most expensive to get wrong.
+
+Concrete clause: the ensemble chapter picker stores `ui.ensemble.chapters_selected` as the literal set of chosen chapters; "Select all" writes every resolved path; `GET /api/ensemble/run/extract` refuses an empty `chapters` list instead of falling back to the glob (`tests/test_ensemble_chapters.py`). The CLI engine is exempt only because a glob *typed at the CLI* is itself an explicit act; the UI must never manufacture that act on the human's behalf.
+
+*Kills: the Implicit Blast Radius* — a batch action that silently expands to "everything" because the set was never explicitly chosen.
+
## Architecture is Destiny
Bad architectural choices are liabilities, and in this system the currency is twofold: **token spend** and **precision failures at the table**.
@@ -110,7 +120,7 @@ Humans author structure, identity, and schema. The LLM — including Spec Kit it
This constitution supersedes conflicting specs, plans, and tasks. A conflict requires written justification or an amendment — not a silent override.
- **Principle precedence:** I (Disk is Truth) and II (The Human Checkpoint) outrank all other principles. When a convenience, a performance gain, or a cleaner abstraction collides with truth-on-disk or the human gate, truth and the gate win.
-- Every spec and plan is tested, by name, against all nine principles before implementation begins.
+- Every spec and plan is tested, by name, against all ten principles before implementation begins.
- Amendments require a stated rationale, a version bump, and a check that dependent templates and docs stay in sync.
- Semantic versioning of this document:
- **MAJOR** — a principle removed or redefined in a backward-incompatible way.
@@ -119,4 +129,6 @@ This constitution supersedes conflicting specs, plans, and tasks. A conflict req
Runtime development guidance lives in `CLAUDE.md` (this repo) and `~/.claude/CLAUDE.md` (global). Where those and this constitution agree, this is the canonical statement; where they drift, amend one to match the other.
-**Version**: 1.1.0 | **Ratified**: 2026-06-27 | **Last Amended**: 2026-06-27
+**Version**: 1.2.0 | **Ratified**: 2026-06-27 | **Last Amended**: 2026-06-27
+
+> **1.2.0** (MINOR) — Added Principle X (*Selection is Explicit; There is No Silent "All"*), arising from the ensemble chapter picker: a batch pass acts only on an explicitly chosen set, and "Select all" must materialize that set rather than be an empty-means-everything default.
diff --git a/.specify/scripts/bash/check-prerequisites.sh b/.specify/scripts/bash/check-prerequisites.sh
new file mode 100755
index 0000000..8377d8e
--- /dev/null
+++ b/.specify/scripts/bash/check-prerequisites.sh
@@ -0,0 +1,189 @@
+#!/usr/bin/env bash
+
+# Consolidated prerequisite checking script
+#
+# This script provides unified prerequisite checking for Spec-Driven Development workflow.
+# It replaces the functionality previously spread across multiple scripts.
+#
+# Usage: ./check-prerequisites.sh [OPTIONS]
+#
+# OPTIONS:
+# --json Output in JSON format
+# --require-tasks Require tasks.md to exist (for implementation phase)
+# --include-tasks Include tasks.md in AVAILABLE_DOCS list
+# --paths-only Only output path variables (no validation)
+# --help, -h Show help message
+#
+# OUTPUTS:
+# JSON mode: {"FEATURE_DIR":"...", "AVAILABLE_DOCS":["..."]}
+# Text mode: FEATURE_DIR:... \n AVAILABLE_DOCS: \n ✓/✗ file.md
+# Paths only: REPO_ROOT: ... \n BRANCH: ... \n FEATURE_DIR: ... etc.
+
+set -e
+
+# Parse command line arguments
+JSON_MODE=false
+REQUIRE_TASKS=false
+INCLUDE_TASKS=false
+PATHS_ONLY=false
+
+for arg in "$@"; do
+ case "$arg" in
+ --json)
+ JSON_MODE=true
+ ;;
+ --require-tasks)
+ REQUIRE_TASKS=true
+ ;;
+ --include-tasks)
+ INCLUDE_TASKS=true
+ ;;
+ --paths-only)
+ PATHS_ONLY=true
+ ;;
+ --help|-h)
+ cat << 'EOF'
+Usage: check-prerequisites.sh [OPTIONS]
+
+Consolidated prerequisite checking for Spec-Driven Development workflow.
+
+OPTIONS:
+ --json Output in JSON format
+ --require-tasks Require tasks.md to exist (for implementation phase)
+ --include-tasks Include tasks.md in AVAILABLE_DOCS list
+ --paths-only Only output path variables (no prerequisite validation)
+ --help, -h Show this help message
+
+EXAMPLES:
+ # Check task prerequisites (plan.md required)
+ ./check-prerequisites.sh --json
+
+ # Check implementation prerequisites (plan.md + tasks.md required)
+ ./check-prerequisites.sh --json --require-tasks --include-tasks
+
+ # Get feature paths only (no validation)
+ ./check-prerequisites.sh --paths-only
+
+EOF
+ exit 0
+ ;;
+ *)
+ echo "ERROR: Unknown option '$arg'. Use --help for usage information." >&2
+ exit 1
+ ;;
+ esac
+done
+
+# Source common functions
+SCRIPT_DIR="$(CDPATH="" cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+source "$SCRIPT_DIR/common.sh"
+
+# Get feature paths
+_paths_output=$(get_feature_paths) || { echo "ERROR: Failed to resolve feature paths" >&2; exit 1; }
+eval "$_paths_output"
+unset _paths_output
+
+# If paths-only mode, output paths and exit (no validation)
+if $PATHS_ONLY; then
+ if $JSON_MODE; then
+ # Minimal JSON paths payload (no validation performed)
+ if has_jq; then
+ jq -cn \
+ --arg repo_root "$REPO_ROOT" \
+ --arg branch "$CURRENT_BRANCH" \
+ --arg feature_dir "$FEATURE_DIR" \
+ --arg feature_spec "$FEATURE_SPEC" \
+ --arg impl_plan "$IMPL_PLAN" \
+ --arg tasks "$TASKS" \
+ '{REPO_ROOT:$repo_root,BRANCH:$branch,FEATURE_DIR:$feature_dir,FEATURE_SPEC:$feature_spec,IMPL_PLAN:$impl_plan,TASKS:$tasks}'
+ else
+ printf '{"REPO_ROOT":"%s","BRANCH":"%s","FEATURE_DIR":"%s","FEATURE_SPEC":"%s","IMPL_PLAN":"%s","TASKS":"%s"}\n' \
+ "$(json_escape "$REPO_ROOT")" "$(json_escape "$CURRENT_BRANCH")" "$(json_escape "$FEATURE_DIR")" "$(json_escape "$FEATURE_SPEC")" "$(json_escape "$IMPL_PLAN")" "$(json_escape "$TASKS")"
+ fi
+ else
+ echo "REPO_ROOT: $REPO_ROOT"
+ echo "BRANCH: $CURRENT_BRANCH"
+ echo "FEATURE_DIR: $FEATURE_DIR"
+ echo "FEATURE_SPEC: $FEATURE_SPEC"
+ echo "IMPL_PLAN: $IMPL_PLAN"
+ echo "TASKS: $TASKS"
+ fi
+ exit 0
+fi
+
+# Validate required directories and files
+if [[ ! -d "$FEATURE_DIR" ]]; then
+ echo "ERROR: Feature directory not found: $FEATURE_DIR" >&2
+ echo "Run /speckit-specify first to create the feature structure." >&2
+ exit 1
+fi
+
+if [[ ! -f "$IMPL_PLAN" ]]; then
+ echo "ERROR: plan.md not found in $FEATURE_DIR" >&2
+ echo "Run /speckit-plan first to create the implementation plan." >&2
+ exit 1
+fi
+
+# Check for tasks.md if required
+if $REQUIRE_TASKS && [[ ! -f "$TASKS" ]]; then
+ echo "ERROR: tasks.md not found in $FEATURE_DIR" >&2
+ echo "Run /speckit-tasks first to create the task list." >&2
+ exit 1
+fi
+
+# Build list of available documents
+docs=()
+
+# Always check these optional docs
+[[ -f "$RESEARCH" ]] && docs+=("research.md")
+[[ -f "$DATA_MODEL" ]] && docs+=("data-model.md")
+
+# Check contracts directory (only if it exists and has files)
+if [[ -d "$CONTRACTS_DIR" ]] && [[ -n "$(ls -A "$CONTRACTS_DIR" 2>/dev/null)" ]]; then
+ docs+=("contracts/")
+fi
+
+[[ -f "$QUICKSTART" ]] && docs+=("quickstart.md")
+
+# Include tasks.md if requested and it exists
+if $INCLUDE_TASKS && [[ -f "$TASKS" ]]; then
+ docs+=("tasks.md")
+fi
+
+# Output results
+if $JSON_MODE; then
+ # Build JSON array of documents
+ if has_jq; then
+ if [[ ${#docs[@]} -eq 0 ]]; then
+ json_docs="[]"
+ else
+ json_docs=$(printf '%s\n' "${docs[@]}" | jq -R . | jq -s .)
+ fi
+ jq -cn \
+ --arg feature_dir "$FEATURE_DIR" \
+ --argjson docs "$json_docs" \
+ '{FEATURE_DIR:$feature_dir,AVAILABLE_DOCS:$docs}'
+ else
+ if [[ ${#docs[@]} -eq 0 ]]; then
+ json_docs="[]"
+ else
+ json_docs=$(for d in "${docs[@]}"; do printf '"%s",' "$(json_escape "$d")"; done)
+ json_docs="[${json_docs%,}]"
+ fi
+ printf '{"FEATURE_DIR":"%s","AVAILABLE_DOCS":%s}\n' "$(json_escape "$FEATURE_DIR")" "$json_docs"
+ fi
+else
+ # Text output
+ echo "FEATURE_DIR:$FEATURE_DIR"
+ echo "AVAILABLE_DOCS:"
+
+ # Show status of each potential document
+ check_file "$RESEARCH" "research.md"
+ check_file "$DATA_MODEL" "data-model.md"
+ check_dir "$CONTRACTS_DIR" "contracts/"
+ check_file "$QUICKSTART" "quickstart.md"
+
+ if $INCLUDE_TASKS; then
+ check_file "$TASKS" "tasks.md"
+ fi
+fi
diff --git a/.specify/scripts/bash/common.sh b/.specify/scripts/bash/common.sh
new file mode 100755
index 0000000..70ab89b
--- /dev/null
+++ b/.specify/scripts/bash/common.sh
@@ -0,0 +1,619 @@
+#!/usr/bin/env bash
+# Common functions and variables for all scripts
+
+# Find repository root by searching upward for .specify directory
+# This is the primary marker for spec-kit projects
+find_specify_root() {
+ local dir="${1:-$(pwd)}"
+ # Normalize to absolute path to prevent infinite loop with relative paths
+ # Use -- to handle paths starting with - (e.g., -P, -L)
+ dir="$(cd -- "$dir" 2>/dev/null && pwd)" || return 1
+ local prev_dir=""
+ while true; do
+ if [ -d "$dir/.specify" ]; then
+ echo "$dir"
+ return 0
+ fi
+ # Stop if we've reached filesystem root or dirname stops changing
+ if [ "$dir" = "/" ] || [ "$dir" = "$prev_dir" ]; then
+ break
+ fi
+ prev_dir="$dir"
+ dir="$(dirname "$dir")"
+ done
+ return 1
+}
+
+# Resolve an explicit SPECIFY_INIT_DIR project override (the directory that
+# *contains* .specify/), for non-interactive / CI use — e.g. running a Spec Kit
+# command against a member project from a monorepo root without cd.
+#
+# Precondition: SPECIFY_INIT_DIR is non-empty. Echoes the validated absolute
+# project root, or prints an error and returns 1. Strict by design: the path
+# must exist and contain .specify/, with no silent fallback to cwd or the
+# script-location default (which would silently write to the wrong project).
+#
+# This is the single resolver: bundled extensions inherit it by sourcing core
+# (e.g. the git extension's create-new-feature-branch) rather than duplicating it.
+resolve_specify_init_dir() {
+ local init_root
+ # Normalize: relative paths resolve against $(pwd); a trailing slash collapses.
+ # CDPATH="" so a relative value cannot be resolved against the caller's CDPATH
+ # (which would also echo to stdout and corrupt the captured path).
+ if ! init_root="$(CDPATH="" cd -- "$SPECIFY_INIT_DIR" 2>/dev/null && pwd)"; then
+ echo "ERROR: SPECIFY_INIT_DIR does not point to an existing directory: $SPECIFY_INIT_DIR" >&2
+ return 1
+ fi
+ if [[ ! -d "$init_root/.specify" ]]; then
+ echo "ERROR: SPECIFY_INIT_DIR is not a Spec Kit project (no .specify/ directory): $init_root" >&2
+ return 1
+ fi
+ printf '%s\n' "$init_root"
+}
+
+# Get repository root, prioritizing .specify directory
+# This prevents using a parent repository when spec-kit is initialized in a subdirectory
+get_repo_root() {
+ # Explicit project override wins (see resolve_specify_init_dir).
+ if [[ -n "${SPECIFY_INIT_DIR:-}" ]]; then
+ resolve_specify_init_dir
+ return
+ fi
+
+ # First, look for .specify directory (spec-kit's own marker)
+ local specify_root
+ if specify_root=$(find_specify_root); then
+ echo "$specify_root"
+ return
+ fi
+
+ # Final fallback to script location
+ local script_dir="$(CDPATH="" cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+ (cd "$script_dir/../../.." && pwd)
+}
+
+# Get current feature name from explicit state only.
+# Returns the feature identifier or empty string if none is set.
+# Feature state is set by SPECIFY_FEATURE (from create-new-feature or
+# the git extension) or implicitly via .specify/feature.json.
+get_current_branch() {
+ if [[ -n "${SPECIFY_FEATURE:-}" ]]; then
+ echo "$SPECIFY_FEATURE"
+ return
+ fi
+
+ # No explicit feature set — caller must handle this via feature.json
+ # in get_feature_paths(). Return empty to signal "unknown".
+ echo ""
+}
+
+# Safely read .specify/feature.json's "feature_directory" value.
+# Prints the raw value (possibly relative) to stdout, or empty string if the file
+# is missing, unparseable, or does not contain the key. Always returns 0 so callers
+# under `set -e` cannot be aborted by parser failure.
+# Parser order mirrors the historical get_feature_paths behavior: jq -> python3 -> grep/sed.
+read_feature_json_feature_directory() {
+ local repo_root="$1"
+ local fj="$repo_root/.specify/feature.json"
+ [[ -f "$fj" ]] || { printf '%s' ''; return 0; }
+
+ local _fd=''
+ if command -v jq >/dev/null 2>&1; then
+ if ! _fd=$(jq -r '.feature_directory // empty' "$fj" 2>/dev/null); then
+ _fd=''
+ fi
+ elif command -v python3 >/dev/null 2>&1; then
+ # Use Python so pretty-printed/multi-line JSON still parses correctly.
+ if ! _fd=$(python3 -c "import json,sys; d=json.load(open(sys.argv[1])); v=d.get('feature_directory'); print(v if v else '')" "$fj" 2>/dev/null); then
+ _fd=''
+ fi
+ else
+ # Last-resort single-line grep/sed fallback. The `|| true` guards against
+ # grep returning 1 (no match) aborting under `set -e` / `pipefail`.
+ _fd=$( { grep -E '"feature_directory"[[:space:]]*:' "$fj" 2>/dev/null || true; } \
+ | head -n 1 \
+ | sed -E 's/^[^:]*:[[:space:]]*"([^"]*)".*$/\1/' )
+ fi
+
+ printf '%s' "$_fd"
+ return 0
+}
+
+# Persist a feature_directory value to .specify/feature.json.
+# Writes only when the file is missing or the value differs from what's stored.
+# Accepts the raw (possibly relative) path — callers should pass the original
+# user-supplied value, not the normalized absolute path.
+_persist_feature_json() {
+ local repo_root="$1"
+ local feature_dir_value="$2"
+ local fj="$repo_root/.specify/feature.json"
+
+ # Strip repo_root prefix if the value is absolute and under repo_root
+ if [[ "$feature_dir_value" == "$repo_root/"* ]]; then
+ feature_dir_value="${feature_dir_value#"$repo_root/"}"
+ fi
+
+ # Read current value (if any) and skip write when unchanged
+ local current_val
+ current_val=$(read_feature_json_feature_directory "$repo_root")
+ if [[ "$current_val" == "$feature_dir_value" ]]; then
+ return 0
+ fi
+
+ # Ensure .specify/ directory exists
+ mkdir -p "$repo_root/.specify"
+
+ # Write feature.json — prefer jq for safe JSON, fall back to printf
+ if command -v jq >/dev/null 2>&1; then
+ jq -cn --arg fd "$feature_dir_value" '{feature_directory:$fd}' > "$fj"
+ else
+ printf '{"feature_directory":"%s"}\n' "$(json_escape "$feature_dir_value")" > "$fj"
+ fi
+}
+
+get_feature_paths() {
+ # Split decl/assignment so a SPECIFY_INIT_DIR validation failure in
+ # get_repo_root propagates as a hard error instead of being masked by `local`.
+ local repo_root
+ repo_root=$(get_repo_root) || return 1
+ local current_branch
+ current_branch=$(get_current_branch)
+
+ # Resolve feature directory. Priority:
+ # 1. SPECIFY_FEATURE_DIRECTORY env var (explicit override)
+ # 2. .specify/feature.json "feature_directory" key (persisted by specify command)
+ # 3. Error — no feature context available
+ local feature_dir
+ if [[ -n "${SPECIFY_FEATURE_DIRECTORY:-}" ]]; then
+ feature_dir="$SPECIFY_FEATURE_DIRECTORY"
+ # Normalize relative paths to absolute under repo root
+ [[ "$feature_dir" != /* ]] && feature_dir="$repo_root/$feature_dir"
+ # Persist to feature.json so future sessions without the env var still work
+ _persist_feature_json "$repo_root" "$SPECIFY_FEATURE_DIRECTORY"
+ elif [[ -f "$repo_root/.specify/feature.json" ]]; then
+ local _fd
+ _fd=$(read_feature_json_feature_directory "$repo_root")
+ if [[ -n "$_fd" ]]; then
+ feature_dir="$_fd"
+ # Normalize relative paths to absolute under repo root
+ [[ "$feature_dir" != /* ]] && feature_dir="$repo_root/$feature_dir"
+ else
+ echo "ERROR: Feature directory not found. Set SPECIFY_FEATURE_DIRECTORY or ensure .specify/feature.json contains feature_directory." >&2
+ return 1
+ fi
+ else
+ echo "ERROR: Feature directory not found. Set SPECIFY_FEATURE_DIRECTORY or run the specify command to create .specify/feature.json." >&2
+ return 1
+ fi
+
+ # Use printf '%q' to safely quote values, preventing shell injection
+ # via crafted branch names or paths containing special characters
+ printf 'REPO_ROOT=%q\n' "$repo_root"
+ printf 'CURRENT_BRANCH=%q\n' "$current_branch"
+ printf 'FEATURE_DIR=%q\n' "$feature_dir"
+ printf 'FEATURE_SPEC=%q\n' "$feature_dir/spec.md"
+ printf 'IMPL_PLAN=%q\n' "$feature_dir/plan.md"
+ printf 'TASKS=%q\n' "$feature_dir/tasks.md"
+ printf 'RESEARCH=%q\n' "$feature_dir/research.md"
+ printf 'DATA_MODEL=%q\n' "$feature_dir/data-model.md"
+ printf 'QUICKSTART=%q\n' "$feature_dir/quickstart.md"
+ printf 'CONTRACTS_DIR=%q\n' "$feature_dir/contracts"
+}
+
+# Check if jq is available for safe JSON construction
+has_jq() {
+ command -v jq >/dev/null 2>&1
+}
+
+get_invoke_separator() {
+ local repo_root="${1:-$(get_repo_root)}"
+ if [[ "${_SPECIFY_INVOKE_SEPARATOR_CACHE_REPO_ROOT:-}" == "$repo_root" && -n "${_SPECIFY_INVOKE_SEPARATOR_CACHE_VALUE:-}" ]]; then
+ printf '%s\n' "$_SPECIFY_INVOKE_SEPARATOR_CACHE_VALUE"
+ return 0
+ fi
+
+ local integration_json="$repo_root/.specify/integration.json"
+ local separator="."
+ local parsed_with_jq=0
+
+ if [[ -f "$integration_json" ]]; then
+ if command -v jq >/dev/null 2>&1; then
+ local jq_separator
+ if jq_separator=$(jq -r '(.default_integration // .integration // "") as $k | if $k == "" then "." else (.integration_settings[$k].invoke_separator // ".") end' "$integration_json" 2>/dev/null); then
+ parsed_with_jq=1
+ case "$jq_separator" in
+ "."|"-") separator="$jq_separator" ;;
+ esac
+ fi
+ fi
+
+ if [[ "$parsed_with_jq" -eq 0 ]] && command -v python3 >/dev/null 2>&1; then
+ if separator=$(python3 - "$integration_json" <<'PY' 2>/dev/null
+import json
+import sys
+
+try:
+ with open(sys.argv[1], encoding="utf-8") as fh:
+ state = json.load(fh)
+ key = state.get("default_integration") or state.get("integration") or ""
+ settings = state.get("integration_settings")
+ separator = "."
+ if isinstance(key, str) and isinstance(settings, dict):
+ entry = settings.get(key)
+ if isinstance(entry, dict) and entry.get("invoke_separator") in {".", "-"}:
+ separator = entry["invoke_separator"]
+ print(separator)
+except Exception:
+ print(".")
+PY
+); then
+ case "$separator" in
+ "."|"-") ;;
+ *) separator="." ;;
+ esac
+ else
+ separator="."
+ fi
+ fi
+ fi
+
+ _SPECIFY_INVOKE_SEPARATOR_CACHE_REPO_ROOT="$repo_root"
+ _SPECIFY_INVOKE_SEPARATOR_CACHE_VALUE="$separator"
+ printf '%s\n' "$separator"
+}
+
+format_speckit_command() {
+ local command_name="$1"
+ local repo_root="${2:-$(get_repo_root)}"
+ local separator
+ if [[ "${_SPECIFY_INVOKE_SEPARATOR_CACHE_REPO_ROOT:-}" == "$repo_root" && -n "${_SPECIFY_INVOKE_SEPARATOR_CACHE_VALUE:-}" ]]; then
+ separator="$_SPECIFY_INVOKE_SEPARATOR_CACHE_VALUE"
+ else
+ separator=$(get_invoke_separator "$repo_root")
+ _SPECIFY_INVOKE_SEPARATOR_CACHE_REPO_ROOT="$repo_root"
+ _SPECIFY_INVOKE_SEPARATOR_CACHE_VALUE="$separator"
+ fi
+
+ command_name="${command_name#/}"
+ command_name="${command_name#speckit.}"
+ command_name="${command_name#speckit-}"
+ command_name="${command_name//./$separator}"
+
+ printf '/speckit%s%s\n' "$separator" "$command_name"
+}
+
+# Escape a string for safe embedding in a JSON value (fallback when jq is unavailable).
+# Handles backslash, double-quote, and JSON-required control character escapes (RFC 8259).
+json_escape() {
+ local s="$1"
+ s="${s//\\/\\\\}"
+ s="${s//\"/\\\"}"
+ s="${s//$'\n'/\\n}"
+ s="${s//$'\t'/\\t}"
+ s="${s//$'\r'/\\r}"
+ s="${s//$'\b'/\\b}"
+ s="${s//$'\f'/\\f}"
+ # Escape any remaining U+0001-U+001F control characters as \uXXXX.
+ # (U+0000/NUL cannot appear in bash strings and is excluded.)
+ # LC_ALL=C ensures ${#s} counts bytes and ${s:$i:1} yields single bytes,
+ # so multi-byte UTF-8 sequences (first byte >= 0xC0) pass through intact.
+ local LC_ALL=C
+ local i char code
+ for (( i=0; i<${#s}; i++ )); do
+ char="${s:$i:1}"
+ printf -v code '%d' "'$char" 2>/dev/null || code=256
+ if (( code >= 1 && code <= 31 )); then
+ printf '\\u%04x' "$code"
+ else
+ printf '%s' "$char"
+ fi
+ done
+}
+
+check_file() { [[ -f "$1" ]] && echo " ✓ $2" || echo " ✗ $2"; }
+check_dir() { [[ -d "$1" && -n $(ls -A "$1" 2>/dev/null) ]] && echo " ✓ $2" || echo " ✗ $2"; }
+
+# Resolve a template name to a file path using the priority stack:
+# 1. .specify/templates/overrides/
+# 2. .specify/presets//templates/ (sorted by priority from .registry)
+# 3. .specify/extensions//templates/
+# 4. .specify/templates/ (core)
+resolve_template() {
+ local template_name="$1"
+ local repo_root="$2"
+ local base="$repo_root/.specify/templates"
+
+ # Priority 1: Project overrides
+ local override="$base/overrides/${template_name}.md"
+ [ -f "$override" ] && echo "$override" && return 0
+
+ # Priority 2: Installed presets (sorted by priority from .registry)
+ local presets_dir="$repo_root/.specify/presets"
+ if [ -d "$presets_dir" ]; then
+ local registry_file="$presets_dir/.registry"
+ if [ -f "$registry_file" ] && command -v python3 >/dev/null 2>&1; then
+ # Read preset IDs sorted by priority (lower number = higher precedence).
+ # The python3 call is wrapped in an if-condition so that set -e does not
+ # abort the function when python3 exits non-zero (e.g. invalid JSON).
+ local sorted_presets=""
+ if sorted_presets=$(SPECKIT_REGISTRY="$registry_file" python3 -c "
+import json, sys, os
+try:
+ with open(os.environ['SPECKIT_REGISTRY']) as f:
+ data = json.load(f)
+ presets = data.get('presets', {})
+ for pid, meta in sorted(presets.items(), key=lambda x: x[1].get('priority', 10) if isinstance(x[1], dict) else 10):
+ if isinstance(meta, dict) and meta.get('enabled', True) is not False:
+ print(pid)
+except Exception:
+ sys.exit(1)
+" 2>/dev/null); then
+ if [ -n "$sorted_presets" ]; then
+ # python3 succeeded and returned preset IDs — search in priority order
+ while IFS= read -r preset_id; do
+ local candidate="$presets_dir/$preset_id/templates/${template_name}.md"
+ [ -f "$candidate" ] && echo "$candidate" && return 0
+ done <<< "$sorted_presets"
+ fi
+ # python3 succeeded but registry has no presets — nothing to search
+ else
+ # python3 failed (missing, or registry parse error) — fall back to unordered directory scan
+ for preset in "$presets_dir"/*/; do
+ [ -d "$preset" ] || continue
+ local candidate="$preset/templates/${template_name}.md"
+ [ -f "$candidate" ] && echo "$candidate" && return 0
+ done
+ fi
+ else
+ # Fallback: alphabetical directory order (no python3 available)
+ for preset in "$presets_dir"/*/; do
+ [ -d "$preset" ] || continue
+ local candidate="$preset/templates/${template_name}.md"
+ [ -f "$candidate" ] && echo "$candidate" && return 0
+ done
+ fi
+ fi
+
+ # Priority 3: Extension-provided templates
+ local ext_dir="$repo_root/.specify/extensions"
+ if [ -d "$ext_dir" ]; then
+ for ext in "$ext_dir"/*/; do
+ [ -d "$ext" ] || continue
+ # Skip hidden directories (e.g. .backup, .cache)
+ case "$(basename "$ext")" in .*) continue;; esac
+ local candidate="$ext/templates/${template_name}.md"
+ [ -f "$candidate" ] && echo "$candidate" && return 0
+ done
+ fi
+
+ # Priority 4: Core templates
+ local core="$base/${template_name}.md"
+ [ -f "$core" ] && echo "$core" && return 0
+
+ # Template not found in any location.
+ # Return 1 so callers can distinguish "not found" from "found".
+ # Callers running under set -e should use: TEMPLATE=$(resolve_template ...) || true
+ return 1
+}
+
+# Resolve a template name to composed content using composition strategies.
+# Reads strategy metadata from preset manifests and composes content
+# from multiple layers using prepend, append, or wrap strategies.
+#
+# Usage: CONTENT=$(resolve_template_content "template-name" "$REPO_ROOT")
+# Returns composed content string on stdout; exit code 1 if not found.
+resolve_template_content() {
+ local template_name="$1"
+ local repo_root="$2"
+ local base="$repo_root/.specify/templates"
+
+ # Collect all layers (highest priority first)
+ local -a layer_paths=()
+ local -a layer_strategies=()
+
+ # Priority 1: Project overrides (always "replace")
+ local override="$base/overrides/${template_name}.md"
+ if [ -f "$override" ]; then
+ layer_paths+=("$override")
+ layer_strategies+=("replace")
+ fi
+
+ # Priority 2: Installed presets (sorted by priority from .registry)
+ local presets_dir="$repo_root/.specify/presets"
+ if [ -d "$presets_dir" ]; then
+ local registry_file="$presets_dir/.registry"
+ local sorted_presets=""
+ if [ -f "$registry_file" ] && command -v python3 >/dev/null 2>&1; then
+ if sorted_presets=$(SPECKIT_REGISTRY="$registry_file" python3 -c "
+import json, sys, os
+try:
+ with open(os.environ['SPECKIT_REGISTRY']) as f:
+ data = json.load(f)
+ presets = data.get('presets', {})
+ for pid, meta in sorted(presets.items(), key=lambda x: x[1].get('priority', 10) if isinstance(x[1], dict) else 10):
+ if isinstance(meta, dict) and meta.get('enabled', True) is not False:
+ print(pid)
+except Exception:
+ sys.exit(1)
+" 2>/dev/null); then
+ if [ -n "$sorted_presets" ]; then
+ local yaml_warned=false
+ while IFS= read -r preset_id; do
+ # Read strategy and file path from preset manifest
+ local strategy="replace"
+ local manifest_file=""
+ local manifest="$presets_dir/$preset_id/preset.yml"
+ if [ -f "$manifest" ] && command -v python3 >/dev/null 2>&1; then
+ # Requires PyYAML; falls back to replace/convention if unavailable
+ local result
+ local py_stderr
+ py_stderr=$(mktemp)
+ result=$(SPECKIT_MANIFEST="$manifest" SPECKIT_TMPL="$template_name" python3 -c "
+import sys, os
+try:
+ import yaml
+except ImportError:
+ print('yaml_missing', file=sys.stderr)
+ print('replace\t')
+ sys.exit(0)
+try:
+ with open(os.environ['SPECKIT_MANIFEST']) as f:
+ data = yaml.safe_load(f)
+ for t in data.get('provides', {}).get('templates', []):
+ if t.get('name') == os.environ['SPECKIT_TMPL'] and t.get('type', 'template') == 'template':
+ print(t.get('strategy', 'replace') + '\t' + t.get('file', ''))
+ sys.exit(0)
+ print('replace\t')
+except Exception:
+ print('replace\t')
+" 2>"$py_stderr")
+ local parse_status=$?
+ if [ $parse_status -eq 0 ] && [ -n "$result" ]; then
+ IFS=$'\t' read -r strategy manifest_file <<< "$result"
+ strategy=$(printf '%s' "$strategy" | tr '[:upper:]' '[:lower:]')
+ fi
+ if [ "$yaml_warned" = false ] && grep -q 'yaml_missing' "$py_stderr" 2>/dev/null; then
+ echo "Warning: PyYAML not available; composition strategies may be ignored" >&2
+ yaml_warned=true
+ fi
+ rm -f "$py_stderr"
+ fi
+ # Try manifest file path first, then convention path
+ local candidate=""
+ if [ -n "$manifest_file" ]; then
+ # Reject absolute paths and parent traversal
+ case "$manifest_file" in
+ /*|*../*|../*) manifest_file="" ;;
+ esac
+ fi
+ if [ -n "$manifest_file" ]; then
+ local mf="$presets_dir/$preset_id/$manifest_file"
+ [ -f "$mf" ] && candidate="$mf"
+ fi
+ if [ -z "$candidate" ]; then
+ local cf="$presets_dir/$preset_id/templates/${template_name}.md"
+ [ -f "$cf" ] && candidate="$cf"
+ fi
+ if [ -n "$candidate" ]; then
+ layer_paths+=("$candidate")
+ layer_strategies+=("$strategy")
+ fi
+ done <<< "$sorted_presets"
+ fi
+ else
+ # python3 failed — fall back to unordered directory scan (replace only)
+ for preset in "$presets_dir"/*/; do
+ [ -d "$preset" ] || continue
+ local candidate="$preset/templates/${template_name}.md"
+ if [ -f "$candidate" ]; then
+ layer_paths+=("$candidate")
+ layer_strategies+=("replace")
+ fi
+ done
+ fi
+ else
+ # No python3 or registry — fall back to unordered directory scan (replace only)
+ for preset in "$presets_dir"/*/; do
+ [ -d "$preset" ] || continue
+ local candidate="$preset/templates/${template_name}.md"
+ if [ -f "$candidate" ]; then
+ layer_paths+=("$candidate")
+ layer_strategies+=("replace")
+ fi
+ done
+ fi
+ fi
+
+ # Priority 3: Extension-provided templates (always "replace")
+ local ext_dir="$repo_root/.specify/extensions"
+ if [ -d "$ext_dir" ]; then
+ for ext in "$ext_dir"/*/; do
+ [ -d "$ext" ] || continue
+ case "$(basename "$ext")" in .*) continue;; esac
+ local candidate="$ext/templates/${template_name}.md"
+ if [ -f "$candidate" ]; then
+ layer_paths+=("$candidate")
+ layer_strategies+=("replace")
+ fi
+ done
+ fi
+
+ # Priority 4: Core templates (always "replace")
+ local core="$base/${template_name}.md"
+ if [ -f "$core" ]; then
+ layer_paths+=("$core")
+ layer_strategies+=("replace")
+ fi
+
+ local count=${#layer_paths[@]}
+ [ "$count" -eq 0 ] && return 1
+
+ # Check if any layer uses a non-replace strategy
+ local has_composition=false
+ for s in "${layer_strategies[@]}"; do
+ [ "$s" != "replace" ] && has_composition=true && break
+ done
+
+ # If the top (highest-priority) layer is replace, it wins entirely —
+ # lower layers are irrelevant regardless of their strategies.
+ if [ "${layer_strategies[0]}" = "replace" ]; then
+ cat "${layer_paths[0]}"
+ return 0
+ fi
+
+ if [ "$has_composition" = false ]; then
+ cat "${layer_paths[0]}"
+ return 0
+ fi
+
+ # Find the effective base: scan from highest priority (index 0) downward
+ # to find the nearest replace layer. Only compose layers above that base.
+ local base_idx=-1
+ local i
+ for (( i=0; i=0; i-- )); do
+ local path="${layer_paths[$i]}"
+ local strat="${layer_strategies[$i]}"
+ local layer_content
+ # Preserve trailing newlines
+ layer_content=$(cat "$path"; printf x)
+ layer_content="${layer_content%x}"
+
+ case "$strat" in
+ replace) content="$layer_content" ;;
+ prepend) content="$(printf '%s\n\n%s' "$layer_content" "$content")" ;;
+ append) content="$(printf '%s\n\n%s' "$content" "$layer_content")" ;;
+ wrap)
+ case "$layer_content" in
+ *'{CORE_TEMPLATE}'*) ;;
+ *) echo "Error: wrap strategy missing {CORE_TEMPLATE} placeholder" >&2; return 1 ;;
+ esac
+ while [[ "$layer_content" == *'{CORE_TEMPLATE}'* ]]; do
+ local before="${layer_content%%\{CORE_TEMPLATE\}*}"
+ local after="${layer_content#*\{CORE_TEMPLATE\}}"
+ layer_content="${before}${content}${after}"
+ done
+ content="$layer_content"
+ ;;
+ *) echo "Error: unknown strategy '$strat'" >&2; return 1 ;;
+ esac
+ done
+
+ printf '%s' "$content"
+ return 0
+}
diff --git a/.specify/scripts/bash/create-new-feature.sh b/.specify/scripts/bash/create-new-feature.sh
new file mode 100755
index 0000000..c960976
--- /dev/null
+++ b/.specify/scripts/bash/create-new-feature.sh
@@ -0,0 +1,299 @@
+#!/usr/bin/env bash
+
+set -e
+
+JSON_MODE=false
+DRY_RUN=false
+ALLOW_EXISTING=false
+SHORT_NAME=""
+BRANCH_NUMBER=""
+USE_TIMESTAMP=false
+ARGS=()
+i=1
+while [ $i -le $# ]; do
+ arg="${!i}"
+ case "$arg" in
+ --json)
+ JSON_MODE=true
+ ;;
+ --dry-run)
+ DRY_RUN=true
+ ;;
+ --allow-existing-branch)
+ ALLOW_EXISTING=true
+ ;;
+ --short-name)
+ if [ $((i + 1)) -gt $# ]; then
+ echo 'Error: --short-name requires a value' >&2
+ exit 1
+ fi
+ i=$((i + 1))
+ next_arg="${!i}"
+ # Check if the next argument is another option (starts with --)
+ if [[ "$next_arg" == --* ]]; then
+ echo 'Error: --short-name requires a value' >&2
+ exit 1
+ fi
+ SHORT_NAME="$next_arg"
+ ;;
+ --number)
+ if [ $((i + 1)) -gt $# ]; then
+ echo 'Error: --number requires a value' >&2
+ exit 1
+ fi
+ i=$((i + 1))
+ next_arg="${!i}"
+ if [[ "$next_arg" == --* ]]; then
+ echo 'Error: --number requires a value' >&2
+ exit 1
+ fi
+ BRANCH_NUMBER="$next_arg"
+ ;;
+ --timestamp)
+ USE_TIMESTAMP=true
+ ;;
+ --help|-h)
+ echo "Usage: $0 [--json] [--dry-run] [--allow-existing-branch] [--short-name ] [--number N] [--timestamp] "
+ echo ""
+ echo "Options:"
+ echo " --json Output in JSON format"
+ echo " --dry-run Compute feature name and paths without creating directories or files"
+ echo " --allow-existing-branch Reuse an existing feature directory if it already exists"
+ echo " --short-name Provide a custom short name (2-4 words) for the feature"
+ echo " --number N Specify branch number manually (overrides auto-detection)"
+ echo " --timestamp Use timestamp prefix (YYYYMMDD-HHMMSS) instead of sequential numbering"
+ echo " --help, -h Show this help message"
+ echo ""
+ echo "Examples:"
+ echo " $0 'Add user authentication system' --short-name 'user-auth'"
+ echo " $0 'Implement OAuth2 integration for API' --number 5"
+ echo " $0 --timestamp --short-name 'user-auth' 'Add user authentication'"
+ exit 0
+ ;;
+ *)
+ ARGS+=("$arg")
+ ;;
+ esac
+ i=$((i + 1))
+done
+
+FEATURE_DESCRIPTION="${ARGS[*]}"
+if [ -z "$FEATURE_DESCRIPTION" ]; then
+ echo "Usage: $0 [--json] [--dry-run] [--allow-existing-branch] [--short-name ] [--number N] [--timestamp] " >&2
+ exit 1
+fi
+
+# Trim whitespace and validate description is not empty (e.g., user passed only whitespace)
+FEATURE_DESCRIPTION=$(echo "$FEATURE_DESCRIPTION" | sed -E 's/^[[:space:]]+|[[:space:]]+$//g')
+if [ -z "$FEATURE_DESCRIPTION" ]; then
+ echo "Error: Feature description cannot be empty or contain only whitespace" >&2
+ exit 1
+fi
+
+# Function to get highest number from specs directory
+get_highest_from_specs() {
+ local specs_dir="$1"
+ local highest=0
+
+ if [ -d "$specs_dir" ]; then
+ for dir in "$specs_dir"/*; do
+ [ -d "$dir" ] || continue
+ dirname=$(basename "$dir")
+ # Match sequential prefixes (>=3 digits), but skip timestamp dirs.
+ if echo "$dirname" | grep -Eq '^[0-9]{3,}-' && ! echo "$dirname" | grep -Eq '^[0-9]{8}-[0-9]{6}-'; then
+ number=$(echo "$dirname" | grep -Eo '^[0-9]+')
+ number=$((10#$number))
+ if [ "$number" -gt "$highest" ]; then
+ highest=$number
+ fi
+ fi
+ done
+ fi
+
+ echo "$highest"
+}
+
+# Function to clean and format a branch name
+clean_branch_name() {
+ local name="$1"
+ echo "$name" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/-\+/-/g' | sed 's/^-//' | sed 's/-$//'
+}
+
+# Resolve repository root using common.sh functions which prioritize .specify
+SCRIPT_DIR="$(CDPATH="" cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+source "$SCRIPT_DIR/common.sh"
+
+REPO_ROOT=$(get_repo_root) || exit 1
+
+cd "$REPO_ROOT"
+
+SPECS_DIR="$REPO_ROOT/specs"
+if [ "$DRY_RUN" != true ]; then
+ mkdir -p "$SPECS_DIR"
+fi
+
+# Function to generate branch name with stop word filtering and length filtering
+generate_branch_name() {
+ local description="$1"
+
+ # Common stop words to filter out
+ local stop_words="^(i|a|an|the|to|for|of|in|on|at|by|with|from|is|are|was|were|be|been|being|have|has|had|do|does|did|will|would|should|could|can|may|might|must|shall|this|that|these|those|my|your|our|their|want|need|add|get|set)$"
+
+ # Convert to lowercase and split into words
+ local clean_name=$(echo "$description" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/ /g')
+
+ # Filter words: remove stop words and words shorter than 3 chars (unless they're uppercase acronyms in original)
+ local meaningful_words=()
+ for word in $clean_name; do
+ # Skip empty words
+ [ -z "$word" ] && continue
+
+ # Keep words that are NOT stop words AND (length >= 3 OR are potential acronyms)
+ if ! echo "$word" | grep -qiE "$stop_words"; then
+ if [ ${#word} -ge 3 ]; then
+ meaningful_words+=("$word")
+ elif echo "$description" | grep -q "\b${word^^}\b"; then
+ # Keep short words if they appear as uppercase in original (likely acronyms)
+ meaningful_words+=("$word")
+ fi
+ fi
+ done
+
+ # If we have meaningful words, use first 3-4 of them
+ if [ ${#meaningful_words[@]} -gt 0 ]; then
+ local max_words=3
+ if [ ${#meaningful_words[@]} -eq 4 ]; then max_words=4; fi
+
+ local result=""
+ local count=0
+ for word in "${meaningful_words[@]}"; do
+ if [ $count -ge $max_words ]; then break; fi
+ if [ -n "$result" ]; then result="$result-"; fi
+ result="$result$word"
+ count=$((count + 1))
+ done
+ echo "$result"
+ else
+ # Fallback to original logic if no meaningful words found
+ local cleaned=$(clean_branch_name "$description")
+ echo "$cleaned" | tr '-' '\n' | grep -v '^$' | head -3 | tr '\n' '-' | sed 's/-$//'
+ fi
+}
+
+# Generate branch name
+if [ -n "$SHORT_NAME" ]; then
+ # Use provided short name, just clean it up
+ BRANCH_SUFFIX=$(clean_branch_name "$SHORT_NAME")
+else
+ # Generate from description with smart filtering
+ BRANCH_SUFFIX=$(generate_branch_name "$FEATURE_DESCRIPTION")
+fi
+
+# Warn if --number and --timestamp are both specified
+if [ "$USE_TIMESTAMP" = true ] && [ -n "$BRANCH_NUMBER" ]; then
+ >&2 echo "[specify] Warning: --number is ignored when --timestamp is used"
+ BRANCH_NUMBER=""
+fi
+
+# Determine branch prefix
+if [ "$USE_TIMESTAMP" = true ]; then
+ FEATURE_NUM=$(date +%Y%m%d-%H%M%S)
+ BRANCH_NAME="${FEATURE_NUM}-${BRANCH_SUFFIX}"
+else
+ # Determine branch number from existing feature directories
+ if [ -z "$BRANCH_NUMBER" ]; then
+ HIGHEST=$(get_highest_from_specs "$SPECS_DIR")
+ BRANCH_NUMBER=$((HIGHEST + 1))
+ fi
+
+ # Force base-10 interpretation to prevent octal conversion (e.g., 010 → 8 in octal, but should be 10 in decimal)
+ FEATURE_NUM=$(printf "%03d" "$((10#$BRANCH_NUMBER))")
+ BRANCH_NAME="${FEATURE_NUM}-${BRANCH_SUFFIX}"
+fi
+
+# GitHub enforces a 244-byte limit on branch names
+# Validate and truncate if necessary
+MAX_BRANCH_LENGTH=244
+if [ ${#BRANCH_NAME} -gt $MAX_BRANCH_LENGTH ]; then
+ # Calculate how much we need to trim from suffix
+ # Account for prefix length: timestamp (15) + hyphen (1) = 16, or sequential (3) + hyphen (1) = 4
+ PREFIX_LENGTH=$(( ${#FEATURE_NUM} + 1 ))
+ MAX_SUFFIX_LENGTH=$((MAX_BRANCH_LENGTH - PREFIX_LENGTH))
+
+ # Truncate suffix at word boundary if possible
+ TRUNCATED_SUFFIX=$(echo "$BRANCH_SUFFIX" | cut -c1-$MAX_SUFFIX_LENGTH)
+ # Remove trailing hyphen if truncation created one
+ TRUNCATED_SUFFIX=$(echo "$TRUNCATED_SUFFIX" | sed 's/-$//')
+
+ ORIGINAL_BRANCH_NAME="$BRANCH_NAME"
+ BRANCH_NAME="${FEATURE_NUM}-${TRUNCATED_SUFFIX}"
+
+ >&2 echo "[specify] Warning: Branch name exceeded GitHub's 244-byte limit"
+ >&2 echo "[specify] Original: $ORIGINAL_BRANCH_NAME (${#ORIGINAL_BRANCH_NAME} bytes)"
+ >&2 echo "[specify] Truncated to: $BRANCH_NAME (${#BRANCH_NAME} bytes)"
+fi
+
+FEATURE_DIR="$SPECS_DIR/$BRANCH_NAME"
+SPEC_FILE="$FEATURE_DIR/spec.md"
+
+if [ "$DRY_RUN" != true ]; then
+ if [ -d "$FEATURE_DIR" ] && [ "$ALLOW_EXISTING" != true ]; then
+ if [ "$USE_TIMESTAMP" = true ]; then
+ >&2 echo "Error: Feature directory '$FEATURE_DIR' already exists. Rerun to get a new timestamp or use a different --short-name."
+ else
+ >&2 echo "Error: Feature directory '$FEATURE_DIR' already exists. Please use a different feature name or specify a different number with --number."
+ fi
+ exit 1
+ fi
+
+ mkdir -p "$FEATURE_DIR"
+
+ if [ ! -f "$SPEC_FILE" ]; then
+ TEMPLATE=$(resolve_template "spec-template" "$REPO_ROOT") || true
+ if [ -n "$TEMPLATE" ] && [ -f "$TEMPLATE" ]; then
+ cp "$TEMPLATE" "$SPEC_FILE"
+ else
+ echo "Warning: Spec template not found; created empty spec file" >&2
+ touch "$SPEC_FILE"
+ fi
+ fi
+
+ # Persist to .specify/feature.json so downstream commands can find the feature
+ _persist_feature_json "$REPO_ROOT" "$FEATURE_DIR"
+
+ # Inform the user how to set feature state in their own shell
+ printf '# To persist: export SPECIFY_FEATURE=%q\n' "$BRANCH_NAME" >&2
+ printf '# export SPECIFY_FEATURE_DIRECTORY=%q\n' "$FEATURE_DIR" >&2
+fi
+
+if $JSON_MODE; then
+ if command -v jq >/dev/null 2>&1; then
+ if [ "$DRY_RUN" = true ]; then
+ jq -cn \
+ --arg branch_name "$BRANCH_NAME" \
+ --arg spec_file "$SPEC_FILE" \
+ --arg feature_num "$FEATURE_NUM" \
+ '{BRANCH_NAME:$branch_name,SPEC_FILE:$spec_file,FEATURE_NUM:$feature_num,DRY_RUN:true}'
+ else
+ jq -cn \
+ --arg branch_name "$BRANCH_NAME" \
+ --arg spec_file "$SPEC_FILE" \
+ --arg feature_num "$FEATURE_NUM" \
+ '{BRANCH_NAME:$branch_name,SPEC_FILE:$spec_file,FEATURE_NUM:$feature_num}'
+ fi
+ else
+ if [ "$DRY_RUN" = true ]; then
+ printf '{"BRANCH_NAME":"%s","SPEC_FILE":"%s","FEATURE_NUM":"%s","DRY_RUN":true}\n' "$(json_escape "$BRANCH_NAME")" "$(json_escape "$SPEC_FILE")" "$(json_escape "$FEATURE_NUM")"
+ else
+ printf '{"BRANCH_NAME":"%s","SPEC_FILE":"%s","FEATURE_NUM":"%s"}\n' "$(json_escape "$BRANCH_NAME")" "$(json_escape "$SPEC_FILE")" "$(json_escape "$FEATURE_NUM")"
+ fi
+ fi
+else
+ echo "BRANCH_NAME: $BRANCH_NAME"
+ echo "SPEC_FILE: $SPEC_FILE"
+ echo "FEATURE_NUM: $FEATURE_NUM"
+ if [ "$DRY_RUN" != true ]; then
+ printf '# To persist in your shell: export SPECIFY_FEATURE=%q\n' "$BRANCH_NAME"
+ printf '# export SPECIFY_FEATURE_DIRECTORY=%q\n' "$FEATURE_DIR"
+ fi
+fi
diff --git a/.specify/scripts/bash/setup-plan.sh b/.specify/scripts/bash/setup-plan.sh
new file mode 100755
index 0000000..cb67943
--- /dev/null
+++ b/.specify/scripts/bash/setup-plan.sh
@@ -0,0 +1,84 @@
+#!/usr/bin/env bash
+
+set -e
+
+# Parse command line arguments
+JSON_MODE=false
+ARGS=()
+
+for arg in "$@"; do
+ case "$arg" in
+ --json)
+ JSON_MODE=true
+ ;;
+ --help|-h)
+ echo "Usage: $0 [--json]"
+ echo " --json Output results in JSON format"
+ echo " --help Show this help message"
+ exit 0
+ ;;
+ *)
+ ARGS+=("$arg")
+ ;;
+ esac
+done
+
+# Get script directory and load common functions
+SCRIPT_DIR="$(CDPATH="" cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+source "$SCRIPT_DIR/common.sh"
+
+# Get all paths and variables from common functions
+_paths_output=$(get_feature_paths) || { echo "ERROR: Failed to resolve feature paths" >&2; exit 1; }
+eval "$_paths_output"
+unset _paths_output
+
+# Ensure the feature directory exists
+mkdir -p "$FEATURE_DIR"
+
+# Copy plan template if plan doesn't already exist
+if [[ -f "$IMPL_PLAN" ]]; then
+ if $JSON_MODE; then
+ echo "Plan already exists at $IMPL_PLAN, skipping template copy" >&2
+ else
+ echo "Plan already exists at $IMPL_PLAN, skipping template copy"
+ fi
+else
+ TEMPLATE=$(resolve_template "plan-template" "$REPO_ROOT") || true
+ if [[ -n "$TEMPLATE" ]] && [[ -f "$TEMPLATE" ]]; then
+ cp "$TEMPLATE" "$IMPL_PLAN"
+ if $JSON_MODE; then
+ echo "Copied plan template to $IMPL_PLAN" >&2
+ else
+ echo "Copied plan template to $IMPL_PLAN"
+ fi
+ else
+ if $JSON_MODE; then
+ echo "Warning: Plan template not found" >&2
+ else
+ echo "Warning: Plan template not found"
+ fi
+ # Create a basic plan file if template doesn't exist
+ touch "$IMPL_PLAN"
+ fi
+fi
+
+# Output results
+if $JSON_MODE; then
+ if has_jq; then
+ jq -cn \
+ --arg feature_spec "$FEATURE_SPEC" \
+ --arg impl_plan "$IMPL_PLAN" \
+ --arg specs_dir "$FEATURE_DIR" \
+ --arg branch "$CURRENT_BRANCH" \
+ '{FEATURE_SPEC:$feature_spec,IMPL_PLAN:$impl_plan,SPECS_DIR:$specs_dir,BRANCH:$branch}'
+ else
+ printf '{"FEATURE_SPEC":"%s","IMPL_PLAN":"%s","SPECS_DIR":"%s","BRANCH":"%s"}\n' \
+ "$(json_escape "$FEATURE_SPEC")" "$(json_escape "$IMPL_PLAN")" "$(json_escape "$FEATURE_DIR")" "$(json_escape "$CURRENT_BRANCH")"
+ fi
+else
+ echo "FEATURE_SPEC: $FEATURE_SPEC"
+ echo "IMPL_PLAN: $IMPL_PLAN"
+ echo "SPECS_DIR: $FEATURE_DIR"
+ echo "BRANCH: $CURRENT_BRANCH"
+fi
+
diff --git a/.specify/scripts/bash/setup-tasks.sh b/.specify/scripts/bash/setup-tasks.sh
new file mode 100755
index 0000000..ae0d7bd
--- /dev/null
+++ b/.specify/scripts/bash/setup-tasks.sh
@@ -0,0 +1,91 @@
+#!/usr/bin/env bash
+
+set -e
+
+# Parse command line arguments
+JSON_MODE=false
+
+for arg in "$@"; do
+ case "$arg" in
+ --json) JSON_MODE=true ;;
+ --help|-h)
+ echo "Usage: $0 [--json]"
+ echo " --json Output results in JSON format"
+ echo " --help Show this help message"
+ exit 0
+ ;;
+ *) echo "ERROR: Unknown option '$arg'" >&2; exit 1 ;;
+ esac
+done
+
+# Source common functions
+SCRIPT_DIR="$(CDPATH="" cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+source "$SCRIPT_DIR/common.sh"
+
+# Get feature paths
+_paths_output=$(get_feature_paths) || { echo "ERROR: Failed to resolve feature paths" >&2; exit 1; }
+eval "$_paths_output"
+unset _paths_output
+
+# Validate required files
+if [[ ! -f "$IMPL_PLAN" ]]; then
+ echo "ERROR: plan.md not found in $FEATURE_DIR" >&2
+ echo "Run /speckit-plan first to create the implementation plan." >&2
+ exit 1
+fi
+
+if [[ ! -f "$FEATURE_SPEC" ]]; then
+ echo "ERROR: spec.md not found in $FEATURE_DIR" >&2
+ echo "Run /speckit-specify first to create the feature structure." >&2
+ exit 1
+fi
+
+# Build available docs list
+docs=()
+[[ -f "$RESEARCH" ]] && docs+=("research.md")
+[[ -f "$DATA_MODEL" ]] && docs+=("data-model.md")
+if [[ -d "$CONTRACTS_DIR" ]] && [[ -n "$(ls -A "$CONTRACTS_DIR" 2>/dev/null)" ]]; then
+ docs+=("contracts/")
+fi
+[[ -f "$QUICKSTART" ]] && docs+=("quickstart.md")
+
+# Resolve tasks template through override stack
+TASKS_TEMPLATE=$(resolve_template "tasks-template" "$REPO_ROOT") || true
+if [[ -z "$TASKS_TEMPLATE" ]] || [[ ! -f "$TASKS_TEMPLATE" ]]; then
+ echo "ERROR: Could not resolve required tasks-template from the template override stack for $REPO_ROOT" >&2
+ echo "Template 'tasks-template' was not found in any supported location (overrides, presets, extensions, or shared core). Add an override at .specify/templates/overrides/tasks-template.md, or run 'specify init' / reinstall shared infra to restore the core .specify/templates/tasks-template.md template." >&2
+ exit 1
+fi
+
+# Output results
+if $JSON_MODE; then
+ if has_jq; then
+ if [[ ${#docs[@]} -eq 0 ]]; then
+ json_docs="[]"
+ else
+ json_docs=$(printf '%s\n' "${docs[@]}" | jq -R . | jq -s .)
+ fi
+ jq -cn \
+ --arg feature_dir "$FEATURE_DIR" \
+ --argjson docs "$json_docs" \
+ --arg tasks_template "${TASKS_TEMPLATE:-}" \
+ '{FEATURE_DIR:$feature_dir,AVAILABLE_DOCS:$docs,TASKS_TEMPLATE:$tasks_template}'
+ else
+ if [[ ${#docs[@]} -eq 0 ]]; then
+ json_docs="[]"
+ else
+ json_docs=$(for d in "${docs[@]}"; do printf '"%s",' "$(json_escape "$d")"; done)
+ json_docs="[${json_docs%,}]"
+ fi
+ printf '{"FEATURE_DIR":"%s","AVAILABLE_DOCS":%s,"TASKS_TEMPLATE":"%s"}\n' \
+ "$(json_escape "$FEATURE_DIR")" "$json_docs" "$(json_escape "${TASKS_TEMPLATE:-}")"
+ fi
+else
+ echo "FEATURE_DIR: $FEATURE_DIR"
+ echo "TASKS_TEMPLATE: ${TASKS_TEMPLATE:-not found}"
+ echo "AVAILABLE_DOCS:"
+ check_file "$RESEARCH" "research.md"
+ check_file "$DATA_MODEL" "data-model.md"
+ check_dir "$CONTRACTS_DIR" "contracts/"
+ check_file "$QUICKSTART" "quickstart.md"
+fi
diff --git a/.specify/templates/checklist-template.md b/.specify/templates/checklist-template.md
new file mode 100644
index 0000000..c4aa166
--- /dev/null
+++ b/.specify/templates/checklist-template.md
@@ -0,0 +1,40 @@
+# [CHECKLIST TYPE] Checklist: [FEATURE NAME]
+
+**Purpose**: [Brief description of what this checklist covers]
+**Created**: [DATE]
+**Feature**: [Link to spec.md or relevant documentation]
+
+**Note**: This checklist is generated by the `/speckit-checklist` command based on feature context and requirements.
+
+
+
+## [Category 1]
+
+- [ ] CHK001 First checklist item with clear action
+- [ ] CHK002 Second checklist item
+- [ ] CHK003 Third checklist item
+
+## [Category 2]
+
+- [ ] CHK004 Another category item
+- [ ] CHK005 Item with specific criteria
+- [ ] CHK006 Final item in this category
+
+## Notes
+
+- Check items off as completed: `[x]`
+- Add comments or findings inline
+- Link to relevant resources or documentation
+- Items are numbered sequentially for easy reference
diff --git a/.specify/templates/constitution-template.md b/.specify/templates/constitution-template.md
new file mode 100644
index 0000000..a4670ff
--- /dev/null
+++ b/.specify/templates/constitution-template.md
@@ -0,0 +1,50 @@
+# [PROJECT_NAME] Constitution
+
+
+## Core Principles
+
+### [PRINCIPLE_1_NAME]
+
+[PRINCIPLE_1_DESCRIPTION]
+
+
+### [PRINCIPLE_2_NAME]
+
+[PRINCIPLE_2_DESCRIPTION]
+
+
+### [PRINCIPLE_3_NAME]
+
+[PRINCIPLE_3_DESCRIPTION]
+
+
+### [PRINCIPLE_4_NAME]
+
+[PRINCIPLE_4_DESCRIPTION]
+
+
+### [PRINCIPLE_5_NAME]
+
+[PRINCIPLE_5_DESCRIPTION]
+
+
+## [SECTION_2_NAME]
+
+
+[SECTION_2_CONTENT]
+
+
+## [SECTION_3_NAME]
+
+
+[SECTION_3_CONTENT]
+
+
+## Governance
+
+
+[GOVERNANCE_RULES]
+
+
+**Version**: [CONSTITUTION_VERSION] | **Ratified**: [RATIFICATION_DATE] | **Last Amended**: [LAST_AMENDED_DATE]
+
diff --git a/.specify/templates/plan-template.md b/.specify/templates/plan-template.md
new file mode 100644
index 0000000..92b96c7
--- /dev/null
+++ b/.specify/templates/plan-template.md
@@ -0,0 +1,113 @@
+# Implementation Plan: [FEATURE]
+
+**Branch**: `[###-feature-name]` | **Date**: [DATE] | **Spec**: [link]
+
+**Input**: Feature specification from `/specs/[###-feature-name]/spec.md`
+
+**Note**: This template is filled in by the `/speckit-plan` command. See `.specify/templates/plan-template.md` for the execution workflow.
+
+## Summary
+
+[Extract from feature spec: primary requirement + technical approach from research]
+
+## Technical Context
+
+
+
+**Language/Version**: [e.g., Python 3.11, Swift 5.9, Rust 1.75 or NEEDS CLARIFICATION]
+
+**Primary Dependencies**: [e.g., FastAPI, UIKit, LLVM or NEEDS CLARIFICATION]
+
+**Storage**: [if applicable, e.g., PostgreSQL, CoreData, files or N/A]
+
+**Testing**: [e.g., pytest, XCTest, cargo test or NEEDS CLARIFICATION]
+
+**Target Platform**: [e.g., Linux server, iOS 15+, WASM or NEEDS CLARIFICATION]
+
+**Project Type**: [e.g., library/cli/web-service/mobile-app/compiler/desktop-app or NEEDS CLARIFICATION]
+
+**Performance Goals**: [domain-specific, e.g., 1000 req/s, 10k lines/sec, 60 fps or NEEDS CLARIFICATION]
+
+**Constraints**: [domain-specific, e.g., <200ms p95, <100MB memory, offline-capable or NEEDS CLARIFICATION]
+
+**Scale/Scope**: [domain-specific, e.g., 10k users, 1M LOC, 50 screens or NEEDS CLARIFICATION]
+
+## Constitution Check
+
+*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
+
+[Gates determined based on constitution file]
+
+## Project Structure
+
+### Documentation (this feature)
+
+```text
+specs/[###-feature]/
+├── plan.md # This file (/speckit-plan command output)
+├── research.md # Phase 0 output (/speckit-plan command)
+├── data-model.md # Phase 1 output (/speckit-plan command)
+├── quickstart.md # Phase 1 output (/speckit-plan command)
+├── contracts/ # Phase 1 output (/speckit-plan command)
+└── tasks.md # Phase 2 output (/speckit-tasks command - NOT created by /speckit-plan)
+```
+
+### Source Code (repository root)
+
+
+```text
+# [REMOVE IF UNUSED] Option 1: Single project (DEFAULT)
+src/
+├── models/
+├── services/
+├── cli/
+└── lib/
+
+tests/
+├── contract/
+├── integration/
+└── unit/
+
+# [REMOVE IF UNUSED] Option 2: Web application (when "frontend" + "backend" detected)
+backend/
+├── src/
+│ ├── models/
+│ ├── services/
+│ └── api/
+└── tests/
+
+frontend/
+├── src/
+│ ├── components/
+│ ├── pages/
+│ └── services/
+└── tests/
+
+# [REMOVE IF UNUSED] Option 3: Mobile + API (when "iOS/Android" detected)
+api/
+└── [same as backend above]
+
+ios/ or android/
+└── [platform-specific structure: feature modules, UI flows, platform tests]
+```
+
+**Structure Decision**: [Document the selected structure and reference the real
+directories captured above]
+
+## Complexity Tracking
+
+> **Fill ONLY if Constitution Check has violations that must be justified**
+
+| Violation | Why Needed | Simpler Alternative Rejected Because |
+|-----------|------------|-------------------------------------|
+| [e.g., 4th project] | [current need] | [why 3 projects insufficient] |
+| [e.g., Repository pattern] | [specific problem] | [why direct DB access insufficient] |
diff --git a/.specify/templates/spec-template.md b/.specify/templates/spec-template.md
new file mode 100644
index 0000000..ceb2877
--- /dev/null
+++ b/.specify/templates/spec-template.md
@@ -0,0 +1,131 @@
+# Feature Specification: [FEATURE NAME]
+
+**Feature Branch**: `[###-feature-name]`
+
+**Created**: [DATE]
+
+**Status**: Draft
+
+**Input**: User description: "$ARGUMENTS"
+
+## User Scenarios & Testing *(mandatory)*
+
+
+
+### User Story 1 - [Brief Title] (Priority: P1)
+
+[Describe this user journey in plain language]
+
+**Why this priority**: [Explain the value and why it has this priority level]
+
+**Independent Test**: [Describe how this can be tested independently - e.g., "Can be fully tested by [specific action] and delivers [specific value]"]
+
+**Acceptance Scenarios**:
+
+1. **Given** [initial state], **When** [action], **Then** [expected outcome]
+2. **Given** [initial state], **When** [action], **Then** [expected outcome]
+
+---
+
+### User Story 2 - [Brief Title] (Priority: P2)
+
+[Describe this user journey in plain language]
+
+**Why this priority**: [Explain the value and why it has this priority level]
+
+**Independent Test**: [Describe how this can be tested independently]
+
+**Acceptance Scenarios**:
+
+1. **Given** [initial state], **When** [action], **Then** [expected outcome]
+
+---
+
+### User Story 3 - [Brief Title] (Priority: P3)
+
+[Describe this user journey in plain language]
+
+**Why this priority**: [Explain the value and why it has this priority level]
+
+**Independent Test**: [Describe how this can be tested independently]
+
+**Acceptance Scenarios**:
+
+1. **Given** [initial state], **When** [action], **Then** [expected outcome]
+
+---
+
+[Add more user stories as needed, each with an assigned priority]
+
+### Edge Cases
+
+
+
+- What happens when [boundary condition]?
+- How does system handle [error scenario]?
+
+## Requirements *(mandatory)*
+
+
+
+### Functional Requirements
+
+- **FR-001**: System MUST [specific capability, e.g., "allow users to create accounts"]
+- **FR-002**: System MUST [specific capability, e.g., "validate email addresses"]
+- **FR-003**: Users MUST be able to [key interaction, e.g., "reset their password"]
+- **FR-004**: System MUST [data requirement, e.g., "persist user preferences"]
+- **FR-005**: System MUST [behavior, e.g., "log all security events"]
+
+*Example of marking unclear requirements:*
+
+- **FR-006**: System MUST authenticate users via [NEEDS CLARIFICATION: auth method not specified - email/password, SSO, OAuth?]
+- **FR-007**: System MUST retain user data for [NEEDS CLARIFICATION: retention period not specified]
+
+### Key Entities *(include if feature involves data)*
+
+- **[Entity 1]**: [What it represents, key attributes without implementation]
+- **[Entity 2]**: [What it represents, relationships to other entities]
+
+## Success Criteria *(mandatory)*
+
+
+
+### Measurable Outcomes
+
+- **SC-001**: [Measurable metric, e.g., "Users can complete account creation in under 2 minutes"]
+- **SC-002**: [Measurable metric, e.g., "System handles 1000 concurrent users without degradation"]
+- **SC-003**: [User satisfaction metric, e.g., "90% of users successfully complete primary task on first attempt"]
+- **SC-004**: [Business metric, e.g., "Reduce support tickets related to [X] by 50%"]
+
+## Assumptions
+
+
+
+- [Assumption about target users, e.g., "Users have stable internet connectivity"]
+- [Assumption about scope boundaries, e.g., "Mobile support is out of scope for v1"]
+- [Assumption about data/environment, e.g., "Existing authentication system will be reused"]
+- [Dependency on existing system/service, e.g., "Requires access to the existing user profile API"]
diff --git a/.specify/templates/tasks-template.md b/.specify/templates/tasks-template.md
new file mode 100644
index 0000000..d46a1f1
--- /dev/null
+++ b/.specify/templates/tasks-template.md
@@ -0,0 +1,252 @@
+---
+
+description: "Task list template for feature implementation"
+---
+
+# Tasks: [FEATURE NAME]
+
+**Input**: Design documents from `/specs/[###-feature-name]/`
+
+**Prerequisites**: plan.md (required), spec.md (required for user stories), research.md, data-model.md, contracts/
+
+**Tests**: The examples below include test tasks. Tests are OPTIONAL - only include them if explicitly requested in the feature specification.
+
+**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
+
+## Format: `[ID] [P?] [Story] Description`
+
+- **[P]**: Can run in parallel (different files, no dependencies)
+- **[Story]**: Which user story this task belongs to (e.g., US1, US2, US3)
+- Include exact file paths in descriptions
+
+## Path Conventions
+
+- **Single project**: `src/`, `tests/` at repository root
+- **Web app**: `backend/src/`, `frontend/src/`
+- **Mobile**: `api/src/`, `ios/src/` or `android/src/`
+- Paths shown below assume single project - adjust based on plan.md structure
+
+
+
+## Phase 1: Setup (Shared Infrastructure)
+
+**Purpose**: Project initialization and basic structure
+
+- [ ] T001 Create project structure per implementation plan
+- [ ] T002 Initialize [language] project with [framework] dependencies
+- [ ] T003 [P] Configure linting and formatting tools
+
+---
+
+## Phase 2: Foundational (Blocking Prerequisites)
+
+**Purpose**: Core infrastructure that MUST be complete before ANY user story can be implemented
+
+**⚠️ CRITICAL**: No user story work can begin until this phase is complete
+
+Examples of foundational tasks (adjust based on your project):
+
+- [ ] T004 Setup database schema and migrations framework
+- [ ] T005 [P] Implement authentication/authorization framework
+- [ ] T006 [P] Setup API routing and middleware structure
+- [ ] T007 Create base models/entities that all stories depend on
+- [ ] T008 Configure error handling and logging infrastructure
+- [ ] T009 Setup environment configuration management
+
+**Checkpoint**: Foundation ready - user story implementation can now begin in parallel
+
+---
+
+## Phase 3: User Story 1 - [Title] (Priority: P1) 🎯 MVP
+
+**Goal**: [Brief description of what this story delivers]
+
+**Independent Test**: [How to verify this story works on its own]
+
+### Tests for User Story 1 (OPTIONAL - only if tests requested) ⚠️
+
+> **NOTE: Write these tests FIRST, ensure they FAIL before implementation**
+
+- [ ] T010 [P] [US1] Contract test for [endpoint] in tests/contract/test_[name].py
+- [ ] T011 [P] [US1] Integration test for [user journey] in tests/integration/test_[name].py
+
+### Implementation for User Story 1
+
+- [ ] T012 [P] [US1] Create [Entity1] model in src/models/[entity1].py
+- [ ] T013 [P] [US1] Create [Entity2] model in src/models/[entity2].py
+- [ ] T014 [US1] Implement [Service] in src/services/[service].py (depends on T012, T013)
+- [ ] T015 [US1] Implement [endpoint/feature] in src/[location]/[file].py
+- [ ] T016 [US1] Add validation and error handling
+- [ ] T017 [US1] Add logging for user story 1 operations
+
+**Checkpoint**: At this point, User Story 1 should be fully functional and testable independently
+
+---
+
+## Phase 4: User Story 2 - [Title] (Priority: P2)
+
+**Goal**: [Brief description of what this story delivers]
+
+**Independent Test**: [How to verify this story works on its own]
+
+### Tests for User Story 2 (OPTIONAL - only if tests requested) ⚠️
+
+- [ ] T018 [P] [US2] Contract test for [endpoint] in tests/contract/test_[name].py
+- [ ] T019 [P] [US2] Integration test for [user journey] in tests/integration/test_[name].py
+
+### Implementation for User Story 2
+
+- [ ] T020 [P] [US2] Create [Entity] model in src/models/[entity].py
+- [ ] T021 [US2] Implement [Service] in src/services/[service].py
+- [ ] T022 [US2] Implement [endpoint/feature] in src/[location]/[file].py
+- [ ] T023 [US2] Integrate with User Story 1 components (if needed)
+
+**Checkpoint**: At this point, User Stories 1 AND 2 should both work independently
+
+---
+
+## Phase 5: User Story 3 - [Title] (Priority: P3)
+
+**Goal**: [Brief description of what this story delivers]
+
+**Independent Test**: [How to verify this story works on its own]
+
+### Tests for User Story 3 (OPTIONAL - only if tests requested) ⚠️
+
+- [ ] T024 [P] [US3] Contract test for [endpoint] in tests/contract/test_[name].py
+- [ ] T025 [P] [US3] Integration test for [user journey] in tests/integration/test_[name].py
+
+### Implementation for User Story 3
+
+- [ ] T026 [P] [US3] Create [Entity] model in src/models/[entity].py
+- [ ] T027 [US3] Implement [Service] in src/services/[service].py
+- [ ] T028 [US3] Implement [endpoint/feature] in src/[location]/[file].py
+
+**Checkpoint**: All user stories should now be independently functional
+
+---
+
+[Add more user story phases as needed, following the same pattern]
+
+---
+
+## Phase N: Polish & Cross-Cutting Concerns
+
+**Purpose**: Improvements that affect multiple user stories
+
+- [ ] TXXX [P] Documentation updates in docs/
+- [ ] TXXX Code cleanup and refactoring
+- [ ] TXXX Performance optimization across all stories
+- [ ] TXXX [P] Additional unit tests (if requested) in tests/unit/
+- [ ] TXXX Security hardening
+- [ ] TXXX Run quickstart.md validation
+
+---
+
+## Dependencies & Execution Order
+
+### Phase Dependencies
+
+- **Setup (Phase 1)**: No dependencies - can start immediately
+- **Foundational (Phase 2)**: Depends on Setup completion - BLOCKS all user stories
+- **User Stories (Phase 3+)**: All depend on Foundational phase completion
+ - User stories can then proceed in parallel (if staffed)
+ - Or sequentially in priority order (P1 → P2 → P3)
+- **Polish (Final Phase)**: Depends on all desired user stories being complete
+
+### User Story Dependencies
+
+- **User Story 1 (P1)**: Can start after Foundational (Phase 2) - No dependencies on other stories
+- **User Story 2 (P2)**: Can start after Foundational (Phase 2) - May integrate with US1 but should be independently testable
+- **User Story 3 (P3)**: Can start after Foundational (Phase 2) - May integrate with US1/US2 but should be independently testable
+
+### Within Each User Story
+
+- Tests (if included) MUST be written and FAIL before implementation
+- Models before services
+- Services before endpoints
+- Core implementation before integration
+- Story complete before moving to next priority
+
+### Parallel Opportunities
+
+- All Setup tasks marked [P] can run in parallel
+- All Foundational tasks marked [P] can run in parallel (within Phase 2)
+- Once Foundational phase completes, all user stories can start in parallel (if team capacity allows)
+- All tests for a user story marked [P] can run in parallel
+- Models within a story marked [P] can run in parallel
+- Different user stories can be worked on in parallel by different team members
+
+---
+
+## Parallel Example: User Story 1
+
+```bash
+# Launch all tests for User Story 1 together (if tests requested):
+Task: "Contract test for [endpoint] in tests/contract/test_[name].py"
+Task: "Integration test for [user journey] in tests/integration/test_[name].py"
+
+# Launch all models for User Story 1 together:
+Task: "Create [Entity1] model in src/models/[entity1].py"
+Task: "Create [Entity2] model in src/models/[entity2].py"
+```
+
+---
+
+## Implementation Strategy
+
+### MVP First (User Story 1 Only)
+
+1. Complete Phase 1: Setup
+2. Complete Phase 2: Foundational (CRITICAL - blocks all stories)
+3. Complete Phase 3: User Story 1
+4. **STOP and VALIDATE**: Test User Story 1 independently
+5. Deploy/demo if ready
+
+### Incremental Delivery
+
+1. Complete Setup + Foundational → Foundation ready
+2. Add User Story 1 → Test independently → Deploy/Demo (MVP!)
+3. Add User Story 2 → Test independently → Deploy/Demo
+4. Add User Story 3 → Test independently → Deploy/Demo
+5. Each story adds value without breaking previous stories
+
+### Parallel Team Strategy
+
+With multiple developers:
+
+1. Team completes Setup + Foundational together
+2. Once Foundational is done:
+ - Developer A: User Story 1
+ - Developer B: User Story 2
+ - Developer C: User Story 3
+3. Stories complete and integrate independently
+
+---
+
+## Notes
+
+- [P] tasks = different files, no dependencies
+- [Story] label maps task to specific user story for traceability
+- Each user story should be independently completable and testable
+- Verify tests fail before implementing
+- Commit after each task or logical group
+- Stop at any checkpoint to validate story independently
+- Avoid: vague tasks, same file conflicts, cross-story dependencies that break independence
diff --git a/.specify/workflows/speckit/workflow.yml b/.specify/workflows/speckit/workflow.yml
new file mode 100644
index 0000000..f69efea
--- /dev/null
+++ b/.specify/workflows/speckit/workflow.yml
@@ -0,0 +1,77 @@
+schema_version: "1.0"
+workflow:
+ id: "speckit"
+ name: "Full SDD Cycle"
+ version: "1.0.0"
+ author: "GitHub"
+ description: "Runs specify → plan → tasks → implement with review gates"
+
+requires:
+ # 0.8.5 is the first release with engine-side resolution of the
+ # ``integration: "auto"`` default. Older versions would treat "auto"
+ # as a literal integration key and fail at dispatch.
+ speckit_version: ">=0.8.5"
+ integrations:
+ # The four commands below (specify, plan, tasks, implement) are core
+ # spec-kit commands provided by every integration. The list here is an
+ # advisory, non-exhaustive compatibility hint following the documented
+ # ``any: [...]`` schema -- it is NOT a closed set. The workflow runs
+ # against any integration the project was initialized with, including
+ # ones not listed below, as long as that integration provides the four
+ # core commands referenced in ``steps``.
+ any:
+ - "claude"
+ - "copilot"
+ - "gemini"
+ - "opencode"
+
+inputs:
+ spec:
+ type: string
+ required: true
+ prompt: "Describe what you want to build"
+ integration:
+ type: string
+ default: "auto"
+ prompt: "Integration to use (e.g. claude, copilot, gemini; 'auto' uses the project's initialized integration)"
+ scope:
+ type: string
+ default: "full"
+ enum: ["full", "backend-only", "frontend-only"]
+
+steps:
+ - id: specify
+ command: speckit.specify
+ integration: "{{ inputs.integration }}"
+ input:
+ args: "{{ inputs.spec }}"
+
+ - id: review-spec
+ type: gate
+ message: "Review the generated spec before planning."
+ options: [approve, reject]
+ on_reject: abort
+
+ - id: plan
+ command: speckit.plan
+ integration: "{{ inputs.integration }}"
+ input:
+ args: "{{ inputs.spec }}"
+
+ - id: review-plan
+ type: gate
+ message: "Review the plan before generating tasks."
+ options: [approve, reject]
+ on_reject: abort
+
+ - id: tasks
+ command: speckit.tasks
+ integration: "{{ inputs.integration }}"
+ input:
+ args: "{{ inputs.spec }}"
+
+ - id: implement
+ command: speckit.implement
+ integration: "{{ inputs.integration }}"
+ input:
+ args: "{{ inputs.spec }}"
diff --git a/.specify/workflows/workflow-registry.json b/.specify/workflows/workflow-registry.json
new file mode 100644
index 0000000..2912343
--- /dev/null
+++ b/.specify/workflows/workflow-registry.json
@@ -0,0 +1,13 @@
+{
+ "schema_version": "1.0",
+ "workflows": {
+ "speckit": {
+ "name": "Full SDD Cycle",
+ "version": "1.0.0",
+ "description": "Runs specify \u2192 plan \u2192 tasks \u2192 implement with review gates",
+ "source": "bundled",
+ "installed_at": "2026-06-27T21:48:08.099604+00:00",
+ "updated_at": "2026-06-27T21:48:08.099611+00:00"
+ }
+ }
+}
\ No newline at end of file
diff --git a/CLAUDE.md b/CLAUDE.md
index a79498b..19e5637 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -142,3 +142,18 @@ cd frontend && npm install # Vue 3 frontend
```
`ANTHROPIC_API_KEY` must be set in the environment.
+
+For the OpenRouter backend (ensemble workflow synthesis/extraction), `pip install
+openai` and set `OPENROUTER_API_KEY` in the environment. OpenRouter is reached
+only through `campaignlib/api` (`make_client(backend="openrouter")`); select it on
+a CLI with `--backend openrouter --model `, or via the
+`CG_BACKEND=openrouter` env var.
+
+
+For additional context about technologies to be used, project structure,
+shell commands, and other important information, read the current plan:
+`specs/001-ensemble-workflow-ui/plan.md` (Ensemble Grounding-Doc Workflow UI —
+adds a stepped `/ensemble` UI page and OpenRouter as a per-stage LLM backend
+through the single `campaignlib` seam; leaves the existing `/grounding` Anthropic
+path unchanged).
+
diff --git a/campaign_state.py b/campaign_state.py
index 4188692..95ed1aa 100644
--- a/campaign_state.py
+++ b/campaign_state.py
@@ -66,7 +66,9 @@
from campaignlib import (
DEFAULT_MODEL,
+ add_backend_args,
build_alias_normalizer,
+ client_from_args,
format_npc_roster,
load_agent_prompt,
load_alias_map,
@@ -151,7 +153,8 @@ def main() -> None:
"canonical name before extract/synth, and a "
"'Known NPCs' roster seeds the system prompts.")
parser.add_argument("--model", default=DEFAULT_MODEL,
- help="Claude model to use")
+ help="Model id (Claude id, or an OpenRouter id for --backend openrouter)")
+ add_backend_args(parser)
parser.add_argument("--dump-input", default=None, metavar="FILE",
help="Write the synthesis prompt to FILE (and FILE.system.md) "
"without making an API call — for use with `claude -p`.")
@@ -195,7 +198,7 @@ def main() -> None:
if alias_map:
print(f"Alias map: {len(alias_map)} NPC(s) from {args.dossier_dir}")
- client = make_client()
+ client = client_from_args(args)
if tracked_items:
print(f"\n Tracking {len(tracked_items)} item(s):")
diff --git a/campaignlib/__init__.py b/campaignlib/__init__.py
index 181022f..e914beb 100644
--- a/campaignlib/__init__.py
+++ b/campaignlib/__init__.py
@@ -27,7 +27,10 @@
assemble_docs,
)
from .util import copy_to_clipboard, save_log
-from .api.client import make_client, call_api, call_api_with_tools, stream_api
+from .api.client import (
+ make_client, call_api, call_api_with_tools, stream_api,
+ add_backend_args, client_from_args,
+)
from .api.batch import (
build_batch_request,
submit_batch,
@@ -79,6 +82,8 @@
"save_log",
# api — client
"make_client",
+ "add_backend_args",
+ "client_from_args",
"call_api",
"call_api_with_tools",
"stream_api",
diff --git a/campaignlib/api/backends.py b/campaignlib/api/backends.py
index f5bd658..72061c2 100644
--- a/campaignlib/api/backends.py
+++ b/campaignlib/api/backends.py
@@ -6,9 +6,11 @@
import json
import os
+import sys
DGX_DEFAULT_MODEL = "Qwen/Qwen2.5-14B-Instruct-AWQ"
+OPENROUTER_DEFAULT_BASE_URL = "https://openrouter.ai/api/v1"
def _flatten_to_text(value) -> str:
@@ -197,6 +199,76 @@ def extra_body_for(self, resolved_model: str, thinking: bool | None) -> dict:
return self._dgxlib.resolve_model_config(resolved_model, thinking=thinking).extra_body
+# ── OpenRouter backend ───────────────────────────────────────────────────────
+#
+# OpenRouter (https://openrouter.ai) is an OpenAI-wire-compatible gateway to many
+# model vendors. It is reached ONLY through this class — Constitution Principle V
+# (one seam per boundary). Unlike the DGX adapter it (a) uses a real API key from
+# OPENROUTER_API_KEY, (b) does NOT consult the dgxlib model registry (OpenRouter
+# ids are namespaced, e.g. "anthropic/claude-sonnet-4", and pass through verbatim),
+# and (c) maps a no-thinking request to OpenRouter's `reasoning` control so the
+# silently-empty-extraction trap (a reasoning model spending its whole budget on a
+# think trace) can be suppressed on this path too.
+
+
+class _OpenRouterMessages(_OpenAICompatMessages):
+ """Messages façade for OpenRouter — same wire calls as the DGX adapter, but
+ model ids pass through verbatim (no dgxlib registry, no claude→DGX substitution)."""
+
+ def _resolve_model(self, model: str) -> str:
+ # Honor an explicit override; otherwise send the caller's id straight
+ # through. OpenRouter ids are vendor-namespaced, so the DGX adapter's
+ # "claude-* → DGX default" substitution must NOT apply here.
+ return self._client.model_override or model
+
+
+class _OpenRouterClient:
+ """Anthropic-shaped façade over OpenRouter's OpenAI-compatible API.
+
+ Presents the same small slice of the anthropic SDK surface
+ (``.messages.create`` / ``.messages.stream``) that stream_api / call_api use,
+ reusing the OpenAI-compat stream/response machinery.
+ """
+
+ def __init__(self, model_override: str | None = None):
+ # Check config before importing the SDK so a missing key fails with a
+ # clear, deterministic error (no silent fallback to another backend).
+ api_key = os.environ.get("OPENROUTER_API_KEY")
+ if not api_key:
+ raise RuntimeError(
+ "OPENROUTER_API_KEY is not set. The openrouter backend requires a key; "
+ "export OPENROUTER_API_KEY in the environment."
+ )
+ try:
+ from openai import OpenAI
+ except ImportError:
+ print("Error: openai not installed. Run: pip install openai", file=sys.stderr)
+ sys.exit(1)
+ base_url = (os.environ.get("OPENROUTER_BASE_URL")
+ or OPENROUTER_DEFAULT_BASE_URL).rstrip("/")
+ self.model_override = model_override or os.environ.get("OPENROUTER_MODEL")
+ import httpx
+ env_to = os.environ.get("OPENROUTER_READ_TIMEOUT")
+ read_timeout = float(env_to) if env_to else 600.0
+ timeout = httpx.Timeout(connect=10.0, read=read_timeout, write=30.0, pool=30.0)
+ self.oai = OpenAI(base_url=base_url, api_key=api_key, timeout=timeout)
+ self.messages = _OpenRouterMessages(self)
+
+ def extra_body_for(self, resolved_model: str, thinking: bool | None) -> dict:
+ """Per-call request extras. Maps no-thinking to OpenRouter's `reasoning`.
+
+ ``thinking`` is a per-call decision: ``None`` leaves OpenRouter's default
+ (but OPENROUTER_NO_THINKING / DGX_NO_THINKING force it off for parity with
+ the DGX extraction path); ``False`` disables reasoning; ``True`` leaves it on.
+ """
+ if thinking is None and (os.environ.get("OPENROUTER_NO_THINKING")
+ or os.environ.get("DGX_NO_THINKING")):
+ thinking = False
+ if thinking is False:
+ return {"reasoning": {"enabled": False}}
+ return {}
+
+
# ── Claude Code (subscription) backend ──────────────────────────────────────
#
# Routes generation through the `claude` CLI in headless print mode (`claude -p`)
diff --git a/campaignlib/api/client.py b/campaignlib/api/client.py
index 35ad684..d69131e 100644
--- a/campaignlib/api/client.py
+++ b/campaignlib/api/client.py
@@ -3,7 +3,29 @@
import os
import sys
-from .backends import _OpenAICompatClient, _ClaudeCodeClient
+from .backends import _OpenAICompatClient, _OpenRouterClient, _ClaudeCodeClient
+
+# Clients that accept the DGX-style `thinking` request extra (mapped per-backend
+# to the right knob: enable_thinking for vLLM, `reasoning` for OpenRouter). The
+# real Anthropic SDK would reject it, so it is only forwarded to these.
+_THINKING_EXTRA_CLIENTS = (_OpenAICompatClient, _OpenRouterClient)
+
+
+def _require_nonempty(text: str) -> str:
+ """Guard against a silently-empty model response (Constitution Principle I).
+
+ A reasoning model can spend its entire token budget on a thinking trace and
+ return empty content — which would otherwise be written to disk as a valid
+ (but empty) extraction/synthesis artifact. Fail loudly instead so the caller
+ aborts before persisting anything.
+ """
+ if text is None or not text.strip():
+ raise RuntimeError(
+ "model returned empty output (no content). On a reasoning model this "
+ "usually means the token budget was spent on a thinking trace — disable "
+ "thinking (DGX_NO_THINKING=1 / OPENROUTER_NO_THINKING=1) or raise max_tokens."
+ )
+ return text
def make_client(endpoint: str | None = None, model_override: str | None = None,
@@ -30,6 +52,8 @@ def make_client(endpoint: str | None = None, model_override: str | None = None,
backend = backend or os.environ.get("CG_BACKEND")
if backend == "claude-code":
return _ClaudeCodeClient(model_override=model_override)
+ if backend == "openrouter":
+ return _OpenRouterClient(model_override=model_override)
endpoint = endpoint or os.environ.get("DGX_ENDPOINT")
if endpoint:
return _OpenAICompatClient(endpoint, model_override=model_override)
@@ -41,6 +65,37 @@ def make_client(endpoint: str | None = None, model_override: str | None = None,
return anthropic.Anthropic()
+def add_backend_args(parser) -> None:
+ """Register the uniform --backend/--endpoint selection on a synthesis CLI.
+
+ Shared so every LLM-bearing script speaks the same backend vocabulary
+ (Constitution Principle V). Default is anthropic — see client_from_args for
+ the backward-compatibility contract.
+ """
+ parser.add_argument(
+ "--backend", choices=["anthropic", "dgx", "openrouter"], default="anthropic",
+ help="LLM backend (default: anthropic). 'dgx'/'openrouter' route through the "
+ "campaignlib seam; with no flag, behaviour is unchanged (Anthropic API).")
+ parser.add_argument(
+ "--endpoint", default=None, metavar="URL",
+ help="OpenAI-compatible endpoint for --backend dgx (OpenRouter uses its own base URL).")
+
+
+def client_from_args(args):
+ """Build a client from parsed --backend/--endpoint/--model args.
+
+ Backward-compatible: with the default ``--backend anthropic`` and no
+ ``--endpoint``, this resolves to ``make_client()`` exactly — env vars
+ (CG_BACKEND / DGX_ENDPOINT) still apply, so existing invocations are
+ byte-for-byte unchanged. For dgx/openrouter the chosen ``--model`` becomes the
+ seam's model override.
+ """
+ backend = None if getattr(args, "backend", "anthropic") == "anthropic" else args.backend
+ model_override = getattr(args, "model", None) if backend in ("dgx", "openrouter") else None
+ return make_client(backend=backend, endpoint=getattr(args, "endpoint", None),
+ model_override=model_override)
+
+
def _is_retryable(exc) -> bool:
"""Return True for transient API errors that are worth retrying."""
try:
@@ -98,8 +153,8 @@ def call_api(client, system: str, content, model: str, max_tokens: int = 8096,
"""
import time
messages = [{"role": "user", "content": content}]
- # `thinking` is a DGX-only knob; the real Anthropic SDK would reject it.
- extra = {"thinking": thinking} if isinstance(client, _OpenAICompatClient) else {}
+ # `thinking` is a local/OpenRouter knob; the real Anthropic SDK would reject it.
+ extra = {"thinking": thinking} if isinstance(client, _THINKING_EXTRA_CLIENTS) else {}
delays = [10, 20, 40]
for attempt, delay in enumerate([-1] + delays):
if delay >= 0:
@@ -114,7 +169,7 @@ def call_api(client, system: str, content, model: str, max_tokens: int = 8096,
messages=messages,
**extra,
)
- return response.content[0].text
+ return _require_nonempty(response.content[0].text)
except Exception as e:
if _is_retryable(e) and attempt < len(delays):
continue
@@ -186,8 +241,8 @@ def stream_api(client, system, user: str, model: str, max_tokens: int = 8096,
else:
system_arg = system
- # `thinking` is a DGX-only knob; the real Anthropic SDK would reject it.
- extra = {"thinking": thinking} if isinstance(client, _OpenAICompatClient) else {}
+ # `thinking` is a local/OpenRouter knob; the real Anthropic SDK would reject it.
+ extra = {"thinking": thinking} if isinstance(client, _THINKING_EXTRA_CLIENTS) else {}
delays = [60, 120, 240] # seconds to wait before each retry
for attempt, delay in enumerate([-1] + delays):
if delay >= 0:
@@ -209,7 +264,7 @@ def stream_api(client, system, user: str, model: str, max_tokens: int = 8096,
chunks.append(text)
if not silent:
print()
- return "".join(chunks)
+ return _require_nonempty("".join(chunks))
except Exception as e:
if _is_retryable(e) and attempt < len(delays):
continue
diff --git a/docs/cli/ensemble_workflow.md b/docs/cli/ensemble_workflow.md
index 0c55d33..12622fe 100644
--- a/docs/cli/ensemble_workflow.md
+++ b/docs/cli/ensemble_workflow.md
@@ -1,5 +1,13 @@
# Ensemble extraction workflow
+> **Run this from the UI.** The Ensemble Workflow page (`/ensemble` in the web UI)
+> mechanizes this whole sequence — Setup → Extract → Bundle → Synthesize — with the
+> scope-review, alias-correction, and diff-before-promote checkpoints kept as gates
+> you satisfy in the CLI or a Claude chat. Each LLM-bearing stage is backend-selectable
+> (Anthropic / DGX-Spark / **OpenRouter**), chosen independently for extraction and
+> synthesis. The UI only invokes the same CLI commands documented below; nothing here
+> is bypassed. See `docs/web/web_ui.md` and `specs/001-ensemble-workflow-ui/`.
+
End-to-end guide: from a set of chapter files to reviewed dossiers ready for synthesis into the four grounding docs (`world_state.md`, `campaign_state.md`, `party.md`, `planning.md`).
The core insight is that **extraction is expensive and should happen once**. Running the Claude API inside each grounding-doc tool (the old path) re-extracts the same chapter text three or four times, spending 2.5–3.4M metered tokens per full refresh. The local ensemble approach extracts once on Spark hardware (~free), aggregates to per-entity dossiers, lets a human review scope, then calls the API only for the final synthesis per doc (~280K tokens total).
diff --git a/ensemble_batch.py b/ensemble_batch.py
index 694ce30..96990e2 100644
--- a/ensemble_batch.py
+++ b/ensemble_batch.py
@@ -32,8 +32,11 @@ def _build_parser():
formatter_class=argparse.RawDescriptionHelpFormatter,
)
p.add_argument(
- "--chapters", required=True, metavar="GLOB",
- help="Glob for chapter files, e.g. 'docs/chapters/chapter_*.md'",
+ "--chapters", required=True, nargs="+", metavar="GLOB",
+ help="One or more globs or explicit chapter paths, e.g. "
+ "'docs/chapters/chapter_*.md' or a hand-picked subset "
+ "'docs/chapters/chapter_03.md docs/chapters/chapter_07.md'. "
+ "Matches are unioned, de-duplicated, and sorted.",
)
p.add_argument(
"--per-chapter-dir", default="per_chapter", metavar="DIR",
@@ -133,9 +136,13 @@ def _build_ensemble_cmd(chapter: Path, workdir: Path, args) -> list[str]:
def main():
args = _build_parser().parse_args()
- chapters = sorted(Path(p) for p in glob_module.glob(args.chapters))
+ matched: set[Path] = set()
+ for pattern in args.chapters:
+ for p in glob_module.glob(pattern):
+ matched.add(Path(p))
+ chapters = sorted(matched)
if not chapters:
- print(f"No chapter files matched: {args.chapters}", file=sys.stderr)
+ print(f"No chapter files matched: {' '.join(args.chapters)}", file=sys.stderr)
sys.exit(1)
per_chapter_dir = Path(args.per_chapter_dir)
diff --git a/frontend/src/components/layout/AppSidebar.vue b/frontend/src/components/layout/AppSidebar.vue
index d186e63..b49f875 100644
--- a/frontend/src/components/layout/AppSidebar.vue
+++ b/frontend/src/components/layout/AppSidebar.vue
@@ -49,6 +49,12 @@ const navGroups: NavGroup[] = [
{ label: 'Planning Document', path: '/grounding/planning' },
],
},
+ {
+ title: 'ENSEMBLE WORKFLOW',
+ items: [
+ { label: 'Ensemble Grounding Docs', path: '/ensemble/setup' },
+ ],
+ },
{
title: 'PREP',
items: [
diff --git a/frontend/src/router.ts b/frontend/src/router.ts
index c994c16..cb7669a 100644
--- a/frontend/src/router.ts
+++ b/frontend/src/router.ts
@@ -64,6 +64,33 @@ const routes = [
},
],
},
+ {
+ path: '/ensemble',
+ component: () => import('./views/EnsembleWorkflow.vue'),
+ children: [
+ { path: '', redirect: '/ensemble/setup' },
+ {
+ path: 'setup',
+ name: 'ensemble-setup',
+ component: () => import('./views/ensemble/EnsembleSetup.vue'),
+ },
+ {
+ path: 'extract',
+ name: 'ensemble-extract',
+ component: () => import('./views/ensemble/EnsembleExtract.vue'),
+ },
+ {
+ path: 'bundle',
+ name: 'ensemble-bundle',
+ component: () => import('./views/ensemble/EnsembleBundle.vue'),
+ },
+ {
+ path: 'synthesize',
+ name: 'ensemble-synthesize',
+ component: () => import('./views/ensemble/EnsembleSynthesize.vue'),
+ },
+ ],
+ },
{
path: '/prep',
component: () => import('./views/PrepTools.vue'),
diff --git a/frontend/src/views/EnsembleWorkflow.vue b/frontend/src/views/EnsembleWorkflow.vue
new file mode 100644
index 0000000..923c829
--- /dev/null
+++ b/frontend/src/views/EnsembleWorkflow.vue
@@ -0,0 +1,75 @@
+
+
+
+
+
+ List the entity universe and the known/location split before spending model
+ time. No model call. Review which names are [known] vs
+ [location]-scoped — this is a precision decision; you may also
+ run facts_to_state.py --list at the CLI.
+
+
+
+
+
+
+
+
② Alias correction human checkpoint
+
+ Edit {{ cfg.aliases_path }} here, or in the CLI/chat and click
+ Reload — changes are reflected without re-running any LLM step.
+
+
Set an aliases path on the Setup step to use this gate.
+ Runs ensemble_batch.py over the chapters you pick below. Resumable:
+ chapters already extracted are skipped. Backend: {{ backendLabel }}
+ (change it on the Setup step). Writes
+ docs/ensemble/per_chapter/*/merged.json.
+
+
+
+
+
+
+ Select at least one chapter to run.
+
+ {{ returnCode === 0 ? 'Done' : `Exit ${returnCode}` }}
+
+
+
+
+ Point at your inputs and pick a backend for each LLM-bearing stage. Extraction
+ and synthesis are chosen independently. Files on disk are the source of truth —
+ this only records your selections.
+
+
+
+
diff --git a/frontend/src/views/ensemble/useEnsembleRun.ts b/frontend/src/views/ensemble/useEnsembleRun.ts
new file mode 100644
index 0000000..24e79df
--- /dev/null
+++ b/frontend/src/views/ensemble/useEnsembleRun.ts
@@ -0,0 +1,85 @@
+import { ref } from 'vue'
+import { connectSSE } from '../../api/sse'
+
+/** Run an ensemble stage over SSE. Unlike RunPanel this does NOT gate on
+ * ANTHROPIC_API_KEY — the ensemble page supports OpenRouter/DGX backends that
+ * don't need it. */
+export function useEnsembleRun() {
+ const output = ref('')
+ const status = ref<'idle' | 'running' | 'done' | 'error'>('idle')
+ const returnCode = ref(null)
+
+ function buildUrl(endpoint: string, params: Record): string {
+ const url = new URL(endpoint, window.location.origin)
+ for (const [k, v] of Object.entries(params)) {
+ if (v === '' || v === false || v === null || v === undefined) continue
+ if (Array.isArray(v)) {
+ for (const it of v) if (it) url.searchParams.append(k, String(it))
+ } else if (typeof v === 'boolean') {
+ url.searchParams.set(k, 'true')
+ } else {
+ url.searchParams.set(k, String(v))
+ }
+ }
+ return url.pathname + url.search
+ }
+
+ function run(endpoint: string, params: Record, onDone?: (rc: number) => void) {
+ if (status.value === 'running') return
+ status.value = 'running'
+ output.value = ''
+ returnCode.value = null
+ connectSSE(buildUrl(endpoint, params), {
+ onData(t) { output.value += t },
+ onDone(rc) {
+ status.value = rc === 0 ? 'done' : 'error'
+ returnCode.value = rc
+ if (onDone) onDone(rc)
+ },
+ onError() { status.value = 'error' },
+ })
+ }
+
+ function clear() {
+ output.value = ''
+ status.value = 'idle'
+ returnCode.value = null
+ }
+
+ return { output, status, returnCode, run, clear }
+}
+
+export interface BackendProfile {
+ backend: 'anthropic' | 'dgx' | 'openrouter'
+ endpoint: string
+ model: string
+}
+
+export interface EnsembleConfig {
+ campaign_dir: string
+ chapters_glob: string
+ chapters_selected: string[]
+ extract: BackendProfile
+ synthesize: BackendProfile
+ known_names: string[]
+ aliases_path: string
+}
+
+/** Read ui.ensemble from the resolved config with safe defaults. */
+export function readEnsembleConfig(resolved: any): EnsembleConfig {
+ const e = resolved?.ui?.ensemble ?? {}
+ const prof = (p: any): BackendProfile => ({
+ backend: p?.backend ?? 'anthropic',
+ endpoint: p?.endpoint ?? '',
+ model: p?.model ?? '',
+ })
+ return {
+ campaign_dir: e.campaign_dir ?? '',
+ chapters_glob: e.chapters_glob ?? 'docs/chapters/chapter_*.md',
+ chapters_selected: Array.isArray(e.chapters_selected) ? e.chapters_selected : [],
+ extract: prof(e.extract),
+ synthesize: prof(e.synthesize),
+ known_names: Array.isArray(e.known_names) ? e.known_names : [],
+ aliases_path: e.aliases_path ?? '',
+ }
+}
diff --git a/party.py b/party.py
index cb68803..80af32c 100644
--- a/party.py
+++ b/party.py
@@ -56,7 +56,9 @@
from campaignlib import (
DEFAULT_MODEL,
+ add_backend_args,
build_alias_normalizer,
+ client_from_args,
format_npc_roster,
load_agent_prompt,
load_alias_map,
@@ -260,7 +262,8 @@ def main() -> None:
"canonical name before extract/synth, and a "
"'Known NPCs' roster seeds the system prompts.")
parser.add_argument("--model", default=DEFAULT_MODEL,
- help="Claude model to use")
+ help="Model id (Claude id, or an OpenRouter id for --backend openrouter)")
+ add_backend_args(parser)
parser.add_argument("--dump-input", default=None, metavar="FILE",
help="Write the synthesis prompt to FILE (and FILE.system.md) "
"without making an API call — for use with `claude -p`.")
@@ -317,7 +320,7 @@ def main() -> None:
if alias_map:
print(f"Alias map: {len(alias_map)} NPC(s) from {args.dossier_dir}")
- client = make_client()
+ client = client_from_args(args)
# ── Extract pass ──────────────────────────────────────────────────────────
if args.summaries and not args.synthesize_only:
diff --git a/planning.py b/planning.py
index b91915f..21c3490 100644
--- a/planning.py
+++ b/planning.py
@@ -58,7 +58,9 @@
from campaignlib import (
DEFAULT_MODEL,
+ add_backend_args,
build_alias_normalizer,
+ client_from_args,
format_npc_roster,
load_agent_prompt,
load_alias_map,
@@ -734,7 +736,8 @@ def main() -> None:
"(e.g. --since 11 when extract_011.md is the new chunk) to skip "
"historical chunks already rolled into dossiers.")
parser.add_argument("--model", default=DEFAULT_MODEL,
- help="Claude model to use")
+ help="Model id (Claude id, or an OpenRouter id for --backend openrouter)")
+ add_backend_args(parser)
parser.add_argument("--campaign-dir", default=None,
help="Campaign workspace root (default: $CAMPAIGN_DIR "
"or the output file's parent, or CWD). Used to "
@@ -819,7 +822,7 @@ def main() -> None:
print(f"Error: file not found: {f}", file=sys.stderr)
sys.exit(1)
- client = make_client()
+ client = client_from_args(args)
# ── Build-dossiers mode ───────────────────────────────────────────────────
if args.build_dossiers:
diff --git a/server/config_models.py b/server/config_models.py
index 66ac5ed..3f96238 100644
--- a/server/config_models.py
+++ b/server/config_models.py
@@ -138,12 +138,51 @@ class ProfilesSection(BaseModel):
active: OptStr = None
+class BackendProfile(BaseModel):
+ """A selectable execution target for one LLM-bearing ensemble stage.
+
+ The API key is NEVER stored here — it is read from the environment
+ (ANTHROPIC_API_KEY / OPENROUTER_API_KEY) at run time. `endpoint` is used
+ for the dgx backend; openrouter uses its own base URL.
+ """
+
+ model_config = ConfigDict(extra="allow")
+
+ backend: Literal["anthropic", "dgx", "openrouter"] = "anthropic"
+ endpoint: OptStr = None
+ model: OptStr = None
+
+
+class EnsembleSection(BaseModel):
+ """``ui.ensemble`` — the ensemble grounding-doc workflow page.
+
+ Per-stage backend choice (extract vs synthesize are independent) plus the
+ scope inputs (known-names sources, aliases file) the bundle stage and the
+ alias-correction gate consume. Files on disk remain the source of truth;
+ this only records the operator's selections.
+ """
+
+ model_config = ConfigDict(extra="allow")
+
+ campaign_dir: OptStr = None
+ chapters_glob: str = "docs/chapters/chapter_*.md"
+ # The explicit set of chapters chosen in the picker (relative paths).
+ # Principle X — there is no silent "all": empty means *nothing selected*
+ # and extraction refuses to run; "Select all" materializes every path here.
+ chapters_selected: list[str] = Field(default_factory=list)
+ extract: BackendProfile = Field(default_factory=BackendProfile)
+ synthesize: BackendProfile = Field(default_factory=BackendProfile)
+ known_names: list[str] = Field(default_factory=list)
+ aliases_path: OptStr = None
+
+
class UISection(BaseModel):
"""All per-page state, one attribute per page or group of pages."""
session_doc: SessionDocSection = Field(default_factory=SessionDocSection)
vtt_summary: VttSummarySection = Field(default_factory=VttSummarySection)
grounding: GroundingSection = Field(default_factory=GroundingSection)
+ ensemble: EnsembleSection = Field(default_factory=EnsembleSection)
profiles: ProfilesSection = Field(default_factory=ProfilesSection)
campaign_state: _LooseSection = Field(default_factory=_LooseSection)
distill: _LooseSection = Field(default_factory=_LooseSection)
diff --git a/server/main.py b/server/main.py
index 1b6b954..97b6a95 100644
--- a/server/main.py
+++ b/server/main.py
@@ -12,7 +12,7 @@
from server.config import derive_campaign_paths, derive_session_paths
from server.config_service import CampaignConfigService, ConfigError
from server.routers import (
- config_routes, connections, experimental, grounding, prep,
+ config_routes, connections, ensemble, experimental, grounding, prep,
scene_editor, session_workflow, setup,
)
@@ -31,6 +31,7 @@
app.include_router(config_routes.router, prefix="/api/config", tags=["config"])
app.include_router(session_workflow.router, prefix="/api/workflow", tags=["workflow"])
app.include_router(grounding.router, prefix="/api/grounding", tags=["grounding"])
+app.include_router(ensemble.router, prefix="/api/ensemble", tags=["ensemble"])
app.include_router(prep.router, prefix="/api/prep", tags=["prep"])
app.include_router(setup.router, prefix="/api/setup", tags=["setup"])
app.include_router(experimental.router, prefix="/api/experimental", tags=["experimental"])
diff --git a/server/routers/ensemble.py b/server/routers/ensemble.py
new file mode 100644
index 0000000..4837590
--- /dev/null
+++ b/server/routers/ensemble.py
@@ -0,0 +1,432 @@
+"""Ensemble grounding-doc workflow API routes.
+
+The UI mechanizes the ensemble pipeline (extract → bundle → synthesize → review);
+this router shells out to the CLI scripts via subprocess_runner and exposes
+disk-derived stage status. It contains NO pipeline logic and issues NO
+retrieval/render calls — the CLI is the engine (Constitution Principle VI), files
+on disk are the truth (Principle I), and OpenRouter is selected via env that the
+single campaignlib seam honors (Principle V).
+"""
+
+import difflib
+import glob
+import shutil
+from pathlib import Path
+
+from fastapi import APIRouter, HTTPException, Query, Request
+from fastapi.responses import JSONResponse, StreamingResponse
+
+from server.subprocess_runner import python_exe, stream_subprocess, sse_error_stream
+
+router = APIRouter()
+
+SCRIPT_DIR = Path(__file__).resolve().parent.parent.parent # CampaignGenerator/
+
+# The four grounding docs the workflow targets. live = promote target; draft =
+# what synthesis writes. Nothing else may be promoted (FR-013).
+GROUNDING_DOCS = {
+ "world_state": ("docs/world_state.md", "docs/world_state_draft.md"),
+ "campaign_state": ("docs/campaign_state.md", "docs/campaign_state_draft.md"),
+ "party": ("docs/party.md", "docs/party_draft.md"),
+ "planning": ("docs/planning.md", "docs/planning_draft.md"),
+}
+
+# Models considered capable enough for synthesis (FR-014 / R6). Anything else
+# selected for the synthesize stage triggers a non-fatal warning.
+SYNTHESIS_CAPABLE = {
+ "claude-sonnet-4-6", "claude-sonnet-4-20250514",
+ "claude-opus-4-8", "claude-opus-4-6", "claude-opus-4-7",
+ "anthropic/claude-sonnet-4", "anthropic/claude-opus-4",
+ "openai/gpt-5", "google/gemini-2.5-pro",
+}
+
+
+# ── Command-building helpers (mirror grounding.py) ──────────────────────────
+
+def _cmd_opt(cmd: list[str], flag: str, value) -> None:
+ if value:
+ cmd += [flag, str(value)]
+
+
+def _cmd_multi(cmd: list[str], flag: str, values: list[str]) -> None:
+ for v in values or []:
+ if v and v.strip():
+ cmd += [flag, v.strip()]
+
+
+def _cmd_flag(cmd: list[str], flag: str, condition: bool) -> None:
+ if condition:
+ cmd.append(flag)
+
+
+def _resolve_ensemble_path(path: str) -> Path:
+ """Resolve a path and confine it to the campaign workspace (CWD).
+
+ Rejects traversal outside the workspace — the UI must not read/write
+ arbitrary disk locations.
+ """
+ if not path:
+ raise HTTPException(status_code=400, detail="path is required")
+ cwd = Path.cwd().resolve()
+ p = Path(path).expanduser()
+ if not p.is_absolute():
+ p = (cwd / p)
+ p = p.resolve()
+ if cwd != p and cwd not in p.parents:
+ raise HTTPException(status_code=400, detail="path escapes the campaign workspace")
+ return p
+
+
+def _is_live_doc(path: Path) -> bool:
+ cwd = Path.cwd().resolve()
+ live = {(cwd / live_rel).resolve() for live_rel, _ in GROUNDING_DOCS.values()}
+ return path.resolve() in live
+
+
+# ── LLM backend selection → subprocess env (Principle V) ────────────────────
+
+def _llm_env(backend: str, endpoint: str, model: str) -> dict[str, str]:
+ """Translate a per-stage backend choice into env that campaignlib.make_client
+ honors. The API key itself is inherited from the server env, never injected
+ from a query param.
+ """
+ if backend == "openrouter":
+ env = {"CG_BACKEND": "openrouter"}
+ if model:
+ env["OPENROUTER_MODEL"] = model
+ return env
+ if backend == "dgx":
+ env = {"DGX_ENDPOINT": endpoint or "http://localhost:8000"}
+ if model:
+ env["DGX_MODEL"] = model
+ return env
+ return {} # anthropic: default path, no overrides
+
+
+# ── Per-stage in-flight lock (M4) ───────────────────────────────────────────
+# Single-operator, local-first: an in-process guard is enough to stop a
+# double-click or a second tab from launching two writers on the same workdir
+# (the orphaned-worker cache-corruption trap in ensemble_workflow.md).
+
+_RUNNING: set[str] = set()
+
+
+def _lock_key(stage: str) -> str:
+ return f"{Path.cwd().resolve()}::{stage}"
+
+
+def _run_locked(stage: str, cmd: list[str], env_extra: dict[str, str] | None = None,
+ prelude: str = "") -> StreamingResponse:
+ key = _lock_key(stage)
+ if key in _RUNNING:
+ return StreamingResponse(
+ sse_error_stream(f"stage '{stage}' is already running for this campaign — "
+ f"wait for it to finish (avoids corrupting the workdir)."),
+ media_type="text/event-stream",
+ headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
+ )
+ _RUNNING.add(key)
+
+ def _release(_rc):
+ _RUNNING.discard(key)
+
+ async def _gen():
+ if prelude:
+ import json
+ yield f"data: {json.dumps(prelude)}\n\n"
+ async for chunk in stream_subprocess(cmd, cwd=str(Path.cwd()),
+ env_extra=env_extra or None,
+ on_complete=_release):
+ yield chunk
+
+ return StreamingResponse(
+ _gen(),
+ media_type="text/event-stream",
+ headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
+ )
+
+
+# ── Status (disk-derived, FR-002) ───────────────────────────────────────────
+
+@router.get("/status")
+def status(chapters: str = "docs/chapters/chapter_*.md"):
+ """Pipeline state computed entirely from artifacts on disk — no caching."""
+ cwd = Path.cwd()
+ per_chapter = sorted(glob.glob(str(cwd / "docs/ensemble/per_chapter/*/merged.json")))
+ dossiers = sorted(glob.glob(str(cwd / "docs/ensemble/state_dossiers/*.md")))
+ drafts = [name for name, (_, draft_rel) in GROUNDING_DOCS.items()
+ if (cwd / draft_rel).exists()]
+ promoted = [name for name, (live_rel, draft_rel) in GROUNDING_DOCS.items()
+ if (cwd / live_rel).exists() and (cwd / draft_rel).exists()
+ and (cwd / live_rel).stat().st_mtime >= (cwd / draft_rel).stat().st_mtime]
+
+ def st(done: bool) -> str:
+ return "complete" if done else "not_started"
+
+ stages = [
+ {"id": "extract", "status": st(bool(per_chapter)), "artifacts": len(per_chapter)},
+ {"id": "bundle", "status": st(bool(dossiers)), "artifacts": len(dossiers)},
+ {"id": "synthesize", "status": st(bool(drafts)), "drafts": drafts},
+ {"id": "review", "status": st(bool(promoted)), "promoted": promoted},
+ ]
+ current = next((s["id"] for s in stages if s["status"] != "complete"), "review")
+ return {"campaign_dir": str(cwd.resolve()), "stages": stages, "current_stage": current}
+
+
+# ── File listing / read / write (FR-004, FR-012, FR-017) ────────────────────
+
+@router.get("/files")
+def list_files(dir: str, pattern: str = "*.md"):
+ d = _resolve_ensemble_path(dir)
+ if not d.exists():
+ return {"dir": str(d), "exists": False, "files": []}
+ files = sorted(f.name for f in d.glob(pattern) if f.is_file())
+ return {"dir": str(d), "exists": True,
+ "files": [{"name": n, "size": (d / n).stat().st_size} for n in files]}
+
+
+@router.get("/chapters")
+def list_chapters(
+ glob: list[str] = Query(default=["docs/chapters/chapter_*.md"]),
+ per_chapter_dir: str = "docs/ensemble/per_chapter",
+):
+ """Resolve one or more chapter globs/paths to the concrete file list the
+ extraction stage would run over (FR: chapter selection). Each entry is
+ flagged `extracted` when its per-chapter merged.json already exists on disk
+ (Principle I — the picker reflects truth, not a cached selection)."""
+ cwd = Path.cwd().resolve()
+ pc_dir = (cwd / per_chapter_dir).resolve()
+ matched: dict[str, Path] = {}
+ for pattern in glob or []:
+ if not pattern or not pattern.strip():
+ continue
+ for hit in cwd.glob(pattern.strip()):
+ if not hit.is_file():
+ continue
+ r = hit.resolve()
+ if cwd not in r.parents:
+ continue # confine to the workspace
+ matched[str(r.relative_to(cwd))] = r
+ out = []
+ for rel in sorted(matched):
+ p = matched[rel]
+ merged = pc_dir / p.stem / "merged.json"
+ out.append({"path": rel, "stem": p.stem, "size": p.stat().st_size,
+ "extracted": merged.exists()})
+ return {"chapters": out, "count": len(out)}
+
+
+@router.get("/file")
+def read_file(path: str):
+ p = _resolve_ensemble_path(path)
+ if not p.exists() or not p.is_file():
+ return JSONResponse({"exists": False, "content": ""}, status_code=404)
+ return {"exists": True, "content": p.read_text(encoding="utf-8")}
+
+
+@router.put("/file")
+async def write_file(path: str, request: Request):
+ """Write an interchange file (e.g. aliases.json). Live grounding docs are
+ rejected — promotion is the only path to a live doc (FR-013)."""
+ p = _resolve_ensemble_path(path)
+ if _is_live_doc(p):
+ raise HTTPException(status_code=403,
+ detail="refusing to write a live grounding doc; use /promote")
+ p.parent.mkdir(parents=True, exist_ok=True)
+ data = await request.json()
+ p.write_text(data.get("content", ""), encoding="utf-8")
+ return {"ok": True, "size": p.stat().st_size}
+
+
+# ── Diff + promote (US3 gate, FR-013, SC-005) ───────────────────────────────
+
+@router.get("/diff")
+def diff(draft: str, live: str):
+ """Unified diff draft vs live for the diff-before-promote gate. Read-only."""
+ dp = _resolve_ensemble_path(draft)
+ lp = _resolve_ensemble_path(live)
+ draft_text = dp.read_text(encoding="utf-8").splitlines(keepends=True) if dp.exists() else []
+ live_text = lp.read_text(encoding="utf-8").splitlines(keepends=True) if lp.exists() else []
+ ud = "".join(difflib.unified_diff(live_text, draft_text,
+ fromfile=str(lp), tofile=str(dp)))
+ return {"draft": str(dp), "live": str(lp), "diff": ud,
+ "draft_exists": dp.exists(), "live_exists": lp.exists()}
+
+
+@router.post("/promote")
+async def promote(request: Request):
+ """Copy a reviewed draft over its live grounding doc — the single explicit
+ live-doc writer (FR-013). Restricted to the four known grounding docs."""
+ body = await request.json()
+ draft = _resolve_ensemble_path(body.get("draft", ""))
+ live = _resolve_ensemble_path(body.get("live", ""))
+ if not _is_live_doc(live):
+ raise HTTPException(status_code=400,
+ detail="promote target must be one of the four grounding docs")
+ if not draft.exists():
+ raise HTTPException(status_code=404, detail="draft does not exist")
+ shutil.copyfile(draft, live)
+ return {"ok": True, "live": str(live), "size": live.stat().st_size}
+
+
+# ── Stage runners (SSE) ─────────────────────────────────────────────────────
+
+@router.get("/run/extract")
+def run_extract(
+ chapters: list[str] = Query(default=[]),
+ per_chapter_dir: str = "docs/ensemble/per_chapter",
+ out: str = "docs/ensemble/merged.json",
+ plan: str = "",
+ endpoint: str = "",
+ model: str = "",
+ backend: str = "anthropic",
+ chapter_parallel: int = 3,
+ chunk_parallel: int = 4,
+ no_speculative: bool = False,
+):
+ # Principle X: no silent "all". An empty selection is refused, never
+ # expanded to the full glob — "Select all" must be an explicit choice the
+ # caller makes (the UI sends every resolved path; a CLI user types a glob).
+ picked = [c.strip() for c in (chapters or []) if c and c.strip()]
+ if not picked:
+ return StreamingResponse(
+ sse_error_stream("No chapters selected — pick chapters (or click "
+ "'Select all') before running extraction."),
+ media_type="text/event-stream",
+ headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
+ )
+ cmd = [python_exe(), str(SCRIPT_DIR / "ensemble_batch.py"),
+ "--chapters", *picked,
+ "--per-chapter-dir", per_chapter_dir,
+ "--out", out]
+ _cmd_opt(cmd, "--plan", plan)
+ _cmd_opt(cmd, "--endpoints", endpoint)
+ _cmd_opt(cmd, "--model", model)
+ _cmd_opt(cmd, "--chapter-parallel", chapter_parallel)
+ _cmd_opt(cmd, "--chunk-parallel", chunk_parallel)
+ _cmd_flag(cmd, "--no-speculative", no_speculative)
+ return _run_locked("extract", cmd, env_extra=_llm_env(backend, endpoint, model))
+
+
+@router.get("/run/bundle")
+def run_bundle(
+ corpus: str = "docs/ensemble/per_chapter/*/merged.json",
+ aliases: str = "",
+ known_names: list[str] = Query(default=[]),
+ min_facts: int = 3,
+ known_only: bool = False,
+ out_dir: str = "docs/ensemble/state_dossiers",
+ list: bool = False,
+ endpoint: str = "",
+ model: str = "",
+ backend: str = "anthropic",
+ entity_parallel: int = 0,
+):
+ cmd = [python_exe(), str(SCRIPT_DIR / "facts_to_state.py"), "--corpus", corpus]
+ _cmd_opt(cmd, "--aliases", aliases)
+ _cmd_multi(cmd, "--known-names", known_names)
+ _cmd_opt(cmd, "--min-facts", min_facts)
+ if list:
+ cmd.append("--list")
+ else:
+ _cmd_opt(cmd, "--out-dir", out_dir)
+ _cmd_flag(cmd, "--known-only", known_only)
+ _cmd_opt(cmd, "--endpoints", endpoint)
+ _cmd_opt(cmd, "--model", model)
+ _cmd_opt(cmd, "--entity-parallel", entity_parallel)
+ # --list does no model work, so it never needs the lock or backend env.
+ if list:
+ return _run_locked("bundle-list", cmd)
+ return _run_locked("bundle", cmd, env_extra=_llm_env(backend, endpoint, model))
+
+
+@router.get("/run/recent-events")
+def run_recent_events(
+ corpus: str = "docs/ensemble/per_chapter/*/merged.json",
+ output: str = "docs/recent_events.md",
+ window: int = 0,
+):
+ cmd = [python_exe(), str(SCRIPT_DIR / "build_recent_events.py"),
+ "--corpus", corpus, "--output", output, "--window", str(window)]
+ return _run_locked("recent-events", cmd)
+
+
+@router.get("/run/threads")
+def run_threads(
+ corpus: str = "docs/ensemble/per_chapter/*/merged.json",
+ aliases: str = "",
+ output: str = "docs/ensemble/threads.md",
+ min_facts: int = 2,
+):
+ """(M1) Deterministic threads-track render — the chronological-spine input
+ fed to synthesis. No model call."""
+ cmd = [python_exe(), str(SCRIPT_DIR / "facts_to_state.py"),
+ "--corpus", corpus, "--types", "thread",
+ "--min-facts", str(min_facts), "--render-only", output]
+ _cmd_opt(cmd, "--aliases", aliases)
+ return _run_locked("threads", cmd)
+
+
+@router.get("/run/synthesize")
+def run_synthesize(
+ doc: str,
+ output: str = "",
+ backend: str = "anthropic",
+ endpoint: str = "",
+ model: str = "",
+ # world_state
+ dossiers: str = "docs/ensemble/merged_dossiers/*.md",
+ dossier_min_facts: int = 10,
+ party: str = "",
+ threads: str = "",
+ backstories: list[str] = Query(default=[]),
+ # campaign_state / party (staging)
+ extract_dir: str = "",
+ synthesize_only: bool = True,
+ # planning
+ npc: list[str] = Query(default=[]),
+ arc_scores: list[str] = Query(default=[]),
+ context: list[str] = Query(default=[]),
+):
+ if doc not in GROUNDING_DOCS:
+ raise HTTPException(status_code=400, detail=f"unknown doc '{doc}'")
+ out = output or GROUNDING_DOCS[doc][1] # default to the draft path
+ # FR-013: never let synthesis target a live grounding doc.
+ if _is_live_doc(_resolve_ensemble_path(out)):
+ raise HTTPException(status_code=400,
+ detail="synthesis output must be a draft, not a live doc")
+
+ if doc == "world_state":
+ cmd = [python_exe(), str(SCRIPT_DIR / "synthesise_world_state.py"),
+ "--dossiers", dossiers, "--dossier-min-facts", str(dossier_min_facts),
+ "--output", out]
+ _cmd_opt(cmd, "--party", party)
+ _cmd_opt(cmd, "--threads", threads)
+ _cmd_multi(cmd, "--backstories", backstories)
+ elif doc == "campaign_state":
+ cmd = [python_exe(), str(SCRIPT_DIR / "campaign_state.py"), "--output", out]
+ _cmd_flag(cmd, "--synthesize-only", synthesize_only)
+ _cmd_opt(cmd, "--extract-dir", extract_dir)
+ elif doc == "party":
+ cmd = [python_exe(), str(SCRIPT_DIR / "party.py"), "--output", out]
+ _cmd_flag(cmd, "--synthesize-only", synthesize_only)
+ _cmd_opt(cmd, "--extract-dir", extract_dir)
+ else: # planning
+ cmd = [python_exe(), str(SCRIPT_DIR / "planning.py"), "--output", out]
+ _cmd_multi(cmd, "--npc", npc)
+ _cmd_multi(cmd, "--arc-scores", arc_scores)
+ _cmd_multi(cmd, "--context", context)
+
+ _cmd_opt(cmd, "--model", model)
+ if backend != "anthropic":
+ cmd += ["--backend", backend]
+ _cmd_opt(cmd, "--endpoint", endpoint)
+
+ # FR-014 / R6: warn (don't block) on a sub-Sonnet synthesis model.
+ prelude = ""
+ if model and model not in SYNTHESIS_CAPABLE:
+ prelude = (f"⚠️ '{model}' is not on the synthesis-capable list — synthesis "
+ f"assumes a model at least as capable as Sonnet; output quality may "
+ f"degrade. Proceeding anyway.\n\n")
+ return _run_locked(f"synthesize-{doc}", cmd,
+ env_extra=_llm_env(backend, endpoint, model), prelude=prelude)
diff --git a/specs/001-ensemble-workflow-ui/checklists/requirements.md b/specs/001-ensemble-workflow-ui/checklists/requirements.md
new file mode 100644
index 0000000..d8480f2
--- /dev/null
+++ b/specs/001-ensemble-workflow-ui/checklists/requirements.md
@@ -0,0 +1,44 @@
+# Specification Quality Checklist: Ensemble Grounding-Doc Workflow UI
+
+**Purpose**: Validate specification completeness and quality before proceeding to planning
+**Created**: 2026-06-27
+**Feature**: [spec.md](../spec.md)
+
+## Content Quality
+
+- [x] No implementation details (languages, frameworks, APIs)
+- [x] Focused on user value and business needs
+- [x] Written for non-technical stakeholders
+- [x] All mandatory sections completed
+
+## Requirement Completeness
+
+- [x] No [NEEDS CLARIFICATION] markers remain
+- [x] Requirements are testable and unambiguous
+- [x] Success criteria are measurable
+- [x] Success criteria are technology-agnostic (no implementation details)
+- [x] All acceptance scenarios are defined
+- [x] Edge cases are identified
+- [x] Scope is clearly bounded
+- [x] Dependencies and assumptions identified
+
+## Feature Readiness
+
+- [x] All functional requirements have clear acceptance criteria
+- [x] User scenarios cover primary flows
+- [x] Feature meets measurable outcomes defined in Success Criteria
+- [x] No implementation details leak into specification
+
+## Notes
+
+- Two scope decisions were resolved with the user before drafting: (1) OpenRouter is
+ selectable per-stage across both LLM-bearing stages (extraction and synthesis); (2) the
+ ensemble workflow gets a new, separate UI surface and the existing Grounding Docs page is
+ left unchanged. Both are recorded in the Assumptions section.
+- The spec intentionally names "DGX/Spark", "Anthropic/Claude", and "OpenRouter" as backend
+ *choices* (product-level options the operator sees), not as implementation prescriptions.
+ These are user-facing selections, consistent with the feature's premise.
+- Constitutional alignment was kept front-of-mind: Principle IX (UI mechanizes; Claude
+ converses), Principle II (human checkpoints non-negotiable), Principle VI (CLI is the
+ engine; FR-016), and Principle I/VIII (files are truth; state discoverable; FR-002, FR-017).
+- Items marked incomplete require spec updates before `/speckit-clarify` or `/speckit-plan`.
diff --git a/specs/001-ensemble-workflow-ui/contracts/api.md b/specs/001-ensemble-workflow-ui/contracts/api.md
new file mode 100644
index 0000000..2f2f3de
--- /dev/null
+++ b/specs/001-ensemble-workflow-ui/contracts/api.md
@@ -0,0 +1,83 @@
+# API Contract: `/api/ensemble`
+
+New FastAPI router `server/routers/ensemble.py`, mounted at `/api/ensemble`, registered in `server/main.py` alongside the existing routers. It mirrors `server/routers/grounding.py`: stage runners return SSE streams from `stream_subprocess()`; status/file endpoints return JSON. **The router builds CLI commands and shells out — it contains no pipeline logic and issues no retrieval/render calls** (Principles VI, III).
+
+All run endpoints accept a per-stage backend selection: `backend` ∈ {`anthropic`, `dgx`, `openrouter`}, plus optional `endpoint` and `model`. These map to CLI flags (see `cli.md`). The `OPENROUTER_API_KEY` / `ANTHROPIC_API_KEY` are injected into the subprocess via `env_extra`, never passed as query params.
+
+---
+
+## Stage runners (SSE)
+
+### `GET /api/ensemble/run/extract`
+Runs `ensemble_batch.py` over the chapter glob.
+
+Query params: `chapters` (glob), `per_chapter_dir`, `out`, `plan`, `endpoint`/`endpoints[]`, `model`, `backend`, `chapter_parallel`, `chunk_parallel`, `embed_endpoint`, `embed_model`, `embed_threshold`, `unit_timeout`, `no_speculative` (bool).
+
+Response: `text/event-stream` — `data:` chunks of stdout/stderr; terminal `event: done` with `{"returncode": N}`.
+
+Behavior: resumable (chapters with existing `merged.json` are skipped by the CLI). On a backend/endpoint failure, the stream surfaces the error and ends with non-zero `returncode` (FR-009); prior chapters' outputs persist.
+
+### `GET /api/ensemble/run/bundle`
+Runs `facts_to_state.py` (aggregation). Supports `--list` mode (no model call) for the scope-review gate.
+
+Query params: `corpus` (glob), `aliases`, `known_names[]`, `min_facts`, `known_only` (bool), `out_dir`, `list` (bool → `--list`), `types[]`, `render_only`, `endpoint`/`endpoints[]`, `model`, `backend`, `entity_parallel`.
+
+Response: SSE as above. When `list=true`, the stream is the entity/scope table only.
+
+### `GET /api/ensemble/run/recent-events` *(deterministic, no model)*
+Runs `build_recent_events.py`. Query params: `corpus`, `output`, `window`. SSE.
+
+### `GET /api/ensemble/run/synthesize`
+Runs one of the four synthesis scripts depending on `doc`.
+
+Query params: `doc` ∈ {`world_state`, `campaign_state`, `party`, `planning`} (selects the script), the doc-specific inputs (e.g. `dossiers`, `dossier_min_facts`, `threads`, `party`, `npc[]`, `arc_scores[]`, `context[]`, `extract_dir`, `synthesize_only`), `output` (must be a `*_draft.md` path), `backend`, `endpoint`, `model`.
+
+Response: SSE.
+
+Behavior:
+- `output` MUST resolve to a draft path; the router rejects (HTTP 400) an `output` that targets a live grounding doc (`docs/.md`) to enforce FR-013.
+- If `doc` ∈ {`campaign_state`, `party`} and `backend` resolves to the subscription `claude-code` path, the router disables agent tools so output goes to stdout (the documented `claude -p` clobber gotcha) — but the default synthesis path here is direct API/OpenRouter, so this is an edge guard.
+- If the synthesis `model`/`backend` is below the capability bar, the response includes a non-fatal warning line in the stream (FR-014).
+
+---
+
+## Status & file endpoints (JSON)
+
+### `GET /api/ensemble/status?campaign_dir=…&chapters=…`
+Returns disk-derived pipeline state (R4, FR-002). No model call, no caching.
+
+```json
+{
+ "campaign_dir": "/abs/path",
+ "stages": [
+ {"id": "extract", "status": "complete", "artifacts": 45},
+ {"id": "bundle", "status": "not_started", "artifacts": 0},
+ {"id": "synthesize", "status": "not_started", "drafts": []},
+ {"id": "review", "status": "not_started"}
+ ],
+ "current_stage": "bundle"
+}
+```
+
+Completion predicates: `extract` ⇔ `per_chapter/*/merged.json` exist; `bundle` ⇔ `state_dossiers/*.md` exist; `synthesize` ⇔ `*_draft.md` exist; `review` ⇔ operator-promoted (best-effort: live doc newer than draft).
+
+### `GET /api/ensemble/files?dir=…&pattern=…`
+Lists artifacts in an ensemble subdir (dossiers, drafts, per_chapter outputs) for review. Mirrors `grounding.py:/extracts`. Returns `{dir, exists, files:[{name,size}]}`.
+
+### `GET /api/ensemble/file?path=…` / `PUT /api/ensemble/file?path=…`
+Read / write a single interchange file (e.g. `aliases.json`, a draft) so the operator can preview and the alias-correction gate is satisfiable from the UI *or* the CLI/chat (FR-012). Write is path-validated and confined to the campaign workspace. **PUT to a live grounding doc is rejected** (promotion is a deliberate, separate action).
+
+### `GET /api/ensemble/diff?draft=…&live=…`
+Returns a unified diff between a `*_draft.md` and its live counterpart for the diff-before-promote gate. Read-only; never writes.
+
+### `POST /api/ensemble/promote`
+Body `{draft, live}`. Copies a reviewed draft over the live doc — the single explicit promotion action (FR-013, SC-005). The router refuses any `live` outside the four known grounding docs.
+
+---
+
+## Cross-cutting contract rules
+
+1. Every run endpoint records the backend+model into the produced artifact's provenance (FR-008) — implemented in the CLI, surfaced here.
+2. No endpoint stores pipeline state server-side; status is always recomputed from disk (FR-017).
+3. Secrets travel only via `env_extra` to the subprocess, never as query params or in logs.
+4. The router never imports `anthropic`/`openai` and never calls `stream_api`/`call_api`/`retrieve` — it only spawns CLI processes (Principles III, V, VI).
diff --git a/specs/001-ensemble-workflow-ui/contracts/cli.md b/specs/001-ensemble-workflow-ui/contracts/cli.md
new file mode 100644
index 0000000..00dc77e
--- /dev/null
+++ b/specs/001-ensemble-workflow-ui/contracts/cli.md
@@ -0,0 +1,69 @@
+# CLI Contract: backend selection across LLM-bearing scripts
+
+The CLI is the engine (Principle VI); the UI only sets these flags. This contract defines the **uniform backend-selection vocabulary** added so every LLM stage can target DGX, Anthropic, or OpenRouter — and the **seam change** that makes OpenRouter reachable from the one boundary (Principle V).
+
+---
+
+## Seam: `campaignlib/api`
+
+### `make_client(endpoint=None, model_override=None, backend=None)` — MODIFY
+Add an OpenRouter branch, preserving existing precedence (`backend`/`$CG_BACKEND` first, then `endpoint`/`$DGX_ENDPOINT`, then Anthropic default):
+
+```
+backend = backend or os.environ.get("CG_BACKEND")
+if backend == "claude-code": return _ClaudeCodeClient(...) # existing
+if backend == "openrouter": return _OpenRouterClient(model_override=model_override) # NEW
+endpoint = endpoint or os.environ.get("DGX_ENDPOINT")
+if endpoint: return _OpenAICompatClient(endpoint, model_override) # existing
+return anthropic.Anthropic() # existing default
+```
+
+### `_OpenRouterClient` (in `campaignlib/api/backends.py`) — NEW
+- Reuses the `openai` SDK: `OpenAI(base_url="https://openrouter.ai/api/v1", api_key=os.environ["OPENROUTER_API_KEY"], timeout=…)`.
+- Base URL overridable via `OPENROUTER_BASE_URL`.
+- Model id passed through verbatim (no dgxlib registry lookup — that is the difference from `_OpenAICompatClient`).
+- Exposes the same Anthropic-shaped `.messages.create(...)` façade the other clients expose, so `stream_api`/`call_api` work unchanged.
+- Missing `OPENROUTER_API_KEY` → a clear, immediate error (no silent fallback), consistent with the seam's "the choice is explicit" docstring.
+
+**Contract test** (`tests/test_openrouter_seam.py`): `make_client(backend="openrouter")` returns the OpenRouter client; no module outside `campaignlib/api` imports `openai`/`anthropic` for OpenRouter; missing key raises.
+
+---
+
+## Synthesis scripts — ADD flags
+
+`synthesise_world_state.py`, `campaign_state.py`, `party.py`, `planning.py` each gain:
+
+| Flag | Values | Effect |
+|---|---|---|
+| `--backend` | `anthropic` (default) \| `dgx` \| `openrouter` | Passed to `make_client(backend=…)`. Omitted ⇒ `anthropic` ⇒ **identical to today** (FR-015, SC-006). |
+| `--endpoint` | URL | Passed to `make_client(endpoint=…)` (for `dgx`; OpenRouter uses its default base). |
+| `--model` | id | Already present; for `openrouter`, an OpenRouter model id. |
+
+These scripts currently call `make_client()` with no args; the change threads the parsed args into that single call. No other behavior changes.
+
+**Backward-compatibility invariant**: with none of the new flags supplied, the constructed command and the resulting output are unchanged from the current Anthropic path. This is the regression guard behind SC-006.
+
+---
+
+## Extraction / aggregation scripts — NO new flags needed
+
+`ensemble.py`, `ensemble_batch.py`, `ensemble_extract.py`, `facts_to_state.py` already accept `--endpoints`/`--dgx-endpoint`/`--model`. To target OpenRouter:
+- set `CG_BACKEND=openrouter` (env, injected by the server) **or** rely on the seam recognizing the OpenRouter selection, and
+- pass the OpenRouter `--model` id.
+
+`facts_to_state.py` already calls `make_client(endpoint=…, model_override=…)`; once the seam honors `openrouter`, no script edit is required there. (If a per-stage `--backend` flag is desired on these for symmetry, it is additive and optional.)
+
+---
+
+## Provenance (FR-008)
+
+Each LLM-bearing script records the backend+model it used into its output artifact (frontmatter or trailing comment), so a mixed-backend run is auditable. This is the same place each script already stamps `n_facts`/model metadata.
+
+---
+
+## Invariants enforced by this contract
+
+- One seam: OpenRouter is constructed only inside `campaignlib/api` (Principle V).
+- CLI-first: every backend choice is expressible and runnable from the terminal without the UI (Principle VI, FR-016).
+- Safe default: absent flags ⇒ today's Anthropic behavior (FR-015).
+- Explicit failure: a missing key or unreachable endpoint errors loudly, never silently degrades (FR-009).
diff --git a/specs/001-ensemble-workflow-ui/data-model.md b/specs/001-ensemble-workflow-ui/data-model.md
new file mode 100644
index 0000000..41c5988
--- /dev/null
+++ b/specs/001-ensemble-workflow-ui/data-model.md
@@ -0,0 +1,125 @@
+# Phase 1 Data Model: Ensemble Grounding-Doc Workflow UI
+
+This feature's "data" is almost entirely **files on disk** (Principle I) plus a small amount of **UI configuration state**. There is no new database. The entities below describe the conceptual model the UI presents and the on-disk artifacts that back it.
+
+---
+
+## 1. Backend Profile (config + runtime selection)
+
+Represents how an LLM-bearing stage executes. Selectable per stage at run time (FR-006, FR-018).
+
+| Field | Type | Notes |
+|---|---|---|
+| `backend` | enum `anthropic` \| `dgx` \| `openrouter` | Which seam branch `make_client` takes. Default `anthropic`. |
+| `endpoint` | string \| null | For `dgx`: the Spark `--endpoints` URL(s). For `openrouter`: defaults to `https://openrouter.ai/api/v1` (rarely overridden). Null for `anthropic`. |
+| `model` | string | Model id. Claude id for `anthropic`; Spark model id for `dgx`; OpenRouter id (e.g. `anthropic/claude-sonnet-4`) for `openrouter`. Free-text. |
+| `api_key_source` | derived | `ANTHROPIC_API_KEY` (anthropic), none (dgx), `OPENROUTER_API_KEY` (openrouter). Never stored in tracked config. |
+
+**Validation rules**:
+- `backend == "openrouter"` requires `OPENROUTER_API_KEY` to be present in the environment; absence surfaces as an explicit error (FR-009), not a silent fallback.
+- `backend == "dgx"` requires a reachable `endpoint`; unreachable surfaces as a fast, explicit error (edge case: local hardware unreachable).
+- A synthesis-stage profile whose `model` is not on the synthesis-capable allow-list raises a **warning, not an error** (FR-014, R6).
+
+**Persistence**: backend/endpoint/model selections persist in `ui_state.yaml` under `ui.ensemble` (per-stage). The key (secret) is environment-only.
+
+---
+
+## 2. Pipeline State (derived, not stored)
+
+The campaign's position in the workflow. **Computed from disk on every read** (FR-002, FR-017) — never cached in the browser or written as a manifest (R4).
+
+| Field | Type | Derivation |
+|---|---|---|
+| `campaign_dir` | path | From the active config (`runtime.session_dir` / campaign root). |
+| `stages` | list of Stage | One per pipeline stage (below), each with a computed status. |
+| `current_stage` | derived | First stage that is not `complete`. |
+
+There are **no state transitions stored** — the state is a pure function of which artifacts exist. "Transition" happens implicitly when a stage's artifacts appear on disk.
+
+---
+
+## 3. Stage
+
+One step in the ordered workflow. Status is derived from artifact presence (R4).
+
+| Field | Type | Notes |
+|---|---|---|
+| `id` | enum | `extract` \| `bundle` \| `synthesize` \| `review` |
+| `label` | string | Human label for the UI. |
+| `status` | enum `not_started` \| `complete` (\| `running` transient) | Derived from artifacts; `running` is an in-flight UI state only. |
+| `backend_profile` | Backend Profile \| null | Null for non-LLM stages (e.g. the `review` gate, the deterministic threads/recent-events renders). |
+| `artifacts` | list of Artifact | What this stage reads and writes. |
+| `gate` | Checkpoint \| null | A blocking human checkpoint attached to this stage, if any. |
+
+**Stage → artifact / gate map** (the concrete pipeline):
+
+| Stage | Backend? | Reads | Writes | Completion predicate | Gate |
+|---|---|---|---|---|---|
+| `extract` | yes (extract) | `docs/chapters/chapter_*.md` | `docs/ensemble/per_chapter//merged.json`, root `merged.json` | per-chapter `merged.json` exist for the glob | — |
+| `bundle` | yes (extract) | `merged.json`, `aliases.json`, `--known-names` | `docs/ensemble/state_dossiers/*.md`, `merged_dossiers/*.md` | dossier files exist | **scope review** (`--list`), **alias correction** |
+| `synthesize` | yes (synthesis) | `merged_dossiers/*.md`, `threads.md`, `recent_events.md` | `docs/{world_state,campaign_state,party,planning}_draft.md` | `*_draft.md` exist | — |
+| `review` | no | `*_draft.md`, live docs | (promotion writes live docs, human-initiated) | live docs updated by operator | **diff-before-promote** |
+
+---
+
+## 4. Checkpoint / Gate
+
+A human-judgment point that blocks automatic advancement (FR-010, FR-011, Principle II). The UI represents it; the *decision* happens in Claude/CLI (Principle IX).
+
+| Field | Type | Notes |
+|---|---|---|
+| `id` | enum | `scope_review` \| `alias_correction` \| `diff_promote` |
+| `stage_id` | enum | The stage it gates. |
+| `satisfied` | bool (operator-confirmed) | The UI does not auto-satisfy; the operator confirms after doing the work. |
+| `handoff` | description | What to do in Claude/CLI (e.g. "run `--list`, review scope", "edit `aliases.json`", "`diff` draft vs live, then promote"). |
+| `interchange_files` | list of path | The files the operator edits/reviews (e.g. `aliases.json`, `*_draft.md`) — the contract between UI, CLI, and chat (FR-012, FR-017). |
+
+**Rule**: a gate is never bypassed by the pipeline; `synthesize` must not consume `bundle` output until `scope_review`/`alias_correction` are operator-confirmed (Principle II — no LLM output feeds another across a precision boundary without a human gate).
+
+---
+
+## 5. Artifact
+
+A file produced or consumed by a stage — the unit of interchange (FR-004, FR-017).
+
+| Field | Type | Notes |
+|---|---|---|
+| `path` | path | Absolute or campaign-relative; the source of truth. |
+| `kind` | enum | `chapter` \| `facts` \| `dossier` \| `threads` \| `recent_events` \| `draft` \| `live_doc` \| `aliases` \| `known_names`. |
+| `produced_by` | stage id \| null | Which stage wrote it (null for human-authored inputs). |
+| `backend_used` | string \| null | For LLM-produced artifacts: the backend+model recorded with the output (FR-008). |
+| `exists` | bool | Drives stage status. |
+
+**Provenance rule (FR-008)**: every LLM-produced artifact records which backend and model produced it (e.g. a frontmatter/comment line). This is how a mixed run (extract on OpenRouter, synthesize on Anthropic) stays auditable.
+
+---
+
+## 6. Grounding Document (draft / live)
+
+The four targets, with a hard draft/live distinction (Principle I, FR-013).
+
+| Field | Type | Notes |
+|---|---|---|
+| `name` | enum | `world_state` \| `campaign_state` \| `party` \| `planning`. |
+| `draft_path` | path | `docs/_draft.md` — what synthesis writes. |
+| `live_path` | path | `docs/.md` — only the operator promotes to here. |
+
+**Rule**: the workflow writes drafts only; the UI never auto-overwrites a live doc; promotion is an explicit operator action (SC-005).
+
+---
+
+## Config schema addition (`server/config_models.py`)
+
+A new `EnsembleSection` added to `UISection`, registered in `UI_SECTION_NAMES`:
+
+```
+ui.ensemble:
+ campaign_dir: str
+ chapters_glob: str # default docs/chapters/chapter_*.md
+ extract: { backend, endpoint, model } # Backend Profile
+ synthesize: { backend, endpoint, model } # Backend Profile (independent of extract)
+ known_names: [str]
+ aliases_path: str
+```
+
+No secret fields. Mirrors existing `SessionDocSection`'s `backend`/`dgx_endpoint`/`dgx_model` precedent (`config_models.py`).
diff --git a/specs/001-ensemble-workflow-ui/plan.md b/specs/001-ensemble-workflow-ui/plan.md
new file mode 100644
index 0000000..75738c9
--- /dev/null
+++ b/specs/001-ensemble-workflow-ui/plan.md
@@ -0,0 +1,126 @@
+# Implementation Plan: Ensemble Grounding-Doc Workflow UI
+
+**Branch**: `001-ensemble-workflow-ui` | **Date**: 2026-06-27 | **Spec**: [spec.md](./spec.md)
+
+**Input**: Feature specification from `specs/001-ensemble-workflow-ui/spec.md`
+
+## Summary
+
+Add a dedicated, stepped UI surface that mechanizes the ensemble grounding-doc pipeline (extraction → fact bundling → synthesis → review/promote), deriving stage status from files on disk and streaming each mechanical step's output, while preserving the human-judgment checkpoints (scope review, alias correction, diff-before-promote) as handoffs to a Claude conversation or the CLI. Make each LLM-bearing stage backend-selectable, **adding OpenRouter** alongside the existing local-hardware (DGX/Spark) and Anthropic (Claude) options, independently per stage.
+
+Technical approach, in one line per layer:
+
+- **Seam (`campaignlib/api`)**: add an OpenRouter branch to `make_client` so OpenRouter is reached through the *one* LLM seam (Principle V) — a real API key from the environment and OpenRouter model ids, not the dgxlib registry.
+- **CLI (engine)**: plumb a uniform `--backend` / `--endpoint` / `--model` selection into the four synthesis scripts (`synthesise_world_state.py`, `campaign_state.py`, `party.py`, `planning.py`) so synthesis can target DGX/Anthropic/OpenRouter — the extraction scripts already accept `--endpoints`/`--model` and only need the seam change.
+- **Server (face)**: a new `server/routers/ensemble.py` (mounted `/api/ensemble`) that shells out to those CLI scripts via `subprocess_runner` and exposes disk-derived stage status — never reimplementing pipeline logic (Principle VI).
+- **Frontend (face)**: a new `/ensemble` stepped page built on the existing `WizardShell` + `connectSSE` patterns, leaving the existing `/grounding` page untouched.
+
+## Technical Context
+
+**Language/Version**: Python 3.11+ (backend + CLI); TypeScript 5 / Vue 3 (frontend).
+
+**Primary Dependencies**: FastAPI + uvicorn (server); `anthropic` SDK and `openai` SDK (both already present — `openai` powers the DGX path today); `dgxlib` (local model registry); Vue 3 + Pinia + Vue Router; PyYAML. OpenRouter is reached via the existing `openai` SDK pointed at `https://openrouter.ai/api/v1`.
+
+**Storage**: Files on disk are the source of truth (Principle I) — chapter files, `docs/ensemble/per_chapter/*/merged.json`, `docs/ensemble/state_dossiers/*.md`, `merged_dossiers/*.md`, `*_draft.md`, live grounding docs. UI state in `ui_state.yaml` (`ui.ensemble` section); machine-local secrets/config in `.campaigngenerator.local.yaml` (gitignored) or environment.
+
+**Testing**: `pytest` (`tests/`), including the CI guard `tests/test_retrieve_render_isolation.py`. Frontend: existing Vite/Vue toolchain (no test mandate added here).
+
+**Target Platform**: Single-operator local workstation (WSL2 on Windows 11), local-first, intermittent network tolerated.
+
+**Project Type**: Web application (FastAPI backend + Vue 3 frontend) layered over a CLI engine.
+
+**Performance Goals**: Extraction is a long-running job (tens of minutes) — the UI streams progress over SSE and relies on the CLI's per-item resumability rather than expecting fast responses. Synthesis token cost stays bounded (~280K metered for a full Phandalin-scale refresh) by keeping extraction off the metered API.
+
+**Constraints**: One seam per external boundary (Principle V) — OpenRouter must route through `campaignlib`. CLI-first (Principle VI) — every UI step is a CLI invocation. Human checkpoints are blocking (Principle II). No browser-only pipeline state (Principles I/VIII). Drafts only; never auto-overwrite live docs (Principle I).
+
+**Scale/Scope**: One GM; campaigns up to ~45 chapters / ~1900 entities / ~860 known names (Phandalin is the reference scale).
+
+## Constitution Check
+
+*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
+
+The CampaignGenerator constitution (v1.2.0) has ten principles. This feature is, by the constitution's own words, the **canonical shape** for Principle IX, so alignment is load-bearing, not incidental. (Principle X — *Selection is Explicit* — was added during this feature, arising from its chapter picker; see the post-implement amendment in `tasks.md`.)
+
+| Principle | Gate for this feature | Verdict |
+|---|---|---|
+| I. Disk is Truth, Model is Draft | Stage status derived from files; synthesis writes `*_draft.md` only; promotion is a manual file act (FR-002, FR-013, FR-017). | ✅ PASS |
+| II. Human Checkpoint Non-Negotiable | Scope/alias/promote gates block auto-advance; UI never feeds one LLM stage's unreviewed output into the next across a precision boundary (FR-010, FR-011). | ✅ PASS |
+| III. Retrieval/Render Separated | New router only shells out; it issues neither retrieval (`retrieve`/`rpg_search`) nor render (`stream_api`/`call_api`) calls. `test_retrieve_render_isolation.py` stays green. | ✅ PASS (no new mixing) |
+| IV. Verbatim is Sacred | Extraction preserves `source_quote`; no step paraphrases transcripts. No new verbatim surface introduced. | ✅ PASS |
+| V. One Seam per Boundary | **The pivotal gate.** OpenRouter is a *new external dependency* and MUST be reached only through `campaignlib`'s `make_client`. No `import openai`/OpenRouter calls added in routers or scripts outside the seam. | ✅ PASS *by design* (see Research) |
+| VI. CLI is Engine, UI is Face | Backend selection is a CLI flag first; the router builds commands and streams via `subprocess_runner`, reimplementing nothing (FR-016). | ✅ PASS |
+| VII. Extract Once, Synthesize Deliberately | The pipeline *is* this shape; the plan adds no pass-collapsing. Extraction stays local/cheap; synthesis stays deliberate. | ✅ PASS |
+| VIII. State is Discoverable | The ensemble page reads campaign state from disk; what is done/pending is visible, not tribal (FR-002). | ✅ PASS |
+| IX. UI Mechanizes; Claude Converses | The whole feature: UI steps the sequence; judgment between steps happens in Claude/CLI; files are the interchange; the human is never trapped in the UI (FR-012, FR-016, FR-017). | ✅ PASS |
+| X. Selection is Explicit; No Silent "All" | Chapter picker stores the literal chosen set; extraction refuses an empty selection; "Select all" materializes every path. The CLI glob is exempt (explicit at the CLI). | ✅ PASS |
+
+**Authority & Human Checkpoint clause**: This plan is a draft reviewed against the constitution; it imposes no autonomous precision decision. The one risk surface — OpenRouter as a second LLM vendor — is contained to the single seam, which is exactly what Principle V demands.
+
+**Result**: No violations. Complexity Tracking left empty.
+
+## Project Structure
+
+### Documentation (this feature)
+
+```text
+specs/001-ensemble-workflow-ui/
+├── plan.md # This file (/speckit-plan command output)
+├── spec.md # Feature specification (/speckit-specify)
+├── research.md # Phase 0 output (/speckit-plan)
+├── data-model.md # Phase 1 output (/speckit-plan)
+├── quickstart.md # Phase 1 output (/speckit-plan)
+├── contracts/ # Phase 1 output (/speckit-plan)
+│ ├── api.md # HTTP endpoints for /api/ensemble
+│ └── cli.md # CLI backend-selection flag contract
+└── checklists/
+ └── requirements.md # Spec quality checklist (/speckit-specify)
+```
+
+### Source Code (repository root)
+
+```text
+# ── Seam: the one LLM boundary (Principle V) ──
+campaignlib/
+└── api/
+ ├── client.py # MODIFY: make_client() gains an "openrouter" backend branch
+ └── backends.py # MODIFY: OpenRouter client (OpenAI SDK + real api_key, no dgxlib registry)
+
+# ── CLI engine (Principle VI): backend selection plumbed into synthesis scripts ──
+synthesise_world_state.py # MODIFY: add --backend/--endpoint, pass to make_client()
+campaign_state.py # MODIFY: same (synthesize path)
+party.py # MODIFY: same (synthesize path)
+planning.py # MODIFY: same (synthesize path)
+# ensemble.py / ensemble_batch.py / ensemble_extract.py / facts_to_state.py
+# already accept --endpoints/--model → reach OpenRouter once the seam supports it
+
+# ── Server (face): new router, mirrors grounding.py ──
+server/
+├── main.py # MODIFY: include_router(ensemble.router, prefix="/api/ensemble")
+├── config_models.py # MODIFY: add EnsembleSection + backend-profile fields to UIState
+├── config.py # MODIFY (maybe): OpenRouter model id suggestions for the picker
+└── routers/
+ └── ensemble.py # NEW: stage runners (SSE) + disk-derived stage-status endpoints
+
+# ── Frontend (face): new stepped page, /grounding untouched ──
+frontend/src/
+├── router.ts # MODIFY: add /ensemble route tree
+├── views/
+│ ├── EnsembleWorkflow.vue # NEW: WizardShell host (mirrors SessionWorkflow.vue)
+│ └── ensemble/ # NEW: one component per stage
+│ ├── EnsembleSetup.vue # paths + per-stage backend selection
+│ ├── EnsembleExtract.vue # Stage 1 run + status
+│ ├── EnsembleBundle.vue # Stage 2 run + scope-review gate
+│ └── EnsembleSynthesize.vue # Stage 3 run + diff/promote gate
+└── stores/
+ └── config.ts # REUSE: ui.ensemble section via updateSection()
+
+# ── Tests ──
+tests/
+└── test_openrouter_seam.py # NEW: make_client("openrouter") routing + no out-of-seam imports
+```
+
+**Structure Decision**: Web-application layout already in place (`server/` + `frontend/` over root-level CLI scripts). This feature is purely additive at every layer — one new seam branch, four script flag additions, one new router, one new frontend page tree — and touches the existing `/grounding` surface not at all (FR-015).
+
+## Complexity Tracking
+
+> No constitution violations. No entries required.
diff --git a/specs/001-ensemble-workflow-ui/quickstart.md b/specs/001-ensemble-workflow-ui/quickstart.md
new file mode 100644
index 0000000..d7baaa6
--- /dev/null
+++ b/specs/001-ensemble-workflow-ui/quickstart.md
@@ -0,0 +1,104 @@
+# Quickstart / Validation Guide: Ensemble Grounding-Doc Workflow UI
+
+This guide proves the feature end-to-end. It assumes a campaign workspace with chapter files already prepared (the upstream spelling/known-names pass is out of scope here — see `docs/cli/ensemble_workflow.md`). Details of flags and endpoints live in `contracts/cli.md` and `contracts/api.md`; the data model is in `data-model.md`.
+
+## Prerequisites
+
+- A campaign workspace with `docs/chapters/chapter_*.md`.
+- `ANTHROPIC_API_KEY` set (for the Anthropic synthesis path / regression check).
+- `OPENROUTER_API_KEY` set (for the OpenRouter path).
+- For the DGX path: at least one reachable Spark endpoint (`/spark-status`); optional if validating only Anthropic + OpenRouter.
+- Server + frontend running via `./startup`.
+
+---
+
+## Validation 1 — Seam: OpenRouter routes through `make_client` (Principle V)
+
+```bash
+python -m pytest tests/test_openrouter_seam.py -q
+```
+
+**Expected**: `make_client(backend="openrouter")` returns the OpenRouter client; a missing `OPENROUTER_API_KEY` raises a clear error; no module outside `campaignlib/api` imports the OpenRouter client. (Maps to FR-007, FR-018; R1.)
+
+## Validation 2 — Regression: existing Anthropic path unchanged (FR-015, SC-006)
+
+```bash
+# Old per-tool path and the synthesis scripts with NO new flags must be byte-identical.
+python -m pytest tests/ # full suite incl. test_retrieve_render_isolation.py
+# Spot check: synthesise_world_state.py with no --backend builds the same command/output as before.
+```
+
+**Expected**: full suite green; the isolation guard passes (router added no retrieval/render mixing); default synthesis still hits Anthropic.
+
+## Validation 3 — Stage status is disk-derived (FR-002, FR-017)
+
+```bash
+curl -s "http://localhost:8000/api/ensemble/status?campaign_dir=$PWD&chapters=docs/chapters/chapter_*.md" | python -m json.tool
+```
+
+**Expected**: with no prior run, `extract` is `current_stage`. After files appear under `docs/ensemble/per_chapter/*/merged.json` (Validation 5), the same call — with no server restart — reports `extract: complete`. Confirms no browser/server-cached state.
+
+## Validation 4 — Walk the pipeline from the UI, no CLI typing (US1, SC-001, SC-002)
+
+1. Open the app → navigate to **Ensemble Workflow** (`/ensemble`). Confirm it is a distinct page from **Grounding Docs** (`/grounding`), which is unchanged (US4).
+2. **Setup** step: set chapter glob and pick a backend for *extract* and (independently) for *synthesize* (US2).
+3. **Extract** step: Run → watch SSE progress stream → on completion, the page lists per-chapter artifacts.
+4. Reload the page → Extract shows **complete** (disk-derived).
+
+**Expected**: an operator who has not read the workflow doc reaches the synthesis stage without typing a command (SC-002).
+
+## Validation 5 — Per-stage backend mix, incl. OpenRouter with local box down (US2, SC-003, SC-008)
+
+```bash
+# Simulate local hardware unreachable, then drive extraction via OpenRouter from the UI's Extract step.
+# (Equivalent CLI the UI runs — proves CLI-first, FR-016:)
+CG_BACKEND=openrouter python ensemble_batch.py \
+ --chapters 'docs/chapters/chapter_*.md' --per-chapter-dir docs/ensemble/per_chapter \
+ --out docs/ensemble/merged.json --model anthropic/claude-sonnet-4
+```
+
+Then synthesize on a *different* backend from the UI's Synthesize step (e.g. Anthropic):
+
+```bash
+python synthesise_world_state.py --backend anthropic \
+ --dossiers 'docs/ensemble/merged_dossiers/*.md' --dossier-min-facts 10 \
+ --output docs/world_state_draft.md
+```
+
+**Expected**: extraction completes against OpenRouter with the local box down (SC-003); each artifact records the backend that produced it (FR-008, SC-008); a full refresh is achievable with mixed backends.
+
+## Validation 6 — Human checkpoints block auto-advance (US3, Principle II)
+
+1. After Extract, the UI presents the **scope-review** gate (`bundle --list`) and does **not** auto-run aggregation.
+2. Edit `docs/ensemble/aliases.json` from the CLI/chat → return to the UI → the alias-correction gate reflects the edited file **without** re-running any LLM step (FR-012).
+3. Proceed to Synthesize → reach the **diff-before-promote** gate.
+
+**Expected**: aggregation never consumes extraction output until the operator confirms scope/alias (Principle II); the gate's interchange files are visible to CLI and chat alike.
+
+## Validation 7 — Drafts only; promotion is explicit (FR-013, SC-005)
+
+```bash
+# Synthesis writes a draft, never the live doc.
+ls docs/world_state_draft.md # exists after synthesize
+git status docs/world_state.md # live doc UNCHANGED by synthesis
+# Promotion is the single explicit action:
+curl -s -X POST http://localhost:8000/api/ensemble/promote \
+ -H 'Content-Type: application/json' \
+ -d '{"draft":"docs/world_state_draft.md","live":"docs/world_state.md"}'
+```
+
+**Expected**: the synthesis step never modifies a live grounding doc; only the explicit promote action does. A `PUT /api/ensemble/file` targeting a live doc is rejected. Zero automatic live-doc overwrites across all runs (SC-005).
+
+## Validation 8 — Sub-Sonnet synthesis warning (FR-014, R6)
+
+Pick a known-weak model (e.g. a small open model id) for the **synthesize** stage and run.
+
+**Expected**: the stream includes a non-fatal warning that the model is below the assumed synthesis capability; the run still proceeds (warn, not block). Extraction with the same weak model produces no such warning.
+
+---
+
+## Done-when
+
+- Validations 1–8 pass.
+- `/grounding` behaves identically to before (US4, SC-006).
+- A full grounding-doc refresh is completable entirely from `/ensemble` (SC-001), including with the local box unreachable by selecting OpenRouter (SC-003).
diff --git a/specs/001-ensemble-workflow-ui/research.md b/specs/001-ensemble-workflow-ui/research.md
new file mode 100644
index 0000000..5969e65
--- /dev/null
+++ b/specs/001-ensemble-workflow-ui/research.md
@@ -0,0 +1,105 @@
+# Phase 0 Research: Ensemble Grounding-Doc Workflow UI
+
+All decisions below resolve the design unknowns implied by the spec and the Technical Context. The dominant constraint throughout is **Principle V (One Seam per Boundary)**: OpenRouter is a new external dependency and may be reached from exactly one place.
+
+---
+
+## R1 — How OpenRouter plugs into the existing LLM seam
+
+**Decision**: Add an `"openrouter"` backend branch to `make_client()` in `campaignlib/api/client.py`, backed by an OpenRouter-aware client in `campaignlib/api/backends.py`. OpenRouter is OpenAI-wire-compatible, so the client reuses the `openai` SDK pointed at `https://openrouter.ai/api/v1`, but with two differences from the existing `_OpenAICompatClient`:
+1. A **real API key** from `OPENROUTER_API_KEY` (the DGX client uses `api_key="not-needed"`).
+2. **Model resolution does not go through the dgxlib registry** — OpenRouter model ids (e.g. `anthropic/claude-sonnet-4`, `meta-llama/llama-3.1-70b-instruct`) are passed through verbatim, and per-call request extras (timeouts, thinking) use sensible defaults instead of `dgxlib.resolve_model_config`.
+
+**Rationale**:
+- `campaignlib/api/backends.py:_OpenAICompatClient` (lines 150–185) hard-imports `dgxlib` and calls `resolve_model_config(self.model_override)`. dgxlib only knows Spark-served models, so OpenRouter ids would fail registry lookup. A separate branch keeps the DGX path unchanged while still living **inside the one seam** Principle V mandates.
+- `make_client(endpoint, model_override, backend)` already has a `backend` parameter and a `$CG_BACKEND` env hook (`client.py:30–35`), and already precedes the endpoint/Anthropic branches. Adding `if backend == "openrouter":` is the minimal, idiomatic extension.
+- Routing through `make_client` means `stream_api`/`call_api` (which already branch on client type for `thinking`/cache extras) and their retry logic are inherited for free.
+
+**Alternatives considered**:
+- *Reuse `_OpenAICompatClient` by passing `endpoint=https://openrouter.ai/api/v1`*: rejected — it would still call `dgxlib.resolve_model_config` on OpenRouter ids and use `api_key="not-needed"`. Bending it to OpenRouter would entangle the DGX path with vendor-specific behavior.
+- *Add a new top-level module / `import openai` in the synthesis scripts*: rejected outright — a direct Constitution Principle V violation (a second place that crosses the LLM boundary).
+- *Use the `anthropic` SDK against OpenRouter's Anthropic-compat shim*: rejected — OpenRouter's first-class surface is the OpenAI wire format already used here; the `openai` SDK is already a dependency.
+
+---
+
+## R2 — Where the OpenRouter credential and model list live
+
+**Decision**: The API key comes from the `OPENROUTER_API_KEY` environment variable, mirroring how `ANTHROPIC_API_KEY` is handled today (CLAUDE.md: "`ANTHROPIC_API_KEY` must be set in the environment"). The server passes it through to subprocesses via `subprocess_runner`'s existing `env_extra` mechanism — it is never written to a tracked file. A small, editable list of suggested OpenRouter model ids is surfaced for the picker (alongside the existing `server/config.py:MODELS` Claude list and the DGX model id), but the operator may type any id.
+
+**Rationale**: Secrets stay out of `config.yaml`/`ui_state.yaml` (both tracked). `.campaigngenerator.local.yaml` (gitignored) is an acceptable fallback for a machine-local key, but environment-variable parity with Anthropic is the least surprising. The model id is free-text because OpenRouter's catalog changes faster than any hard-coded list.
+
+**Alternatives considered**:
+- *Store the key in `ui_state.yaml`*: rejected — it is tracked; secrets must not be committed.
+- *Fetch OpenRouter's live model catalog for the picker*: rejected for v1 — adds a network dependency at UI load (bad in Bear Valley) for marginal benefit; a static suggestion list plus free-text covers it.
+
+---
+
+## R3 — Backend selection surface across CLI stages
+
+**Decision**: Introduce a uniform selection convention across the LLM-bearing scripts:
+- **Synthesis scripts** (`synthesise_world_state.py`, `campaign_state.py`, `party.py`, `planning.py`) gain `--backend {anthropic,dgx,openrouter}` plus the already-conventional `--endpoint`/`--model`, and pass them into `make_client(...)`. They currently call `make_client()` with no args (Anthropic-only); default stays `anthropic` so existing invocations are byte-for-byte unchanged (FR-015).
+- **Extraction/aggregation scripts** (`ensemble.py`, `ensemble_batch.py`, `ensemble_extract.py`, `facts_to_state.py`) already accept `--endpoints`/`--dgx-endpoint`/`--model`; selecting OpenRouter for them is achieved by pointing the endpoint at OpenRouter and relying on the R1 seam branch (driven by `--backend openrouter` or `CG_BACKEND=openrouter`, which `make_client` already reads).
+
+**Rationale**: Honors Principle VI — the backend choice is a CLI capability first; the UI merely sets the flag. A single `--backend` vocabulary across scripts keeps the router's command-building uniform and the contract testable.
+
+**Alternatives considered**:
+- *Only support OpenRouter via env vars, no flags*: rejected — env-only selection is invisible state and harder to test per stage; the spec requires per-stage, run-time selection (FR-006, FR-018).
+- *A single global backend setting for the whole run*: rejected — the clarified scope is **per-stage** choice (extract on one backend, synthesize on another).
+
+---
+
+## R4 — Stage-status discovery from disk
+
+**Decision**: The router exposes read-only status endpoints that infer each stage's completion from artifact presence, reusing the pattern already in `grounding.py` (`/extracts`, `/extracts/{filename}`). Specifically: extraction complete ⇔ `docs/ensemble/per_chapter/*/merged.json` exist for the chapter glob; bundling complete ⇔ `docs/ensemble/state_dossiers/*.md` (and `merged_dossiers/*.md`) exist; synthesis complete ⇔ the relevant `*_draft.md` exist. No status is stored server-side or in the browser (Principles I/VIII, FR-002, FR-017).
+
+**Rationale**: `facts_to_state.py` and `ensemble_batch.py` are already resumable by checking for these exact files, so "does the file exist?" is the same predicate the CLI uses — the UI and CLI cannot disagree. Reusing `grounding.py`'s file-listing endpoints minimizes new surface.
+
+**Alternatives considered**:
+- *A status manifest file the router writes*: rejected — introduces a second source of truth that can drift from the actual artifacts; the artifacts already are the state.
+
+---
+
+## R5 — Long-running extraction over SSE
+
+**Decision**: Run each stage as a streamed subprocess via the existing `stream_subprocess()` (SSE `data:`/`event: done`), exactly as `grounding.py`/`session_workflow.py` do. Resumability comes from the CLI's existing per-chapter / per-entity skip-if-exists behavior; an interrupted run is restarted by re-invoking the same stage, which skips completed items. The doc's `tmux` guidance remains the recommended path for *very* long unattended runs; the UI targets attended runs and surfaces progress live.
+
+**Rationale**: No new long-job infrastructure is needed — the CLI is already resumable and the SSE plumbing already exists. This keeps the UI a thin face (Principle VI).
+
+**Alternatives considered**:
+- *A background job queue / persistent worker*: rejected for v1 — over-engineered for a single local operator; adds a daemon (a recurring tax the constitution warns against) for a workflow that is already resumable on disk.
+
+---
+
+## R6 — Synthesis-capability warning
+
+**Decision**: The UI warns (does not block) when a backend/model chosen for the **synthesis** stage is below the assumed capability bar (a model at least as capable as Sonnet). The signal is heuristic: a curated "synthesis-capable" allow-list (the Claude `MODELS` and a small set of frontier OpenRouter ids) versus everything else (local 3B/80B open models, which the workflow doc records as unable to synthesize). Extraction has no such warning — weak open models are expected and fine there.
+
+**Rationale**: Encodes the user's explicit statement that the workflow "assumes a model at least as powerful as Sonnet," and the doc's calibration finding that `Qwen3-Next-80B` "cannot handle synthesis." A warning, not a block, respects operator agency (it is their experiment to run).
+
+**Alternatives considered**:
+- *Hard block on sub-Sonnet synthesis*: rejected — contradicts the local-hardware exploration goal; the operator may deliberately want to calibrate a weak model on synthesis.
+
+---
+
+## R7 — Keeping the existing Anthropic workflow untouched
+
+**Decision**: The new ensemble page is a separate route tree (`/ensemble`) and a separate router (`/api/ensemble`); `GroundingDocs.vue` and `grounding.py` are not modified. The synthesis scripts default `--backend anthropic`, so the old `/grounding` invocations produce identical commands and identical results (FR-015, SC-006).
+
+**Rationale**: The user requires the old path preserved "until I decide to retire it." Physical separation at both router and view layers is the simplest guarantee against regression.
+
+**Alternatives considered**:
+- *Add an ensemble mode/tab inside `GroundingDocs.vue`*: rejected per the clarification (a new separate page was chosen), and because co-locating raises the risk of touching the old path.
+
+---
+
+## Summary of decisions
+
+| # | Decision | Primary principle upheld |
+|---|----------|--------------------------|
+| R1 | OpenRouter branch inside `make_client`/`backends.py` | V (one seam) |
+| R2 | `OPENROUTER_API_KEY` env var; free-text model id | I (no secrets on tracked disk) |
+| R3 | Uniform `--backend`/`--endpoint`/`--model` on synthesis scripts; default `anthropic` | VI (CLI first) |
+| R4 | Disk-derived stage status, reuse `grounding.py` pattern | I/VIII (disk is truth, discoverable) |
+| R5 | SSE subprocess streaming + CLI resumability; no new daemon | VI; "no recurring tax" |
+| R6 | Warn (not block) on sub-Sonnet synthesis backend | II/IX (human decides) |
+| R7 | Separate `/ensemble` route + router; old path defaults unchanged | (regression guard) |
diff --git a/specs/001-ensemble-workflow-ui/spec.md b/specs/001-ensemble-workflow-ui/spec.md
new file mode 100644
index 0000000..033851b
--- /dev/null
+++ b/specs/001-ensemble-workflow-ui/spec.md
@@ -0,0 +1,161 @@
+# Feature Specification: Ensemble Grounding-Doc Workflow UI
+
+**Feature Branch**: `001-ensemble-workflow-ui`
+
+**Created**: 2026-06-27
+
+**Status**: Draft
+
+**Input**: User description: "I want you to transform the docs/cli/ensemble_workflow.md into a feature that uses the UI to simplify the workflow management. Between steps of the UI, the user will interact with claude. The current feature is designed to only work against dgx and claude, I would like to have the ability to use openrouter as well. the feature should not replace the current workflow that uses anthropic. That feature assumes a model that is at least as powerful as sonnet and can be kept around until I decide to retire it."
+
+## Overview
+
+The ensemble grounding-doc workflow (`docs/cli/ensemble_workflow.md`) turns a campaign's chapter files into the four grounding documents (`world_state.md`, `campaign_state.md`, `party.md`, `planning.md`). It does this in stages: extract atomic facts cheaply on local hardware, bundle them into per-entity dossiers, let a human review scope, then spend metered tokens only on the final synthesis. Today the whole thing is a sequence of long, flag-heavy command-line invocations that the operator must remember and run in the right order, interleaved with manual review steps.
+
+This feature gives the operator a **UI surface that mechanizes the sequence** — it shows where the campaign is in the pipeline, runs each mechanical step on request, and surfaces the files each step produces — while preserving the judgment steps (scope review, alias correction, diff-before-promote) as handoffs to a Claude conversation or the CLI. It also makes each LLM-bearing stage **backend-selectable**, adding OpenRouter alongside the existing local-hardware (DGX/Spark) and Anthropic (Claude) options.
+
+The existing per-tool grounding-doc workflow on the current Grounding Docs page is **not** changed by this feature. It remains available, unmodified, until the operator chooses to retire it.
+
+## User Scenarios & Testing *(mandatory)*
+
+### User Story 1 - Walk the ensemble pipeline from a single UI surface (Priority: P1)
+
+The operator opens a dedicated ensemble-workflow page for the current campaign. The page shows the pipeline as an ordered set of stages (extract → bundle → synthesize → review/promote), reflects which stages have already produced output (discovered from files on disk), and lets the operator run the mechanical step for the current stage and watch its output stream. After a step finishes, the operator can see the files it produced and move to the next stage.
+
+**Why this priority**: This is the core value — replacing "remember the right command with the right flags, in the right order" with a guided, stateful surface. It delivers value even before OpenRouter exists, using only the backends the workflow supports today.
+
+**Independent Test**: With a campaign that already has chapter files, an operator who has never seen the CLI can run the extraction step, see per-chapter outputs appear, run the bundling step, and reach the synthesis stage — entirely from the page, without typing a command. The page correctly shows, on reload, which stages are already complete.
+
+**Acceptance Scenarios**:
+
+1. **Given** a campaign workspace with chapter files and no prior ensemble run, **When** the operator opens the ensemble page, **Then** the page shows the extraction stage as the next actionable step and later stages as not-yet-started.
+2. **Given** a completed extraction (per-chapter outputs exist on disk), **When** the operator reloads the page, **Then** the extraction stage is shown as complete and the bundling stage is shown as the next actionable step.
+3. **Given** the operator runs a stage, **When** the underlying step emits progress, **Then** the page streams that progress live and, on completion, lists the artifacts the step wrote.
+4. **Given** a stage whose outputs already exist, **When** the operator re-runs it, **Then** already-completed work is skipped (the run is resumable) and the page makes clear nothing was needlessly recomputed.
+
+---
+
+### User Story 2 - Choose the backend per stage, including OpenRouter (Priority: P2)
+
+For each LLM-bearing stage — extraction/aggregation and synthesis — the operator chooses which backend runs it: local hardware (DGX/Spark), Anthropic (Claude), or OpenRouter. The choices are independent: the operator can extract on one backend and synthesize on another. OpenRouter is a new option added without removing the existing two.
+
+**Why this priority**: It removes the workflow's hard dependency on having both a reachable local box and a Claude path. From a remote location with no local hardware, the operator can still run extraction (on OpenRouter); for synthesis, the operator can pick whichever frontier model they prefer. It builds on the stepped UI from US1.
+
+**Independent Test**: With the local box unreachable, an operator can select OpenRouter for extraction, run it successfully, then select Claude for synthesis and complete a grounding-doc refresh — all from the page.
+
+**Acceptance Scenarios**:
+
+1. **Given** the ensemble page, **When** the operator views a stage that uses an LLM, **Then** they can choose among local hardware, Anthropic, and OpenRouter as the backend for that stage.
+2. **Given** OpenRouter is selected for a stage, **When** the operator runs that stage, **Then** the step executes against OpenRouter and the page reports which backend and model produced the output.
+3. **Given** the operator extracts on OpenRouter and synthesizes on Anthropic, **When** the full pipeline completes, **Then** each stage's artifacts record the backend that produced them.
+4. **Given** a backend is unreachable or misconfigured, **When** the operator runs a stage against it, **Then** the page surfaces a clear failure (not a silent hang) and the operator can retry with a different backend without losing prior-stage output.
+
+---
+
+### User Story 3 - Drop to Claude or the CLI for the judgment between steps (Priority: P2)
+
+Between mechanical steps, the pipeline has human-judgment checkpoints: reviewing the entity scope list before aggregation, correcting name aliases, and diffing a draft against the live doc before promoting it. The UI represents these as explicit gates that point the operator to do the work in a Claude conversation or at the CLI. Because every step reads and writes files, the operator can leave the UI, make the change (e.g. edit an alias map, correct a draft, promote a reviewed draft), and return to a UI that reflects the new file state — losing nothing.
+
+**Why this priority**: This is the constitutional spine of the feature (the UI mechanizes; Claude converses). Without it the UI would either skip the precision decisions or try to absorb them — both of which break the workflow's correctness guarantees. It is P2 because US1 is usable for the mechanical steps before the gates are formalized, but the feature is not trustworthy without it.
+
+**Independent Test**: At the scope-review gate, the operator opens the entity list, makes a scope/alias correction outside the UI, and the UI — without re-running any LLM step — reflects the corrected scope before the operator proceeds to aggregation. At the promote gate, a draft is never written to a live grounding doc by the UI itself.
+
+**Acceptance Scenarios**:
+
+1. **Given** extraction is complete, **When** the operator reaches the scope-review gate, **Then** the UI presents the entity/scope list for review and does not proceed to aggregation until the operator confirms.
+2. **Given** the operator edits an alias map or scope input outside the UI, **When** they return, **Then** the UI reflects the updated files without having re-run any LLM step.
+3. **Given** a synthesized draft exists, **When** the operator reaches the promote gate, **Then** the UI offers to compare the draft against the live document but never overwrites a live grounding document automatically.
+4. **Given** any stage, **When** the operator inspects what that stage did, **Then** every input and output is a file on disk that is equally visible from the CLI and a Claude conversation.
+
+---
+
+### User Story 4 - Keep the existing Anthropic workflow available (Priority: P3)
+
+The operator who prefers the current per-tool grounding-doc path (each tool re-extracting from the chapter bible, synthesized by a Claude model at least as capable as Sonnet) continues to use it exactly as before. The new ensemble page is additive.
+
+**Why this priority**: It is a guardrail rather than new capability, but it must hold: the user explicitly wants the old path preserved until they decide to retire it.
+
+**Independent Test**: After this feature ships, an operator runs the existing Grounding Docs page exactly as before and gets the same behavior; nothing about that path changed.
+
+**Acceptance Scenarios**:
+
+1. **Given** the existing Grounding Docs page, **When** the operator uses it after this feature ships, **Then** its behavior is unchanged.
+2. **Given** the new ensemble page, **When** the operator navigates the app, **Then** the two workflows are clearly distinct surfaces and neither is a prerequisite for the other.
+
+---
+
+### Edge Cases
+
+- **Local hardware unreachable** (intermittent network at a remote location): selecting the local backend for a stage must fail fast with a clear message, not hang silently; the operator can switch that stage to OpenRouter or Anthropic and proceed.
+- **Backend produces empty output** (e.g. a reasoning model that emits only its thinking trace and no result): the stage must be reported as failed/empty, not silently recorded as complete.
+- **Underpowered synthesis model**: synthesis requires a model capable of prioritizing and organizing across many dossiers. When a backend/model that cannot do this is chosen for synthesis, the operator should be warned (the workflow assumes a model at least as capable as Sonnet for synthesis).
+- **Re-running a completed stage**: completed per-item work is skipped (resumable); the operator is not forced to recompute an expensive stage to make a small downstream change.
+- **Operator skips a judgment gate**: the pipeline does not auto-advance past a human checkpoint; scope, alias, and promote decisions remain blocking.
+- **Concurrent/duplicate runs of the same stage**: launching a stage that is already running must not corrupt shared working files.
+- **A draft is promoted, then re-synthesized**: promotion is a manual, file-level act; a fresh draft never silently clobbers the live doc.
+- **Mid-run backend interruption**: a long extraction interrupted partway can be resumed from its cached per-item progress rather than restarted from zero.
+
+## Requirements *(mandatory)*
+
+### Functional Requirements
+
+- **FR-001**: The system MUST provide a dedicated UI surface, separate from the existing Grounding Docs page, that presents the ensemble grounding-doc workflow as an ordered sequence of stages (extraction, fact bundling/aggregation, synthesis of each grounding doc, review/promotion).
+- **FR-002**: The system MUST derive and display each stage's completion status from the artifacts present on disk for the current campaign, so the displayed state survives a page reload and reflects work done outside the UI.
+- **FR-003**: The system MUST let the operator run the mechanical step for a stage from the UI and MUST stream that step's progress and output to the page in real time.
+- **FR-004**: The system MUST list the artifacts (files) a stage produced after it completes, and those artifacts MUST be the same files the CLI and a Claude conversation can read.
+- **FR-005**: Re-running a stage MUST reuse already-completed work where the underlying step supports resumption, and MUST NOT silently recompute completed items.
+- **FR-006**: For each LLM-bearing stage (extraction/aggregation and synthesis), the system MUST let the operator choose the backend independently from: local hardware (DGX/Spark), Anthropic (Claude), and OpenRouter.
+- **FR-007**: The system MUST support OpenRouter as a backend for both the extraction/aggregation stage and the synthesis stage.
+- **FR-008**: The system MUST record, with each stage's output, which backend and model produced it.
+- **FR-009**: The system MUST surface backend failures (unreachable endpoint, auth/config error, empty result) as explicit, actionable errors and MUST allow retrying a failed stage with a different backend without discarding prior-stage output.
+- **FR-010**: The system MUST represent the workflow's human-judgment checkpoints — scope/entity review before aggregation, name-alias correction, and diff-before-promote — as explicit gates that block automatic advancement of the pipeline.
+- **FR-011**: The system MUST NOT perform a precision decision (scope, ordering, attribution) on the operator's behalf, and MUST NOT feed one LLM stage's unreviewed output into the next across a checkpoint without operator confirmation.
+- **FR-012**: The system MUST allow the operator to perform any checkpoint's judgment work in a Claude conversation or at the CLI and then continue in the UI, with the UI reflecting the resulting file changes without re-running an LLM step.
+- **FR-013**: The system MUST write synthesis results to draft artifacts only, and MUST NOT automatically overwrite a live grounding document; promotion of a draft to a live document is an explicit, operator-initiated act.
+- **FR-014**: The system MUST warn the operator when a backend/model selected for the synthesis stage is below the capability the workflow assumes (a model at least as capable as Sonnet), since underpowered synthesis silently degrades the result.
+- **FR-015**: The system MUST leave the existing per-tool Anthropic grounding-doc workflow (the current Grounding Docs page) functionally unchanged and independently usable.
+- **FR-016**: Every step in the new workflow MUST be expressible and runnable equivalently from the CLI; the UI MUST NOT be the only way to perform any step.
+- **FR-017**: The system MUST NOT hold pipeline state that exists only in the browser; if a step produced something, it produced a file that is the source of truth for that state.
+- **FR-018**: OpenRouter backend configuration (credentials/endpoint/model selection) MUST be supplied through the system's existing configuration mechanism, not hard-coded, and MUST be selectable per stage at run time.
+
+### Key Entities *(include if data involved)*
+
+- **Pipeline state**: the current campaign's position in the ensemble workflow, derived entirely from which stage artifacts exist on disk; not stored in the browser.
+- **Stage**: one step in the ordered workflow (extraction, bundling/aggregation, per-doc synthesis, review/promotion), with a completion status, the artifacts it produces, and — for LLM-bearing stages — a selected backend.
+- **Backend profile**: a selectable execution target for an LLM-bearing stage — local hardware (DGX/Spark), Anthropic (Claude), or OpenRouter — including the model used and any reachability/config it needs.
+- **Checkpoint / gate**: a human-judgment point between stages (scope review, alias correction, diff-before-promote) that blocks automatic advancement and is satisfied via Claude/CLI.
+- **Artifact**: a file on disk produced or consumed by a stage (per-chapter facts, merged facts, per-entity dossiers, draft grounding docs, live grounding docs); the unit of interchange between UI, CLI, and Claude.
+- **Grounding document (draft / live)**: the four target docs (`world_state`, `campaign_state`, `party`, `planning`); the workflow writes drafts and the operator promotes them to live docs.
+
+## Success Criteria *(mandatory)*
+
+### Measurable Outcomes
+
+- **SC-001**: An operator can run a full grounding-doc refresh through the ensemble page — from chapter files to four reviewed drafts — without typing a single command-line invocation.
+- **SC-002**: An operator who has not memorized the workflow can identify the correct next step and run it without consulting `docs/cli/ensemble_workflow.md`, in their first session with the page.
+- **SC-003**: With local hardware unreachable, an operator can still complete a full grounding-doc refresh by selecting OpenRouter (and/or Anthropic) for the LLM-bearing stages.
+- **SC-004**: The metered-token cost of a full refresh through the UI is no higher than the same refresh run from the CLI today (i.e. extraction stays off the metered API when a local or OpenRouter open-model backend is chosen; only synthesis spends frontier tokens).
+- **SC-005**: No live grounding document is ever modified by the workflow without an explicit operator promotion action — measured as zero automatic overwrites of live docs across all runs.
+- **SC-006**: The existing per-tool Anthropic workflow produces identical results before and after this feature ships (no regression).
+- **SC-007**: After any stage runs, 100% of its inputs and outputs are files visible from the CLI; no pipeline state is recoverable only from the browser.
+- **SC-008**: For every LLM-bearing stage, the operator can independently select among at least three backends (local, Anthropic, OpenRouter), and the produced artifact records which one was used.
+
+## Assumptions
+
+- **Per-stage backend choice across both LLM stages** (from clarification): OpenRouter is selectable independently for extraction/aggregation and for synthesis; the operator may mix backends across stages (e.g. extract on OpenRouter, synthesize on Anthropic).
+- **New separate UI surface** (from clarification): the ensemble workflow lives on its own page/section; the existing Grounding Docs page is left in place and unchanged.
+- **Single operator, local-first**: the UI serves one GM on their own workstation; multi-user concurrency and access control are out of scope.
+- **Campaign workspace already exists**: the operator runs the page from within a campaign workspace that has chapter files (or the documented inputs); creating the workspace and preparing chapters is out of scope for this feature.
+- **Spelling/known-names/alias preparation remains a documented prerequisite**: this feature mechanizes the pipeline stages and their gates; it does not replace the upstream proper-noun consistency pass, which the operator performs as today.
+- **Synthesis assumes a capable model**: the synthesis stage assumes a model at least as capable as Sonnet; weaker open models may be fine for extraction/aggregation but are expected to underperform on synthesis, and the UI warns rather than blocks.
+- **Long-running stages**: extraction can take tens of minutes; the UI is expected to handle a long-running step (progress, resumability) rather than assume sub-second responses.
+- **Existing configuration mechanism is reused**: backend endpoints, models, and credentials (including OpenRouter) are provided through the project's existing configuration files/UI rather than a new bespoke store.
+- **Files are the contract**: all interchange between the UI, the CLI, and Claude conversations happens through files on disk; the UI never becomes the sole holder of workflow state.
+
+## Out of Scope
+
+- Replacing, modifying, or retiring the existing per-tool Anthropic grounding-doc workflow.
+- Running the synthesis stages on local/open models as the *primary* path (the "all-Spark synthesis" and "per-section fan-out" ideas in the workflow doc remain future exploration, not part of this feature).
+- Automating the human-judgment checkpoints (scope, alias, promotion) — these are deliberately preserved as human decisions.
+- Multi-user, remote-hosted, or access-controlled deployment of the UI.
+- Creating campaign workspaces, preparing chapter files, or running the upstream spelling/known-names preparation passes.
diff --git a/specs/001-ensemble-workflow-ui/tasks.md b/specs/001-ensemble-workflow-ui/tasks.md
new file mode 100644
index 0000000..a5f32d1
--- /dev/null
+++ b/specs/001-ensemble-workflow-ui/tasks.md
@@ -0,0 +1,286 @@
+---
+
+description: "Task list for Ensemble Grounding-Doc Workflow UI"
+---
+
+# Tasks: Ensemble Grounding-Doc Workflow UI
+
+**Input**: Design documents from `specs/001-ensemble-workflow-ui/`
+
+**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/api.md, contracts/cli.md, quickstart.md
+
+**Tests**: Targeted tests are included where the contracts specify behavior (the OpenRouter seam contract test, gate/promote guards, the Anthropic-path regression). This is not full TDD — it matches the constitution's "tested by name" expectation and the CI isolation guard.
+
+**Organization**: Tasks are grouped by user story. This feature is an **extension of the existing Vue app + FastAPI server** (same `./startup`, same nav) — not a new application. The existing `/grounding` (Anthropic per-tool) path is left untouched.
+
+## Format: `[ID] [P?] [Story] Description`
+
+- **[P]**: Can run in parallel (different files, no dependencies on incomplete tasks)
+- **[Story]**: US1 / US2 / US3 / US4
+
+## Path Conventions
+
+Web app over a CLI engine. Backend: `server/`, root-level CLI scripts, `campaignlib/`. Frontend: `frontend/src/`. Tests: `tests/`.
+
+---
+
+## Phase 1: Setup (Shared Infrastructure)
+
+**Purpose**: Minimal scaffolding for an additive feature in a mature codebase.
+
+- [X] T001 Document the `OPENROUTER_API_KEY` env var and confirm the `openai` SDK is importable, updating the Dependencies section of `CLAUDE.md` (parity with the existing `ANTHROPIC_API_KEY` note)
+- [X] T002 [P] Create the frontend stage-component directory `frontend/src/views/ensemble/` and an empty backend router stub `server/routers/ensemble.py` (module + `router = APIRouter()`, mirroring the header of `server/routers/grounding.py`)
+
+---
+
+## Phase 2: Foundational (Blocking Prerequisites)
+
+**Purpose**: The shared kernel every user story builds on — config schema, router mount, frontend route + shell + nav.
+
+**⚠️ CRITICAL**: No user story work can begin until this phase is complete.
+
+- [X] T003 Add an `EnsembleSection` Pydantic model (campaign_dir, chapters_glob, per-stage `extract`/`synthesize` backend profiles, known_names, aliases_path — per data-model.md §"Config schema addition") to `server/config_models.py` and register `"ensemble"` in `UI_SECTION_NAMES`
+- [X] T004 Mount the ensemble router at `/api/ensemble` via `app.include_router(...)` in `server/main.py` (alongside the existing routers; do not modify any existing registration)
+- [X] T005 Add the `/ensemble` route tree to `frontend/src/router.ts` and create `frontend/src/views/EnsembleWorkflow.vue` as a `WizardShell` host with the stage steps (Setup → Extract → Bundle → Synthesize → Review), mirroring `frontend/src/views/SessionWorkflow.vue`
+- [X] T006 [P] Add an "Ensemble Workflow" entry to the app's primary navigation in `frontend/src/App.vue`, placed beside the existing "Grounding Docs" link (which stays unchanged)
+- [X] T007 [P] Add shared router helpers to `server/routers/ensemble.py`: an `_sse_response()` and `_cmd_opt/_cmd_multi/_cmd_flag` set (copy the pattern from `server/routers/grounding.py`) and a `_resolve_ensemble_path()` that confines paths to the campaign workspace
+
+**Checkpoint**: The new page is reachable and empty; the router is mounted; config persists. User stories can begin.
+
+---
+
+## Phase 3: User Story 1 - Walk the ensemble pipeline from a single UI surface (Priority: P1) 🎯 MVP
+
+**Goal**: Step the operator through extract → bundle → synthesize → review from one page, with stage status derived from disk and each step's output streamed. Works with the **existing** backends (DGX/Anthropic); OpenRouter arrives in US2.
+
+**Independent Test**: With a campaign that has chapter files, run extraction from the page, see per-chapter artifacts appear, run bundling, reach synthesis — without typing a command; reload and confirm completed stages are still shown complete.
+
+### Tests for User Story 1
+
+- [X] T008 [P] [US1] Integration test: `GET /api/ensemble/status` reports `extract` as current with no run, then `extract: complete` once `docs/ensemble/per_chapter/*/merged.json` exist — in `tests/test_ensemble_status.py` (quickstart Validation 3)
+
+### Implementation for User Story 1
+
+- [X] T009 [US1] Implement disk-derived `GET /api/ensemble/status` (completion predicates per contracts/api.md §Status; no caching) in `server/routers/ensemble.py`
+- [X] T010 [US1] Implement `GET /api/ensemble/files` and `GET /api/ensemble/file` (list/read artifacts, mirror `grounding.py:/extracts`) in `server/routers/ensemble.py`
+- [X] T010a [US1] **(M4)** Implement a per-campaign, per-stage in-flight lock helper in `server/routers/ensemble.py` (lock file or in-process registry keyed by campaign+stage). ALL `/run/*` endpoints (T011–T013a) MUST acquire it on launch and return HTTP 409 "stage already running" if held — preventing concurrent writers from corrupting `per_chapter/` cache (the `ensemble_workflow.md` orphaned-worker trap). Released on stream completion.
+- [X] T011 [US1] Implement stage runner `GET /api/ensemble/run/extract` (builds `ensemble_batch.py`, SSE via `stream_subprocess`, resumable; acquires the T010a lock) in `server/routers/ensemble.py`
+- [X] T012 [US1] Implement stage runner `GET /api/ensemble/run/bundle` (builds `facts_to_state.py`, including `list=true` → `--list` no-model mode) in `server/routers/ensemble.py`
+- [X] T013 [US1] Implement stage runners `GET /api/ensemble/run/recent-events` (`build_recent_events.py`) and `GET /api/ensemble/run/synthesize` (dispatch on `doc` to the four synthesis scripts; reject `output` that targets a live doc) in `server/routers/ensemble.py`
+- [X] T013a [US1] **(M1)** Implement stage runner `GET /api/ensemble/run/threads` (builds `facts_to_state.py --types thread --render-only`, deterministic/no-model, writes `docs/ensemble/threads.md`) in `server/routers/ensemble.py`, symmetric with `/run/recent-events`. This is the chronological-spine input fed to `/run/synthesize --threads` (contracts/api.md, data-model.md §Stage). Surface it in `EnsembleBundle.vue` (T016).
+- [X] T014 [P] [US1] Build `frontend/src/views/ensemble/EnsembleSetup.vue` — campaign dir + chapter glob inputs **plus known-names (multi-path) and aliases-path inputs (M2)**, all persisted via `config.updateSection('ensemble', …)`. The bundle endpoint (T012) and the US3 alias gate (T036) read `known_names`/`aliases_path` from this config.
+- [X] T015 [P] [US1] Build `frontend/src/views/ensemble/EnsembleExtract.vue` — run `/run/extract` via `connectSSE`/`RunPanel`, stream progress, list produced artifacts, reflect status
+- [X] T016 [P] [US1] Build `frontend/src/views/ensemble/EnsembleBundle.vue` — run `/run/bundle` (and the `--list` scope view), stream output, list dossiers
+- [X] T017 [P] [US1] Build `frontend/src/views/ensemble/EnsembleSynthesize.vue` — run `/run/synthesize` per doc, write `*_draft.md`, list drafts
+- [X] T018 [US1] Wire the `WizardShell` steps in `EnsembleWorkflow.vue` to `GET /api/ensemble/status` so stage completion (disk-derived) drives step state and survives reload
+
+**Checkpoint**: A full extract→bundle→synthesize→draft walk is doable from the page using DGX/Anthropic. MVP complete.
+
+---
+
+## Phase 4: User Story 2 - Choose the backend per stage, including OpenRouter (Priority: P2)
+
+**Goal**: Make extraction/aggregation and synthesis backend-selectable independently among DGX, Anthropic, and **OpenRouter** — OpenRouter reached only through the single `campaignlib` seam (Principle V).
+
+**Independent Test**: With the local box unreachable, select OpenRouter for extraction, run it, then select Anthropic for synthesis and complete a refresh; each artifact records the backend used.
+
+### Tests for User Story 2
+
+- [X] T019 [P] [US2] Contract test `tests/test_openrouter_seam.py`: `make_client(backend="openrouter")` returns the OpenRouter client; missing `OPENROUTER_API_KEY` raises; no module outside `campaignlib/api` constructs it (contracts/cli.md §Seam)
+- [X] T019a [P] [US2] **(M5)** Integration test `tests/test_backend_retry_resume.py`: fail a stage partway on backend A, retry on backend B, and assert (1) prior-stage artifacts intact, (2) the failed stage resumes (skip-if-exists) rather than restarts, (3) no empty/partial `merged.json` counts as complete (locks SC-003; also exercises the M3 guard from T027a)
+
+### Implementation for User Story 2
+
+- [X] T020 [US2] Implement `_OpenRouterClient` in `campaignlib/api/backends.py` (OpenAI SDK at `https://openrouter.ai/api/v1`, real `OPENROUTER_API_KEY`, model id passed verbatim — no dgxlib lookup, `OPENROUTER_BASE_URL` override, Anthropic-shaped `.messages` façade). **(M3 prevention)** Honors a no-thinking request extra (per-call and via `DGX_NO_THINKING`/equivalent env) so extraction can suppress reasoning traces — the dgxlib `thinking_default: false` safety net does not apply on this path.
+- [X] T021 [US2] Add the `backend == "openrouter"` branch to `make_client()` in `campaignlib/api/client.py` (precedence: claude-code → openrouter → dgx endpoint → Anthropic default) — depends on T020
+- [X] T022 [P] [US2] Add `--backend {anthropic,dgx,openrouter}` + `--endpoint` flags to `synthesise_world_state.py` and thread them into its `make_client(...)` call (default `anthropic` ⇒ unchanged)
+- [X] T023 [P] [US2] Add the same `--backend`/`--endpoint` flags to `campaign_state.py`, threaded into `make_client(...)`
+- [X] T024 [P] [US2] Add the same `--backend`/`--endpoint` flags to `party.py`, threaded into `make_client(...)`
+- [X] T025 [P] [US2] Add the same `--backend`/`--endpoint` flags to `planning.py`, threaded into `make_client(...)`
+- [X] T026 [US2] Verify the extraction/aggregation scripts reach OpenRouter via `CG_BACKEND=openrouter` + an OpenRouter `--model` (no script edit expected for `ensemble_batch.py`/`facts_to_state.py`); add a `--backend` pass-through only if needed for symmetry
+- [ ] T027 [US2] Stamp backend+model provenance into LLM-produced outputs (synthesis drafts and `facts_to_state.py` dossiers) where each script already records metadata (FR-008) — sequential, touches the synthesis scripts + `facts_to_state.py`
+- [X] T027a [US2] **(M3 detection)** Add an empty-output guard in the seam (`campaignlib/api`: treat empty/whitespace `content` from any backend as an error, not a result) and ensure the extraction/aggregation/synthesis scripts fail loudly (non-zero exit) and write NO empty/partial artifact when output is empty — so a silently-empty run never flips disk-derived status (FR-002) to "complete" (spec edge case; FR-009). Covered by T019a.
+- [X] T028 [US2] Add `backend`/`endpoint`/`model` query params to all `/api/ensemble/run/*` endpoints and inject `ANTHROPIC_API_KEY`/`OPENROUTER_API_KEY` via `stream_subprocess` `env_extra` (never as query params) in `server/routers/ensemble.py`
+- [X] T029 [US2] Add a synthesis-capability allow-list to `server/config.py` and surface a non-fatal warning in `/api/ensemble/run/synthesize` when a sub-Sonnet model is chosen for synthesis (FR-014, R6) in `server/routers/ensemble.py`
+- [X] T030 [P] [US2] Add per-stage backend selectors (extract + synthesize, independent) to `frontend/src/views/ensemble/EnsembleSetup.vue`, persist to `ui.ensemble`, and display the recorded backend on produced artifacts
+
+**Checkpoint**: Each LLM stage runs on any of the three backends, mixable; OpenRouter lives only in the seam.
+
+---
+
+## Phase 5: User Story 3 - Drop to Claude or the CLI for the judgment between steps (Priority: P2)
+
+**Goal**: Represent the human-judgment checkpoints (scope review, alias correction, diff-before-promote) as blocking gates satisfied in Claude/CLI; files are the interchange; the UI never auto-advances past a precision boundary and never auto-overwrites a live doc.
+
+**Independent Test**: At the scope gate, an alias edit made outside the UI is reflected on return without re-running any LLM step; at the promote gate, a draft reaches a live doc only via the explicit promote action.
+
+### Tests for User Story 3
+
+- [X] T031 [P] [US3] Integration test: `/api/ensemble/run/synthesize` rejects an `output` pointing at a live grounding doc; `PUT /api/ensemble/file` to a live doc is rejected; `POST /api/ensemble/promote` is the only writer of live docs — in `tests/test_ensemble_gates.py` (quickstart Validation 6/7)
+
+### Implementation for User Story 3
+
+- [X] T032 [US3] Implement `PUT /api/ensemble/file` (path-validated, confined to workspace, **rejects live grounding docs**) in `server/routers/ensemble.py`
+- [X] T033 [US3] Implement `GET /api/ensemble/diff` (unified diff draft vs live, read-only) in `server/routers/ensemble.py`
+- [X] T034 [US3] Implement `POST /api/ensemble/promote` (copy reviewed draft → live; restricted to the four known grounding docs) in `server/routers/ensemble.py`
+- [X] T035 [P] [US3] Add the scope-review gate to `frontend/src/views/ensemble/EnsembleBundle.vue` — show the `--list` output, block advancement to aggregation until the operator confirms
+- [X] T036 [US3] Add the alias-correction gate to `frontend/src/views/ensemble/EnsembleBundle.vue` — edit `aliases.json` via the file endpoints (or hand off to CLI/chat) and reflect external edits without re-running an LLM step — same file as T035, sequential
+- [X] T037 [P] [US3] Add the diff-before-promote gate to `frontend/src/views/ensemble/EnsembleSynthesize.vue` — render the `/diff`, expose an explicit **Promote** button calling `/promote`, never auto-write
+- [X] T038 [US3] Reflect gate confirmation state in `EnsembleWorkflow.vue` so the wizard cannot skip an unsatisfied gate
+
+**Checkpoint**: Aggregation never consumes extraction output until scope/alias are confirmed; promotion is always explicit.
+
+---
+
+## Phase 6: User Story 4 - Keep the existing Anthropic workflow available (Priority: P3)
+
+**Goal**: Guarantee the existing per-tool Anthropic grounding-doc path (the `/grounding` page) is unchanged and independently usable.
+
+**Independent Test**: After this feature ships, the `/grounding` page behaves identically and the synthesis scripts with no new flags produce the same commands/output.
+
+### Tests for User Story 4
+
+- [X] T039 [P] [US4] Regression test: each synthesis script invoked with **no** `--backend`/`--endpoint` constructs the same `make_client()` (Anthropic) path and output as before — in `tests/test_synthesis_backend_default.py` (SC-006)
+
+### Implementation for User Story 4
+
+- [X] T040 [US4] Confirm `tests/test_retrieve_render_isolation.py` passes with the new router (the router must contain no retrieval/render calls) and run the full `pytest tests/` suite
+- [X] T041 [US4] Verify by inspection that `server/routers/grounding.py` and `frontend/src/views/GroundingDocs.vue` (and its nested views) are untouched by this feature; record the diff scope
+
+**Checkpoint**: New workflow and old workflow coexist; no regression.
+
+---
+
+## Phase 7: Polish & Cross-Cutting Concerns
+
+- [ ] T042 [P] Run all 8 validations in `quickstart.md` end-to-end and record results
+- [X] T043 [P] Update `docs/web/web_ui.md` to document the new Ensemble Workflow page and add a "run this from the UI" pointer near the top of `docs/cli/ensemble_workflow.md`
+- [X] T044 Consistency/cleanup pass on `server/routers/ensemble.py` (helper reuse, error messages match the fast-fail contract in FR-009)
+
+---
+
+## Dependencies & Execution Order
+
+### Phase Dependencies
+
+- **Setup (Phase 1)**: no dependencies.
+- **Foundational (Phase 2)**: depends on Setup; **blocks all user stories**.
+- **US1 (Phase 3)**: depends on Foundational. The MVP.
+- **US2 (Phase 4)**: depends on Foundational. Builds on US1's run endpoints (adds backend params) and Setup page (adds selectors), but the seam/CLI work (T019–T027) is independent of US1 and can proceed in parallel with Phase 3.
+- **US3 (Phase 5)**: depends on Foundational and on US1's bundle/synthesize endpoints + step components (it adds gates to them).
+- **US4 (Phase 6)**: depends on US2 (the `--backend` defaults it asserts) and on the new router existing; otherwise independent.
+- **Polish (Phase 7)**: after the desired stories are complete.
+
+### Story-level notes
+
+- **US2's seam + CLI tasks (T019–T027)** touch `campaignlib/` and root scripts — fully independent of the US1 UI and can be built first or in parallel.
+- **US3** extends `EnsembleBundle.vue` / `EnsembleSynthesize.vue` created in US1, so it follows US1 for those files.
+
+### Within `server/routers/ensemble.py`
+
+Tasks T007, T009–T013, **T010a, T013a**, T028, T029, T032–T034 all edit this one file → they are **sequential** with respect to each other (no `[P]`), even across stories. Plan to serialize router edits. Note T010a (the in-flight lock) must land before/with the `/run/*` endpoints since they acquire it.
+
+---
+
+## Parallel Opportunities
+
+- **Setup**: T002 [P].
+- **Foundational**: T006, T007 [P] (different files: `App.vue`, `ensemble.py`).
+- **US1 frontend**: T014, T015, T016, T017 [P] (four distinct `.vue` files). T008 [P] (test).
+- **US2 CLI**: T022, T023, T024, T025 [P] (four distinct scripts); T019, T019a [P] (tests); T030 [P] (frontend).
+- **US3**: T031 [P] (test); T035 and T037 [P] (different `.vue` files); T036 follows T035 (same file).
+- **US4**: T039 [P] (test).
+- **Polish**: T042, T043 [P].
+
+### Parallel example — US1 frontend
+
+```bash
+# After the run/status endpoints exist, build the four step components together:
+Task: "Build EnsembleSetup.vue" # T014
+Task: "Build EnsembleExtract.vue" # T015
+Task: "Build EnsembleBundle.vue" # T016
+Task: "Build EnsembleSynthesize.vue" # T017
+```
+
+### Parallel example — US2 CLI flags
+
+```bash
+# Independent scripts, same flag addition:
+Task: "Add --backend/--endpoint to synthesise_world_state.py" # T022
+Task: "Add --backend/--endpoint to campaign_state.py" # T023
+Task: "Add --backend/--endpoint to party.py" # T024
+Task: "Add --backend/--endpoint to planning.py" # T025
+```
+
+---
+
+## Implementation Strategy
+
+### MVP First (User Story 1 only)
+
+1. Phase 1 Setup → Phase 2 Foundational.
+2. Phase 3 US1.
+3. **STOP and VALIDATE**: walk extract→bundle→synthesize→draft from the page on DGX/Anthropic (quickstart Validations 3–4). Demo.
+
+### Incremental Delivery
+
+1. Setup + Foundational → page reachable.
+2. + US1 → walk the pipeline (MVP).
+3. + US2 → OpenRouter and per-stage backends (the headline ask; ship the seam test first).
+4. + US3 → blocking gates and explicit promotion.
+5. + US4 → regression guard locks the old path.
+
+### Recommended early track
+
+Because US2's seam (T019–T021) is the riskiest, novel surface (a new LLM vendor through the one seam) and is UI-independent, build and test it **in parallel with US1** even though it's P2 — it de-risks the headline requirement without blocking the MVP.
+
+---
+
+## Notes
+
+- `[P]` = different files, no incomplete dependency. All `server/routers/ensemble.py` edits are mutually sequential.
+- Every backend choice is a CLI flag first (Principle VI); the UI only sets it.
+- Drafts only; `POST /promote` is the sole live-doc writer (Principle I).
+- Gates block auto-advance (Principle II); files are the interchange (Principle IX).
+- Commit after each task or logical group; stop at any checkpoint to validate a story independently.
+
+---
+
+## Remediation Log (from `/speckit-analyze`)
+
+MEDIUM findings resolved into tasks (suffixed IDs avoid renumbering):
+
+| Finding | Decision | Task(s) |
+|---|---|---|
+| M1 — `threads.md` had no producer | B: dedicated endpoint | T013a |
+| M2 — Setup UI lacked known-names/aliases inputs | A: add to Setup | T014 (expanded) |
+| M3 — empty-output trap on OpenRouter path | A: prevention + detection | T020 (prevention), T027a (detection) |
+| M4 — no concurrent-run guard | A: server-side lock | T010a |
+| M5 — backend-retry-without-loss untested | A: integration test | T019a |
+
+LOW findings (A1, L1–L4, C1) were accepted as-is; see the analysis report. C1 (the pre-existing `ensemble_merge.py` embedding client outside the seam) is explicitly **not** extended by this feature — OpenRouter chat goes only through `campaignlib/api`.
+
+---
+
+## Post-implement enhancement — chapter picker (operator request)
+
+The single chapters-glob text field was too blunt: the operator needs to **select
+all / select one / pick a subset / sort** the chapters before extraction, not just
+type a glob. Resolved additively, CLI-first:
+
+| Layer | Change |
+|---|---|
+| Engine | `ensemble_batch.py --chapters` now `nargs="+"` — unions one or more globs/paths, de-dupes, sorts. Single-glob callers unchanged (Principle VI: the engine gains the capability). |
+| API | `GET /api/ensemble/chapters?glob=…` resolves globs → sorted file list with a disk-derived `extracted` flag (Principle I); `GET /run/extract` `chapters` is now a list (select-all = the glob, subset = the picked paths). |
+| Config | `EnsembleSection.chapters_selected: list[str]` — the explicit chosen set; empty == nothing selected. No secrets. |
+| UI | New `ChapterPicker.vue` (glob + Resolve, Select all / Select none / "only", natural sort ▲▼, per-chapter `extracted`/`pending` badge) wired into both Setup and Extract. |
+| Tests | `test_ensemble_chapters.py` (resolution, multi-glob union/dedupe, empty, workspace-confinement, **empty-selection refusal**), `test_ensemble_batch_chapters.py` (nargs contract). +7 passing, zero regressions. |
+
+### Constitution amendment — Principle X (operator-elevated)
+
+The operator ruled, as a matter of UX design, that **"there is no 'select all' that isn't explicit."** This was elevated to the constitution as **Principle X — Selection is Explicit; There is No Silent "All"** (v1.1.0 → **1.2.0**, MINOR). The chapter picker is now its concrete clause:
+
+- `chapters_selected == []` means *nothing selected* — it no longer falls back to the glob.
+- `GET /api/ensemble/run/extract` **refuses** an empty selection (SSE error, returncode 1) instead of expanding to "all"; the Run button is disabled until ≥1 chapter is picked.
+- "Select all" **materializes** every resolved path into `chapters_selected` — it is a deliberate act, not a default.
+- The CLI engine (`ensemble_batch.py`) is exempt: a glob typed at the CLI is itself explicit. The UI must never manufacture that act for the human.
diff --git a/synthesise_world_state.py b/synthesise_world_state.py
index 20d6783..f680c5b 100644
--- a/synthesise_world_state.py
+++ b/synthesise_world_state.py
@@ -68,6 +68,8 @@
from campaignlib import (
DEFAULT_MODEL,
+ add_backend_args,
+ client_from_args,
load_agent_prompt,
make_client,
stream_api,
@@ -334,8 +336,10 @@ def main() -> None:
"input for grounding (default: on). --no-quotes for "
"a clean baseline comparison against the extracts.")
parser.add_argument("--model", default=DEFAULT_MODEL,
- help=f"Claude model id (default: {DEFAULT_MODEL}). "
- f"Use claude-opus-4-7 for highest-quality synthesis.")
+ help=f"Model id (default: {DEFAULT_MODEL}). "
+ f"Use claude-opus-4-7 for highest-quality synthesis; "
+ f"an OpenRouter id (e.g. anthropic/claude-sonnet-4) for --backend openrouter.")
+ add_backend_args(parser)
parser.add_argument("--max-tokens", type=int, default=16000,
help="max_tokens for the synthesis call (default: 16000).")
parser.add_argument("--dump-input", default=None, metavar="FILE",
@@ -470,7 +474,7 @@ def main() -> None:
print(f"[Input: {len(user_prompt):,} chars]")
print("=" * 60)
- client = make_client()
+ client = client_from_args(args)
world_state = stream_api(
client,
system_prompt,
diff --git a/tests/test_ensemble_batch_chapters.py b/tests/test_ensemble_batch_chapters.py
new file mode 100644
index 0000000..73b127b
--- /dev/null
+++ b/tests/test_ensemble_batch_chapters.py
@@ -0,0 +1,16 @@
+"""ensemble_batch.py --chapters accepts one or more globs/paths (the engine
+contract behind the UI chapter picker's select-all / select-one / subset)."""
+
+import ensemble_batch
+
+
+def test_chapters_accepts_multiple_globs():
+ p = ensemble_batch._build_parser()
+ args = p.parse_args(["--chapters", "docs/a_*.md", "docs/b_03.md", "--out", "x.json"])
+ assert args.chapters == ["docs/a_*.md", "docs/b_03.md"]
+
+
+def test_chapters_single_value_still_works():
+ p = ensemble_batch._build_parser()
+ args = p.parse_args(["--chapters", "docs/chapters/chapter_*.md", "--out", "x.json"])
+ assert args.chapters == ["docs/chapters/chapter_*.md"]
diff --git a/tests/test_ensemble_chapters.py b/tests/test_ensemble_chapters.py
new file mode 100644
index 0000000..bcd7895
--- /dev/null
+++ b/tests/test_ensemble_chapters.py
@@ -0,0 +1,81 @@
+"""Tests for the chapter picker: /api/ensemble/chapters resolution + the
+multi-chapter extract contract (select all / select one / subset)."""
+
+import json
+
+from fastapi.testclient import TestClient
+
+from server.main import app
+
+client = TestClient(app)
+
+
+def _make_chapters(tmp_path):
+ d = tmp_path / "docs/chapters"
+ d.mkdir(parents=True)
+ for n in ("01", "02", "10"):
+ (d / f"chapter_{n}.md").write_text(f"# chapter {n}")
+
+
+def test_chapters_resolves_glob_with_extracted_flag(tmp_path, monkeypatch):
+ monkeypatch.chdir(tmp_path)
+ _make_chapters(tmp_path)
+ # chapter_02 already has a merged.json → must be flagged extracted.
+ pc = tmp_path / "docs/ensemble/per_chapter/chapter_02"
+ pc.mkdir(parents=True)
+ (pc / "merged.json").write_text(json.dumps({"facts": []}))
+
+ body = client.get("/api/ensemble/chapters",
+ params={"glob": "docs/chapters/chapter_*.md"}).json()
+ assert body["count"] == 3
+ by_stem = {c["stem"]: c for c in body["chapters"]}
+ assert set(by_stem) == {"chapter_01", "chapter_02", "chapter_10"}
+ assert by_stem["chapter_02"]["extracted"] is True
+ assert by_stem["chapter_01"]["extracted"] is False
+ # Paths are workspace-relative.
+ assert by_stem["chapter_01"]["path"] == "docs/chapters/chapter_01.md"
+
+
+def test_chapters_unions_multiple_globs_and_dedupes(tmp_path, monkeypatch):
+ monkeypatch.chdir(tmp_path)
+ _make_chapters(tmp_path)
+ body = client.get(
+ "/api/ensemble/chapters",
+ params=[("glob", "docs/chapters/chapter_01.md"),
+ ("glob", "docs/chapters/chapter_0*.md")], # overlaps chapter_01
+ ).json()
+ stems = sorted(c["stem"] for c in body["chapters"])
+ assert stems == ["chapter_01", "chapter_02"] # 01 not duplicated
+
+
+def test_chapters_empty_when_nothing_matches(tmp_path, monkeypatch):
+ monkeypatch.chdir(tmp_path)
+ _make_chapters(tmp_path)
+ body = client.get("/api/ensemble/chapters",
+ params={"glob": "docs/chapters/nope_*.md"}).json()
+ assert body == {"chapters": [], "count": 0}
+
+
+def test_chapters_confined_to_workspace(tmp_path, monkeypatch):
+ monkeypatch.chdir(tmp_path)
+ _make_chapters(tmp_path)
+ # An escaping glob resolves to nothing inside the workspace, never leaks.
+ body = client.get("/api/ensemble/chapters",
+ params={"glob": "../*.md"}).json()
+ assert body["count"] == 0
+
+
+# ── Principle X: no silent "all" ─────────────────────────────────────────────
+
+def test_extract_refuses_empty_selection(tmp_path, monkeypatch):
+ """An empty selection must be refused, never expanded to the full glob."""
+ monkeypatch.chdir(tmp_path)
+ _make_chapters(tmp_path)
+ # No chapters param at all → must refuse with a clear message, not run.
+ r = client.get("/api/ensemble/run/extract")
+ assert r.status_code == 200 # SSE channel opens, but carries a refusal
+ assert "No chapters selected" in r.text
+ assert '"returncode": 1' in r.text
+ # An explicitly empty list is refused identically (no glob fallback).
+ r2 = client.get("/api/ensemble/run/extract", params={"chapters": ""})
+ assert "No chapters selected" in r2.text
diff --git a/tests/test_ensemble_gates.py b/tests/test_ensemble_gates.py
new file mode 100644
index 0000000..73ee78e
--- /dev/null
+++ b/tests/test_ensemble_gates.py
@@ -0,0 +1,62 @@
+"""Gate guards: drafts-only synthesis, no live-doc writes, promote is the sole
+live-doc writer (FR-013, SC-005, spec US3)."""
+
+from fastapi.testclient import TestClient
+
+from server.main import app
+
+client = TestClient(app)
+
+
+def test_synthesize_rejects_live_doc_output(tmp_path, monkeypatch):
+ monkeypatch.chdir(tmp_path)
+ r = client.get("/api/ensemble/run/synthesize",
+ params={"doc": "world_state", "output": "docs/world_state.md"})
+ assert r.status_code == 400
+ assert "draft" in r.json()["detail"]
+
+
+def test_put_file_rejects_live_doc(tmp_path, monkeypatch):
+ monkeypatch.chdir(tmp_path)
+ (tmp_path / "docs").mkdir()
+ r = client.put("/api/ensemble/file", params={"path": "docs/world_state.md"},
+ json={"content": "clobbered"})
+ assert r.status_code == 403
+
+
+def test_put_file_allows_aliases(tmp_path, monkeypatch):
+ monkeypatch.chdir(tmp_path)
+ r = client.put("/api/ensemble/file",
+ params={"path": "docs/ensemble/aliases.json"},
+ json={"content": "{}"})
+ assert r.status_code == 200
+ assert (tmp_path / "docs/ensemble/aliases.json").read_text() == "{}"
+
+
+def test_promote_is_sole_live_writer(tmp_path, monkeypatch):
+ monkeypatch.chdir(tmp_path)
+ (tmp_path / "docs").mkdir()
+ draft = tmp_path / "docs/world_state_draft.md"
+ draft.write_text("promoted body")
+ live = tmp_path / "docs/world_state.md"
+ assert not live.exists()
+
+ r = client.post("/api/ensemble/promote",
+ json={"draft": "docs/world_state_draft.md", "live": "docs/world_state.md"})
+ assert r.status_code == 200
+ assert live.read_text() == "promoted body"
+
+
+def test_promote_rejects_non_grounding_target(tmp_path, monkeypatch):
+ monkeypatch.chdir(tmp_path)
+ (tmp_path / "docs").mkdir()
+ (tmp_path / "docs/world_state_draft.md").write_text("x")
+ r = client.post("/api/ensemble/promote",
+ json={"draft": "docs/world_state_draft.md", "live": "docs/notes.md"})
+ assert r.status_code == 400
+
+
+def test_path_traversal_rejected(tmp_path, monkeypatch):
+ monkeypatch.chdir(tmp_path)
+ r = client.get("/api/ensemble/file", params={"path": "../../etc/passwd"})
+ assert r.status_code == 400
diff --git a/tests/test_ensemble_status.py b/tests/test_ensemble_status.py
new file mode 100644
index 0000000..30e812b
--- /dev/null
+++ b/tests/test_ensemble_status.py
@@ -0,0 +1,34 @@
+"""Integration tests for /api/ensemble/status — disk-derived stage state (FR-002)."""
+
+import json
+
+from fastapi.testclient import TestClient
+
+from server.main import app
+
+client = TestClient(app)
+
+
+def test_status_extract_current_then_complete(tmp_path, monkeypatch):
+ monkeypatch.chdir(tmp_path)
+ (tmp_path / "docs/chapters").mkdir(parents=True)
+ (tmp_path / "docs/chapters/chapter_01.md").write_text("# ch1")
+
+ # No run yet → extract is the current stage.
+ r = client.get("/api/ensemble/status")
+ assert r.status_code == 200
+ body = r.json()
+ assert body["current_stage"] == "extract"
+ assert {s["id"] for s in body["stages"]} == {"extract", "bundle", "synthesize", "review"}
+ assert next(s for s in body["stages"] if s["id"] == "extract")["status"] == "not_started"
+
+ # Extraction artifacts appear on disk → status flips to complete with no caching.
+ pc = tmp_path / "docs/ensemble/per_chapter/chapter_01"
+ pc.mkdir(parents=True)
+ (pc / "merged.json").write_text(json.dumps({"facts": []}))
+
+ body2 = client.get("/api/ensemble/status").json()
+ extract = next(s for s in body2["stages"] if s["id"] == "extract")
+ assert extract["status"] == "complete"
+ assert extract["artifacts"] == 1
+ assert body2["current_stage"] == "bundle"
diff --git a/tests/test_openrouter_seam.py b/tests/test_openrouter_seam.py
new file mode 100644
index 0000000..434d482
--- /dev/null
+++ b/tests/test_openrouter_seam.py
@@ -0,0 +1,129 @@
+"""Contract tests for the OpenRouter backend seam (spec 001-ensemble-workflow-ui).
+
+Enforces Constitution Principle V (one seam per boundary): OpenRouter is reached
+ONLY through campaignlib.api, selection is uniform across scripts, a missing key
+fails loudly, and an empty model response is never silently accepted (M3).
+"""
+
+import argparse
+from pathlib import Path
+
+import pytest
+
+import campaignlib
+from campaignlib.api import client as client_mod
+from campaignlib.api import backends as backends_mod
+
+
+REPO_ROOT = Path(__file__).resolve().parent.parent
+
+
+# ── Routing: make_client dispatches to the OpenRouter client ────────────────
+
+def test_make_client_routes_openrouter(monkeypatch):
+ """backend='openrouter' must construct _OpenRouterClient (not Anthropic/DGX)."""
+ sentinel = object()
+ captured = {}
+
+ def fake_ctor(model_override=None):
+ captured["model_override"] = model_override
+ return sentinel
+
+ monkeypatch.setattr(client_mod, "_OpenRouterClient", fake_ctor)
+ out = client_mod.make_client(backend="openrouter", model_override="anthropic/claude-sonnet-4")
+ assert out is sentinel
+ assert captured["model_override"] == "anthropic/claude-sonnet-4"
+
+
+def test_cg_backend_env_selects_openrouter(monkeypatch):
+ """CG_BACKEND=openrouter selects the branch with no explicit arg."""
+ monkeypatch.setattr(client_mod, "_OpenRouterClient", lambda model_override=None: "OR")
+ monkeypatch.setenv("CG_BACKEND", "openrouter")
+ assert client_mod.make_client() == "OR"
+
+
+# ── Missing key fails loudly (no silent fallback) ───────────────────────────
+
+def test_missing_key_raises(monkeypatch):
+ monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
+ with pytest.raises(RuntimeError, match="OPENROUTER_API_KEY"):
+ backends_mod._OpenRouterClient()
+
+
+# ── No-thinking mapping (M3 prevention) ─────────────────────────────────────
+
+def test_extra_body_no_thinking_mapping(monkeypatch):
+ """thinking=False maps to OpenRouter's reasoning-disable control."""
+ monkeypatch.setenv("OPENROUTER_API_KEY", "x")
+ # Build only the method under test without constructing the SDK client.
+ inst = backends_mod._OpenRouterClient.__new__(backends_mod._OpenRouterClient)
+ assert inst.extra_body_for("m", thinking=False) == {"reasoning": {"enabled": False}}
+ assert inst.extra_body_for("m", thinking=True) == {}
+ monkeypatch.setenv("DGX_NO_THINKING", "1")
+ assert inst.extra_body_for("m", thinking=None) == {"reasoning": {"enabled": False}}
+
+
+# ── Empty-output guard (M3 detection) ───────────────────────────────────────
+
+@pytest.mark.parametrize("bad", [None, "", " ", "\n\t "])
+def test_require_nonempty_raises(bad):
+ with pytest.raises(RuntimeError, match="empty output"):
+ client_mod._require_nonempty(bad)
+
+
+def test_require_nonempty_passes_through():
+ assert client_mod._require_nonempty("real text") == "real text"
+
+
+# ── Uniform backend-selection vocabulary + backward compatibility ───────────
+
+def test_add_backend_args_defaults_anthropic():
+ p = argparse.ArgumentParser()
+ p.add_argument("--model", default="claude-sonnet-4-6")
+ campaignlib.add_backend_args(p)
+ ns = p.parse_args([])
+ assert ns.backend == "anthropic"
+ assert ns.endpoint is None
+
+
+def test_client_from_args_anthropic_is_backward_compatible(monkeypatch):
+ """Default backend must call make_client(None, None, None) so env still applies."""
+ seen = {}
+ monkeypatch.setattr(client_mod, "make_client",
+ lambda backend=None, endpoint=None, model_override=None:
+ seen.update(backend=backend, endpoint=endpoint, model_override=model_override))
+ ns = argparse.Namespace(backend="anthropic", endpoint=None, model="claude-sonnet-4-6")
+ client_mod.client_from_args(ns)
+ assert seen == {"backend": None, "endpoint": None, "model_override": None}
+
+
+def test_client_from_args_openrouter_passes_model(monkeypatch):
+ seen = {}
+ monkeypatch.setattr(client_mod, "make_client",
+ lambda backend=None, endpoint=None, model_override=None:
+ seen.update(backend=backend, endpoint=endpoint, model_override=model_override))
+ ns = argparse.Namespace(backend="openrouter", endpoint=None, model="anthropic/claude-sonnet-4")
+ client_mod.client_from_args(ns)
+ assert seen == {"backend": "openrouter", "endpoint": None,
+ "model_override": "anthropic/claude-sonnet-4"}
+
+
+# ── Principle V: OpenRouter constructed only inside campaignlib/api ──────────
+
+def test_no_out_of_seam_openrouter_construction():
+ """No module outside campaignlib/api may hard-wire OpenRouter's base URL or
+ construct the client directly — selection goes through make_client / env."""
+ offenders = []
+ seam = (REPO_ROOT / "campaignlib" / "api").resolve()
+ for py in REPO_ROOT.rglob("*.py"):
+ rp = py.resolve()
+ if seam in rp.parents or rp.parent == seam:
+ continue
+ if "/tests/" in str(rp) or rp.name.startswith("test_"):
+ continue
+ if ".specify" in rp.parts or "node_modules" in rp.parts:
+ continue
+ text = py.read_text(encoding="utf-8", errors="ignore")
+ if "openrouter.ai" in text or "_OpenRouterClient(" in text:
+ offenders.append(str(rp.relative_to(REPO_ROOT)))
+ assert not offenders, f"OpenRouter referenced outside the seam: {offenders}"
From f2a634455f5742990997f5aaa6d1e6d5564cda98 Mon Sep 17 00:00:00 2001
From: Kostadis
Date: Sun, 28 Jun 2026 19:13:33 -0700
Subject: [PATCH 3/3] =?UTF-8?q?feat(ensemble):=20run=20observability=20?=
=?UTF-8?q?=E2=80=94=20copyable=20command,=20live=20stream,=20abort=20+=20?=
=?UTF-8?q?durable=20record?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Implements spec 002-ensemble-run-observability (T001–T032; T033 = manual QA pending):
Engine / shared seam
- atomic_write_text / atomic_write_json in campaignlib.util (FR-014): temp-then-rename
so a SIGKILL never leaves a truncated merged.json or dossier at a resume-trusted path
- subprocess_runner: classify_result(), extended _save_run_log with result field (T003/T004)
- subprocess_runner: start_new_session=True + SIGTERM→wait(4 s)→SIGKILL process-group
teardown on every exit path — normal, explicit abort, and disconnect (T020/T021)
- subprocess_runner: emits `event: command` as first SSE event carrying the secret-free
invocation string; `done` payload includes aborted flag on signal exit (T006/T022)
- ensemble_merge + facts_to_state: atomic cache writes (T018/T019)
Frontend
- sse.ts: onCommand callback; onerror while running closes EventSource (no auto-restart)
and transitions to aborted — a network drop is an implicit abort (T007/T025, I1)
- useEnsembleRun: command state, aborted status, abort() method (T008/T024/T025)
- RunCommandBar.vue: monospace copyable command box (T009)
- EnsembleExtract/Bundle/Synthesize: RunCommandBar wired; Abort button while running;
aborted/connection-lost labels; success vs failure color distinction (T010/T014/T015/T026)
Tests (tests/test_subprocess_abort.py)
- secret-safety + explicit-selection-faithfulness (T011/T012)
- process-group kill on explicit abort and disconnect, child + grandchild (T027)
- grace→force timing; aborted record written (T028)
- atomic-write integrity under SIGKILL; lock released after abort (T029)
- non-ensemble SSE route regression — group-killed on disconnect, no orphan (T031)
- success and failure run records verified (T017)
Docs: web_ui.md + ensemble_workflow.md updated with abort/reconnect/per-run-log notes (T030)
Spec: specs/002-ensemble-run-observability/ — full artifact set committed
Co-Authored-By: Claude Sonnet 4.6
---
.specify/feature.json | 2 +-
CLAUDE.md | 13 +-
campaignlib/__init__.py | 4 +-
campaignlib/util.py | 34 ++
docs/cli/ensemble_workflow.md | 11 +
docs/web/web_ui.md | 10 +
ensemble_merge.py | 4 +-
facts_to_state.py | 3 +-
frontend/src/api/sse.ts | 16 +-
.../src/components/shared/RunCommandBar.vue | 77 ++++
.../src/views/ensemble/EnsembleBundle.vue | 40 +-
.../src/views/ensemble/EnsembleExtract.vue | 17 +-
.../src/views/ensemble/EnsembleSynthesize.vue | 21 +-
frontend/src/views/ensemble/useEnsembleRun.ts | 49 +-
server/routers/ensemble.py | 3 +
server/subprocess_runner.py | 191 +++++---
.../checklists/requirements.md | 36 ++
.../contracts/run-stream.md | 64 +++
.../data-model.md | 58 +++
specs/002-ensemble-run-observability/plan.md | 104 +++++
.../quickstart.md | 83 ++++
.../research.md | 69 +++
specs/002-ensemble-run-observability/spec.md | 143 ++++++
specs/002-ensemble-run-observability/tasks.md | 208 +++++++++
tests/test_subprocess_abort.py | 427 ++++++++++++++++++
25 files changed, 1604 insertions(+), 83 deletions(-)
create mode 100644 frontend/src/components/shared/RunCommandBar.vue
create mode 100644 specs/002-ensemble-run-observability/checklists/requirements.md
create mode 100644 specs/002-ensemble-run-observability/contracts/run-stream.md
create mode 100644 specs/002-ensemble-run-observability/data-model.md
create mode 100644 specs/002-ensemble-run-observability/plan.md
create mode 100644 specs/002-ensemble-run-observability/quickstart.md
create mode 100644 specs/002-ensemble-run-observability/research.md
create mode 100644 specs/002-ensemble-run-observability/spec.md
create mode 100644 specs/002-ensemble-run-observability/tasks.md
create mode 100644 tests/test_subprocess_abort.py
diff --git a/.specify/feature.json b/.specify/feature.json
index 69a4651..84dd9c2 100644
--- a/.specify/feature.json
+++ b/.specify/feature.json
@@ -1,3 +1,3 @@
{
- "feature_directory": "specs/001-ensemble-workflow-ui"
+ "feature_directory": "specs/002-ensemble-run-observability"
}
diff --git a/CLAUDE.md b/CLAUDE.md
index 19e5637..7402cdb 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -152,8 +152,13 @@ a CLI with `--backend openrouter --model `, or via the
For additional context about technologies to be used, project structure,
shell commands, and other important information, read the current plan:
-`specs/001-ensemble-workflow-ui/plan.md` (Ensemble Grounding-Doc Workflow UI —
-adds a stepped `/ensemble` UI page and OpenRouter as a per-stage LLM backend
-through the single `campaignlib` seam; leaves the existing `/grounding` Anthropic
-path unchanged).
+`specs/002-ensemble-run-observability/plan.md` (Ensemble Run Observability —
+makes an ensemble-stage run observable and controllable from the `/ensemble` UI:
+a copyable, secret-free reproducible command; live streamed output; an
+unambiguous succeeded/failed/aborted result plus a durable on-disk run record;
+and abort = graceful→force process-group kill, where a lost connection is an
+implicit abort. Engine correctness — process-group kill in the shared
+`server/subprocess_runner.py` seam, atomic per-unit cache writes in
+`ensemble_batch.py`/`facts_to_state.py` — stays in the CLI/seam layer, not the
+router. Predecessor: `specs/001-ensemble-workflow-ui/plan.md`.).
diff --git a/campaignlib/__init__.py b/campaignlib/__init__.py
index e914beb..fd90a37 100644
--- a/campaignlib/__init__.py
+++ b/campaignlib/__init__.py
@@ -26,7 +26,7 @@
load_agent_prompt,
assemble_docs,
)
-from .util import copy_to_clipboard, save_log
+from .util import copy_to_clipboard, save_log, atomic_write_text, atomic_write_json
from .api.client import (
make_client, call_api, call_api_with_tools, stream_api,
add_backend_args, client_from_args,
@@ -80,6 +80,8 @@
# util
"copy_to_clipboard",
"save_log",
+ "atomic_write_text",
+ "atomic_write_json",
# api — client
"make_client",
"add_backend_args",
diff --git a/campaignlib/util.py b/campaignlib/util.py
index dd6402a..42fdce2 100644
--- a/campaignlib/util.py
+++ b/campaignlib/util.py
@@ -1,7 +1,41 @@
"""Clipboard and timestamped-log helpers."""
+import json
+import os
from datetime import datetime
from pathlib import Path
+from typing import Any
+
+
+def atomic_write_text(path: Path | str, text: str, encoding: str = "utf-8") -> None:
+ """Write text to path atomically (FR-014: no partial file at the trusted path).
+
+ Writes to a temp file in the same directory as `path`, then renames via
+ os.replace — a POSIX atomic rename on the same filesystem. A SIGKILL during
+ write leaves at most a discardable .tmp file; the destination is always either
+ the complete new content or the previous version, never a partial write.
+ """
+ path = Path(path)
+ path.parent.mkdir(parents=True, exist_ok=True)
+ tmp = path.with_suffix(path.suffix + ".tmp")
+ try:
+ tmp.write_text(text, encoding=encoding)
+ os.replace(tmp, path)
+ except BaseException:
+ try:
+ tmp.unlink(missing_ok=True)
+ except OSError:
+ pass
+ raise
+
+
+def atomic_write_json(path: Path | str, obj: Any, indent: int = 2) -> None:
+ """Write obj as JSON to path atomically (FR-014).
+
+ Serialises as json.dumps(obj, indent=indent) + "\\n" to match the existing
+ ensemble_merge.py output format exactly, then delegates to atomic_write_text.
+ """
+ atomic_write_text(path, json.dumps(obj, indent=indent) + "\n")
def copy_to_clipboard(text: str) -> None:
diff --git a/docs/cli/ensemble_workflow.md b/docs/cli/ensemble_workflow.md
index 12622fe..bb9c4b5 100644
--- a/docs/cli/ensemble_workflow.md
+++ b/docs/cli/ensemble_workflow.md
@@ -968,6 +968,17 @@ Old per-tool API path (distill / planning / party / campaign_state each re-extra
---
+## Observability & abort
+
+All ensemble stages (extraction, bundling, synthesis) support:
+
+- **Copyable command**: Every UI-launched run emits the exact, secret-free invocation as the first SSE event. Paste it into a terminal in the campaign workspace to reproduce the run. No API key appears in the command — keys are inherited from the server environment, never on the command line.
+- **Live streaming**: Output lines appear incrementally as each chapter/step completes. The UI shows a "Running…" state while the run is active.
+- **Abort**: The Abort button (UI) closes the EventSource connection. The server observes the disconnect and group-kills the entire worker tree (SIGTERM → SIGKILL after ~4 s) — no orphaned `ensemble_extract` or `facts_to_state` subprocesses keep spending tokens.
+- **Disconnect = implicit abort**: Closing the browser tab or dropping the network mid-run is treated identically to clicking Abort. The UI does NOT auto-reconnect during a running stage (that would silently restart the run). The server kills the process group when the connection drops.
+- **Durable record**: Every run writes `/logs/_
+
+
+
+ Command
+
{{ command }}
+
+
+
+ Command
+ Run a stage to see the exact command.
+
+
+
+
diff --git a/frontend/src/views/ensemble/EnsembleBundle.vue b/frontend/src/views/ensemble/EnsembleBundle.vue
index b32e731..ce6a0d2 100644
--- a/frontend/src/views/ensemble/EnsembleBundle.vue
+++ b/frontend/src/views/ensemble/EnsembleBundle.vue
@@ -12,6 +12,19 @@ const listRun = useEnsembleRun()
const aggRun = useEnsembleRun()
const threadsRun = useEnsembleRun()
+function statusLabel(s: string, rc: number | null): string {
+ if (s === 'done') return 'Done'
+ if (s === 'error') return `Exit ${rc}`
+ if (s === 'aborted') return 'Aborted'
+ return ''
+}
+function statusClass(s: string): string {
+ if (s === 'done') return 'ok'
+ if (s === 'error') return 'err'
+ if (s === 'aborted') return 'aborted'
+ return ''
+}
+
// Gate state — aggregation is blocked until the operator confirms they reviewed
// scope + aliases (Principle II: no precision decision auto-fed downstream).
const gateConfirmed = ref(false)
@@ -88,9 +101,16 @@ function runThreads() {
[location]-scoped — this is a precision decision; you may also
run facts_to_state.py --list at the CLI.
- Select at least one chapter to run.
-
- {{ returnCode === 0 ? 'Done' : `Exit ${returnCode}` }}
-
+
+ Select at least one chapter to run.
+ Done
+ Exit {{ returnCode }}
+ Aborted
-
+
@@ -78,5 +82,6 @@ h2 { font-size: 16px; margin-bottom: 6px; }
.controls { display: flex; align-items: center; gap: 10px; margin-bottom: 10px; }
.ok { color: var(--green); font-size: 12px; font-weight: 600; }
.err { color: var(--red); font-size: 12px; font-weight: 600; }
+.aborted { color: var(--peach); font-size: 12px; font-weight: 600; }
.need { color: var(--peach); font-size: 12px; }
diff --git a/frontend/src/views/ensemble/EnsembleSynthesize.vue b/frontend/src/views/ensemble/EnsembleSynthesize.vue
index 9c3c3a3..891b2bb 100644
--- a/frontend/src/views/ensemble/EnsembleSynthesize.vue
+++ b/frontend/src/views/ensemble/EnsembleSynthesize.vue
@@ -10,6 +10,19 @@ const config = useConfigStore()
const cfg = ref(readEnsembleConfig({}))
const run = useEnsembleRun()
+function statusLabel(s: string, rc: number | null): string {
+ if (s === 'done') return 'Draft written'
+ if (s === 'error') return `Exit ${rc}`
+ if (s === 'aborted') return 'Aborted'
+ return ''
+}
+function statusClass(s: string): string {
+ if (s === 'done') return 'ok'
+ if (s === 'error') return 'err'
+ if (s === 'aborted') return 'aborted'
+ return ''
+}
+
const DOCS = [
{ id: 'world_state', label: 'World State' },
{ id: 'campaign_state', label: 'Campaign State' },
@@ -64,9 +77,10 @@ async function promote(doc: string) {
-
- {{ run.returnCode.value === 0 ? 'Draft written' : `Exit ${run.returnCode.value}` }}
+
+
+ {{ statusLabel(run.status.value, run.returnCode.value) }}
@@ -94,6 +108,7 @@ h3 { font-size: 13px; margin: 16px 0 6px; }
select { font-size: 12px; padding: 5px 7px; background: var(--bg-surface0); color: var(--text); border: 1px solid var(--bg-surface1); border-radius: 4px; }
.ok { color: var(--green); font-size: 12px; font-weight: 600; }
.err { color: var(--red); font-size: 12px; font-weight: 600; }
+.aborted { color: var(--peach); font-size: 12px; font-weight: 600; }
.promote-tbl td { padding: 4px 10px 4px 0; font-size: 12px; }
.diff { background: #141420; border: 1px solid var(--bg-surface0); border-radius: 4px; padding: 8px 10px; font-family: var(--mono); font-size: 11px; white-space: pre-wrap; max-height: 300px; overflow-y: auto; }
diff --git a/frontend/src/views/ensemble/useEnsembleRun.ts b/frontend/src/views/ensemble/useEnsembleRun.ts
index 24e79df..77a3c14 100644
--- a/frontend/src/views/ensemble/useEnsembleRun.ts
+++ b/frontend/src/views/ensemble/useEnsembleRun.ts
@@ -6,8 +6,13 @@ import { connectSSE } from '../../api/sse'
* don't need it. */
export function useEnsembleRun() {
const output = ref('')
- const status = ref<'idle' | 'running' | 'done' | 'error'>('idle')
+ const status = ref<'idle' | 'running' | 'done' | 'error' | 'aborted'>('idle')
const returnCode = ref(null)
+ /** Secret-free, copyable invocation from the server's `command` SSE event (US1). */
+ const command = ref('')
+
+ // Private EventSource handle — kept so abort() can close it.
+ let _es: EventSource | null = null
function buildUrl(endpoint: string, params: Record): string {
const url = new URL(endpoint, window.location.origin)
@@ -29,24 +34,58 @@ export function useEnsembleRun() {
status.value = 'running'
output.value = ''
returnCode.value = null
- connectSSE(buildUrl(endpoint, params), {
+ command.value = ''
+ _es = connectSSE(buildUrl(endpoint, params), {
+ onCommand(cmd) { command.value = cmd },
onData(t) { output.value += t },
- onDone(rc) {
+ onDone(rc, error) {
+ _es = null
status.value = rc === 0 ? 'done' : 'error'
returnCode.value = rc
+ // Surface precondition refusals (FR-011): done.error carries the message.
+ if (error && !output.value.includes(error)) {
+ output.value += `\nError: ${error}\n`
+ }
if (onDone) onDone(rc)
},
- onError() { status.value = 'error' },
+ onError(_e) {
+ // I1: onerror during 'running' = network drop / disconnect.
+ // Close the EventSource explicitly — prevents automatic reconnect which
+ // would re-issue the GET and silently restart the run (metered calls!).
+ // Treat as implicit abort; the server group-kills the process tree.
+ if (status.value === 'running') {
+ _es?.close()
+ _es = null
+ status.value = 'aborted'
+ output.value += '\n[connection lost — run stopped]\n'
+ } else {
+ // Not running (e.g. initial connection failure) — just error out.
+ _es?.close()
+ _es = null
+ status.value = 'error'
+ }
+ },
})
}
+ /** Close the EventSource and mark status as aborted (valid only from 'running').
+ * The server observes the connection drop and group-kills the worker tree.
+ */
+ function abort() {
+ if (status.value !== 'running') return
+ _es?.close()
+ _es = null
+ status.value = 'aborted'
+ }
+
function clear() {
output.value = ''
status.value = 'idle'
returnCode.value = null
+ command.value = ''
}
- return { output, status, returnCode, run, clear }
+ return { output, status, returnCode, command, run, abort, clear }
}
export interface BackendProfile {
diff --git a/server/routers/ensemble.py b/server/routers/ensemble.py
index 4837590..7c609cb 100644
--- a/server/routers/ensemble.py
+++ b/server/routers/ensemble.py
@@ -128,6 +128,9 @@ def _run_locked(stage: str, cmd: list[str], env_extra: dict[str, str] | None = N
_RUNNING.add(key)
def _release(_rc):
+ # T023: stream_subprocess calls on_complete from its finally block on
+ # every exit path (normal, explicit abort, or disconnect). The lock is
+ # therefore always released — no run can get stuck "running" after abort.
_RUNNING.discard(key)
async def _gen():
diff --git a/server/subprocess_runner.py b/server/subprocess_runner.py
index 177f830..7c04cfa 100644
--- a/server/subprocess_runner.py
+++ b/server/subprocess_runner.py
@@ -1,14 +1,48 @@
-"""Async subprocess runner with SSE streaming output."""
+"""Async subprocess runner with SSE streaming output.
+
+Shared seam used by ALL SSE routes (ensemble, grounding, prep, session_workflow,
+scene_editor, …). The termination behaviour added here (T020–T021: start_new_session
++ group-kill on disconnect) is intentionally global — no route should leak a runaway
+subprocess when the client disconnects. Non-ensemble routes' request/response shapes
+are unchanged; they additionally gain disconnect-driven cleanup for free.
+See plan.md "Constraints / Shared-seam blast radius (I2)" and tests/test_subprocess_abort.py
+for regression coverage.
+"""
import asyncio
import json
import os
+import signal
import sys
import time
from collections.abc import AsyncGenerator, Callable
from datetime import datetime
from pathlib import Path
+GRACE_SECONDS = 4.0 # SIGTERM grace window before SIGKILL (FR-008)
+
+
+def classify_result(returncode: int | None) -> str:
+ """Map a subprocess returncode to a run outcome string (R5, data-model.md).
+
+ - ``None`` or negative (signal) → ``"aborted"``
+ - ``0`` → ``"succeeded"``
+ - positive non-zero → ``"failed"``
+ """
+ if returncode is None or returncode < 0:
+ return "aborted"
+ if returncode == 0:
+ return "succeeded"
+ return "failed"
+
+
+def _killpg_safe(pgid: int, sig: int) -> None:
+ """Send sig to process group pgid, silently ignoring ProcessLookupError."""
+ try:
+ os.killpg(pgid, sig)
+ except (ProcessLookupError, PermissionError):
+ pass
+
def _log_stem(cmd: list[str]) -> str:
"""Derive a filename stem from the script being run."""
@@ -19,12 +53,13 @@ def _log_stem(cmd: list[str]) -> str:
def _save_run_log(cmd: list[str], cwd: str | None, output: str,
- returncode: int | None, duration: float) -> None:
- """Persist the run to `logs/` so it survives the SSE buffer.
+ returncode: int | None, result: str, duration: float) -> None:
+ """Persist the run to `logs/` so it survives the SSE buffer (FR-007, SC-006).
- Mirrors the format of `campaignlib.save_log` — markdown sections, one
- file per run with a timestamped filename. Failures here are silent;
- logging is best-effort and must not break the running subprocess.
+ Mirrors the format of `campaignlib.save_log` — markdown sections, one file
+ per run with a timestamped filename. Failures here are silent; logging is
+ best-effort and must not break the running subprocess. Runs on every exit
+ path (normal, abort, disconnect) via the finally in stream_subprocess.
"""
try:
log_dir = Path(cwd or os.getcwd()) / "logs"
@@ -36,6 +71,7 @@ def _save_run_log(cmd: list[str], cwd: str | None, output: str,
f"# Subprocess run — {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n\n"
f"## Command\n\n```\n{cmd_block}\n```\n\n"
f"## Result\n\n"
+ f"- result: `{result}`\n"
f"- returncode: `{returncode}`\n"
f"- duration: `{duration:.2f}s`\n"
f"- cwd: `{cwd or os.getcwd()}`\n\n"
@@ -55,23 +91,33 @@ async def stream_subprocess(
) -> AsyncGenerator[str, None]:
"""Run a subprocess and yield Server-Sent Events as output arrives.
- Yields SSE-formatted strings:
- - ``data: "text chunk"\\n\\n`` for stdout/stderr output
- - ``event: done\\ndata: {"returncode": N}\\n\\n`` when the process exits
+ Yields SSE-formatted strings (in order):
+ - ``event: command\\ndata: ""\\n\\n`` — distinct named event
+ carrying the secret-free, copyable invocation (US1, FR-001/002/003)
+ - ``data: "$ \\n\\n"\\n\\n`` — legacy inline chunk (back-compat)
+ - ``data: "text chunk"\\n\\n`` — stdout/stderr as produced (US2, FR-004)
+ - ``event: done\\ndata: {...}\\n\\n`` — terminal event (US3, FR-006)
`env_extra` is merged on top of the inherited environment after
- ``PYTHONUNBUFFERED``. Used to inject per-route LLM backend env
- (``DGX_ENDPOINT`` / ``DGX_MODEL``) without leaking it into routes that
- must stay on the default Anthropic path.
-
- `on_complete`, if provided, fires once with the returncode after
- ``proc.wait()`` returns but before the SSE ``done`` event is sent.
- Exceptions are swallowed so a faulty hook can never break the stream.
- Used by the editor routes to append a row to ``activity.jsonl``.
-
- On exit, writes a per-run log file under `/logs/` capturing the
- command line, returncode, duration, and full output so failed runs can
- be reproduced after the browser session is closed.
+ ``PYTHONUNBUFFERED``. Secrets (API keys) are inherited from the server
+ environment, never on the command line — so cmd_display is secret-safe.
+
+ `on_complete`, if provided, fires once with the returncode (or None on
+ abort) from the finally block — always fires on every exit path so that
+ callers (e.g. ensemble.py's _RUNNING lock release) are never orphaned.
+
+ On exit (normal, explicit-abort, or disconnect), writes a per-run log
+ under ``/logs/`` capturing the command, full output, result, and
+ duration — survives browser close (FR-007, SC-006).
+
+ Termination (US4, R1):
+ Subprocess is launched in its own session (``start_new_session=True``) so
+ the whole worker tree is signalable as a group. When the client disconnects
+ (or calls es.close()), Starlette cancels the response task via anyio's
+ cancel scope, which propagates as CancelledError into this generator. The
+ finally block sends SIGTERM to the process group and schedules SIGKILL via
+ loop.call_later (avoiding any await in the cancelled context, where any await
+ would re-raise CancelledError immediately, per Starlette's anyio cancel scope).
"""
env = {**os.environ, "PYTHONUNBUFFERED": "1"}
if env_extra:
@@ -80,44 +126,83 @@ async def stream_subprocess(
env_prefix = " \\\n ".join(f"{k}={v}" for k, v in (env_extra or {}).items())
cmd_parts = ([env_prefix] if env_prefix else []) + list(cmd)
cmd_display = " \\\n ".join(cmd_parts)
- yield f"data: {json.dumps(f'$ {cmd_display}\\n\\n')}\n\n"
- proc = await asyncio.create_subprocess_exec(
- *cmd,
- stdout=asyncio.subprocess.PIPE,
- stderr=asyncio.subprocess.STDOUT,
- cwd=cwd,
- env=env,
- )
-
- assert proc.stdout is not None
+ # proc is initialised here so the finally can reference it even if aclose()
+ # is called before the subprocess starts (e.g. during the command yields).
+ proc: asyncio.subprocess.Process | None = None
buf = ""
captured: list[str] = []
started = time.monotonic()
- while True:
- chunk = await proc.stdout.read(64)
- if not chunk:
- break
- buf += chunk.decode("utf-8", errors="replace")
- if len(buf) >= 20 or "\n" in buf:
+
+ try:
+ # US1: distinct named event for copyable command (FR-001/002/003) — FIRST
+ # These yields are inside the try so that aclose() before subprocess start
+ # still triggers the finally (on_complete / log write).
+ yield f"event: command\ndata: {json.dumps(cmd_display)}\n\n"
+ # Back-compat inline chunk (clients ignoring the command event still see it)
+ yield f"data: {json.dumps(f'$ {cmd_display}\\n\\n')}\n\n"
+
+ proc = await asyncio.create_subprocess_exec(
+ *cmd,
+ stdout=asyncio.subprocess.PIPE,
+ stderr=asyncio.subprocess.STDOUT,
+ cwd=cwd,
+ env=env,
+ start_new_session=True, # own process group → killpg kills child workers (R1)
+ )
+
+ assert proc.stdout is not None
+ while True:
+ chunk = await proc.stdout.read(64)
+ if not chunk:
+ break
+ buf += chunk.decode("utf-8", errors="replace")
+ if len(buf) >= 20 or "\n" in buf:
+ captured.append(buf)
+ yield f"data: {json.dumps(buf)}\n\n"
+ buf = ""
+
+ if buf:
captured.append(buf)
yield f"data: {json.dumps(buf)}\n\n"
- buf = ""
-
- if buf:
- captured.append(buf)
- yield f"data: {json.dumps(buf)}\n\n"
-
- await proc.wait()
- _save_run_log(cmd, cwd, "".join(captured), proc.returncode,
- time.monotonic() - started)
- if on_complete is not None:
- try:
- on_complete(proc.returncode)
- except Exception:
- # Activity recording is opportunistic — never break the stream.
- pass
- yield f"event: done\ndata: {json.dumps({'returncode': proc.returncode})}\n\n"
+
+ await proc.wait()
+
+ finally:
+ # Fires on: normal exit, explicit abort (es.close()), browser disconnect.
+ # Guard on proc/returncode to avoid signaling an already-exited process
+ # or one that was never started (aclose before proc was created).
+ if proc is not None and proc.returncode is None:
+ try:
+ pgid = os.getpgid(proc.pid)
+ _killpg_safe(pgid, signal.SIGTERM)
+ # SIGKILL via call_later — do NOT await here. Starlette delivers
+ # disconnect as anyio cancel-scope cancellation, which makes any
+ # await in this finally re-raise CancelledError immediately.
+ # call_later schedules from the event loop after finally exits,
+ # guaranteeing bounded stop within GRACE_SECONDS (FR-008).
+ loop = asyncio.get_running_loop()
+ loop.call_later(GRACE_SECONDS, _killpg_safe, pgid, signal.SIGKILL)
+ except ProcessLookupError:
+ pass # already exited between the returncode check and getpgid
+
+ returncode = proc.returncode if proc is not None else None
+ result = classify_result(returncode)
+ _save_run_log(cmd, cwd, "".join(captured), returncode, result,
+ time.monotonic() - started)
+ if on_complete is not None:
+ try:
+ on_complete(returncode)
+ except Exception:
+ pass # activity recording is opportunistic — never break the stream
+
+ # Only reached on normal completion (abort/disconnect exits via exception propagation)
+ if proc is not None:
+ result = classify_result(proc.returncode)
+ done_payload: dict[str, object] = {"returncode": proc.returncode}
+ if result == "aborted":
+ done_payload["aborted"] = True
+ yield f"event: done\ndata: {json.dumps(done_payload)}\n\n"
async def sse_error_stream(message: str, returncode: int = 1) -> AsyncGenerator[str, None]:
diff --git a/specs/002-ensemble-run-observability/checklists/requirements.md b/specs/002-ensemble-run-observability/checklists/requirements.md
new file mode 100644
index 0000000..58ebf74
--- /dev/null
+++ b/specs/002-ensemble-run-observability/checklists/requirements.md
@@ -0,0 +1,36 @@
+# Specification Quality Checklist: Ensemble Run Observability
+
+**Purpose**: Validate specification completeness and quality before proceeding to planning
+**Created**: 2026-06-28
+**Feature**: [spec.md](../spec.md)
+
+## Content Quality
+
+- [x] No implementation details (languages, frameworks, APIs)
+- [x] Focused on user value and business needs
+- [x] Written for non-technical stakeholders
+- [x] All mandatory sections completed
+
+## Requirement Completeness
+
+- [x] No [NEEDS CLARIFICATION] markers remain
+- [x] Requirements are testable and unambiguous
+- [x] Success criteria are measurable
+- [x] Success criteria are technology-agnostic (no implementation details)
+- [x] All acceptance scenarios are defined
+- [x] Edge cases are identified
+- [x] Scope is clearly bounded
+- [x] Dependencies and assumptions identified
+
+## Feature Readiness
+
+- [x] All functional requirements have clear acceptance criteria
+- [x] User scenarios cover primary flows
+- [x] Feature meets measurable outcomes defined in Success Criteria
+- [x] No implementation details leak into specification
+
+## Notes
+
+- Items marked incomplete require spec updates before `/speckit-clarify` or `/speckit-plan`
+- All items pass. Four observability needs from the request map 1:1 to user stories US1–US4.
+- The single scope ambiguity in the request — "an extraction" vs. all command-running ensemble stages — is resolved by an informed guess (capability applies to every command-running stage) and documented in Assumptions, so no [NEEDS CLARIFICATION] marker was needed. Reconsider during `/speckit-clarify` if the operator wants extraction-only.
diff --git a/specs/002-ensemble-run-observability/contracts/run-stream.md b/specs/002-ensemble-run-observability/contracts/run-stream.md
new file mode 100644
index 0000000..c6b3fa0
--- /dev/null
+++ b/specs/002-ensemble-run-observability/contracts/run-stream.md
@@ -0,0 +1,64 @@
+# Contract: Ensemble Run Stream + Abort
+
+Applies to all streaming run endpoints under `GET /api/ensemble/run/*`
+(`extract`, `bundle`, `synthesize`, `threads`, `recent-events`, `bundle-list`).
+Transport: Server-Sent Events (`text/event-stream`) over an `EventSource`.
+
+## Request
+
+Unchanged from today: `GET /api/ensemble/run/?`. Params carry the
+explicit input selection and backend/model (Principle X / FR-012). No request body.
+
+## Response: SSE event stream
+
+Events are emitted in this order. **Bold** = new or changed by this feature.
+
+| # | Event | `data` payload | Meaning |
+|---|---|---|---|
+| 1 | **`command`** | JSON string: the secret-free, copyable invocation (env prefix + `python … --flags`) | Emitted once, first. The reproducible command (US1, FR-001/002/003). |
+| 2 | `data` (default) | JSON string: an output chunk | Live stdout/stderr as produced (US2, FR-004). May repeat many times. A precondition failure emits a single readable `data` line here (FR-011). |
+| 3 | `done` | JSON `{ "returncode": N, "error"?: "...", **"aborted"?: true** }` | Terminal. `returncode==0` → success; `>0` → failure; **`aborted:true` or `returncode<0` → aborted** (FR-006/009, SC-004). |
+
+Notes:
+- The legacy inline `$ ` first **`data`** chunk MAY be retained for backward
+ compatibility, but the authoritative copyable command is the **`command`** event.
+- `done.error` carries the human-readable reason for a precondition refusal
+ (e.g. "No chapters selected …"), surfaced to the operator verbatim (FR-011).
+
+## Abort (FR-008) and disconnect (FR-013)
+
+There is **no separate abort endpoint**. Abort is performed by the client
+**closing the stream connection**:
+
+1. Frontend `abort()` calls `EventSource.close()` and sets local `status = aborted`.
+2. The server observes the dropped connection as a cancellation of the streaming
+ generator.
+3. In the generator's `finally`, the server terminates the run's **process group**:
+ `SIGTERM` → wait grace window (~3–5 s) → `SIGKILL` if still alive (FR-008).
+4. The same `finally` releases the per-stage `_RUNNING` lock and writes the durable
+ run record with `result: aborted` (FR-007, FR-009).
+
+A lost tab / navigation / network drop is identical to step 2 onward — it is an
+**implicit abort** (FR-013). The operator never has an unobserved run still burning
+tokens.
+
+### Termination guarantees
+
+- **Process-group kill**: child workers (e.g. `ensemble_batch.py`'s per-chapter
+ `ensemble_extract` subprocesses) are launched in the run's session/process group
+ and die with it. No orphaned token-spending workers.
+- **Bounded stop**: force-kill after the grace window guarantees exit within a few
+ seconds (SC-005).
+- **Cache integrity**: any in-flight cache unit is published atomically
+ (temp + `os.replace`), so an abort/force-kill never leaves a partial file the
+ resume check trusts (FR-014). Completed units survive; the interrupted unit is
+ recomputed on re-run (FR-010).
+
+## Backward compatibility
+
+- Non-ensemble run routes and the `/grounding` path are untouched. The
+ termination/`command`-event changes live in the **shared** `subprocess_runner`,
+ so other SSE routes inherit disconnect-driven cleanup for free, but their
+ request/response shapes do not otherwise change.
+- A client that ignores the `command` event still receives identical `data`/`done`
+ events.
diff --git a/specs/002-ensemble-run-observability/data-model.md b/specs/002-ensemble-run-observability/data-model.md
new file mode 100644
index 0000000..24e636b
--- /dev/null
+++ b/specs/002-ensemble-run-observability/data-model.md
@@ -0,0 +1,58 @@
+# Phase 1 Data Model: Ensemble Run Observability
+
+This feature adds no database and no persistent schema beyond a file. The "data" is (a) the **Run record** persisted to disk and (b) the **SSE stream protocol** the run emits. Both are described here.
+
+## Entity: Run record (persisted)
+
+One file per run, written under `/logs/_