From 981232adbec2da4aaab46f8092403393f6565d1b Mon Sep 17 00:00:00 2001
From: Ashmit Biswas <ashmitbbiswas@gmail.com>
Date: Sat, 2 May 2026 21:47:21 +0530
Subject: [PATCH 1/2] docs: integration test plan + first end-to-end run
 findings
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two new docs:

- docs/test-plan.md — static, re-runnable manual integration test plan
  for M0–M5 against the canopy-test workspace. Sections per milestone
  with concrete steps, expected output, status checkboxes, plus a
  composite end-to-end scenario.

- docs/test-findings.md — record of the first pass against canopy 0.5.0.
  Captures 9 distinct findings (F-0 through F-9) plus the per-section
  status. Includes severity, repro, and suggested fix for each.

Headline: the MCP/agent-facing surface is healthy across M0–M5; the
CLI/human-facing surface has 2 important bugs (F-6, F-7) and 4 small
ones (F-1, F-2, F-5, F-9). All M2 augments + M4 historian flows pass
end-to-end. M3 bot-tracking is blocked behind setting up CodeRabbit on
a test PR (deferred external setup).

The asymmetry was invisible in the unit suite because every tested
code path went through the action layer's MCP shape — the CLI
rendering bugs only surface when you actually type the commands.
This is what motivated the test plan in the first place.

Action items at the bottom of test-findings.md:
- F-7 (P0) — alias resolver provider-aware (CLI dead for non-Linear)
- F-6 (P1) — CLI render Issue.to_dict() to match MCP shape
- F-1/F-2 (P1) — workspace-tolerant --check, non-zero exit on errors
- F-5 (P2) — add cmd_issues for parity with MCP
- F-3, F-4 — backlog

No code changes here — pure docs. The fixes for F-1/F-2/F-6/F-7 are
follow-up PRs based on this run's findings.
---
 docs/test-findings.md | 167 ++++++++++++++++++++++++++++++++++
 docs/test-plan.md     | 207 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 374 insertions(+)
 create mode 100644 docs/test-findings.md
 create mode 100644 docs/test-plan.md

diff --git a/docs/test-findings.md b/docs/test-findings.md
new file mode 100644
index 0000000..28aeba6
--- /dev/null
+++ b/docs/test-findings.md
@@ -0,0 +1,167 @@
+# Test Run — 2026-05-02 (canopy 0.5.0 against canopy-test)
+
+First end-to-end pass against [`docs/test-plan.md`](test-plan.md). Workspace: `~/projects/canopy-test` (test-api + test-ui, GitHub-backed, Linear MCP wired, 8 features in `features.json`).
+
+**Environment:** canopy 0.5.0 (editable install in `~/projects/canopy/.venv/`), Python 3.14, gh authenticated, Linear MCP token cached at `~/.canopy/mcp-tokens/linear.tokens.json` (last refreshed 2026-04-27).
+
+**Status legend:** ✅ pass · ⚠️ pass with finding · ❌ fail · ⏭️ skipped/blocked · 🟨 in-progress
+
+---
+
+## Section results
+
+| Section | Result | Notes |
+|---|---|---|
+| §0 Preconditions (5) | ✅ 5/5 | Found pre-existing `__version__` drift (`0.1.0` though M0–M5 shipped) — fixed pre-test in [PR #16](https://github.com/ashmitb95/canopy/pull/16). |
+| §1 Doctor (5) | ⚠️ 5/5 with 2 findings | Doctor surfaces 8 real workspace issues (all `auto_fixable: true`); 2 minor CLI bugs noted (F-1, F-2). |
+| §2a Linear provider | ⚠️ partial | CLI `canopy issue SIN-5` works but exposes raw Linear state ("Todo") instead of canonical ("todo"). MCP `issue_get` correct. F-5 (no plural CLI). F-4 (headless OAuth). |
+| §2b GitHub Issues provider | ❌ CLI broken | MCP `issue_get` works perfectly with provider-swapped config. CLI `canopy issue 5` / `#5` / `owner/repo#5` all fail with `unknown_alias` — alias resolver is Linear-only. **F-7 is the headline bug.** |
+| §3 Augments (6) | ✅ 6/6 with F-9 | Workspace `preflight_cmd`, per-repo override (per-repo wins), failing-augment graceful, augment-canopy skill installs. F-9: `--check` only reports default skill. |
+| §4 Bot tracking | ⏭️ blocked | Needs CodeRabbit set up on `ashmitb95/canopy-test-api` PRs — external setup. Throwaway issues #5/#6 used + closed. |
+| §5 Historian (11) | ✅ 9/11; 2 blocked | switch ↔ memory round-trip works; decision dedup works; pause + render work; `.gitignore` auto-written; compact noop + drop both work. §5.7 (commit --address auto-mirror) + §5.8 (review_comments auto-mirror) blocked behind §4. |
+
+---
+
+## Findings
+
+### F-0: `__version__` drift (FIXED pre-test)
+
+`src/canopy/__init__.py` was stuck at `"0.1.0"` despite M0–M5 shipping. The doctor's `cli_stale` / `mcp_stale` checks compare against this constant — they were silently a no-op for ~6 months of work.
+
+- **Fix:** [PR #16](https://github.com/ashmitb95/canopy/pull/16) — bumped to `0.5.0`, added `CHANGELOG.md`, added a CLAUDE.md guard.
+- **Lesson:** version bump should happen in the same PR as the milestone it represents. Going forward, the CLAUDE.md note covers it.
+
+### F-1: `canopy setup-agent --check` requires a workspace
+
+Running `canopy setup-agent --check --json` from a directory without `canopy.toml` errors with "No canopy.toml found." But the skill check is global (`~/.claude/skills/`) and doesn't need a workspace; only the MCP-config check does.
+
+- **Repro:** `cd / && canopy setup-agent --check` → "Error: No canopy.toml found in current directory or any parent."
+- **Expected:** skill section reports normally; MCP section reports "no workspace; skipped" with the same JSON shape.
+- **Severity:** low — workaround is to run from inside a workspace.
+- **Fix:** in `cmd_setup_agent` (cli/main.py), make the workspace lookup tolerant; if it fails, render the skill section anyway and stub MCP.
+
+^For F1, we should just show an error and that canopy needs a workspace that is a non-git directory
+
+
+### F-2: error path returns exit code 0
+
+When `setup-agent --check` fails because no canopy.toml exists, `echo $?` is `0`. Should be non-zero so shell scripts catch it.
+
+- **Repro:** `canopy setup-agent --check; echo $?` outside a workspace → `Error:` printed, `0` exit.
+- **Severity:** medium — breaks shell-script integration; CI scripts that wrap canopy will silently miss errors.
+- **Fix:** all error-print branches in `cli/main.py` should `sys.exit(1)` after printing.
+
+### F-3: stale `canopy-mcp` processes accumulate
+
+`ps aux | grep canopy-mcp` shows 8+ stale processes from earlier in the week. Each one is a hung MCP server from a previous session that didn't get reaped.
+
+- **Severity:** low — they're idle (memory ≤96 KB each); not actively harming.
+- **Cause likely:** when an agent / IDE disconnects from MCP without a clean shutdown handshake, the stdio server hangs waiting for stdin.
+- **Fix candidate:** doctor could add a `mcp_orphans` check that lists processes whose parent died, with `--clean` to reap them. Or the MCP server could exit on EOF rather than blocking.
+
+### F-4: Linear MCP from a headless Python invocation hangs
+
+Running `python -c "from canopy.mcp.server import issue_list_my_issues; issue_list_my_issues()"` from a script never returned. Likely the OAuth flow attempts a browser open + waits for the redirect, with no terminal.
+
+- **Severity:** low for users (the MCP client is meant to be invoked through Claude Code / canopy CLI, both of which have stdio); medium for testing (we can't headlessly assert the MCP path works against live Linear).
+- **Workaround:** for tests, exercise the `LinearProvider` class directly with mocked `call_tool` (already done in the unit suite).
+
+### F-6: CLI `canopy issue` exposes raw provider state, MCP returns canonical
+
+`linear_get_issue` in `actions/reads.py` (the legacy wrapper backing `cmd_issue`) intentionally exposes `raw.state` for "backward compatibility." Concretely:
+
+| Surface | `state` for SIN-5 |
+|---|---|
+| `canopy issue SIN-5 --json` | `"Todo"` (raw Linear) |
+| `mcp__canopy__issue_get(alias="SIN-5")` | `"todo"` (canonical M5 mapping) |
+
+Same workspace, same issue, two different responses depending on which surface you call. Also: CLI shape is `{alias, issue_id, title, state, url, description, raw}`; MCP shape is `{id, identifier, title, description, state, url, assignee, labels, priority, raw}`. Different fields entirely.
+
+- **Severity:** medium — back-compat reasoning is dated (no current callers actually depend on raw state); the inconsistency is a footgun for anyone scripting against the CLI vs the agent talking via MCP.
+- **Fix:** retire the legacy shape; have `cmd_issue` render `Issue.to_dict()` directly (matching the MCP tool). Update `docs/commands.md` accordingly.
+
+### F-7: alias resolver is Linear-only — `canopy issue` broken for GitHub Issues provider
+
+With `[issue_provider] name = "github_issues"` set in canopy.toml and a real GH issue (#5) on the configured repo, **none of these CLI invocations work:**
+
+```
+canopy issue 5                                  → BlockerError unknown_alias
+canopy issue '#5'                               → BlockerError unknown_alias
+canopy issue 'ashmitb95/canopy-test-api#5'      → BlockerError unknown_alias
+```
+
+The MCP equivalent (`mcp__canopy__issue_get(alias="5")`) returns the issue correctly — the `GitHubIssuesProvider` itself works. The bug is in `actions/aliases.py:resolve_linear_id`, which is hardcoded to look for Linear-shaped IDs (`SIN-N`) or feature-lane names. It doesn't know that for `github_issues`, a bare number is the canonical id form.
+
+This is a **major M5 integration gap**: M5 added the Provider Protocol + registry, but the alias resolution layer above it is still Linear-shaped. The CLI surface for any non-Linear provider is dead.
+
+- **Severity:** high — `canopy issue` is the primary user surface for the issue provider abstraction; it doesn't work for the second backend M5 was supposed to ship.
+- **Workaround:** call MCP tool directly (works in agent contexts; not for CLI users).
+- **Fix:** rewrite `resolve_linear_id` (rename to `resolve_issue_id`) to consult the active provider for what shapes it accepts. GitHub Issues: bare number, `#N`, `owner/repo#N`, full URL. Linear: `<TEAM>-<N>`, feature names. Provider can expose a `parse_alias(s) -> str | None` method (returns canonical id if accepted, else None); resolver tries provider first, falls back to feature-name lookup.
+- **Adjacent:** Phil's branch has `actions/issue_resolver.py` with auto-detect logic (`SIN-N` → Linear, `owner/repo#N` → GitHub, etc.) — could be ported when his PR rebases onto M5.
+
+### F-2 generalized: BlockerError JSON output → exit 0
+
+Same root cause as F-2 but worth re-stating: when `canopy issue 5` returns a BlockerError as JSON, the exit code is still 0. Any shell script wrapping `canopy issue` and checking `$?` will silently miss the failure. This applies to all CLI commands that emit BlockerError JSON.
+
+- **Severity:** medium — script integration footgun.
+- **Fix:** in `cli/main.py`, every code path that prints BlockerError JSON should `sys.exit(1)` after.
+
+### F-9: `setup-agent --check` only reports the default skill
+
+`canopy setup-agent --check --json` returns `{skill: {...}, mcp: {...}}` — but `skill` is hardcoded to `using-canopy`. After installing `augment-canopy` via `--skill augment-canopy`, the `--check` output still only reports `using-canopy`'s state.
+
+- **Severity:** low — install side works correctly; `--check` is just incomplete reporting.
+- **Fix:** `check_status` should iterate `available_skills()` and return `skills: [...]` parallel to the install-side report.
+
+### F-5: no `canopy issues` (plural) CLI command
+
+The test plan's §2.1 assumed `canopy issues` lists the user's open issues. It doesn't exist — only `canopy issue <alias>` (singular, fetches one). The MCP tool `issue_list_my_issues` exists, but there's no CLI mirror.
+
+Phil's `extension-rewrite` branch added `canopy issues --json` for exactly this reason (his extension calls it via subprocess for the issue picker).
+
+- **Severity:** medium — gap between MCP + CLI surface; the test plan + the agent skill both implicitly assume the CLI form exists.
+- **Fix candidates:** (a) add `cmd_issues` in `cli/main.py` calling `issue_list_my_issues`; (b) wait for Phil's PR which already has it.
+
+### Real workspace issues caught by doctor (validation pass for M1)
+
+Doctor reported 8 issues in `~/projects/canopy-test`, all `auto_fixable: true`:
+
+| Code | Severity | What |
+|---|---|---|
+| `heads_stale` × 2 | warn | `heads.json` out of sync for test-api + test-ui (post-checkout hook didn't fire after manual git ops) |
+| `worktree_missing` × 4 | error | `features.json` references worktree paths that don't exist on disk (deleted manually): `demo-parallel/test-{api,ui}`, `sin-5-search/test-ui`, `sin-7-empty-state/test-ui` |
+| `preflight_stale` | info | preflight result for `doc-1001-paired` test-api is stale |
+| `vsix_duplicates` | info | 4 canopy vsix install dirs found in `~/.vscode/extensions/` |
+
+This is the recovery scenario M1 was built for. Detection works end-to-end against a real workspace. **Did not** run `--fix` yet (4 of these would recreate worktrees on disk; defer until intentional cleanup).
+
+---
+
+## Action items from this run
+
+- [ ] **F-7 fix (P0)** — make alias resolver provider-aware. The CLI for any non-Linear provider is currently dead. ~half-day. Could compose with Phil's `issue_resolver.py` if his PR lands first.
+- [ ] **F-6 fix (P1)** — make `cmd_issue` render `Issue.to_dict()` directly so CLI + MCP agree. Drop the legacy raw-state shape. Update docs/commands.md. ~30 min.
+- [ ] **F-1 / F-2 + F-2-generalized fix (P1)** — `cli/main.py` cleanup: workspace-tolerant `--check`, non-zero exit on every error path. ~30 min.
+- [ ] **F-5 fix (P2)** — add `cmd_issues` for parity with the MCP `issue_list_my_issues`, OR defer to Phil's PR (which has it).
+- [ ] **F-3 backlog** — doctor `mcp_orphans` check + reaper.
+- [ ] **F-4 backlog** — Linear-headless smoke test using cached tokens; document the OAuth-required-in-tty constraint in docs/mcp.md.
+- [ ] **`canopy doctor --fix`** — on a follow-up session, intentionally clean canopy-test's real drift (heads.json + missing worktrees + vsix duplicates) to validate the repair side end-to-end.
+
+## Test-data cleanup
+
+- ✅ Throwaway issues #5 + #6 on `ashmitb95/canopy-test-api` closed at end of run.
+- ✅ canopy-test workspace canopy.toml restored to original; README.md edits in test-api/test-ui reset.
+- 🟡 Memory file `~/projects/canopy-test/.canopy/memory/sin-7-empty-state.{md,jsonl}` left in place — it's gitignored per M4's auto-write, harmless. Cleanup instruction: `rm -rf ~/projects/canopy-test/.canopy/memory/` if a fully fresh state is wanted.
+
+## Headline takeaway
+
+**The MCP/agent-facing surface is healthy across M0–M5; the CLI/human-facing surface has 2 important bugs (F-6, F-7) and 4 small ones (F-1, F-2, F-5, F-9).** None are catastrophic — agents using the MCP tools get correct, canonical responses. But human users hitting the CLI directly get raw provider strings, broken alias resolution for non-Linear providers, and exit codes that lie about success. The asymmetry was invisible in the unit suite because every tested code path went through the action layer's MCP shape — the CLI rendering bugs only surface when you actually type the commands.
+
+---
+
+## How to interpret this doc
+
+- Findings labeled `F-N` are bugs / gaps surfaced by this test run. Each has severity + suggested fix.
+- The "Action items" at the bottom are the to-do list for the next session.
+- Pass-with-finding (⚠️) means the surface works but reveals a quality issue worth noting.
+- This file is the *test run record*; the static plan to re-run is at [test-plan.md](test-plan.md).
diff --git a/docs/test-plan.md b/docs/test-plan.md
new file mode 100644
index 0000000..11e0c23
--- /dev/null
+++ b/docs/test-plan.md
@@ -0,0 +1,207 @@
+# Canopy — Manual Integration Test Plan
+
+**Workspace under test:** `~/projects/canopy-test` (2 repos: `canopy-test-api`, `canopy-test-ui`; both backed by GitHub remotes; MCP wired for canopy + Linear).
+
+**Purpose:** validate every shipped milestone (M0–M5) end-to-end against a real workspace — the unit suite proves modules; this proves the integrated product. Walk through once after each milestone; re-run before each release.
+
+**Format:** every check has *steps* (concrete commands), *expected* (what passes), and a *status* slot (`[ ]` → `[✓]`/`[✗]`). Skip with `[~]` and a one-line reason.
+
+---
+
+## 0. Pre-conditions
+
+Run these before anything else. Fail any of these → stop and fix install before testing milestones.
+
+| # | Check | Steps | Expected |
+|---|---|---|---|
+| 0.1 | Canopy CLI on PATH | `canopy --version` | Prints a version (e.g. `0.5.0`); no `command not found` |
+| 0.2 | MCP entry registered | `cat ~/projects/canopy-test/.mcp.json` | Has a `canopy` server with `CANOPY_ROOT=/Users/ashmit/projects/canopy-test` |
+| 0.3 | `gh` authenticated | `gh auth status` | "Logged in to github.com" |
+| 0.4 | Linear MCP available | `cat ~/projects/canopy-test/.canopy/mcps.json` | Linear entry present |
+| 0.5 | Workspace parses | `cd ~/projects/canopy-test && canopy state --json \| head -5` | Returns JSON, not an exception |
+
+Status: `[ ]` `[ ]` `[ ]` `[ ]` `[ ]`
+
+---
+
+## 1. M1 — `canopy doctor` (16 categories + version handshake)
+
+| # | Check | Steps | Expected |
+|---|---|---|---|
+| 1.1 | Doctor runs clean | `cd ~/projects/canopy-test && canopy doctor --json` | `summary.errors == 0` (or only known/expected ones). Categories cover state, install, mcp, skill, vsix. |
+| 1.2 | Detects state drift | `mv ~/projects/canopy-test/.canopy/state/heads.json /tmp/heads-bak.json && canopy doctor` | Reports `heads_missing` (or similar). Restore: `mv /tmp/heads-bak.json ~/projects/canopy-test/.canopy/state/heads.json`. |
+| 1.3 | Auto-fix recovers | repeat 1.2 then `canopy doctor --fix` | The missing-state issue gets `auto_fixable: true` and is repaired. |
+| 1.4 | Version handshake | `canopy --version` and `python -c "from canopy.mcp.server import version; print(version())"` | Both report the same `cli_version` / `mcp_version` / `schema_version`. |
+| 1.5 | Skill install report | `canopy setup-agent --check --json` | Skill `installed: true`, `is_canopy_skill: true`, `up_to_date: true`. |
+
+Status: `[ ]` `[ ]` `[ ]` `[ ]` `[ ]`
+
+---
+
+## 2. M5 — Issue providers (Linear + GitHub Issues)
+
+### 2a. Linear backend (default — current canopy-test config)
+
+| # | Check | Steps | Expected |
+|---|---|---|---|
+| 2.1 | List my Linear issues | `cd ~/projects/canopy-test && canopy issues --json` (MCP variant: `issue_list_my_issues`) | Returns ≥1 issue if any are assigned to you; else `[]`. Each has `id`, `identifier` (e.g. `SIN-7`), `title`, `state`. |
+| 2.2 | Fetch a known Linear issue | `canopy issue SIN-5 --json` (MCP: `issue_get(alias="SIN-5")`) | Returns `{identifier: "SIN-5", title, state, url, ...}`. State maps to canonical (`todo` / `in_progress` / `done`). |
+| 2.3 | Backward-compat alias | `mcp__canopy__linear_get_issue(alias="SIN-5")` | Same response as 2.2; deprecation note in logs. |
+| 2.4 | Per-feature alias resolves | `canopy state SIN-7 --json \| head -20` | Resolves `SIN-7` → `sin-7-empty-state`; returns its state machine entry. |
+
+Status: `[ ]` `[ ]` `[ ]` `[ ]`
+
+### 2b. GitHub Issues backend (one-off swap)
+
+| # | Check | Steps | Expected |
+|---|---|---|---|
+| 2.5 | Switch provider | Edit `canopy-test/canopy.toml`, add `[issue_provider]\nname = "github_issues"\n\n[issue_provider.github_issues]\nrepo = "ashmitb95/canopy-test-api"`. Then `canopy issues --json`. | Returns issues from the GitHub repo (or `[]` if none open). Provider switch with no canopy restart. |
+| 2.6 | Fetch GitHub issue | Create or pick a GitHub issue: `gh issue create --repo ashmitb95/canopy-test-api --title "test" --body ""`. Then `canopy issue <num> --json`. | Returns the issue normalized to the same `Issue` shape (no Linear-specific fields leaked). |
+| 2.7 | Restore Linear | Remove the `[issue_provider]` block from canopy.toml. `canopy issues --json` falls back to Linear. | No errors; Linear issues again. |
+
+Status: `[ ]` `[ ]` `[ ]`
+
+---
+
+## 3. M2 — Augments (per-workspace customization)
+
+| # | Check | Steps | Expected |
+|---|---|---|---|
+| 3.1 | Empty augments → default preflight | `canopy preflight --json` (in canopy-test root) | Existing pre-commit auto-detection runs; result has `applied_augment: false`. |
+| 3.2 | Workspace `preflight_cmd` runs | Edit canopy.toml, add `[augments]\npreflight_cmd = "echo OK && exit 0"`. `canopy preflight --json`. | Output includes `applied_augment: true`, `passed: true`, `command: "echo OK && exit 0"`. |
+| 3.3 | Per-repo override | Add `augments = { preflight_cmd = "echo TEST-API && exit 0" }` to the `test-api` `[[repos]]` block. `canopy preflight --json`. | `test-api` runs the override; `test-ui` uses workspace default. |
+| 3.4 | Augment skill installs | `canopy setup-agent --skill augment-canopy --check` then `canopy setup-agent --skill augment-canopy` | Reports installed at `~/.claude/skills/augment-canopy/SKILL.md`. |
+| 3.5 | Bad command surfaces in result | Set `preflight_cmd = "exit 1"`. Run preflight. | `passed: false`, `applied_augment: true`. No crash. |
+| 3.6 | Cleanup | Remove the `[augments]` block + per-repo augments | preflight returns to auto-detect (`applied_augment: false`). |
+
+Status: `[ ]` `[ ]` `[ ]` `[ ]` `[ ]` `[ ]`
+
+---
+
+## 4. M3 — Bot-comment tracking
+
+**Setup (~10 min, one-time):** install CodeRabbit (or similar bot) on `canopy-test-api`. Open a small PR with a deliberate code-quality issue (unused import, magic number). Wait for the bot to comment. Note the comment ID from the GitHub URL.
+
+| # | Check | Steps | Expected |
+|---|---|---|---|
+| 4.1 | Comment id surfaces | `canopy review <feature> --comments-only --json` | Each comment has an `id` field (M3 added; should be a non-zero integer). |
+| 4.2 | Bot vs human split | `canopy state <feature-with-bot-pr> --json` | `summary.actionable_bot_count >= 1`, `summary.actionable_human_count == 0` (assuming no human reviewers). |
+| 4.3 | New state surfaces | Same as 4.2 | `state == "awaiting_bot_resolution"`; `next_actions[0].action == "address_bot_comments"`. |
+| 4.4 | `bot-status` rollup | `canopy bot-status --feature <f> --json` | Returns `{repos: {test-api: {pr_number, total: ≥1, resolved: 0, unresolved: ≥1, threads: [...]}}, all_resolved: false}`. Each thread has `id`, `author`, `path`, `body_preview`. |
+| 4.5 | `--unresolved-only` filter | `canopy bot-status --feature <f> --unresolved-only --json` | Only unresolved threads listed. |
+| 4.6 | `commit --address` (numeric id) | Make a small fix in the repo. `canopy commit --address <comment-id> -m "rename"` | Per-repo result `ok` for the matching repo. Top-level `addressed: {comment_id, repo, sha, recorded: true, ...}`. Commit message in git includes `Addresses bot comment: "<title>" (<url>)`. |
+| 4.7 | Resolution persisted | `cat ~/projects/canopy-test/.canopy/state/bot_resolutions.json` | Has `{<id>: {feature, repo, commit_sha, ...}}`. |
+| 4.8 | `commit --address` (URL form) | Same as 4.6 but pass full GitHub URL as the address | Same behavior; URL parsed to numeric id. |
+| 4.9 | Resolved subtracts from count | `canopy state <feature> --json` after 4.6 | `actionable_bot_count` decreased by 1. State drops out of `awaiting_bot_resolution` if it was the last one. |
+| 4.10 | Augment-narrowed bots | Add `[augments] review_bots = ["coderabbit"]` to canopy.toml. Re-run `bot-status`. | Same coverage if author is CodeRabbit; non-matching bot accounts (e.g. `dependabot`) drop into the human bucket. |
+| 4.11 | Unknown id rejected | `canopy commit --address 999999 -m "x"` (id not in PR) | Errors with `BlockerError(code='not_a_bot_comment')`; no commit fires. |
+| 4.12 | Approved + bot threads | If a reviewer approves the PR while bot comments remain: `canopy state` | State stays `approved`, `next_actions[0]` is `merge`, `next_actions[1]` is `address_bot_comments` (secondary CTA). |
+
+Status: `[ ]` × 12
+
+---
+
+## 5. M4 — Historian (cross-session memory)
+
+| # | Check | Steps | Expected |
+|---|---|---|---|
+| 5.1 | Empty memory on switch | `canopy switch sin-7-empty-state --json` (assuming no historian entries yet) | Response includes `memory: ""`. |
+| 5.2 | Record a decision | Via MCP: `mcp__canopy__historian_decide(feature="sin-7-empty-state", decisions=[{"title": "use empty-state SVG from design system", "rationale": "matches existing 404 page"}])` | Returns `{action: "recorded", title: ...}`. File created at `~/projects/canopy-test/.canopy/memory/sin-7-empty-state.jsonl` + `.md`. |
+| 5.3 | Decision dedup | Same call again | Returns `{action: "deduped"}`. Only one entry in the JSONL. |
+| 5.4 | Pause | `mcp__canopy__historian_pause(feature="sin-7-empty-state", reason="blocked on design-system copy")` | Recorded; appears in Sessions section of the rendered .md. |
+| 5.5 | Memory included on switch | `canopy switch sin-6-cache-stats` then `canopy switch sin-7-empty-state` | Second switch's response `memory` field contains the markdown with the decision + pause. |
+| 5.6 | CLI inspection | `canopy historian show sin-7-empty-state` | Prints the rendered markdown — header, all 3 sections (resolutions/PR/sessions), placeholders for empty sections. |
+| 5.7 | Auto-mirror from `commit --address` | After running 4.6, `canopy historian show <feature>` | "Resolutions log" section has the resolved comment (✓ glyph, sha, gist). |
+| 5.8 | Auto-mirror from `review_comments` | After running 4.1, `canopy historian show <feature>` | Sessions section has `read comment <id>` entries; if classifier marked threads, also `classifier marked N thread(s) likely-resolved`. |
+| 5.9 | Compact (within limit) | `canopy historian compact sin-7-empty-state --keep-sessions 5` | `action: "noop"` (only 1 session so far). |
+| 5.10 | Compact (forces drop) | Force multiple sessions: `CANOPY_SESSION_ID=s-1 canopy historian show ...` won't help; instead, in MCP: call `historian_decide` with several sessions in JSONL by hand or wait until natural sessions accumulate. Then `canopy historian compact <f> --keep-sessions 2`. | Drops oldest session entries; preserves resolutions log + PR context. |
+| 5.11 | `.gitignore` written | `cat ~/projects/canopy-test/.canopy/memory/.gitignore` | Contains `*` and `!.gitignore` so memory files don't get committed. |
+
+Status: `[ ]` × 11
+
+---
+
+## 6. End-to-end scenario (composite)
+
+One realistic feature lifecycle that exercises every shipped milestone in sequence. Plan ~30 minutes.
+
+```bash
+# Fresh start: pick a feature that doesn't exist yet
+cd ~/projects/canopy-test
+
+# 1. Verify clean install (M1)
+canopy doctor
+
+# 2. Configure: add augments + (optionally) GitHub Issues (M2 + M5)
+# Edit canopy.toml:
+#   [augments]
+#   preflight_cmd = "echo OK && exit 0"
+#   review_bots = ["coderabbit"]
+
+# 3. Pick up a Linear issue → switch (M5 + canonical-slot model)
+canopy switch SIN-8   # promotes sin-8-stale-count to canonical
+# Response: includes memory: "" on first switch (M4)
+
+# 4. Make a change in test-api
+echo "# stale count fix" >> canopy-test-api/src/example.py
+
+# 5. Preflight runs the augment (M2)
+canopy preflight   # → "applied_augment: true"
+
+# 6. Commit (Wave 2.3)
+canopy commit -m "fix: stale count edge"
+
+# 7. Record a decision (M4)
+# Via MCP: historian_decide(feature="sin-8-stale-count",
+#   decisions=[{"title": "compute stale count from cache TTL",
+#               "rationale": "avoids extra DB call on hot path"}])
+
+# 8. Push (Wave 2.3)
+canopy push --set-upstream
+
+# 9. Open PR via gh; wait for CodeRabbit
+gh pr create --repo ashmitb95/canopy-test-api --title "..." --body ""
+
+# 10. After bot comments arrive: state machine surfaces awaiting_bot_resolution (M3)
+canopy state SIN-8
+
+# 11. Address bot comment (M3 + M4 mirror)
+canopy commit --address <id> -m "rename per coderabbit"
+
+# 12. Switch away then back — verify memory carries the narrative (M4)
+canopy switch sin-7-empty-state
+canopy switch SIN-8
+# Response.memory shows: decision, the resolved comment, the recorded session
+
+# 13. Doctor still clean (M1)
+canopy doctor
+```
+
+**Pass criteria:** every step completes without unexpected errors; the state machine transitions match the 9-state diagram in [concepts.md](concepts.md); `historian show SIN-8` at the end has a non-trivial Resolutions log + Sessions narrative.
+
+---
+
+## 7. Known unverifiable / deferred
+
+These are intentionally not testable in v1 — note them but skip:
+
+| Capability | Why skipped now | Lands when |
+|---|---|---|
+| Auto-capture of generic Bash/Edit events into historian | PostToolUse hook (autopilot) deferred | Autopilot hook bundle ships |
+| Stop-hook tail-parse of `<historian-decisions>` | Stop hook (autopilot) deferred | Autopilot hook bundle ships |
+| LLM compaction in `historian_compact` | Mechanical-only in v1 by design | Future LLM pass; storage shape forward-compatible |
+| `canopy ship` end-to-end (commit + push + PR) | M8 not shipped | After Phil's `pr_target` + M8 |
+| Per-repo PR target | M8 / Phil's branch | After Phil's PR |
+| Local-package symlinking on switch | Phil's branch | After Phil's PR |
+| Extension dashboard (action drawer) | M11 + Phil's extension rewrite | After both land |
+| Sidebar single-tree | M7 / Phil's extension rewrite | After Phil's rewrite |
+
+---
+
+## 8. After running this plan
+
+1. **Record results** — fill in checkboxes in this file and commit (or paste the diff in a session note).
+2. **Triage failures** — each `[✗]` becomes either a bug fix (file an issue), a docs gap (clarify in the relevant SKILL.md / concepts.md), or a known limitation (move into §7).
+3. **Repeat per release** — re-run §0–§5 before each version bump; full §6 e2e at major milestones.
+
+The first full pass is the high-value one — it turns "we shipped a lot of unit-tested code" into "we shipped a working product."

From 6260efccd3ae5fca0ee7c0a4a3923460915cd6c7 Mon Sep 17 00:00:00 2001
From: Ashmit Biswas <ashmitbbiswas@gmail.com>
Date: Sat, 2 May 2026 21:49:55 +0530
Subject: [PATCH 2/2] =?UTF-8?q?docs(test-findings):=20revise=20F-1=20?=
 =?UTF-8?q?=E2=80=94=20fail=20loud=20with=20workspace=20explainer?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Per user decision on the test-findings.md review: don't gracefully
degrade `setup-agent --check` for the no-workspace case. Instead,
fail with a clear error that explains canopy's mental model:

> Canopy needs to be run from a canopy workspace — a non-git directory
> that contains canopy.toml plus the participating repos as subdirs.
> Run `canopy init` in such a directory to create one.

The fix should centralize this rendering so every workspace-scoped
command prints the same helpful message instead of the terse
"No canopy.toml found" each one currently emits.

Bundles cleanly with F-2 + F-2-generalized — same PR adds non-zero
exit codes to every error path.
---
 docs/test-findings.md | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/docs/test-findings.md b/docs/test-findings.md
index 28aeba6..551f6b3 100644
--- a/docs/test-findings.md
+++ b/docs/test-findings.md
@@ -31,17 +31,15 @@ First end-to-end pass against [`docs/test-plan.md`](test-plan.md). Workspace: `~
 - **Fix:** [PR #16](https://github.com/ashmitb95/canopy/pull/16) — bumped to `0.5.0`, added `CHANGELOG.md`, added a CLAUDE.md guard.
 - **Lesson:** version bump should happen in the same PR as the milestone it represents. Going forward, the CLAUDE.md note covers it.
 
-### F-1: `canopy setup-agent --check` requires a workspace
+### F-1: "no canopy.toml found" error is unhelpful
 
-Running `canopy setup-agent --check --json` from a directory without `canopy.toml` errors with "No canopy.toml found." But the skill check is global (`~/.claude/skills/`) and doesn't need a workspace; only the MCP-config check does.
-
-- **Repro:** `cd / && canopy setup-agent --check` → "Error: No canopy.toml found in current directory or any parent."
-- **Expected:** skill section reports normally; MCP section reports "no workspace; skipped" with the same JSON shape.
-- **Severity:** low — workaround is to run from inside a workspace.
-- **Fix:** in `cmd_setup_agent` (cli/main.py), make the workspace lookup tolerant; if it fails, render the skill section anyway and stub MCP.
-
-^For F1, we should just show an error and that canopy needs a workspace that is a non-git directory
+Running any workspace-scoped command (e.g. `canopy setup-agent --check`, `canopy state`, `canopy preflight`) from outside a workspace prints `Error: No canopy.toml found in current directory or any parent.` That's technically true, but doesn't tell a new user *what* a workspace is or *why* canopy can't proceed.
 
+- **Repro:** `cd / && canopy setup-agent --check` → terse "No canopy.toml found" error.
+- **Decision (per user):** **don't gracefully degrade** (e.g. partial setup-agent reports without MCP). Fail loud and clear with an error message that explains canopy's mental model:
+  > Canopy needs to be run from a **canopy workspace** — a non-git directory that contains `canopy.toml` plus the participating repos as subdirectories. Run `canopy init` in such a directory to create one.
+- **Severity:** low individually; medium for new-user friction (this is the first error a fresh install hits).
+- **Fix:** centralize the "no canopy.toml" error rendering in one place (`cli/render.py` or a small helper in `cli/main.py`) so every command that depends on a workspace prints the same, helpful message — and exits non-zero (see F-2).
 
 ### F-2: error path returns exit code 0
 
@@ -141,7 +139,7 @@ This is the recovery scenario M1 was built for. Detection works end-to-end again
 
 - [ ] **F-7 fix (P0)** — make alias resolver provider-aware. The CLI for any non-Linear provider is currently dead. ~half-day. Could compose with Phil's `issue_resolver.py` if his PR lands first.
 - [ ] **F-6 fix (P1)** — make `cmd_issue` render `Issue.to_dict()` directly so CLI + MCP agree. Drop the legacy raw-state shape. Update docs/commands.md. ~30 min.
-- [ ] **F-1 / F-2 + F-2-generalized fix (P1)** — `cli/main.py` cleanup: workspace-tolerant `--check`, non-zero exit on every error path. ~30 min.
+- [ ] **F-1 + F-2 + F-2-generalized fix (P1)** — centralize the "no canopy.toml" error in one helper that prints the workspace-explainer message and exits non-zero. Apply to every workspace-scoped command. Plus: every CLI path that emits a BlockerError JSON should `sys.exit(1)`. ~30 min.
 - [ ] **F-5 fix (P2)** — add `cmd_issues` for parity with the MCP `issue_list_my_issues`, OR defer to Phil's PR (which has it).
 - [ ] **F-3 backlog** — doctor `mcp_orphans` check + reaper.
 - [ ] **F-4 backlog** — Linear-headless smoke test using cached tokens; document the OAuth-required-in-tty constraint in docs/mcp.md.