From 981232adbec2da4aaab46f8092403393f6565d1b Mon Sep 17 00:00:00 2001 From: Ashmit Biswas Date: Sat, 2 May 2026 21:47:21 +0530 Subject: [PATCH 1/2] docs: integration test plan + first end-to-end run findings MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two new docs: - docs/test-plan.md — static, re-runnable manual integration test plan for M0–M5 against the canopy-test workspace. Sections per milestone with concrete steps, expected output, status checkboxes, plus a composite end-to-end scenario. - docs/test-findings.md — record of the first pass against canopy 0.5.0. Captures 9 distinct findings (F-0 through F-9) plus the per-section status. Includes severity, repro, and suggested fix for each. Headline: the MCP/agent-facing surface is healthy across M0–M5; the CLI/human-facing surface has 2 important bugs (F-6, F-7) and 4 small ones (F-1, F-2, F-5, F-9). All M2 augments + M4 historian flows pass end-to-end. M3 bot-tracking is blocked behind setting up CodeRabbit on a test PR (deferred external setup). The asymmetry was invisible in the unit suite because every tested code path went through the action layer's MCP shape — the CLI rendering bugs only surface when you actually type the commands. This is what motivated the test plan in the first place. Action items at the bottom of test-findings.md: - F-7 (P0) — alias resolver provider-aware (CLI dead for non-Linear) - F-6 (P1) — CLI render Issue.to_dict() to match MCP shape - F-1/F-2 (P1) — workspace-tolerant --check, non-zero exit on errors - F-5 (P2) — add cmd_issues for parity with MCP - F-3, F-4 — backlog No code changes here — pure docs. The fixes for F-1/F-2/F-6/F-7 are follow-up PRs based on this run's findings. --- docs/test-findings.md | 167 ++++++++++++++++++++++++++++++++++ docs/test-plan.md | 207 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 374 insertions(+) create mode 100644 docs/test-findings.md create mode 100644 docs/test-plan.md diff --git a/docs/test-findings.md b/docs/test-findings.md new file mode 100644 index 0000000..28aeba6 --- /dev/null +++ b/docs/test-findings.md @@ -0,0 +1,167 @@ +# Test Run — 2026-05-02 (canopy 0.5.0 against canopy-test) + +First end-to-end pass against [`docs/test-plan.md`](test-plan.md). Workspace: `~/projects/canopy-test` (test-api + test-ui, GitHub-backed, Linear MCP wired, 8 features in `features.json`). + +**Environment:** canopy 0.5.0 (editable install in `~/projects/canopy/.venv/`), Python 3.14, gh authenticated, Linear MCP token cached at `~/.canopy/mcp-tokens/linear.tokens.json` (last refreshed 2026-04-27). + +**Status legend:** ✅ pass · ⚠️ pass with finding · ❌ fail · ⏭️ skipped/blocked · 🟨 in-progress + +--- + +## Section results + +| Section | Result | Notes | +|---|---|---| +| §0 Preconditions (5) | ✅ 5/5 | Found pre-existing `__version__` drift (`0.1.0` though M0–M5 shipped) — fixed pre-test in [PR #16](https://github.com/ashmitb95/canopy/pull/16). | +| §1 Doctor (5) | ⚠️ 5/5 with 2 findings | Doctor surfaces 8 real workspace issues (all `auto_fixable: true`); 2 minor CLI bugs noted (F-1, F-2). | +| §2a Linear provider | ⚠️ partial | CLI `canopy issue SIN-5` works but exposes raw Linear state ("Todo") instead of canonical ("todo"). MCP `issue_get` correct. F-5 (no plural CLI). F-4 (headless OAuth). | +| §2b GitHub Issues provider | ❌ CLI broken | MCP `issue_get` works perfectly with provider-swapped config. CLI `canopy issue 5` / `#5` / `owner/repo#5` all fail with `unknown_alias` — alias resolver is Linear-only. **F-7 is the headline bug.** | +| §3 Augments (6) | ✅ 6/6 with F-9 | Workspace `preflight_cmd`, per-repo override (per-repo wins), failing-augment graceful, augment-canopy skill installs. F-9: `--check` only reports default skill. | +| §4 Bot tracking | ⏭️ blocked | Needs CodeRabbit set up on `ashmitb95/canopy-test-api` PRs — external setup. Throwaway issues #5/#6 used + closed. | +| §5 Historian (11) | ✅ 9/11; 2 blocked | switch ↔ memory round-trip works; decision dedup works; pause + render work; `.gitignore` auto-written; compact noop + drop both work. §5.7 (commit --address auto-mirror) + §5.8 (review_comments auto-mirror) blocked behind §4. | + +--- + +## Findings + +### F-0: `__version__` drift (FIXED pre-test) + +`src/canopy/__init__.py` was stuck at `"0.1.0"` despite M0–M5 shipping. The doctor's `cli_stale` / `mcp_stale` checks compare against this constant — they were silently a no-op for ~6 months of work. + +- **Fix:** [PR #16](https://github.com/ashmitb95/canopy/pull/16) — bumped to `0.5.0`, added `CHANGELOG.md`, added a CLAUDE.md guard. +- **Lesson:** version bump should happen in the same PR as the milestone it represents. Going forward, the CLAUDE.md note covers it. + +### F-1: `canopy setup-agent --check` requires a workspace + +Running `canopy setup-agent --check --json` from a directory without `canopy.toml` errors with "No canopy.toml found." But the skill check is global (`~/.claude/skills/`) and doesn't need a workspace; only the MCP-config check does. + +- **Repro:** `cd / && canopy setup-agent --check` → "Error: No canopy.toml found in current directory or any parent." +- **Expected:** skill section reports normally; MCP section reports "no workspace; skipped" with the same JSON shape. +- **Severity:** low — workaround is to run from inside a workspace. +- **Fix:** in `cmd_setup_agent` (cli/main.py), make the workspace lookup tolerant; if it fails, render the skill section anyway and stub MCP. + +^For F1, we should just show an error and that canopy needs a workspace that is a non-git directory + + +### F-2: error path returns exit code 0 + +When `setup-agent --check` fails because no canopy.toml exists, `echo $?` is `0`. Should be non-zero so shell scripts catch it. + +- **Repro:** `canopy setup-agent --check; echo $?` outside a workspace → `Error:` printed, `0` exit. +- **Severity:** medium — breaks shell-script integration; CI scripts that wrap canopy will silently miss errors. +- **Fix:** all error-print branches in `cli/main.py` should `sys.exit(1)` after printing. + +### F-3: stale `canopy-mcp` processes accumulate + +`ps aux | grep canopy-mcp` shows 8+ stale processes from earlier in the week. Each one is a hung MCP server from a previous session that didn't get reaped. + +- **Severity:** low — they're idle (memory ≤96 KB each); not actively harming. +- **Cause likely:** when an agent / IDE disconnects from MCP without a clean shutdown handshake, the stdio server hangs waiting for stdin. +- **Fix candidate:** doctor could add a `mcp_orphans` check that lists processes whose parent died, with `--clean` to reap them. Or the MCP server could exit on EOF rather than blocking. + +### F-4: Linear MCP from a headless Python invocation hangs + +Running `python -c "from canopy.mcp.server import issue_list_my_issues; issue_list_my_issues()"` from a script never returned. Likely the OAuth flow attempts a browser open + waits for the redirect, with no terminal. + +- **Severity:** low for users (the MCP client is meant to be invoked through Claude Code / canopy CLI, both of which have stdio); medium for testing (we can't headlessly assert the MCP path works against live Linear). +- **Workaround:** for tests, exercise the `LinearProvider` class directly with mocked `call_tool` (already done in the unit suite). + +### F-6: CLI `canopy issue` exposes raw provider state, MCP returns canonical + +`linear_get_issue` in `actions/reads.py` (the legacy wrapper backing `cmd_issue`) intentionally exposes `raw.state` for "backward compatibility." Concretely: + +| Surface | `state` for SIN-5 | +|---|---| +| `canopy issue SIN-5 --json` | `"Todo"` (raw Linear) | +| `mcp__canopy__issue_get(alias="SIN-5")` | `"todo"` (canonical M5 mapping) | + +Same workspace, same issue, two different responses depending on which surface you call. Also: CLI shape is `{alias, issue_id, title, state, url, description, raw}`; MCP shape is `{id, identifier, title, description, state, url, assignee, labels, priority, raw}`. Different fields entirely. + +- **Severity:** medium — back-compat reasoning is dated (no current callers actually depend on raw state); the inconsistency is a footgun for anyone scripting against the CLI vs the agent talking via MCP. +- **Fix:** retire the legacy shape; have `cmd_issue` render `Issue.to_dict()` directly (matching the MCP tool). Update `docs/commands.md` accordingly. + +### F-7: alias resolver is Linear-only — `canopy issue` broken for GitHub Issues provider + +With `[issue_provider] name = "github_issues"` set in canopy.toml and a real GH issue (#5) on the configured repo, **none of these CLI invocations work:** + +``` +canopy issue 5 → BlockerError unknown_alias +canopy issue '#5' → BlockerError unknown_alias +canopy issue 'ashmitb95/canopy-test-api#5' → BlockerError unknown_alias +``` + +The MCP equivalent (`mcp__canopy__issue_get(alias="5")`) returns the issue correctly — the `GitHubIssuesProvider` itself works. The bug is in `actions/aliases.py:resolve_linear_id`, which is hardcoded to look for Linear-shaped IDs (`SIN-N`) or feature-lane names. It doesn't know that for `github_issues`, a bare number is the canonical id form. + +This is a **major M5 integration gap**: M5 added the Provider Protocol + registry, but the alias resolution layer above it is still Linear-shaped. The CLI surface for any non-Linear provider is dead. + +- **Severity:** high — `canopy issue` is the primary user surface for the issue provider abstraction; it doesn't work for the second backend M5 was supposed to ship. +- **Workaround:** call MCP tool directly (works in agent contexts; not for CLI users). +- **Fix:** rewrite `resolve_linear_id` (rename to `resolve_issue_id`) to consult the active provider for what shapes it accepts. GitHub Issues: bare number, `#N`, `owner/repo#N`, full URL. Linear: `-`, feature names. Provider can expose a `parse_alias(s) -> str | None` method (returns canonical id if accepted, else None); resolver tries provider first, falls back to feature-name lookup. +- **Adjacent:** Phil's branch has `actions/issue_resolver.py` with auto-detect logic (`SIN-N` → Linear, `owner/repo#N` → GitHub, etc.) — could be ported when his PR rebases onto M5. + +### F-2 generalized: BlockerError JSON output → exit 0 + +Same root cause as F-2 but worth re-stating: when `canopy issue 5` returns a BlockerError as JSON, the exit code is still 0. Any shell script wrapping `canopy issue` and checking `$?` will silently miss the failure. This applies to all CLI commands that emit BlockerError JSON. + +- **Severity:** medium — script integration footgun. +- **Fix:** in `cli/main.py`, every code path that prints BlockerError JSON should `sys.exit(1)` after. + +### F-9: `setup-agent --check` only reports the default skill + +`canopy setup-agent --check --json` returns `{skill: {...}, mcp: {...}}` — but `skill` is hardcoded to `using-canopy`. After installing `augment-canopy` via `--skill augment-canopy`, the `--check` output still only reports `using-canopy`'s state. + +- **Severity:** low — install side works correctly; `--check` is just incomplete reporting. +- **Fix:** `check_status` should iterate `available_skills()` and return `skills: [...]` parallel to the install-side report. + +### F-5: no `canopy issues` (plural) CLI command + +The test plan's §2.1 assumed `canopy issues` lists the user's open issues. It doesn't exist — only `canopy issue ` (singular, fetches one). The MCP tool `issue_list_my_issues` exists, but there's no CLI mirror. + +Phil's `extension-rewrite` branch added `canopy issues --json` for exactly this reason (his extension calls it via subprocess for the issue picker). + +- **Severity:** medium — gap between MCP + CLI surface; the test plan + the agent skill both implicitly assume the CLI form exists. +- **Fix candidates:** (a) add `cmd_issues` in `cli/main.py` calling `issue_list_my_issues`; (b) wait for Phil's PR which already has it. + +### Real workspace issues caught by doctor (validation pass for M1) + +Doctor reported 8 issues in `~/projects/canopy-test`, all `auto_fixable: true`: + +| Code | Severity | What | +|---|---|---| +| `heads_stale` × 2 | warn | `heads.json` out of sync for test-api + test-ui (post-checkout hook didn't fire after manual git ops) | +| `worktree_missing` × 4 | error | `features.json` references worktree paths that don't exist on disk (deleted manually): `demo-parallel/test-{api,ui}`, `sin-5-search/test-ui`, `sin-7-empty-state/test-ui` | +| `preflight_stale` | info | preflight result for `doc-1001-paired` test-api is stale | +| `vsix_duplicates` | info | 4 canopy vsix install dirs found in `~/.vscode/extensions/` | + +This is the recovery scenario M1 was built for. Detection works end-to-end against a real workspace. **Did not** run `--fix` yet (4 of these would recreate worktrees on disk; defer until intentional cleanup). + +--- + +## Action items from this run + +- [ ] **F-7 fix (P0)** — make alias resolver provider-aware. The CLI for any non-Linear provider is currently dead. ~half-day. Could compose with Phil's `issue_resolver.py` if his PR lands first. +- [ ] **F-6 fix (P1)** — make `cmd_issue` render `Issue.to_dict()` directly so CLI + MCP agree. Drop the legacy raw-state shape. Update docs/commands.md. ~30 min. +- [ ] **F-1 / F-2 + F-2-generalized fix (P1)** — `cli/main.py` cleanup: workspace-tolerant `--check`, non-zero exit on every error path. ~30 min. +- [ ] **F-5 fix (P2)** — add `cmd_issues` for parity with the MCP `issue_list_my_issues`, OR defer to Phil's PR (which has it). +- [ ] **F-3 backlog** — doctor `mcp_orphans` check + reaper. +- [ ] **F-4 backlog** — Linear-headless smoke test using cached tokens; document the OAuth-required-in-tty constraint in docs/mcp.md. +- [ ] **`canopy doctor --fix`** — on a follow-up session, intentionally clean canopy-test's real drift (heads.json + missing worktrees + vsix duplicates) to validate the repair side end-to-end. + +## Test-data cleanup + +- ✅ Throwaway issues #5 + #6 on `ashmitb95/canopy-test-api` closed at end of run. +- ✅ canopy-test workspace canopy.toml restored to original; README.md edits in test-api/test-ui reset. +- 🟡 Memory file `~/projects/canopy-test/.canopy/memory/sin-7-empty-state.{md,jsonl}` left in place — it's gitignored per M4's auto-write, harmless. Cleanup instruction: `rm -rf ~/projects/canopy-test/.canopy/memory/` if a fully fresh state is wanted. + +## Headline takeaway + +**The MCP/agent-facing surface is healthy across M0–M5; the CLI/human-facing surface has 2 important bugs (F-6, F-7) and 4 small ones (F-1, F-2, F-5, F-9).** None are catastrophic — agents using the MCP tools get correct, canonical responses. But human users hitting the CLI directly get raw provider strings, broken alias resolution for non-Linear providers, and exit codes that lie about success. The asymmetry was invisible in the unit suite because every tested code path went through the action layer's MCP shape — the CLI rendering bugs only surface when you actually type the commands. + +--- + +## How to interpret this doc + +- Findings labeled `F-N` are bugs / gaps surfaced by this test run. Each has severity + suggested fix. +- The "Action items" at the bottom are the to-do list for the next session. +- Pass-with-finding (⚠️) means the surface works but reveals a quality issue worth noting. +- This file is the *test run record*; the static plan to re-run is at [test-plan.md](test-plan.md). diff --git a/docs/test-plan.md b/docs/test-plan.md new file mode 100644 index 0000000..11e0c23 --- /dev/null +++ b/docs/test-plan.md @@ -0,0 +1,207 @@ +# Canopy — Manual Integration Test Plan + +**Workspace under test:** `~/projects/canopy-test` (2 repos: `canopy-test-api`, `canopy-test-ui`; both backed by GitHub remotes; MCP wired for canopy + Linear). + +**Purpose:** validate every shipped milestone (M0–M5) end-to-end against a real workspace — the unit suite proves modules; this proves the integrated product. Walk through once after each milestone; re-run before each release. + +**Format:** every check has *steps* (concrete commands), *expected* (what passes), and a *status* slot (`[ ]` → `[✓]`/`[✗]`). Skip with `[~]` and a one-line reason. + +--- + +## 0. Pre-conditions + +Run these before anything else. Fail any of these → stop and fix install before testing milestones. + +| # | Check | Steps | Expected | +|---|---|---|---| +| 0.1 | Canopy CLI on PATH | `canopy --version` | Prints a version (e.g. `0.5.0`); no `command not found` | +| 0.2 | MCP entry registered | `cat ~/projects/canopy-test/.mcp.json` | Has a `canopy` server with `CANOPY_ROOT=/Users/ashmit/projects/canopy-test` | +| 0.3 | `gh` authenticated | `gh auth status` | "Logged in to github.com" | +| 0.4 | Linear MCP available | `cat ~/projects/canopy-test/.canopy/mcps.json` | Linear entry present | +| 0.5 | Workspace parses | `cd ~/projects/canopy-test && canopy state --json \| head -5` | Returns JSON, not an exception | + +Status: `[ ]` `[ ]` `[ ]` `[ ]` `[ ]` + +--- + +## 1. M1 — `canopy doctor` (16 categories + version handshake) + +| # | Check | Steps | Expected | +|---|---|---|---| +| 1.1 | Doctor runs clean | `cd ~/projects/canopy-test && canopy doctor --json` | `summary.errors == 0` (or only known/expected ones). Categories cover state, install, mcp, skill, vsix. | +| 1.2 | Detects state drift | `mv ~/projects/canopy-test/.canopy/state/heads.json /tmp/heads-bak.json && canopy doctor` | Reports `heads_missing` (or similar). Restore: `mv /tmp/heads-bak.json ~/projects/canopy-test/.canopy/state/heads.json`. | +| 1.3 | Auto-fix recovers | repeat 1.2 then `canopy doctor --fix` | The missing-state issue gets `auto_fixable: true` and is repaired. | +| 1.4 | Version handshake | `canopy --version` and `python -c "from canopy.mcp.server import version; print(version())"` | Both report the same `cli_version` / `mcp_version` / `schema_version`. | +| 1.5 | Skill install report | `canopy setup-agent --check --json` | Skill `installed: true`, `is_canopy_skill: true`, `up_to_date: true`. | + +Status: `[ ]` `[ ]` `[ ]` `[ ]` `[ ]` + +--- + +## 2. M5 — Issue providers (Linear + GitHub Issues) + +### 2a. Linear backend (default — current canopy-test config) + +| # | Check | Steps | Expected | +|---|---|---|---| +| 2.1 | List my Linear issues | `cd ~/projects/canopy-test && canopy issues --json` (MCP variant: `issue_list_my_issues`) | Returns ≥1 issue if any are assigned to you; else `[]`. Each has `id`, `identifier` (e.g. `SIN-7`), `title`, `state`. | +| 2.2 | Fetch a known Linear issue | `canopy issue SIN-5 --json` (MCP: `issue_get(alias="SIN-5")`) | Returns `{identifier: "SIN-5", title, state, url, ...}`. State maps to canonical (`todo` / `in_progress` / `done`). | +| 2.3 | Backward-compat alias | `mcp__canopy__linear_get_issue(alias="SIN-5")` | Same response as 2.2; deprecation note in logs. | +| 2.4 | Per-feature alias resolves | `canopy state SIN-7 --json \| head -20` | Resolves `SIN-7` → `sin-7-empty-state`; returns its state machine entry. | + +Status: `[ ]` `[ ]` `[ ]` `[ ]` + +### 2b. GitHub Issues backend (one-off swap) + +| # | Check | Steps | Expected | +|---|---|---|---| +| 2.5 | Switch provider | Edit `canopy-test/canopy.toml`, add `[issue_provider]\nname = "github_issues"\n\n[issue_provider.github_issues]\nrepo = "ashmitb95/canopy-test-api"`. Then `canopy issues --json`. | Returns issues from the GitHub repo (or `[]` if none open). Provider switch with no canopy restart. | +| 2.6 | Fetch GitHub issue | Create or pick a GitHub issue: `gh issue create --repo ashmitb95/canopy-test-api --title "test" --body ""`. Then `canopy issue --json`. | Returns the issue normalized to the same `Issue` shape (no Linear-specific fields leaked). | +| 2.7 | Restore Linear | Remove the `[issue_provider]` block from canopy.toml. `canopy issues --json` falls back to Linear. | No errors; Linear issues again. | + +Status: `[ ]` `[ ]` `[ ]` + +--- + +## 3. M2 — Augments (per-workspace customization) + +| # | Check | Steps | Expected | +|---|---|---|---| +| 3.1 | Empty augments → default preflight | `canopy preflight --json` (in canopy-test root) | Existing pre-commit auto-detection runs; result has `applied_augment: false`. | +| 3.2 | Workspace `preflight_cmd` runs | Edit canopy.toml, add `[augments]\npreflight_cmd = "echo OK && exit 0"`. `canopy preflight --json`. | Output includes `applied_augment: true`, `passed: true`, `command: "echo OK && exit 0"`. | +| 3.3 | Per-repo override | Add `augments = { preflight_cmd = "echo TEST-API && exit 0" }` to the `test-api` `[[repos]]` block. `canopy preflight --json`. | `test-api` runs the override; `test-ui` uses workspace default. | +| 3.4 | Augment skill installs | `canopy setup-agent --skill augment-canopy --check` then `canopy setup-agent --skill augment-canopy` | Reports installed at `~/.claude/skills/augment-canopy/SKILL.md`. | +| 3.5 | Bad command surfaces in result | Set `preflight_cmd = "exit 1"`. Run preflight. | `passed: false`, `applied_augment: true`. No crash. | +| 3.6 | Cleanup | Remove the `[augments]` block + per-repo augments | preflight returns to auto-detect (`applied_augment: false`). | + +Status: `[ ]` `[ ]` `[ ]` `[ ]` `[ ]` `[ ]` + +--- + +## 4. M3 — Bot-comment tracking + +**Setup (~10 min, one-time):** install CodeRabbit (or similar bot) on `canopy-test-api`. Open a small PR with a deliberate code-quality issue (unused import, magic number). Wait for the bot to comment. Note the comment ID from the GitHub URL. + +| # | Check | Steps | Expected | +|---|---|---|---| +| 4.1 | Comment id surfaces | `canopy review --comments-only --json` | Each comment has an `id` field (M3 added; should be a non-zero integer). | +| 4.2 | Bot vs human split | `canopy state --json` | `summary.actionable_bot_count >= 1`, `summary.actionable_human_count == 0` (assuming no human reviewers). | +| 4.3 | New state surfaces | Same as 4.2 | `state == "awaiting_bot_resolution"`; `next_actions[0].action == "address_bot_comments"`. | +| 4.4 | `bot-status` rollup | `canopy bot-status --feature --json` | Returns `{repos: {test-api: {pr_number, total: ≥1, resolved: 0, unresolved: ≥1, threads: [...]}}, all_resolved: false}`. Each thread has `id`, `author`, `path`, `body_preview`. | +| 4.5 | `--unresolved-only` filter | `canopy bot-status --feature --unresolved-only --json` | Only unresolved threads listed. | +| 4.6 | `commit --address` (numeric id) | Make a small fix in the repo. `canopy commit --address -m "rename"` | Per-repo result `ok` for the matching repo. Top-level `addressed: {comment_id, repo, sha, recorded: true, ...}`. Commit message in git includes `Addresses bot comment: "" (<url>)`. | +| 4.7 | Resolution persisted | `cat ~/projects/canopy-test/.canopy/state/bot_resolutions.json` | Has `{<id>: {feature, repo, commit_sha, ...}}`. | +| 4.8 | `commit --address` (URL form) | Same as 4.6 but pass full GitHub URL as the address | Same behavior; URL parsed to numeric id. | +| 4.9 | Resolved subtracts from count | `canopy state <feature> --json` after 4.6 | `actionable_bot_count` decreased by 1. State drops out of `awaiting_bot_resolution` if it was the last one. | +| 4.10 | Augment-narrowed bots | Add `[augments] review_bots = ["coderabbit"]` to canopy.toml. Re-run `bot-status`. | Same coverage if author is CodeRabbit; non-matching bot accounts (e.g. `dependabot`) drop into the human bucket. | +| 4.11 | Unknown id rejected | `canopy commit --address 999999 -m "x"` (id not in PR) | Errors with `BlockerError(code='not_a_bot_comment')`; no commit fires. | +| 4.12 | Approved + bot threads | If a reviewer approves the PR while bot comments remain: `canopy state` | State stays `approved`, `next_actions[0]` is `merge`, `next_actions[1]` is `address_bot_comments` (secondary CTA). | + +Status: `[ ]` × 12 + +--- + +## 5. M4 — Historian (cross-session memory) + +| # | Check | Steps | Expected | +|---|---|---|---| +| 5.1 | Empty memory on switch | `canopy switch sin-7-empty-state --json` (assuming no historian entries yet) | Response includes `memory: ""`. | +| 5.2 | Record a decision | Via MCP: `mcp__canopy__historian_decide(feature="sin-7-empty-state", decisions=[{"title": "use empty-state SVG from design system", "rationale": "matches existing 404 page"}])` | Returns `{action: "recorded", title: ...}`. File created at `~/projects/canopy-test/.canopy/memory/sin-7-empty-state.jsonl` + `.md`. | +| 5.3 | Decision dedup | Same call again | Returns `{action: "deduped"}`. Only one entry in the JSONL. | +| 5.4 | Pause | `mcp__canopy__historian_pause(feature="sin-7-empty-state", reason="blocked on design-system copy")` | Recorded; appears in Sessions section of the rendered .md. | +| 5.5 | Memory included on switch | `canopy switch sin-6-cache-stats` then `canopy switch sin-7-empty-state` | Second switch's response `memory` field contains the markdown with the decision + pause. | +| 5.6 | CLI inspection | `canopy historian show sin-7-empty-state` | Prints the rendered markdown — header, all 3 sections (resolutions/PR/sessions), placeholders for empty sections. | +| 5.7 | Auto-mirror from `commit --address` | After running 4.6, `canopy historian show <feature>` | "Resolutions log" section has the resolved comment (✓ glyph, sha, gist). | +| 5.8 | Auto-mirror from `review_comments` | After running 4.1, `canopy historian show <feature>` | Sessions section has `read comment <id>` entries; if classifier marked threads, also `classifier marked N thread(s) likely-resolved`. | +| 5.9 | Compact (within limit) | `canopy historian compact sin-7-empty-state --keep-sessions 5` | `action: "noop"` (only 1 session so far). | +| 5.10 | Compact (forces drop) | Force multiple sessions: `CANOPY_SESSION_ID=s-1 canopy historian show ...` won't help; instead, in MCP: call `historian_decide` with several sessions in JSONL by hand or wait until natural sessions accumulate. Then `canopy historian compact <f> --keep-sessions 2`. | Drops oldest session entries; preserves resolutions log + PR context. | +| 5.11 | `.gitignore` written | `cat ~/projects/canopy-test/.canopy/memory/.gitignore` | Contains `*` and `!.gitignore` so memory files don't get committed. | + +Status: `[ ]` × 11 + +--- + +## 6. End-to-end scenario (composite) + +One realistic feature lifecycle that exercises every shipped milestone in sequence. Plan ~30 minutes. + +```bash +# Fresh start: pick a feature that doesn't exist yet +cd ~/projects/canopy-test + +# 1. Verify clean install (M1) +canopy doctor + +# 2. Configure: add augments + (optionally) GitHub Issues (M2 + M5) +# Edit canopy.toml: +# [augments] +# preflight_cmd = "echo OK && exit 0" +# review_bots = ["coderabbit"] + +# 3. Pick up a Linear issue → switch (M5 + canonical-slot model) +canopy switch SIN-8 # promotes sin-8-stale-count to canonical +# Response: includes memory: "" on first switch (M4) + +# 4. Make a change in test-api +echo "# stale count fix" >> canopy-test-api/src/example.py + +# 5. Preflight runs the augment (M2) +canopy preflight # → "applied_augment: true" + +# 6. Commit (Wave 2.3) +canopy commit -m "fix: stale count edge" + +# 7. Record a decision (M4) +# Via MCP: historian_decide(feature="sin-8-stale-count", +# decisions=[{"title": "compute stale count from cache TTL", +# "rationale": "avoids extra DB call on hot path"}]) + +# 8. Push (Wave 2.3) +canopy push --set-upstream + +# 9. Open PR via gh; wait for CodeRabbit +gh pr create --repo ashmitb95/canopy-test-api --title "..." --body "" + +# 10. After bot comments arrive: state machine surfaces awaiting_bot_resolution (M3) +canopy state SIN-8 + +# 11. Address bot comment (M3 + M4 mirror) +canopy commit --address <id> -m "rename per coderabbit" + +# 12. Switch away then back — verify memory carries the narrative (M4) +canopy switch sin-7-empty-state +canopy switch SIN-8 +# Response.memory shows: decision, the resolved comment, the recorded session + +# 13. Doctor still clean (M1) +canopy doctor +``` + +**Pass criteria:** every step completes without unexpected errors; the state machine transitions match the 9-state diagram in [concepts.md](concepts.md); `historian show SIN-8` at the end has a non-trivial Resolutions log + Sessions narrative. + +--- + +## 7. Known unverifiable / deferred + +These are intentionally not testable in v1 — note them but skip: + +| Capability | Why skipped now | Lands when | +|---|---|---| +| Auto-capture of generic Bash/Edit events into historian | PostToolUse hook (autopilot) deferred | Autopilot hook bundle ships | +| Stop-hook tail-parse of `<historian-decisions>` | Stop hook (autopilot) deferred | Autopilot hook bundle ships | +| LLM compaction in `historian_compact` | Mechanical-only in v1 by design | Future LLM pass; storage shape forward-compatible | +| `canopy ship` end-to-end (commit + push + PR) | M8 not shipped | After Phil's `pr_target` + M8 | +| Per-repo PR target | M8 / Phil's branch | After Phil's PR | +| Local-package symlinking on switch | Phil's branch | After Phil's PR | +| Extension dashboard (action drawer) | M11 + Phil's extension rewrite | After both land | +| Sidebar single-tree | M7 / Phil's extension rewrite | After Phil's rewrite | + +--- + +## 8. After running this plan + +1. **Record results** — fill in checkboxes in this file and commit (or paste the diff in a session note). +2. **Triage failures** — each `[✗]` becomes either a bug fix (file an issue), a docs gap (clarify in the relevant SKILL.md / concepts.md), or a known limitation (move into §7). +3. **Repeat per release** — re-run §0–§5 before each version bump; full §6 e2e at major milestones. + +The first full pass is the high-value one — it turns "we shipped a lot of unit-tested code" into "we shipped a working product." From 6260efccd3ae5fca0ee7c0a4a3923460915cd6c7 Mon Sep 17 00:00:00 2001 From: Ashmit Biswas <ashmitbbiswas@gmail.com> Date: Sat, 2 May 2026 21:49:55 +0530 Subject: [PATCH 2/2] =?UTF-8?q?docs(test-findings):=20revise=20F-1=20?= =?UTF-8?q?=E2=80=94=20fail=20loud=20with=20workspace=20explainer?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per user decision on the test-findings.md review: don't gracefully degrade `setup-agent --check` for the no-workspace case. Instead, fail with a clear error that explains canopy's mental model: > Canopy needs to be run from a canopy workspace — a non-git directory > that contains canopy.toml plus the participating repos as subdirs. > Run `canopy init` in such a directory to create one. The fix should centralize this rendering so every workspace-scoped command prints the same helpful message instead of the terse "No canopy.toml found" each one currently emits. Bundles cleanly with F-2 + F-2-generalized — same PR adds non-zero exit codes to every error path. --- docs/test-findings.md | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/docs/test-findings.md b/docs/test-findings.md index 28aeba6..551f6b3 100644 --- a/docs/test-findings.md +++ b/docs/test-findings.md @@ -31,17 +31,15 @@ First end-to-end pass against [`docs/test-plan.md`](test-plan.md). Workspace: `~ - **Fix:** [PR #16](https://github.com/ashmitb95/canopy/pull/16) — bumped to `0.5.0`, added `CHANGELOG.md`, added a CLAUDE.md guard. - **Lesson:** version bump should happen in the same PR as the milestone it represents. Going forward, the CLAUDE.md note covers it. -### F-1: `canopy setup-agent --check` requires a workspace +### F-1: "no canopy.toml found" error is unhelpful -Running `canopy setup-agent --check --json` from a directory without `canopy.toml` errors with "No canopy.toml found." But the skill check is global (`~/.claude/skills/`) and doesn't need a workspace; only the MCP-config check does. - -- **Repro:** `cd / && canopy setup-agent --check` → "Error: No canopy.toml found in current directory or any parent." -- **Expected:** skill section reports normally; MCP section reports "no workspace; skipped" with the same JSON shape. -- **Severity:** low — workaround is to run from inside a workspace. -- **Fix:** in `cmd_setup_agent` (cli/main.py), make the workspace lookup tolerant; if it fails, render the skill section anyway and stub MCP. - -^For F1, we should just show an error and that canopy needs a workspace that is a non-git directory +Running any workspace-scoped command (e.g. `canopy setup-agent --check`, `canopy state`, `canopy preflight`) from outside a workspace prints `Error: No canopy.toml found in current directory or any parent.` That's technically true, but doesn't tell a new user *what* a workspace is or *why* canopy can't proceed. +- **Repro:** `cd / && canopy setup-agent --check` → terse "No canopy.toml found" error. +- **Decision (per user):** **don't gracefully degrade** (e.g. partial setup-agent reports without MCP). Fail loud and clear with an error message that explains canopy's mental model: + > Canopy needs to be run from a **canopy workspace** — a non-git directory that contains `canopy.toml` plus the participating repos as subdirectories. Run `canopy init` in such a directory to create one. +- **Severity:** low individually; medium for new-user friction (this is the first error a fresh install hits). +- **Fix:** centralize the "no canopy.toml" error rendering in one place (`cli/render.py` or a small helper in `cli/main.py`) so every command that depends on a workspace prints the same, helpful message — and exits non-zero (see F-2). ### F-2: error path returns exit code 0 @@ -141,7 +139,7 @@ This is the recovery scenario M1 was built for. Detection works end-to-end again - [ ] **F-7 fix (P0)** — make alias resolver provider-aware. The CLI for any non-Linear provider is currently dead. ~half-day. Could compose with Phil's `issue_resolver.py` if his PR lands first. - [ ] **F-6 fix (P1)** — make `cmd_issue` render `Issue.to_dict()` directly so CLI + MCP agree. Drop the legacy raw-state shape. Update docs/commands.md. ~30 min. -- [ ] **F-1 / F-2 + F-2-generalized fix (P1)** — `cli/main.py` cleanup: workspace-tolerant `--check`, non-zero exit on every error path. ~30 min. +- [ ] **F-1 + F-2 + F-2-generalized fix (P1)** — centralize the "no canopy.toml" error in one helper that prints the workspace-explainer message and exits non-zero. Apply to every workspace-scoped command. Plus: every CLI path that emits a BlockerError JSON should `sys.exit(1)`. ~30 min. - [ ] **F-5 fix (P2)** — add `cmd_issues` for parity with the MCP `issue_list_my_issues`, OR defer to Phil's PR (which has it). - [ ] **F-3 backlog** — doctor `mcp_orphans` check + reaper. - [ ] **F-4 backlog** — Linear-headless smoke test using cached tokens; document the OAuth-required-in-tty constraint in docs/mcp.md.