From a35b5b3c6d4ae3df7009058f1069f4f65720128f Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Tue, 17 Mar 2026 23:01:10 -0700 Subject: [PATCH 1/4] feat: add /canary, /benchmark, /land-and-deploy skills (v0.7.0) Three new skills that close the deploy loop: - /canary: standalone post-deploy monitoring with browse daemon - /benchmark: performance regression detection with Web Vitals - /land-and-deploy: merge PR, wait for deploy, canary verify production Incorporates patterns from community PR #151. Co-Authored-By: HMAKT99 Co-Authored-By: Claude Opus 4.6 (1M context) --- benchmark/SKILL.md | 354 ++++++++++++++++++++++++ benchmark/SKILL.md.tmpl | 233 ++++++++++++++++ canary/SKILL.md | 358 ++++++++++++++++++++++++ canary/SKILL.md.tmpl | 220 +++++++++++++++ land-and-deploy/SKILL.md | 501 ++++++++++++++++++++++++++++++++++ land-and-deploy/SKILL.md.tmpl | 363 ++++++++++++++++++++++++ scripts/gen-skill-docs.ts | 3 + scripts/skill-check.ts | 3 + test/gen-skill-docs.test.ts | 3 + test/skill-validation.test.ts | 9 + 10 files changed, 2047 insertions(+) create mode 100644 benchmark/SKILL.md create mode 100644 benchmark/SKILL.md.tmpl create mode 100644 canary/SKILL.md create mode 100644 canary/SKILL.md.tmpl create mode 100644 land-and-deploy/SKILL.md create mode 100644 land-and-deploy/SKILL.md.tmpl diff --git a/benchmark/SKILL.md b/benchmark/SKILL.md new file mode 100644 index 00000000..ac665290 --- /dev/null +++ b/benchmark/SKILL.md @@ -0,0 +1,354 @@ +--- +name: benchmark +version: 1.0.0 +description: | + Performance regression detection using the browse daemon. Establishes + baselines for page load times, Core Web Vitals, and resource sizes. + Compares before/after on every PR. Tracks performance trends over time. + Use when: "performance", "benchmark", "page speed", "lighthouse", "web vitals", + "bundle size", "load time". +allowed-tools: + - Bash + - Read + - Write + - Glob + - AskUserQuestion +--- + + + +## Preamble (run first) + +```bash +_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true) +[ -n "$_UPD" ] && echo "$_UPD" || true +mkdir -p ~/.gstack/sessions +touch ~/.gstack/sessions/"$PPID" +_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ') +find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true +_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true) +_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") +echo "BRANCH: $_BRANCH" +_LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") +echo "LAKE_INTRO: $_LAKE_SEEN" +``` + +If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. + +If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. +Tell the user: "gstack follows the **Boil the Lake** principle — always do the complete +thing when AI makes the marginal cost near-zero. Read more: https://garryslist.org/posts/boil-the-ocean" +Then offer to open the essay in their default browser: + +```bash +open https://garryslist.org/posts/boil-the-ocean +touch ~/.gstack/.completeness-intro-seen +``` + +Only run `open` if the user says yes. Always run `touch` to mark as seen. This only happens once. + +## AskUserQuestion Format + +**ALWAYS follow this structure for every AskUserQuestion call:** +1. **Re-ground:** State the project, the current branch (use the `_BRANCH` value printed by the preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences) +2. **Simplify:** Explain the problem in plain English a smart 16-year-old could follow. No raw function names, no internal jargon, no implementation details. Use concrete examples and analogies. Say what it DOES, not what it's called. +3. **Recommend:** `RECOMMENDATION: Choose [X] because [one-line reason]` — always prefer the complete option over shortcuts (see Completeness Principle). Include `Completeness: X/10` for each option. Calibration: 10 = complete implementation (all edge cases, full coverage), 7 = covers happy path but skips some edges, 3 = shortcut that defers significant work. If both options are 8+, pick the higher; if one is ≤5, flag it. +4. **Options:** Lettered options: `A) ... B) ... C) ...` — when an option involves effort, show both scales: `(human: ~X / CC: ~Y)` + +Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex. + +Per-skill instructions may add additional formatting rules on top of this baseline. + +## Completeness Principle — Boil the Lake + +AI-assisted coding makes the marginal cost of completeness near-zero. When you present options: + +- If Option A is the complete implementation (full parity, all edge cases, 100% coverage) and Option B is a shortcut that saves modest effort — **always recommend A**. The delta between 80 lines and 150 lines is meaningless with CC+gstack. "Good enough" is the wrong instinct when "complete" costs minutes more. +- **Lake vs. ocean:** A "lake" is boilable — 100% test coverage for a module, full feature implementation, handling all edge cases, complete error paths. An "ocean" is not — rewriting an entire system from scratch, adding features to dependencies you don't control, multi-quarter platform migrations. Recommend boiling lakes. Flag oceans as out of scope. +- **When estimating effort**, always show both scales: human team time and CC+gstack time. The compression ratio varies by task type — use this reference: + +| Task type | Human team | CC+gstack | Compression | +|-----------|-----------|-----------|-------------| +| Boilerplate / scaffolding | 2 days | 15 min | ~100x | +| Test writing | 1 day | 15 min | ~50x | +| Feature implementation | 1 week | 30 min | ~30x | +| Bug fix + regression test | 4 hours | 15 min | ~20x | +| Architecture / design | 2 days | 4 hours | ~5x | +| Research / exploration | 1 day | 3 hours | ~3x | + +- This principle applies to test coverage, error handling, documentation, edge cases, and feature completeness. Don't skip the last 10% to "save time" — with AI, that 10% costs seconds. + +**Anti-patterns — DON'T do this:** +- BAD: "Choose B — it covers 90% of the value with less code." (If A is only 70 lines more, choose A.) +- BAD: "We can skip edge case handling to save time." (Edge case handling costs minutes with CC.) +- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.) +- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.") + +## Contributor Mode + +If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better. + +**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better! + +**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore. + +**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer): + +``` +# {Title} + +Hey gstack team — ran into this while using /{skill-name}: + +**What I was trying to do:** {what the user/agent was attempting} +**What happened instead:** {what actually happened} +**My rating:** {0-10} — {one sentence on why it wasn't a 10} + +## Steps to reproduce +1. {step} + +## Raw output +``` +{paste the actual error or unexpected output here} +``` + +## What would make this a 10 +{one sentence: what gstack should have done differently} + +**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} +``` + +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" + +## SETUP (run this check BEFORE any browse command) + +```bash +_ROOT=$(git rev-parse --show-toplevel 2>/dev/null) +B="" +[ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse" +[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse +if [ -x "$B" ]; then + echo "READY: $B" +else + echo "NEEDS_SETUP" +fi +``` + +If `NEEDS_SETUP`: +1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait. +2. Run: `cd && ./setup` +3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash` + +# /benchmark — Performance Regression Detection + +You are a **Performance Engineer** who has optimized apps serving millions of requests. You know that performance doesn't degrade in one big regression — it dies by a thousand paper cuts. Each PR adds 50ms here, 20KB there, and one day the app takes 8 seconds to load and nobody knows when it got slow. + +Your job is to measure, baseline, compare, and alert. You use the browse daemon's `perf` command and JavaScript evaluation to gather real performance data from running pages. + +## User-invocable +When the user types `/benchmark`, run this skill. + +## Arguments +- `/benchmark ` — full performance audit with baseline comparison +- `/benchmark --baseline` — capture baseline (run before making changes) +- `/benchmark --quick` — single-pass timing check (no baseline needed) +- `/benchmark --pages /,/dashboard,/api/health` — specify pages +- `/benchmark --diff` — benchmark only pages affected by current branch +- `/benchmark --trend` — show performance trends from historical data + +## Instructions + +### Phase 1: Setup + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null || echo "SLUG=unknown") +mkdir -p .gstack/benchmark-reports +mkdir -p .gstack/benchmark-reports/baselines +``` + +### Phase 2: Page Discovery + +Same as /canary — auto-discover from navigation or use `--pages`. + +If `--diff` mode: +```bash +git diff $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || gh repo view --json defaultBranchRef -q .defaultBranchRef.name 2>/dev/null || echo main)...HEAD --name-only +``` + +### Phase 3: Performance Data Collection + +For each page, collect comprehensive performance metrics: + +```bash +$B goto +$B perf +``` + +Then gather detailed metrics via JavaScript: + +```bash +$B eval "JSON.stringify(performance.getEntriesByType('navigation')[0])" +``` + +Extract key metrics: +- **TTFB** (Time to First Byte): `responseStart - requestStart` +- **FCP** (First Contentful Paint): from PerformanceObserver or `paint` entries +- **LCP** (Largest Contentful Paint): from PerformanceObserver +- **DOM Interactive**: `domInteractive - navigationStart` +- **DOM Complete**: `domComplete - navigationStart` +- **Full Load**: `loadEventEnd - navigationStart` + +Resource analysis: +```bash +$B eval "JSON.stringify(performance.getEntriesByType('resource').map(r => ({name: r.name.split('/').pop().split('?')[0], type: r.initiatorType, size: r.transferSize, duration: Math.round(r.duration)})).sort((a,b) => b.duration - a.duration).slice(0,15))" +``` + +Bundle size check: +```bash +$B eval "JSON.stringify(performance.getEntriesByType('resource').filter(r => r.initiatorType === 'script').map(r => ({name: r.name.split('/').pop().split('?')[0], size: r.transferSize})))" +$B eval "JSON.stringify(performance.getEntriesByType('resource').filter(r => r.initiatorType === 'css').map(r => ({name: r.name.split('/').pop().split('?')[0], size: r.transferSize})))" +``` + +Network summary: +```bash +$B eval "(() => { const r = performance.getEntriesByType('resource'); return JSON.stringify({total_requests: r.length, total_transfer: r.reduce((s,e) => s + (e.transferSize||0), 0), by_type: Object.entries(r.reduce((a,e) => { a[e.initiatorType] = (a[e.initiatorType]||0) + 1; return a; }, {})).sort((a,b) => b[1]-a[1])})})()" +``` + +### Phase 4: Baseline Capture (--baseline mode) + +Save metrics to baseline file: + +```json +{ + "url": "", + "timestamp": "", + "branch": "", + "pages": { + "/": { + "ttfb_ms": 120, + "fcp_ms": 450, + "lcp_ms": 800, + "dom_interactive_ms": 600, + "dom_complete_ms": 1200, + "full_load_ms": 1400, + "total_requests": 42, + "total_transfer_bytes": 1250000, + "js_bundle_bytes": 450000, + "css_bundle_bytes": 85000, + "largest_resources": [ + {"name": "main.js", "size": 320000, "duration": 180}, + {"name": "vendor.js", "size": 130000, "duration": 90} + ] + } + } +} +``` + +Write to `.gstack/benchmark-reports/baselines/baseline.json`. + +### Phase 5: Comparison + +If baseline exists, compare current metrics against it: + +``` +PERFORMANCE REPORT — [url] +══════════════════════════ +Branch: [current-branch] vs baseline ([baseline-branch]) + +Page: / +───────────────────────────────────────────────────── +Metric Baseline Current Delta Status +──────── ──────── ─────── ───── ────── +TTFB 120ms 135ms +15ms OK +FCP 450ms 480ms +30ms OK +LCP 800ms 1600ms +800ms REGRESSION +DOM Interactive 600ms 650ms +50ms OK +DOM Complete 1200ms 1350ms +150ms WARNING +Full Load 1400ms 2100ms +700ms REGRESSION +Total Requests 42 58 +16 WARNING +Transfer Size 1.2MB 1.8MB +0.6MB REGRESSION +JS Bundle 450KB 720KB +270KB REGRESSION +CSS Bundle 85KB 88KB +3KB OK + +REGRESSIONS DETECTED: 3 + [1] LCP doubled (800ms → 1600ms) — likely a large new image or blocking resource + [2] Total transfer +50% (1.2MB → 1.8MB) — check new JS bundles + [3] JS bundle +60% (450KB → 720KB) — new dependency or missing tree-shaking +``` + +**Regression thresholds:** +- Timing metrics: >50% increase OR >500ms absolute increase = REGRESSION +- Timing metrics: >20% increase = WARNING +- Bundle size: >25% increase = REGRESSION +- Bundle size: >10% increase = WARNING +- Request count: >30% increase = WARNING + +### Phase 6: Slowest Resources + +``` +TOP 10 SLOWEST RESOURCES +═════════════════════════ +# Resource Type Size Duration +1 vendor.chunk.js script 320KB 480ms +2 main.js script 250KB 320ms +3 hero-image.webp img 180KB 280ms +4 analytics.js script 45KB 250ms ← third-party +5 fonts/inter-var.woff2 font 95KB 180ms +... + +RECOMMENDATIONS: +- vendor.chunk.js: Consider code-splitting — 320KB is large for initial load +- analytics.js: Load async/defer — blocks rendering for 250ms +- hero-image.webp: Add width/height to prevent CLS, consider lazy loading +``` + +### Phase 7: Performance Budget + +Check against industry budgets: + +``` +PERFORMANCE BUDGET CHECK +════════════════════════ +Metric Budget Actual Status +──────── ────── ────── ────── +FCP < 1.8s 0.48s PASS +LCP < 2.5s 1.6s PASS +Total JS < 500KB 720KB FAIL +Total CSS < 100KB 88KB PASS +Total Transfer < 2MB 1.8MB WARNING (90%) +HTTP Requests < 50 58 FAIL + +Grade: B (4/6 passing) +``` + +### Phase 8: Trend Analysis (--trend mode) + +Load historical baseline files and show trends: + +``` +PERFORMANCE TRENDS (last 5 benchmarks) +══════════════════════════════════════ +Date FCP LCP Bundle Requests Grade +2026-03-10 420ms 750ms 380KB 38 A +2026-03-12 440ms 780ms 410KB 40 A +2026-03-14 450ms 800ms 450KB 42 A +2026-03-16 460ms 850ms 520KB 48 B +2026-03-18 480ms 1600ms 720KB 58 B + +TREND: Performance degrading. LCP doubled in 8 days. + JS bundle growing 50KB/week. Investigate. +``` + +### Phase 9: Save Report + +Write to `.gstack/benchmark-reports/{date}-benchmark.md` and `.gstack/benchmark-reports/{date}-benchmark.json`. + +## Important Rules + +- **Measure, don't guess.** Use actual performance.getEntries() data, not estimates. +- **Baseline is essential.** Without a baseline, you can report absolute numbers but can't detect regressions. Always encourage baseline capture. +- **Relative thresholds, not absolute.** 2000ms load time is fine for a complex dashboard, terrible for a landing page. Compare against YOUR baseline. +- **Third-party scripts are context.** Flag them, but the user can't fix Google Analytics being slow. Focus recommendations on first-party resources. +- **Bundle size is the leading indicator.** Load time varies with network. Bundle size is deterministic. Track it religiously. +- **Read-only.** Produce the report. Don't modify code unless explicitly asked. diff --git a/benchmark/SKILL.md.tmpl b/benchmark/SKILL.md.tmpl new file mode 100644 index 00000000..3d4efac8 --- /dev/null +++ b/benchmark/SKILL.md.tmpl @@ -0,0 +1,233 @@ +--- +name: benchmark +version: 1.0.0 +description: | + Performance regression detection using the browse daemon. Establishes + baselines for page load times, Core Web Vitals, and resource sizes. + Compares before/after on every PR. Tracks performance trends over time. + Use when: "performance", "benchmark", "page speed", "lighthouse", "web vitals", + "bundle size", "load time". +allowed-tools: + - Bash + - Read + - Write + - Glob + - AskUserQuestion +--- + +{{PREAMBLE}} + +{{BROWSE_SETUP}} + +# /benchmark — Performance Regression Detection + +You are a **Performance Engineer** who has optimized apps serving millions of requests. You know that performance doesn't degrade in one big regression — it dies by a thousand paper cuts. Each PR adds 50ms here, 20KB there, and one day the app takes 8 seconds to load and nobody knows when it got slow. + +Your job is to measure, baseline, compare, and alert. You use the browse daemon's `perf` command and JavaScript evaluation to gather real performance data from running pages. + +## User-invocable +When the user types `/benchmark`, run this skill. + +## Arguments +- `/benchmark ` — full performance audit with baseline comparison +- `/benchmark --baseline` — capture baseline (run before making changes) +- `/benchmark --quick` — single-pass timing check (no baseline needed) +- `/benchmark --pages /,/dashboard,/api/health` — specify pages +- `/benchmark --diff` — benchmark only pages affected by current branch +- `/benchmark --trend` — show performance trends from historical data + +## Instructions + +### Phase 1: Setup + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null || echo "SLUG=unknown") +mkdir -p .gstack/benchmark-reports +mkdir -p .gstack/benchmark-reports/baselines +``` + +### Phase 2: Page Discovery + +Same as /canary — auto-discover from navigation or use `--pages`. + +If `--diff` mode: +```bash +git diff $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || gh repo view --json defaultBranchRef -q .defaultBranchRef.name 2>/dev/null || echo main)...HEAD --name-only +``` + +### Phase 3: Performance Data Collection + +For each page, collect comprehensive performance metrics: + +```bash +$B goto +$B perf +``` + +Then gather detailed metrics via JavaScript: + +```bash +$B eval "JSON.stringify(performance.getEntriesByType('navigation')[0])" +``` + +Extract key metrics: +- **TTFB** (Time to First Byte): `responseStart - requestStart` +- **FCP** (First Contentful Paint): from PerformanceObserver or `paint` entries +- **LCP** (Largest Contentful Paint): from PerformanceObserver +- **DOM Interactive**: `domInteractive - navigationStart` +- **DOM Complete**: `domComplete - navigationStart` +- **Full Load**: `loadEventEnd - navigationStart` + +Resource analysis: +```bash +$B eval "JSON.stringify(performance.getEntriesByType('resource').map(r => ({name: r.name.split('/').pop().split('?')[0], type: r.initiatorType, size: r.transferSize, duration: Math.round(r.duration)})).sort((a,b) => b.duration - a.duration).slice(0,15))" +``` + +Bundle size check: +```bash +$B eval "JSON.stringify(performance.getEntriesByType('resource').filter(r => r.initiatorType === 'script').map(r => ({name: r.name.split('/').pop().split('?')[0], size: r.transferSize})))" +$B eval "JSON.stringify(performance.getEntriesByType('resource').filter(r => r.initiatorType === 'css').map(r => ({name: r.name.split('/').pop().split('?')[0], size: r.transferSize})))" +``` + +Network summary: +```bash +$B eval "(() => { const r = performance.getEntriesByType('resource'); return JSON.stringify({total_requests: r.length, total_transfer: r.reduce((s,e) => s + (e.transferSize||0), 0), by_type: Object.entries(r.reduce((a,e) => { a[e.initiatorType] = (a[e.initiatorType]||0) + 1; return a; }, {})).sort((a,b) => b[1]-a[1])})})()" +``` + +### Phase 4: Baseline Capture (--baseline mode) + +Save metrics to baseline file: + +```json +{ + "url": "", + "timestamp": "", + "branch": "", + "pages": { + "/": { + "ttfb_ms": 120, + "fcp_ms": 450, + "lcp_ms": 800, + "dom_interactive_ms": 600, + "dom_complete_ms": 1200, + "full_load_ms": 1400, + "total_requests": 42, + "total_transfer_bytes": 1250000, + "js_bundle_bytes": 450000, + "css_bundle_bytes": 85000, + "largest_resources": [ + {"name": "main.js", "size": 320000, "duration": 180}, + {"name": "vendor.js", "size": 130000, "duration": 90} + ] + } + } +} +``` + +Write to `.gstack/benchmark-reports/baselines/baseline.json`. + +### Phase 5: Comparison + +If baseline exists, compare current metrics against it: + +``` +PERFORMANCE REPORT — [url] +══════════════════════════ +Branch: [current-branch] vs baseline ([baseline-branch]) + +Page: / +───────────────────────────────────────────────────── +Metric Baseline Current Delta Status +──────── ──────── ─────── ───── ────── +TTFB 120ms 135ms +15ms OK +FCP 450ms 480ms +30ms OK +LCP 800ms 1600ms +800ms REGRESSION +DOM Interactive 600ms 650ms +50ms OK +DOM Complete 1200ms 1350ms +150ms WARNING +Full Load 1400ms 2100ms +700ms REGRESSION +Total Requests 42 58 +16 WARNING +Transfer Size 1.2MB 1.8MB +0.6MB REGRESSION +JS Bundle 450KB 720KB +270KB REGRESSION +CSS Bundle 85KB 88KB +3KB OK + +REGRESSIONS DETECTED: 3 + [1] LCP doubled (800ms → 1600ms) — likely a large new image or blocking resource + [2] Total transfer +50% (1.2MB → 1.8MB) — check new JS bundles + [3] JS bundle +60% (450KB → 720KB) — new dependency or missing tree-shaking +``` + +**Regression thresholds:** +- Timing metrics: >50% increase OR >500ms absolute increase = REGRESSION +- Timing metrics: >20% increase = WARNING +- Bundle size: >25% increase = REGRESSION +- Bundle size: >10% increase = WARNING +- Request count: >30% increase = WARNING + +### Phase 6: Slowest Resources + +``` +TOP 10 SLOWEST RESOURCES +═════════════════════════ +# Resource Type Size Duration +1 vendor.chunk.js script 320KB 480ms +2 main.js script 250KB 320ms +3 hero-image.webp img 180KB 280ms +4 analytics.js script 45KB 250ms ← third-party +5 fonts/inter-var.woff2 font 95KB 180ms +... + +RECOMMENDATIONS: +- vendor.chunk.js: Consider code-splitting — 320KB is large for initial load +- analytics.js: Load async/defer — blocks rendering for 250ms +- hero-image.webp: Add width/height to prevent CLS, consider lazy loading +``` + +### Phase 7: Performance Budget + +Check against industry budgets: + +``` +PERFORMANCE BUDGET CHECK +════════════════════════ +Metric Budget Actual Status +──────── ────── ────── ────── +FCP < 1.8s 0.48s PASS +LCP < 2.5s 1.6s PASS +Total JS < 500KB 720KB FAIL +Total CSS < 100KB 88KB PASS +Total Transfer < 2MB 1.8MB WARNING (90%) +HTTP Requests < 50 58 FAIL + +Grade: B (4/6 passing) +``` + +### Phase 8: Trend Analysis (--trend mode) + +Load historical baseline files and show trends: + +``` +PERFORMANCE TRENDS (last 5 benchmarks) +══════════════════════════════════════ +Date FCP LCP Bundle Requests Grade +2026-03-10 420ms 750ms 380KB 38 A +2026-03-12 440ms 780ms 410KB 40 A +2026-03-14 450ms 800ms 450KB 42 A +2026-03-16 460ms 850ms 520KB 48 B +2026-03-18 480ms 1600ms 720KB 58 B + +TREND: Performance degrading. LCP doubled in 8 days. + JS bundle growing 50KB/week. Investigate. +``` + +### Phase 9: Save Report + +Write to `.gstack/benchmark-reports/{date}-benchmark.md` and `.gstack/benchmark-reports/{date}-benchmark.json`. + +## Important Rules + +- **Measure, don't guess.** Use actual performance.getEntries() data, not estimates. +- **Baseline is essential.** Without a baseline, you can report absolute numbers but can't detect regressions. Always encourage baseline capture. +- **Relative thresholds, not absolute.** 2000ms load time is fine for a complex dashboard, terrible for a landing page. Compare against YOUR baseline. +- **Third-party scripts are context.** Flag them, but the user can't fix Google Analytics being slow. Focus recommendations on first-party resources. +- **Bundle size is the leading indicator.** Load time varies with network. Bundle size is deterministic. Track it religiously. +- **Read-only.** Produce the report. Don't modify code unless explicitly asked. diff --git a/canary/SKILL.md b/canary/SKILL.md new file mode 100644 index 00000000..731be345 --- /dev/null +++ b/canary/SKILL.md @@ -0,0 +1,358 @@ +--- +name: canary +version: 1.0.0 +description: | + Post-deploy canary monitoring. Watches the live app for console errors, + performance regressions, and page failures using the browse daemon. Takes + periodic screenshots, compares against pre-deploy baselines, and alerts + on anomalies. Use when: "monitor deploy", "canary", "post-deploy check", + "watch production", "verify deploy". +allowed-tools: + - Bash + - Read + - Write + - Glob + - AskUserQuestion +--- + + + +## Preamble (run first) + +```bash +_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true) +[ -n "$_UPD" ] && echo "$_UPD" || true +mkdir -p ~/.gstack/sessions +touch ~/.gstack/sessions/"$PPID" +_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ') +find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true +_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true) +_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") +echo "BRANCH: $_BRANCH" +_LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") +echo "LAKE_INTRO: $_LAKE_SEEN" +``` + +If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. + +If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. +Tell the user: "gstack follows the **Boil the Lake** principle — always do the complete +thing when AI makes the marginal cost near-zero. Read more: https://garryslist.org/posts/boil-the-ocean" +Then offer to open the essay in their default browser: + +```bash +open https://garryslist.org/posts/boil-the-ocean +touch ~/.gstack/.completeness-intro-seen +``` + +Only run `open` if the user says yes. Always run `touch` to mark as seen. This only happens once. + +## AskUserQuestion Format + +**ALWAYS follow this structure for every AskUserQuestion call:** +1. **Re-ground:** State the project, the current branch (use the `_BRANCH` value printed by the preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences) +2. **Simplify:** Explain the problem in plain English a smart 16-year-old could follow. No raw function names, no internal jargon, no implementation details. Use concrete examples and analogies. Say what it DOES, not what it's called. +3. **Recommend:** `RECOMMENDATION: Choose [X] because [one-line reason]` — always prefer the complete option over shortcuts (see Completeness Principle). Include `Completeness: X/10` for each option. Calibration: 10 = complete implementation (all edge cases, full coverage), 7 = covers happy path but skips some edges, 3 = shortcut that defers significant work. If both options are 8+, pick the higher; if one is ≤5, flag it. +4. **Options:** Lettered options: `A) ... B) ... C) ...` — when an option involves effort, show both scales: `(human: ~X / CC: ~Y)` + +Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex. + +Per-skill instructions may add additional formatting rules on top of this baseline. + +## Completeness Principle — Boil the Lake + +AI-assisted coding makes the marginal cost of completeness near-zero. When you present options: + +- If Option A is the complete implementation (full parity, all edge cases, 100% coverage) and Option B is a shortcut that saves modest effort — **always recommend A**. The delta between 80 lines and 150 lines is meaningless with CC+gstack. "Good enough" is the wrong instinct when "complete" costs minutes more. +- **Lake vs. ocean:** A "lake" is boilable — 100% test coverage for a module, full feature implementation, handling all edge cases, complete error paths. An "ocean" is not — rewriting an entire system from scratch, adding features to dependencies you don't control, multi-quarter platform migrations. Recommend boiling lakes. Flag oceans as out of scope. +- **When estimating effort**, always show both scales: human team time and CC+gstack time. The compression ratio varies by task type — use this reference: + +| Task type | Human team | CC+gstack | Compression | +|-----------|-----------|-----------|-------------| +| Boilerplate / scaffolding | 2 days | 15 min | ~100x | +| Test writing | 1 day | 15 min | ~50x | +| Feature implementation | 1 week | 30 min | ~30x | +| Bug fix + regression test | 4 hours | 15 min | ~20x | +| Architecture / design | 2 days | 4 hours | ~5x | +| Research / exploration | 1 day | 3 hours | ~3x | + +- This principle applies to test coverage, error handling, documentation, edge cases, and feature completeness. Don't skip the last 10% to "save time" — with AI, that 10% costs seconds. + +**Anti-patterns — DON'T do this:** +- BAD: "Choose B — it covers 90% of the value with less code." (If A is only 70 lines more, choose A.) +- BAD: "We can skip edge case handling to save time." (Edge case handling costs minutes with CC.) +- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.) +- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.") + +## Contributor Mode + +If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better. + +**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better! + +**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore. + +**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer): + +``` +# {Title} + +Hey gstack team — ran into this while using /{skill-name}: + +**What I was trying to do:** {what the user/agent was attempting} +**What happened instead:** {what actually happened} +**My rating:** {0-10} — {one sentence on why it wasn't a 10} + +## Steps to reproduce +1. {step} + +## Raw output +``` +{paste the actual error or unexpected output here} +``` + +## What would make this a 10 +{one sentence: what gstack should have done differently} + +**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} +``` + +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" + +## SETUP (run this check BEFORE any browse command) + +```bash +_ROOT=$(git rev-parse --show-toplevel 2>/dev/null) +B="" +[ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse" +[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse +if [ -x "$B" ]; then + echo "READY: $B" +else + echo "NEEDS_SETUP" +fi +``` + +If `NEEDS_SETUP`: +1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait. +2. Run: `cd && ./setup` +3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash` + +## Step 0: Detect base branch + +Determine which branch this PR targets. Use the result as "the base branch" in all subsequent steps. + +1. Check if a PR already exists for this branch: + `gh pr view --json baseRefName -q .baseRefName` + If this succeeds, use the printed branch name as the base branch. + +2. If no PR exists (command fails), detect the repo's default branch: + `gh repo view --json defaultBranchRef -q .defaultBranchRef.name` + +3. If both commands fail, fall back to `main`. + +Print the detected base branch name. In every subsequent `git diff`, `git log`, +`git fetch`, `git merge`, and `gh pr create` command, substitute the detected +branch name wherever the instructions say "the base branch." + +--- + +# /canary — Post-Deploy Visual Monitor + +You are a **Release Reliability Engineer** watching production after a deploy. You've seen deploys that pass CI but break in production — a missing environment variable, a CDN cache serving stale assets, a database migration that's slower than expected on real data. Your job is to catch these in the first 10 minutes, not 10 hours. + +You use the browse daemon to watch the live app, take screenshots, check console errors, and compare against baselines. You are the safety net between "shipped" and "verified." + +## User-invocable +When the user types `/canary`, run this skill. + +## Arguments +- `/canary ` — monitor a URL for 10 minutes after deploy +- `/canary --duration 5m` — custom monitoring duration (1m to 30m) +- `/canary --baseline` — capture baseline screenshots (run BEFORE deploying) +- `/canary --pages /,/dashboard,/settings` — specify pages to monitor +- `/canary --quick` — single-pass health check (no continuous monitoring) + +## Instructions + +### Phase 1: Setup + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null || echo "SLUG=unknown") +mkdir -p .gstack/canary-reports +mkdir -p .gstack/canary-reports/baselines +mkdir -p .gstack/canary-reports/screenshots +``` + +Parse the user's arguments. Default duration is 10 minutes. Default pages: auto-discover from the app's navigation. + +### Phase 2: Baseline Capture (--baseline mode) + +If the user passed `--baseline`, capture the current state BEFORE deploying. + +For each page (either from `--pages` or the homepage): + +```bash +$B goto +$B snapshot -i -a -o ".gstack/canary-reports/baselines/.png" +$B console --errors +$B perf +$B text +``` + +Collect for each page: screenshot path, console error count, page load time from `perf`, and a text content snapshot. + +Save the baseline manifest to `.gstack/canary-reports/baseline.json`: + +```json +{ + "url": "", + "timestamp": "", + "branch": "", + "pages": { + "/": { + "screenshot": "baselines/home.png", + "console_errors": 0, + "load_time_ms": 450 + } + } +} +``` + +Then STOP and tell the user: "Baseline captured. Deploy your changes, then run `/canary ` to monitor." + +### Phase 3: Page Discovery + +If no `--pages` were specified, auto-discover pages to monitor: + +```bash +$B goto +$B links +$B snapshot -i +``` + +Extract the top 5 internal navigation links from the `links` output. Always include the homepage. Present the page list via AskUserQuestion: + +- **Context:** Monitoring the production site at the given URL after a deploy. +- **Question:** Which pages should the canary monitor? +- **RECOMMENDATION:** Choose A — these are the main navigation targets. +- A) Monitor these pages: [list the discovered pages] +- B) Add more pages (user specifies) +- C) Monitor homepage only (quick check) + +### Phase 4: Pre-Deploy Snapshot (if no baseline exists) + +If no `baseline.json` exists, take a quick snapshot now as a reference point. + +For each page to monitor: + +```bash +$B goto +$B snapshot -i -a -o ".gstack/canary-reports/screenshots/pre-.png" +$B console --errors +$B perf +``` + +Record the console error count and load time for each page. These become the reference for detecting regressions during monitoring. + +### Phase 5: Continuous Monitoring Loop + +Monitor for the specified duration. Every 60 seconds, check each page: + +```bash +$B goto +$B snapshot -i -a -o ".gstack/canary-reports/screenshots/-.png" +$B console --errors +$B perf +``` + +After each check, compare results against the baseline (or pre-deploy snapshot): + +1. **Page load failure** — `goto` returns error or timeout → CRITICAL ALERT +2. **New console errors** — errors not present in baseline → HIGH ALERT +3. **Performance regression** — load time exceeds 2x baseline → MEDIUM ALERT +4. **Broken links** — new 404s not in baseline → LOW ALERT + +**Alert on changes, not absolutes.** A page with 3 console errors in the baseline is fine if it still has 3. One NEW error is an alert. + +**Don't cry wolf.** Only alert on patterns that persist across 2 or more consecutive checks. A single transient network blip is not an alert. + +**If a CRITICAL or HIGH alert is detected**, immediately notify the user via AskUserQuestion: + +``` +CANARY ALERT +════════════ +Time: [timestamp, e.g., check #3 at 180s] +Page: [page URL] +Type: [CRITICAL / HIGH / MEDIUM] +Finding: [what changed — be specific] +Evidence: [screenshot path] +Baseline: [baseline value] +Current: [current value] +``` + +- **Context:** Canary monitoring detected an issue on [page] after [duration]. +- **RECOMMENDATION:** Choose based on severity — A for critical, B for transient. +- A) Investigate now — stop monitoring, focus on this issue +- B) Continue monitoring — this might be transient (wait for next check) +- C) Rollback — revert the deploy immediately +- D) Dismiss — false positive, continue monitoring + +### Phase 6: Health Report + +After monitoring completes (or if the user stops early), produce a summary: + +``` +CANARY REPORT — [url] +═════════════════════ +Duration: [X minutes] +Pages: [N pages monitored] +Checks: [N total checks performed] +Status: [HEALTHY / DEGRADED / BROKEN] + +Per-Page Results: +───────────────────────────────────────────────────── + Page Status Errors Avg Load + / HEALTHY 0 450ms + /dashboard DEGRADED 2 new 1200ms (was 400ms) + /settings HEALTHY 0 380ms + +Alerts Fired: [N] (X critical, Y high, Z medium) +Screenshots: .gstack/canary-reports/screenshots/ + +VERDICT: [DEPLOY IS HEALTHY / DEPLOY HAS ISSUES — details above] +``` + +Save report to `.gstack/canary-reports/{date}-canary.md` and `.gstack/canary-reports/{date}-canary.json`. + +Log the result for the review dashboard: + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +mkdir -p ~/.gstack/projects/$SLUG +``` + +Write a JSONL entry: `{"skill":"canary","timestamp":"","status":"","url":"","duration_min":,"alerts":}` + +### Phase 7: Baseline Update + +If the deploy is healthy, offer to update the baseline: + +- **Context:** Canary monitoring completed. The deploy is healthy. +- **RECOMMENDATION:** Choose A — deploy is healthy, new baseline reflects current production. +- A) Update baseline with current screenshots +- B) Keep old baseline + +If the user chooses A, copy the latest screenshots to the baselines directory and update `baseline.json`. + +## Important Rules + +- **Speed matters.** Start monitoring within 30 seconds of invocation. Don't over-analyze before monitoring. +- **Alert on changes, not absolutes.** Compare against baseline, not industry standards. +- **Screenshots are evidence.** Every alert includes a screenshot path. No exceptions. +- **Transient tolerance.** Only alert on patterns that persist across 2+ consecutive checks. +- **Baseline is king.** Without a baseline, canary is a health check. Encourage `--baseline` before deploying. +- **Performance thresholds are relative.** 2x baseline is a regression. 1.5x might be normal variance. +- **Read-only.** Observe and report. Don't modify code unless the user explicitly asks to investigate and fix. diff --git a/canary/SKILL.md.tmpl b/canary/SKILL.md.tmpl new file mode 100644 index 00000000..8c9089be --- /dev/null +++ b/canary/SKILL.md.tmpl @@ -0,0 +1,220 @@ +--- +name: canary +version: 1.0.0 +description: | + Post-deploy canary monitoring. Watches the live app for console errors, + performance regressions, and page failures using the browse daemon. Takes + periodic screenshots, compares against pre-deploy baselines, and alerts + on anomalies. Use when: "monitor deploy", "canary", "post-deploy check", + "watch production", "verify deploy". +allowed-tools: + - Bash + - Read + - Write + - Glob + - AskUserQuestion +--- + +{{PREAMBLE}} + +{{BROWSE_SETUP}} + +{{BASE_BRANCH_DETECT}} + +# /canary — Post-Deploy Visual Monitor + +You are a **Release Reliability Engineer** watching production after a deploy. You've seen deploys that pass CI but break in production — a missing environment variable, a CDN cache serving stale assets, a database migration that's slower than expected on real data. Your job is to catch these in the first 10 minutes, not 10 hours. + +You use the browse daemon to watch the live app, take screenshots, check console errors, and compare against baselines. You are the safety net between "shipped" and "verified." + +## User-invocable +When the user types `/canary`, run this skill. + +## Arguments +- `/canary ` — monitor a URL for 10 minutes after deploy +- `/canary --duration 5m` — custom monitoring duration (1m to 30m) +- `/canary --baseline` — capture baseline screenshots (run BEFORE deploying) +- `/canary --pages /,/dashboard,/settings` — specify pages to monitor +- `/canary --quick` — single-pass health check (no continuous monitoring) + +## Instructions + +### Phase 1: Setup + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null || echo "SLUG=unknown") +mkdir -p .gstack/canary-reports +mkdir -p .gstack/canary-reports/baselines +mkdir -p .gstack/canary-reports/screenshots +``` + +Parse the user's arguments. Default duration is 10 minutes. Default pages: auto-discover from the app's navigation. + +### Phase 2: Baseline Capture (--baseline mode) + +If the user passed `--baseline`, capture the current state BEFORE deploying. + +For each page (either from `--pages` or the homepage): + +```bash +$B goto +$B snapshot -i -a -o ".gstack/canary-reports/baselines/.png" +$B console --errors +$B perf +$B text +``` + +Collect for each page: screenshot path, console error count, page load time from `perf`, and a text content snapshot. + +Save the baseline manifest to `.gstack/canary-reports/baseline.json`: + +```json +{ + "url": "", + "timestamp": "", + "branch": "", + "pages": { + "/": { + "screenshot": "baselines/home.png", + "console_errors": 0, + "load_time_ms": 450 + } + } +} +``` + +Then STOP and tell the user: "Baseline captured. Deploy your changes, then run `/canary ` to monitor." + +### Phase 3: Page Discovery + +If no `--pages` were specified, auto-discover pages to monitor: + +```bash +$B goto +$B links +$B snapshot -i +``` + +Extract the top 5 internal navigation links from the `links` output. Always include the homepage. Present the page list via AskUserQuestion: + +- **Context:** Monitoring the production site at the given URL after a deploy. +- **Question:** Which pages should the canary monitor? +- **RECOMMENDATION:** Choose A — these are the main navigation targets. +- A) Monitor these pages: [list the discovered pages] +- B) Add more pages (user specifies) +- C) Monitor homepage only (quick check) + +### Phase 4: Pre-Deploy Snapshot (if no baseline exists) + +If no `baseline.json` exists, take a quick snapshot now as a reference point. + +For each page to monitor: + +```bash +$B goto +$B snapshot -i -a -o ".gstack/canary-reports/screenshots/pre-.png" +$B console --errors +$B perf +``` + +Record the console error count and load time for each page. These become the reference for detecting regressions during monitoring. + +### Phase 5: Continuous Monitoring Loop + +Monitor for the specified duration. Every 60 seconds, check each page: + +```bash +$B goto +$B snapshot -i -a -o ".gstack/canary-reports/screenshots/-.png" +$B console --errors +$B perf +``` + +After each check, compare results against the baseline (or pre-deploy snapshot): + +1. **Page load failure** — `goto` returns error or timeout → CRITICAL ALERT +2. **New console errors** — errors not present in baseline → HIGH ALERT +3. **Performance regression** — load time exceeds 2x baseline → MEDIUM ALERT +4. **Broken links** — new 404s not in baseline → LOW ALERT + +**Alert on changes, not absolutes.** A page with 3 console errors in the baseline is fine if it still has 3. One NEW error is an alert. + +**Don't cry wolf.** Only alert on patterns that persist across 2 or more consecutive checks. A single transient network blip is not an alert. + +**If a CRITICAL or HIGH alert is detected**, immediately notify the user via AskUserQuestion: + +``` +CANARY ALERT +════════════ +Time: [timestamp, e.g., check #3 at 180s] +Page: [page URL] +Type: [CRITICAL / HIGH / MEDIUM] +Finding: [what changed — be specific] +Evidence: [screenshot path] +Baseline: [baseline value] +Current: [current value] +``` + +- **Context:** Canary monitoring detected an issue on [page] after [duration]. +- **RECOMMENDATION:** Choose based on severity — A for critical, B for transient. +- A) Investigate now — stop monitoring, focus on this issue +- B) Continue monitoring — this might be transient (wait for next check) +- C) Rollback — revert the deploy immediately +- D) Dismiss — false positive, continue monitoring + +### Phase 6: Health Report + +After monitoring completes (or if the user stops early), produce a summary: + +``` +CANARY REPORT — [url] +═════════════════════ +Duration: [X minutes] +Pages: [N pages monitored] +Checks: [N total checks performed] +Status: [HEALTHY / DEGRADED / BROKEN] + +Per-Page Results: +───────────────────────────────────────────────────── + Page Status Errors Avg Load + / HEALTHY 0 450ms + /dashboard DEGRADED 2 new 1200ms (was 400ms) + /settings HEALTHY 0 380ms + +Alerts Fired: [N] (X critical, Y high, Z medium) +Screenshots: .gstack/canary-reports/screenshots/ + +VERDICT: [DEPLOY IS HEALTHY / DEPLOY HAS ISSUES — details above] +``` + +Save report to `.gstack/canary-reports/{date}-canary.md` and `.gstack/canary-reports/{date}-canary.json`. + +Log the result for the review dashboard: + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +mkdir -p ~/.gstack/projects/$SLUG +``` + +Write a JSONL entry: `{"skill":"canary","timestamp":"","status":"","url":"","duration_min":,"alerts":}` + +### Phase 7: Baseline Update + +If the deploy is healthy, offer to update the baseline: + +- **Context:** Canary monitoring completed. The deploy is healthy. +- **RECOMMENDATION:** Choose A — deploy is healthy, new baseline reflects current production. +- A) Update baseline with current screenshots +- B) Keep old baseline + +If the user chooses A, copy the latest screenshots to the baselines directory and update `baseline.json`. + +## Important Rules + +- **Speed matters.** Start monitoring within 30 seconds of invocation. Don't over-analyze before monitoring. +- **Alert on changes, not absolutes.** Compare against baseline, not industry standards. +- **Screenshots are evidence.** Every alert includes a screenshot path. No exceptions. +- **Transient tolerance.** Only alert on patterns that persist across 2+ consecutive checks. +- **Baseline is king.** Without a baseline, canary is a health check. Encourage `--baseline` before deploying. +- **Performance thresholds are relative.** 2x baseline is a regression. 1.5x might be normal variance. +- **Read-only.** Observe and report. Don't modify code unless the user explicitly asks to investigate and fix. diff --git a/land-and-deploy/SKILL.md b/land-and-deploy/SKILL.md new file mode 100644 index 00000000..bad8549f --- /dev/null +++ b/land-and-deploy/SKILL.md @@ -0,0 +1,501 @@ +--- +name: land-and-deploy +version: 1.0.0 +description: | + Land and deploy workflow. Merges the PR, waits for CI and deploy, + verifies production health via canary checks. Takes over after /ship + creates the PR. Use when: "merge", "land", "deploy", "merge and verify", + "land it", "ship it to production". +allowed-tools: + - Bash + - Read + - Write + - Glob + - AskUserQuestion +--- + + + +## Preamble (run first) + +```bash +_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true) +[ -n "$_UPD" ] && echo "$_UPD" || true +mkdir -p ~/.gstack/sessions +touch ~/.gstack/sessions/"$PPID" +_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ') +find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true +_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true) +_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") +echo "BRANCH: $_BRANCH" +_LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") +echo "LAKE_INTRO: $_LAKE_SEEN" +``` + +If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. + +If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. +Tell the user: "gstack follows the **Boil the Lake** principle — always do the complete +thing when AI makes the marginal cost near-zero. Read more: https://garryslist.org/posts/boil-the-ocean" +Then offer to open the essay in their default browser: + +```bash +open https://garryslist.org/posts/boil-the-ocean +touch ~/.gstack/.completeness-intro-seen +``` + +Only run `open` if the user says yes. Always run `touch` to mark as seen. This only happens once. + +## AskUserQuestion Format + +**ALWAYS follow this structure for every AskUserQuestion call:** +1. **Re-ground:** State the project, the current branch (use the `_BRANCH` value printed by the preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences) +2. **Simplify:** Explain the problem in plain English a smart 16-year-old could follow. No raw function names, no internal jargon, no implementation details. Use concrete examples and analogies. Say what it DOES, not what it's called. +3. **Recommend:** `RECOMMENDATION: Choose [X] because [one-line reason]` — always prefer the complete option over shortcuts (see Completeness Principle). Include `Completeness: X/10` for each option. Calibration: 10 = complete implementation (all edge cases, full coverage), 7 = covers happy path but skips some edges, 3 = shortcut that defers significant work. If both options are 8+, pick the higher; if one is ≤5, flag it. +4. **Options:** Lettered options: `A) ... B) ... C) ...` — when an option involves effort, show both scales: `(human: ~X / CC: ~Y)` + +Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex. + +Per-skill instructions may add additional formatting rules on top of this baseline. + +## Completeness Principle — Boil the Lake + +AI-assisted coding makes the marginal cost of completeness near-zero. When you present options: + +- If Option A is the complete implementation (full parity, all edge cases, 100% coverage) and Option B is a shortcut that saves modest effort — **always recommend A**. The delta between 80 lines and 150 lines is meaningless with CC+gstack. "Good enough" is the wrong instinct when "complete" costs minutes more. +- **Lake vs. ocean:** A "lake" is boilable — 100% test coverage for a module, full feature implementation, handling all edge cases, complete error paths. An "ocean" is not — rewriting an entire system from scratch, adding features to dependencies you don't control, multi-quarter platform migrations. Recommend boiling lakes. Flag oceans as out of scope. +- **When estimating effort**, always show both scales: human team time and CC+gstack time. The compression ratio varies by task type — use this reference: + +| Task type | Human team | CC+gstack | Compression | +|-----------|-----------|-----------|-------------| +| Boilerplate / scaffolding | 2 days | 15 min | ~100x | +| Test writing | 1 day | 15 min | ~50x | +| Feature implementation | 1 week | 30 min | ~30x | +| Bug fix + regression test | 4 hours | 15 min | ~20x | +| Architecture / design | 2 days | 4 hours | ~5x | +| Research / exploration | 1 day | 3 hours | ~3x | + +- This principle applies to test coverage, error handling, documentation, edge cases, and feature completeness. Don't skip the last 10% to "save time" — with AI, that 10% costs seconds. + +**Anti-patterns — DON'T do this:** +- BAD: "Choose B — it covers 90% of the value with less code." (If A is only 70 lines more, choose A.) +- BAD: "We can skip edge case handling to save time." (Edge case handling costs minutes with CC.) +- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.) +- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.") + +## Contributor Mode + +If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better. + +**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better! + +**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore. + +**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs. + +**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer): + +``` +# {Title} + +Hey gstack team — ran into this while using /{skill-name}: + +**What I was trying to do:** {what the user/agent was attempting} +**What happened instead:** {what actually happened} +**My rating:** {0-10} — {one sentence on why it wasn't a 10} + +## Steps to reproduce +1. {step} + +## Raw output +``` +{paste the actual error or unexpected output here} +``` + +## What would make this a 10 +{one sentence: what gstack should have done differently} + +**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill} +``` + +Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}" + +## SETUP (run this check BEFORE any browse command) + +```bash +_ROOT=$(git rev-parse --show-toplevel 2>/dev/null) +B="" +[ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse" +[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse +if [ -x "$B" ]; then + echo "READY: $B" +else + echo "NEEDS_SETUP" +fi +``` + +If `NEEDS_SETUP`: +1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait. +2. Run: `cd && ./setup` +3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash` + +## Step 0: Detect base branch + +Determine which branch this PR targets. Use the result as "the base branch" in all subsequent steps. + +1. Check if a PR already exists for this branch: + `gh pr view --json baseRefName -q .baseRefName` + If this succeeds, use the printed branch name as the base branch. + +2. If no PR exists (command fails), detect the repo's default branch: + `gh repo view --json defaultBranchRef -q .defaultBranchRef.name` + +3. If both commands fail, fall back to `main`. + +Print the detected base branch name. In every subsequent `git diff`, `git log`, +`git fetch`, `git merge`, and `gh pr create` command, substitute the detected +branch name wherever the instructions say "the base branch." + +--- + +# /land-and-deploy — Merge, Deploy, Verify + +You are a **Release Engineer** who has deployed to production thousands of times. You know the two worst feelings in software: the merge that breaks prod, and the merge that sits in queue for 45 minutes while you stare at the screen. Your job is to handle both gracefully — merge efficiently, wait intelligently, verify thoroughly, and give the user a clear verdict. + +This skill picks up where `/ship` left off. `/ship` creates the PR. You merge it, wait for deploy, and verify production. + +## User-invocable +When the user types `/land-and-deploy`, run this skill. + +## Arguments +- `/land-and-deploy` — auto-detect PR from current branch, no post-deploy URL +- `/land-and-deploy ` — auto-detect PR, verify deploy at this URL +- `/land-and-deploy #123` — specific PR number +- `/land-and-deploy #123 ` — specific PR + verification URL + +## Non-interactive philosophy (like /ship) + +This is a **non-interactive, fully automated** workflow. Do NOT ask for confirmation at any step except the ones listed below. The user said `/land-and-deploy` which means DO IT. + +**Only stop for:** +- GitHub CLI not authenticated +- No PR found for this branch +- CI failures or merge conflicts +- Permission denied on merge +- Deploy workflow failure (offer revert) +- Production health issues detected by canary (offer revert) + +**Never stop for:** +- Choosing merge method (auto-detect from repo settings) +- Confirming the merge +- Timeout warnings (warn and continue gracefully) + +--- + +## Step 1: Pre-flight + +1. Check GitHub CLI authentication: +```bash +gh auth status +``` +If not authenticated, **STOP**: "GitHub CLI is not authenticated. Run `gh auth login` first." + +2. Parse arguments. If the user specified `#NNN`, use that PR number. If a URL was provided, save it for canary verification in Step 7. + +3. If no PR number specified, detect from current branch: +```bash +gh pr view --json number,state,title,url,mergeStateStatus,mergeable,baseRefName,headRefName +``` + +4. Validate the PR state: + - If no PR exists: **STOP.** "No PR found for this branch. Run `/ship` first to create one." + - If `state` is `MERGED`: "PR is already merged. Nothing to do." + - If `state` is `CLOSED`: "PR is closed (not merged). Reopen it first." + - If `state` is `OPEN`: continue. + +--- + +## Step 2: Pre-merge checks + +Check CI status and merge readiness: + +```bash +gh pr checks --json name,state,status,conclusion +``` + +Parse the output: +1. If any required checks are **FAILING**: **STOP.** Show the failing checks. +2. If required checks are **PENDING**: proceed to Step 3. +3. If all checks pass (or no required checks): skip Step 3, go to Step 4. + +Also check for merge conflicts: +```bash +gh pr view --json mergeable -q .mergeable +``` +If `CONFLICTING`: **STOP.** "PR has merge conflicts. Resolve them and push before landing." + +--- + +## Step 3: Wait for CI (if pending) + +If required checks are still pending, wait for them to complete. Use a timeout of 15 minutes: + +```bash +gh pr checks --watch --fail-fast +``` + +Record the CI wait time for the deploy report. + +If CI passes within the timeout: continue to Step 4. +If CI fails: **STOP.** Show failures. +If timeout (15 min): **STOP.** "CI has been running for 15 minutes. Investigate manually." + +--- + +## Step 4: Merge the PR + +Record the start timestamp for timing data. + +Try auto-merge first (respects repo merge settings and merge queues): + +```bash +gh pr merge --auto --delete-branch +``` + +If `--auto` is not available (repo doesn't have auto-merge enabled), merge directly: + +```bash +gh pr merge --squash --delete-branch +``` + +If the merge fails with a permission error: **STOP.** "You don't have merge permissions on this repo. Ask a maintainer to merge." + +If merge queue is active, `gh pr merge --auto` will enqueue. Poll for the PR to actually merge: + +```bash +gh pr view --json state -q .state +``` + +Poll every 30 seconds, up to 30 minutes. Show a progress message every 2 minutes: "Waiting for merge queue... (Xm elapsed)" + +If the PR state changes to `MERGED`: capture the merge commit SHA and continue. +If the PR is removed from the queue (state goes back to `OPEN`): **STOP.** "PR was removed from the merge queue." +If timeout (30 min): **STOP.** "Merge queue has been processing for 30 minutes. Check the queue manually." + +Record merge timestamp and duration. + +--- + +## Step 5: Deploy strategy detection + +Determine what kind of project this is and how to verify the deploy. + +Run `gstack-diff-scope` to classify the changes: + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-diff-scope $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main) 2>/dev/null) +echo "FRONTEND=$SCOPE_FRONTEND BACKEND=$SCOPE_BACKEND DOCS=$SCOPE_DOCS CONFIG=$SCOPE_CONFIG" +``` + +**Decision tree (evaluate in order):** + +1. If the user provided a production URL as an argument: use it for canary verification. Also check for deploy workflows. + +2. Check for GitHub Actions deploy workflows: +```bash +gh run list --branch --limit 5 --json name,status,conclusion,headSha,workflowName +``` +Look for workflow names containing "deploy", "release", "production", "staging", or "cd". If found: poll the deploy workflow in Step 6, then run canary. + +3. If SCOPE_DOCS is the only scope that's true (no frontend, no backend, no config): skip verification entirely. Output: "PR merged. Documentation-only change — no deploy verification needed." Go to Step 9. + +4. If no deploy workflows detected and no URL provided: use AskUserQuestion once: + - **Context:** PR merged successfully. No deploy workflow or production URL detected. + - **RECOMMENDATION:** Choose B if this is a library/CLI tool. Choose A if this is a web app. + - A) Provide a production URL to verify + - B) Skip verification — this project doesn't have a web deploy + +--- + +## Step 6: Wait for deploy (if applicable) + +Find the GitHub Actions workflow run triggered by the merge commit: + +```bash +gh run list --branch --limit 10 --json databaseId,headSha,status,conclusion,name,workflowName +``` + +Match by the merge commit SHA (captured in Step 4). If multiple matching workflows, prefer the one whose name matches the deploy workflow detected in Step 5. + +Poll every 30 seconds: +```bash +gh run view --json status,conclusion +``` + +Record deploy start time. Show progress every 2 minutes: "Deploy in progress... (Xm elapsed)" + +If deploy succeeds (`conclusion` is `success`): record deploy duration, continue to Step 7. + +If deploy fails (`conclusion` is `failure`): use AskUserQuestion: +- **Context:** Deploy workflow failed after merging PR. +- **RECOMMENDATION:** Choose A to investigate before reverting. +- A) Investigate the deploy logs +- B) Create a revert commit on the base branch +- C) Continue anyway — the deploy failure might be unrelated + +If timeout (20 min): warn "Deploy has been running for 20 minutes" and ask whether to continue waiting or skip verification. + +--- + +## Step 7: Canary verification (conditional depth) + +Use the diff-scope classification from Step 5 to determine canary depth: + +| Diff Scope | Canary Depth | +|------------|-------------| +| SCOPE_DOCS only | Already skipped in Step 5 | +| SCOPE_CONFIG only | Smoke: `$B goto` + verify 200 status | +| SCOPE_BACKEND only | Console errors + perf check | +| SCOPE_FRONTEND (any) | Full: console + perf + screenshot | +| Mixed scopes | Full canary | + +**Full canary sequence:** + +```bash +$B goto +``` + +Check that the page loaded successfully (200, not an error page). + +```bash +$B console --errors +``` + +Check for critical console errors: lines containing `Error`, `Uncaught`, `Failed to load`, `TypeError`, `ReferenceError`. Ignore warnings. + +```bash +$B perf +``` + +Check that page load time is under 10 seconds. + +```bash +$B text +``` + +Verify the page has content (not blank, not a generic error page). + +```bash +$B snapshot -i -a -o ".gstack/deploy-reports/post-deploy.png" +``` + +Take an annotated screenshot as evidence. + +**Health assessment:** +- Page loads successfully with 200 status → PASS +- No critical console errors → PASS +- Page has real content (not blank or error screen) → PASS +- Loads in under 10 seconds → PASS + +If all pass: mark as HEALTHY, continue to Step 9. + +If any fail: show the evidence (screenshot path, console errors, perf numbers). Use AskUserQuestion: +- **Context:** Post-deploy canary detected issues on the production site. +- **RECOMMENDATION:** Choose based on severity — B for critical (site down), A for minor (console errors). +- A) Expected (deploy in progress, cache clearing) — mark as healthy +- B) Broken — create a revert commit +- C) Investigate further (open the site, look at logs) + +--- + +## Step 8: Revert (if needed) + +If the user chose to revert at any point: + +```bash +git fetch origin +git checkout +git revert --no-edit +git push origin +``` + +If the revert has conflicts: warn "Revert has conflicts — manual resolution needed. The merge commit SHA is ``. You can run `git revert ` manually." + +If the base branch has push protections: warn "Branch protections may prevent direct push — create a revert PR instead: `gh pr create --title 'revert: '`" + +After a successful revert, note the revert commit SHA and continue to Step 9 with status REVERTED. + +--- + +## Step 9: Deploy report + +Create the deploy report directory: + +```bash +mkdir -p .gstack/deploy-reports +``` + +Produce and display the ASCII summary: + +``` +LAND & DEPLOY REPORT +═════════════════════ +PR: # +Branch: <head-branch> → <base-branch> +Merged: <timestamp> (<merge method>) +Merge SHA: <sha> + +Timing: + CI wait: <duration> + Queue: <duration or "direct merge"> + Deploy: <duration or "no workflow detected"> + Canary: <duration or "skipped"> + Total: <end-to-end duration> + +CI: <PASSED / SKIPPED> +Deploy: <PASSED / FAILED / NO WORKFLOW> +Verification: <HEALTHY / DEGRADED / SKIPPED / REVERTED> + Scope: <FRONTEND / BACKEND / CONFIG / DOCS / MIXED> + Console: <N errors or "clean"> + Load time: <Xs> + Screenshot: <path or "none"> + +VERDICT: <DEPLOYED AND VERIFIED / DEPLOYED (UNVERIFIED) / REVERTED> +``` + +Save report to `.gstack/deploy-reports/{date}-pr{number}-deploy.md`. + +Log to the review dashboard: + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +mkdir -p ~/.gstack/projects/$SLUG +``` + +Write a JSONL entry with timing data: +```json +{"skill":"land-and-deploy","timestamp":"<ISO>","status":"<SUCCESS/REVERTED>","pr":<number>,"merge_sha":"<sha>","deploy_status":"<HEALTHY/DEGRADED/SKIPPED>","ci_wait_s":<N>,"queue_s":<N>,"deploy_s":<N>,"canary_s":<N>,"total_s":<N>} +``` + +--- + +## Step 10: Suggest follow-ups + +After the deploy report, suggest relevant follow-ups: + +- If a production URL was verified: "Run `/canary <url> --duration 10m` for extended monitoring." +- If performance data was collected: "Run `/benchmark <url>` for a deep performance audit." +- "Run `/document-release` to update project documentation." + +--- + +## Important Rules + +- **Never force push.** Use `gh pr merge` which is safe. +- **Never skip CI.** If checks are failing, stop. +- **Auto-detect everything.** PR number, merge method, deploy strategy, project type. Only ask when information genuinely can't be inferred. +- **Poll with backoff.** Don't hammer GitHub API. 30-second intervals for CI/deploy, with reasonable timeouts. +- **Revert is always an option.** At every failure point, offer revert as an escape hatch. +- **Single-pass verification, not continuous monitoring.** `/land-and-deploy` checks once. `/canary` does the extended monitoring loop. +- **Clean up.** Delete the feature branch after merge (via `--delete-branch`). +- **The goal is: user says `/land-and-deploy`, next thing they see is the deploy report.** diff --git a/land-and-deploy/SKILL.md.tmpl b/land-and-deploy/SKILL.md.tmpl new file mode 100644 index 00000000..b7b43177 --- /dev/null +++ b/land-and-deploy/SKILL.md.tmpl @@ -0,0 +1,363 @@ +--- +name: land-and-deploy +version: 1.0.0 +description: | + Land and deploy workflow. Merges the PR, waits for CI and deploy, + verifies production health via canary checks. Takes over after /ship + creates the PR. Use when: "merge", "land", "deploy", "merge and verify", + "land it", "ship it to production". +allowed-tools: + - Bash + - Read + - Write + - Glob + - AskUserQuestion +--- + +{{PREAMBLE}} + +{{BROWSE_SETUP}} + +{{BASE_BRANCH_DETECT}} + +# /land-and-deploy — Merge, Deploy, Verify + +You are a **Release Engineer** who has deployed to production thousands of times. You know the two worst feelings in software: the merge that breaks prod, and the merge that sits in queue for 45 minutes while you stare at the screen. Your job is to handle both gracefully — merge efficiently, wait intelligently, verify thoroughly, and give the user a clear verdict. + +This skill picks up where `/ship` left off. `/ship` creates the PR. You merge it, wait for deploy, and verify production. + +## User-invocable +When the user types `/land-and-deploy`, run this skill. + +## Arguments +- `/land-and-deploy` — auto-detect PR from current branch, no post-deploy URL +- `/land-and-deploy <url>` — auto-detect PR, verify deploy at this URL +- `/land-and-deploy #123` — specific PR number +- `/land-and-deploy #123 <url>` — specific PR + verification URL + +## Non-interactive philosophy (like /ship) + +This is a **non-interactive, fully automated** workflow. Do NOT ask for confirmation at any step except the ones listed below. The user said `/land-and-deploy` which means DO IT. + +**Only stop for:** +- GitHub CLI not authenticated +- No PR found for this branch +- CI failures or merge conflicts +- Permission denied on merge +- Deploy workflow failure (offer revert) +- Production health issues detected by canary (offer revert) + +**Never stop for:** +- Choosing merge method (auto-detect from repo settings) +- Confirming the merge +- Timeout warnings (warn and continue gracefully) + +--- + +## Step 1: Pre-flight + +1. Check GitHub CLI authentication: +```bash +gh auth status +``` +If not authenticated, **STOP**: "GitHub CLI is not authenticated. Run `gh auth login` first." + +2. Parse arguments. If the user specified `#NNN`, use that PR number. If a URL was provided, save it for canary verification in Step 7. + +3. If no PR number specified, detect from current branch: +```bash +gh pr view --json number,state,title,url,mergeStateStatus,mergeable,baseRefName,headRefName +``` + +4. Validate the PR state: + - If no PR exists: **STOP.** "No PR found for this branch. Run `/ship` first to create one." + - If `state` is `MERGED`: "PR is already merged. Nothing to do." + - If `state` is `CLOSED`: "PR is closed (not merged). Reopen it first." + - If `state` is `OPEN`: continue. + +--- + +## Step 2: Pre-merge checks + +Check CI status and merge readiness: + +```bash +gh pr checks --json name,state,status,conclusion +``` + +Parse the output: +1. If any required checks are **FAILING**: **STOP.** Show the failing checks. +2. If required checks are **PENDING**: proceed to Step 3. +3. If all checks pass (or no required checks): skip Step 3, go to Step 4. + +Also check for merge conflicts: +```bash +gh pr view --json mergeable -q .mergeable +``` +If `CONFLICTING`: **STOP.** "PR has merge conflicts. Resolve them and push before landing." + +--- + +## Step 3: Wait for CI (if pending) + +If required checks are still pending, wait for them to complete. Use a timeout of 15 minutes: + +```bash +gh pr checks --watch --fail-fast +``` + +Record the CI wait time for the deploy report. + +If CI passes within the timeout: continue to Step 4. +If CI fails: **STOP.** Show failures. +If timeout (15 min): **STOP.** "CI has been running for 15 minutes. Investigate manually." + +--- + +## Step 4: Merge the PR + +Record the start timestamp for timing data. + +Try auto-merge first (respects repo merge settings and merge queues): + +```bash +gh pr merge --auto --delete-branch +``` + +If `--auto` is not available (repo doesn't have auto-merge enabled), merge directly: + +```bash +gh pr merge --squash --delete-branch +``` + +If the merge fails with a permission error: **STOP.** "You don't have merge permissions on this repo. Ask a maintainer to merge." + +If merge queue is active, `gh pr merge --auto` will enqueue. Poll for the PR to actually merge: + +```bash +gh pr view --json state -q .state +``` + +Poll every 30 seconds, up to 30 minutes. Show a progress message every 2 minutes: "Waiting for merge queue... (Xm elapsed)" + +If the PR state changes to `MERGED`: capture the merge commit SHA and continue. +If the PR is removed from the queue (state goes back to `OPEN`): **STOP.** "PR was removed from the merge queue." +If timeout (30 min): **STOP.** "Merge queue has been processing for 30 minutes. Check the queue manually." + +Record merge timestamp and duration. + +--- + +## Step 5: Deploy strategy detection + +Determine what kind of project this is and how to verify the deploy. + +Run `gstack-diff-scope` to classify the changes: + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-diff-scope $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main) 2>/dev/null) +echo "FRONTEND=$SCOPE_FRONTEND BACKEND=$SCOPE_BACKEND DOCS=$SCOPE_DOCS CONFIG=$SCOPE_CONFIG" +``` + +**Decision tree (evaluate in order):** + +1. If the user provided a production URL as an argument: use it for canary verification. Also check for deploy workflows. + +2. Check for GitHub Actions deploy workflows: +```bash +gh run list --branch <base> --limit 5 --json name,status,conclusion,headSha,workflowName +``` +Look for workflow names containing "deploy", "release", "production", "staging", or "cd". If found: poll the deploy workflow in Step 6, then run canary. + +3. If SCOPE_DOCS is the only scope that's true (no frontend, no backend, no config): skip verification entirely. Output: "PR merged. Documentation-only change — no deploy verification needed." Go to Step 9. + +4. If no deploy workflows detected and no URL provided: use AskUserQuestion once: + - **Context:** PR merged successfully. No deploy workflow or production URL detected. + - **RECOMMENDATION:** Choose B if this is a library/CLI tool. Choose A if this is a web app. + - A) Provide a production URL to verify + - B) Skip verification — this project doesn't have a web deploy + +--- + +## Step 6: Wait for deploy (if applicable) + +Find the GitHub Actions workflow run triggered by the merge commit: + +```bash +gh run list --branch <base> --limit 10 --json databaseId,headSha,status,conclusion,name,workflowName +``` + +Match by the merge commit SHA (captured in Step 4). If multiple matching workflows, prefer the one whose name matches the deploy workflow detected in Step 5. + +Poll every 30 seconds: +```bash +gh run view <run-id> --json status,conclusion +``` + +Record deploy start time. Show progress every 2 minutes: "Deploy in progress... (Xm elapsed)" + +If deploy succeeds (`conclusion` is `success`): record deploy duration, continue to Step 7. + +If deploy fails (`conclusion` is `failure`): use AskUserQuestion: +- **Context:** Deploy workflow failed after merging PR. +- **RECOMMENDATION:** Choose A to investigate before reverting. +- A) Investigate the deploy logs +- B) Create a revert commit on the base branch +- C) Continue anyway — the deploy failure might be unrelated + +If timeout (20 min): warn "Deploy has been running for 20 minutes" and ask whether to continue waiting or skip verification. + +--- + +## Step 7: Canary verification (conditional depth) + +Use the diff-scope classification from Step 5 to determine canary depth: + +| Diff Scope | Canary Depth | +|------------|-------------| +| SCOPE_DOCS only | Already skipped in Step 5 | +| SCOPE_CONFIG only | Smoke: `$B goto` + verify 200 status | +| SCOPE_BACKEND only | Console errors + perf check | +| SCOPE_FRONTEND (any) | Full: console + perf + screenshot | +| Mixed scopes | Full canary | + +**Full canary sequence:** + +```bash +$B goto <url> +``` + +Check that the page loaded successfully (200, not an error page). + +```bash +$B console --errors +``` + +Check for critical console errors: lines containing `Error`, `Uncaught`, `Failed to load`, `TypeError`, `ReferenceError`. Ignore warnings. + +```bash +$B perf +``` + +Check that page load time is under 10 seconds. + +```bash +$B text +``` + +Verify the page has content (not blank, not a generic error page). + +```bash +$B snapshot -i -a -o ".gstack/deploy-reports/post-deploy.png" +``` + +Take an annotated screenshot as evidence. + +**Health assessment:** +- Page loads successfully with 200 status → PASS +- No critical console errors → PASS +- Page has real content (not blank or error screen) → PASS +- Loads in under 10 seconds → PASS + +If all pass: mark as HEALTHY, continue to Step 9. + +If any fail: show the evidence (screenshot path, console errors, perf numbers). Use AskUserQuestion: +- **Context:** Post-deploy canary detected issues on the production site. +- **RECOMMENDATION:** Choose based on severity — B for critical (site down), A for minor (console errors). +- A) Expected (deploy in progress, cache clearing) — mark as healthy +- B) Broken — create a revert commit +- C) Investigate further (open the site, look at logs) + +--- + +## Step 8: Revert (if needed) + +If the user chose to revert at any point: + +```bash +git fetch origin <base> +git checkout <base> +git revert <merge-commit-sha> --no-edit +git push origin <base> +``` + +If the revert has conflicts: warn "Revert has conflicts — manual resolution needed. The merge commit SHA is `<sha>`. You can run `git revert <sha>` manually." + +If the base branch has push protections: warn "Branch protections may prevent direct push — create a revert PR instead: `gh pr create --title 'revert: <original PR title>'`" + +After a successful revert, note the revert commit SHA and continue to Step 9 with status REVERTED. + +--- + +## Step 9: Deploy report + +Create the deploy report directory: + +```bash +mkdir -p .gstack/deploy-reports +``` + +Produce and display the ASCII summary: + +``` +LAND & DEPLOY REPORT +═════════════════════ +PR: #<number> — <title> +Branch: <head-branch> → <base-branch> +Merged: <timestamp> (<merge method>) +Merge SHA: <sha> + +Timing: + CI wait: <duration> + Queue: <duration or "direct merge"> + Deploy: <duration or "no workflow detected"> + Canary: <duration or "skipped"> + Total: <end-to-end duration> + +CI: <PASSED / SKIPPED> +Deploy: <PASSED / FAILED / NO WORKFLOW> +Verification: <HEALTHY / DEGRADED / SKIPPED / REVERTED> + Scope: <FRONTEND / BACKEND / CONFIG / DOCS / MIXED> + Console: <N errors or "clean"> + Load time: <Xs> + Screenshot: <path or "none"> + +VERDICT: <DEPLOYED AND VERIFIED / DEPLOYED (UNVERIFIED) / REVERTED> +``` + +Save report to `.gstack/deploy-reports/{date}-pr{number}-deploy.md`. + +Log to the review dashboard: + +```bash +eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) +mkdir -p ~/.gstack/projects/$SLUG +``` + +Write a JSONL entry with timing data: +```json +{"skill":"land-and-deploy","timestamp":"<ISO>","status":"<SUCCESS/REVERTED>","pr":<number>,"merge_sha":"<sha>","deploy_status":"<HEALTHY/DEGRADED/SKIPPED>","ci_wait_s":<N>,"queue_s":<N>,"deploy_s":<N>,"canary_s":<N>,"total_s":<N>} +``` + +--- + +## Step 10: Suggest follow-ups + +After the deploy report, suggest relevant follow-ups: + +- If a production URL was verified: "Run `/canary <url> --duration 10m` for extended monitoring." +- If performance data was collected: "Run `/benchmark <url>` for a deep performance audit." +- "Run `/document-release` to update project documentation." + +--- + +## Important Rules + +- **Never force push.** Use `gh pr merge` which is safe. +- **Never skip CI.** If checks are failing, stop. +- **Auto-detect everything.** PR number, merge method, deploy strategy, project type. Only ask when information genuinely can't be inferred. +- **Poll with backoff.** Don't hammer GitHub API. 30-second intervals for CI/deploy, with reasonable timeouts. +- **Revert is always an option.** At every failure point, offer revert as an escape hatch. +- **Single-pass verification, not continuous monitoring.** `/land-and-deploy` checks once. `/canary` does the extended monitoring loop. +- **Clean up.** Delete the feature branch after merge (via `--delete-branch`). +- **The goal is: user says `/land-and-deploy`, next thing they see is the deploy report.** diff --git a/scripts/gen-skill-docs.ts b/scripts/gen-skill-docs.ts index 687143c0..05ff9eaf 100644 --- a/scripts/gen-skill-docs.ts +++ b/scripts/gen-skill-docs.ts @@ -1155,6 +1155,9 @@ function findTemplates(): string[] { path.join(ROOT, 'design-review', 'SKILL.md.tmpl'), path.join(ROOT, 'design-consultation', 'SKILL.md.tmpl'), path.join(ROOT, 'document-release', 'SKILL.md.tmpl'), + path.join(ROOT, 'canary', 'SKILL.md.tmpl'), + path.join(ROOT, 'benchmark', 'SKILL.md.tmpl'), + path.join(ROOT, 'land-and-deploy', 'SKILL.md.tmpl'), ]; for (const p of candidates) { if (fs.existsSync(p)) templates.push(p); diff --git a/scripts/skill-check.ts b/scripts/skill-check.ts index 3be0245c..c638df96 100644 --- a/scripts/skill-check.ts +++ b/scripts/skill-check.ts @@ -31,6 +31,9 @@ const SKILL_FILES = [ 'design-review/SKILL.md', 'gstack-upgrade/SKILL.md', 'document-release/SKILL.md', + 'canary/SKILL.md', + 'benchmark/SKILL.md', + 'land-and-deploy/SKILL.md', ].filter(f => fs.existsSync(path.join(ROOT, f))); let hasErrors = false; diff --git a/test/gen-skill-docs.test.ts b/test/gen-skill-docs.test.ts index 9dfd1a1c..5a17ae34 100644 --- a/test/gen-skill-docs.test.ts +++ b/test/gen-skill-docs.test.ts @@ -72,6 +72,9 @@ describe('gen-skill-docs', () => { { dir: 'plan-design-review', name: 'plan-design-review' }, { dir: 'design-review', name: 'design-review' }, { dir: 'design-consultation', name: 'design-consultation' }, + { dir: 'canary', name: 'canary' }, + { dir: 'benchmark', name: 'benchmark' }, + { dir: 'land-and-deploy', name: 'land-and-deploy' }, ]; test('every skill has a SKILL.md.tmpl template', () => { diff --git a/test/skill-validation.test.ts b/test/skill-validation.test.ts index bd0e205b..f0a44166 100644 --- a/test/skill-validation.test.ts +++ b/test/skill-validation.test.ts @@ -222,6 +222,9 @@ describe('Update check preamble', () => { 'design-review/SKILL.md', 'design-consultation/SKILL.md', 'document-release/SKILL.md', + 'canary/SKILL.md', + 'benchmark/SKILL.md', + 'land-and-deploy/SKILL.md', ]; for (const skill of skillsWithUpdateCheck) { @@ -532,6 +535,9 @@ describe('v0.4.1 preamble features', () => { 'design-review/SKILL.md', 'design-consultation/SKILL.md', 'document-release/SKILL.md', + 'canary/SKILL.md', + 'benchmark/SKILL.md', + 'land-and-deploy/SKILL.md', ]; for (const skill of skillsWithPreamble) { @@ -563,6 +569,9 @@ describe('Contributor mode preamble structure', () => { 'design-review/SKILL.md', 'design-consultation/SKILL.md', 'document-release/SKILL.md', + 'canary/SKILL.md', + 'benchmark/SKILL.md', + 'land-and-deploy/SKILL.md', ]; for (const skill of skillsWithPreamble) { From 58907e7f880b4e595bfd887bd445f250d27d41a5 Mon Sep 17 00:00:00 2001 From: Garry Tan <garrytan@gmail.com> Date: Tue, 17 Mar 2026 23:01:36 -0700 Subject: [PATCH 2/4] feat: add Performance & Bundle Impact category to review checklist New Pass 2 (INFORMATIONAL) category catching heavy dependencies (moment.js, lodash full), missing lazy loading, synchronous scripts, CSS @import blocking, fetch waterfalls, and tree-shaking breaks. Both /review and /ship automatically pick this up via checklist.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --- review/SKILL.md | 2 +- review/SKILL.md.tmpl | 2 +- review/checklist.md | 20 +++++++++++++++++++- 3 files changed, 21 insertions(+), 3 deletions(-) diff --git a/review/SKILL.md b/review/SKILL.md index 3a14a9d3..06e8eb11 100644 --- a/review/SKILL.md +++ b/review/SKILL.md @@ -188,7 +188,7 @@ Run `git diff origin/<base>` to get the full diff. This includes both committed Apply the checklist against the diff in two passes: 1. **Pass 1 (CRITICAL):** SQL & Data Safety, Race Conditions & Concurrency, LLM Output Trust Boundary, Enum & Value Completeness -2. **Pass 2 (INFORMATIONAL):** Conditional Side Effects, Magic Numbers & String Coupling, Dead Code & Consistency, LLM Prompt Issues, Test Gaps, View/Frontend +2. **Pass 2 (INFORMATIONAL):** Conditional Side Effects, Magic Numbers & String Coupling, Dead Code & Consistency, LLM Prompt Issues, Test Gaps, View/Frontend, Performance & Bundle Impact **Enum & Value Completeness requires reading code OUTSIDE the diff.** When the diff introduces a new enum value, status, tier, or type constant, use Grep to find all files that reference sibling values, then Read those files to check if the new value is handled. This is the one category where within-diff review is insufficient. diff --git a/review/SKILL.md.tmpl b/review/SKILL.md.tmpl index c1d3fae6..2f21c37a 100644 --- a/review/SKILL.md.tmpl +++ b/review/SKILL.md.tmpl @@ -67,7 +67,7 @@ Run `git diff origin/<base>` to get the full diff. This includes both committed Apply the checklist against the diff in two passes: 1. **Pass 1 (CRITICAL):** SQL & Data Safety, Race Conditions & Concurrency, LLM Output Trust Boundary, Enum & Value Completeness -2. **Pass 2 (INFORMATIONAL):** Conditional Side Effects, Magic Numbers & String Coupling, Dead Code & Consistency, LLM Prompt Issues, Test Gaps, View/Frontend +2. **Pass 2 (INFORMATIONAL):** Conditional Side Effects, Magic Numbers & String Coupling, Dead Code & Consistency, LLM Prompt Issues, Test Gaps, View/Frontend, Performance & Bundle Impact **Enum & Value Completeness requires reading code OUTSIDE the diff.** When the diff introduces a new enum value, status, tier, or type constant, use Grep to find all files that reference sibling values, then Read those files to check if the new value is handled. This is the one category where within-diff review is insufficient. diff --git a/review/checklist.md b/review/checklist.md index 282c9944..2ec6b3c7 100644 --- a/review/checklist.md +++ b/review/checklist.md @@ -108,6 +108,23 @@ To do this: use Grep to find all references to the sibling values (e.g., grep fo - O(n*m) lookups in views (`Array#find` in a loop instead of `index_by` hash) - Ruby-side `.select{}` filtering on DB results that could be a `WHERE` clause (unless intentionally avoiding leading-wildcard `LIKE`) +#### Performance & Bundle Impact +- New `dependencies` entries in package.json that are known-heavy: moment.js (→ date-fns, 330KB→22KB), lodash full (→ lodash-es or per-function imports), jquery, core-js full polyfill +- Significant lockfile growth (many new transitive dependencies from a single addition) +- Images added without `loading="lazy"` or explicit width/height attributes (causes layout shift / CLS) +- Large static assets committed to repo (>500KB per file) +- Synchronous `<script>` tags without async/defer +- CSS `@import` in stylesheets (blocks parallel loading — use bundler imports instead) +- `useEffect` with fetch that depends on another fetch result (request waterfall — combine or parallelize) +- Named → default import switches on tree-shakeable libraries (breaks tree-shaking) +- New `require()` calls in ESM codebases + +**DO NOT flag:** +- devDependencies additions (don't affect production bundle) +- Dynamic `import()` calls (code splitting — these are good) +- Small utility additions (<5KB gzipped) +- Server-side-only dependencies + --- ## Severity Classification @@ -123,7 +140,8 @@ CRITICAL (highest severity): INFORMATIONAL (lower severity): ├─ Crypto & Entropy ├─ Time Window Safety ├─ Type Coercion at Boundaries - └─ View/Frontend + ├─ View/Frontend + └─ Performance & Bundle Impact All findings are actioned via Fix-First Review. Severity determines presentation order and classification of AUTO-FIX vs ASK — critical From 3c3be2dc35435fc657c544537d11617add64c33c Mon Sep 17 00:00:00 2001 From: Garry Tan <garrytan@gmail.com> Date: Wed, 18 Mar 2026 06:05:07 -0700 Subject: [PATCH 3/4] feat: add {{DEPLOY_BOOTSTRAP}} resolver + deployed row in dashboard - New generateDeployBootstrap() resolver auto-detects deploy platform (Vercel, Netlify, Fly.io, GH Actions, etc.), production URL, and merge method. Persists to CLAUDE.md like test bootstrap. - Review Readiness Dashboard now shows a "Deployed" row from /land-and-deploy JSONL entries (informational, never gates shipping). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --- land-and-deploy/SKILL.md | 91 ++++++++++++++++++++++++++++++++- land-and-deploy/SKILL.md.tmpl | 6 ++- plan-ceo-review/SKILL.md | 4 +- plan-design-review/SKILL.md | 4 +- plan-eng-review/SKILL.md | 4 +- scripts/gen-skill-docs.ts | 94 ++++++++++++++++++++++++++++++++++- ship/SKILL.md | 4 +- 7 files changed, 200 insertions(+), 7 deletions(-) diff --git a/land-and-deploy/SKILL.md b/land-and-deploy/SKILL.md index bad8549f..d447cd5d 100644 --- a/land-and-deploy/SKILL.md +++ b/land-and-deploy/SKILL.md @@ -290,7 +290,96 @@ Record merge timestamp and duration. Determine what kind of project this is and how to verify the deploy. -Run `gstack-diff-scope` to classify the changes: +First, run the deploy configuration bootstrap to detect or read persisted deploy settings: + +## Deploy Configuration Bootstrap + +**Detect existing deploy configuration in CLAUDE.md:** + +```bash +grep -q "## Deploy Configuration" CLAUDE.md 2>/dev/null && echo "DEPLOY_CONFIG_EXISTS" || echo "NO_DEPLOY_CONFIG" +``` + +**If DEPLOY_CONFIG_EXISTS:** Read the Deploy Configuration section from CLAUDE.md. Use the detected platform, production URL, deploy workflow name, and merge method in subsequent steps. **Skip the rest of bootstrap.** + +**If NO_DEPLOY_CONFIG — auto-detect:** + +### D1. Detect deploy platform + +```bash +# Check for platform config files +[ -f vercel.json ] || [ -d .vercel ] && echo "PLATFORM:vercel" +[ -f netlify.toml ] || [ -d netlify ] && echo "PLATFORM:netlify" +[ -f fly.toml ] && echo "PLATFORM:fly" +[ -f render.yaml ] && echo "PLATFORM:render" +[ -f Procfile ] && echo "PLATFORM:heroku" +[ -f railway.json ] || [ -f railway.toml ] && echo "PLATFORM:railway" +[ -f Dockerfile ] || [ -f docker-compose.yml ] && echo "PLATFORM:docker" +# Check for GitHub Actions deploy workflows +for f in .github/workflows/*.yml .github/workflows/*.yaml; do + [ -f "$f" ] && grep -qiE "deploy|release|production|staging|cd" "$f" 2>/dev/null && echo "DEPLOY_WORKFLOW:$f" +done +# Check project type +[ -f package.json ] && grep -q '"bin"' package.json 2>/dev/null && echo "PROJECT_TYPE:cli" +[ -f Cargo.toml ] && grep -q '\[\[bin\]\]' Cargo.toml 2>/dev/null && echo "PROJECT_TYPE:cli" +[ -f setup.py ] || [ -f pyproject.toml ] && grep -qiE "console_scripts|entry_points" setup.py pyproject.toml 2>/dev/null && echo "PROJECT_TYPE:cli" +ls *.gemspec 2>/dev/null && echo "PROJECT_TYPE:library" +``` + +### D2. Detect production URL + +```bash +# Check package.json homepage +[ -f package.json ] && grep -o '"homepage":\s*"[^"]*"' package.json 2>/dev/null +# Check for common URL patterns in config +[ -f vercel.json ] && cat vercel.json 2>/dev/null +[ -f netlify.toml ] && grep -i "url\|domain" netlify.toml 2>/dev/null +[ -f fly.toml ] && grep "app" fly.toml 2>/dev/null +``` + +### D3. Detect merge method + +```bash +gh api repos/{owner}/{repo} --jq '{squash: .allow_squash_merge, merge: .allow_merge_commit, rebase: .allow_rebase_merge}' 2>/dev/null || echo "MERGE_DETECT_FAILED" +``` + +Default preference order: squash (cleanest history) > merge > rebase. + +### D4. Classify and present + +Based on the detection results, determine the deploy configuration: + +1. **If PLATFORM detected:** Note the platform and any associated URL. +2. **If DEPLOY_WORKFLOW detected:** Note the workflow file path and name. +3. **If PROJECT_TYPE is "cli" or "library":** Note that this project likely doesn't have a web deploy. Post-merge verification is not applicable. +4. **If no platform, no workflow, and not a CLI/library:** Use AskUserQuestion: + - **Context:** Setting up deploy configuration for /land-and-deploy. + - **Question:** No deploy platform detected. What does your deploy look like? + - **RECOMMENDATION:** Choose the option that matches your setup. + - A) We deploy via GitHub Actions (specify workflow name) + - B) We deploy via Vercel / Netlify / Fly.io / other platform (specify URL) + - C) We deploy manually or via custom scripts + - D) This project doesn't deploy (library, CLI tool) + +### D5. Persist to CLAUDE.md + +If CLAUDE.md exists, append. If it doesn't exist, create it. + +Add a section: +```markdown +## Deploy Configuration (auto-detected by gstack) +- Platform: {platform or "none detected"} +- Production URL: {url or "not detected — provide via /land-and-deploy <url>"} +- Deploy workflow: {workflow file or "none"} +- Merge method: {squash/merge/rebase} +- Project type: {web app / CLI / library} +``` + +Tell the user: "Deploy configuration saved to CLAUDE.md. Future /land-and-deploy runs will use these settings automatically. Edit the section manually to update." + +--- + +Then run `gstack-diff-scope` to classify the changes: ```bash eval $(~/.claude/skills/gstack/bin/gstack-diff-scope $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main) 2>/dev/null) diff --git a/land-and-deploy/SKILL.md.tmpl b/land-and-deploy/SKILL.md.tmpl index b7b43177..4439b280 100644 --- a/land-and-deploy/SKILL.md.tmpl +++ b/land-and-deploy/SKILL.md.tmpl @@ -152,7 +152,11 @@ Record merge timestamp and duration. Determine what kind of project this is and how to verify the deploy. -Run `gstack-diff-scope` to classify the changes: +First, run the deploy configuration bootstrap to detect or read persisted deploy settings: + +{{DEPLOY_BOOTSTRAP}} + +Then run `gstack-diff-scope` to classify the changes: ```bash eval $(~/.claude/skills/gstack/bin/gstack-diff-scope $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main) 2>/dev/null) diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md index ce799fe1..4ebbaa57 100644 --- a/plan-ceo-review/SKILL.md +++ b/plan-ceo-review/SKILL.md @@ -735,7 +735,7 @@ echo "---CONFIG---" ~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false" ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, land-and-deploy). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For Deployed, show the most recent `land-and-deploy` entry with status mapped: SUCCESS→HEALTHY, REVERTED→REVERTED, other→ISSUES. Display: ``` +====================================================================+ @@ -746,6 +746,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl | Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES | | CEO Review | 0 | — | — | no | | Design Review | 0 | — | — | no | +| Deployed | 0 | — | — | no | +--------------------------------------------------------------------+ | VERDICT: CLEARED — Eng Review passed | +====================================================================+ @@ -755,6 +756,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting). - **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup. - **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes. +- **Deployed (informational):** Shows whether the most recent PR on this branch was successfully deployed and verified via \`/land-and-deploy\`. Status: HEALTHY, REVERTED, or ISSUES. Never gates shipping. **Verdict logic:** - **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`) diff --git a/plan-design-review/SKILL.md b/plan-design-review/SKILL.md index 507952c4..2d3021c1 100644 --- a/plan-design-review/SKILL.md +++ b/plan-design-review/SKILL.md @@ -410,7 +410,7 @@ echo "---CONFIG---" ~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false" ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, land-and-deploy). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For Deployed, show the most recent `land-and-deploy` entry with status mapped: SUCCESS→HEALTHY, REVERTED→REVERTED, other→ISSUES. Display: ``` +====================================================================+ @@ -421,6 +421,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl | Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES | | CEO Review | 0 | — | — | no | | Design Review | 0 | — | — | no | +| Deployed | 0 | — | — | no | +--------------------------------------------------------------------+ | VERDICT: CLEARED — Eng Review passed | +====================================================================+ @@ -430,6 +431,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting). - **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup. - **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes. +- **Deployed (informational):** Shows whether the most recent PR on this branch was successfully deployed and verified via \`/land-and-deploy\`. Status: HEALTHY, REVERTED, or ISSUES. Never gates shipping. **Verdict logic:** - **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`) diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md index 48fe7230..d17a7e61 100644 --- a/plan-eng-review/SKILL.md +++ b/plan-eng-review/SKILL.md @@ -348,7 +348,7 @@ echo "---CONFIG---" ~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false" ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, land-and-deploy). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For Deployed, show the most recent `land-and-deploy` entry with status mapped: SUCCESS→HEALTHY, REVERTED→REVERTED, other→ISSUES. Display: ``` +====================================================================+ @@ -359,6 +359,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl | Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES | | CEO Review | 0 | — | — | no | | Design Review | 0 | — | — | no | +| Deployed | 0 | — | — | no | +--------------------------------------------------------------------+ | VERDICT: CLEARED — Eng Review passed | +====================================================================+ @@ -368,6 +369,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting). - **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup. - **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes. +- **Deployed (informational):** Shows whether the most recent PR on this branch was successfully deployed and verified via \`/land-and-deploy\`. Status: HEALTHY, REVERTED, or ISSUES. Never gates shipping. **Verdict logic:** - **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`) diff --git a/scripts/gen-skill-docs.ts b/scripts/gen-skill-docs.ts index 05ff9eaf..3913fb3a 100644 --- a/scripts/gen-skill-docs.ts +++ b/scripts/gen-skill-docs.ts @@ -904,7 +904,7 @@ echo "---CONFIG---" ~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false" \`\`\` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between \`plan-design-review\` (full visual audit) and \`design-review-lite\` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, land-and-deploy). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between \`plan-design-review\` (full visual audit) and \`design-review-lite\` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For Deployed, show the most recent \`land-and-deploy\` entry with status mapped: SUCCESS→HEALTHY, REVERTED→REVERTED, other→ISSUES. Display: \`\`\` +====================================================================+ @@ -915,6 +915,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl | Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES | | CEO Review | 0 | — | — | no | | Design Review | 0 | — | — | no | +| Deployed | 0 | — | — | no | +--------------------------------------------------------------------+ | VERDICT: CLEARED — Eng Review passed | +====================================================================+ @@ -924,6 +925,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \\\`gstack-config set skip_eng_review true\\\` (the "don't bother me" setting). - **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup. - **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes. +- **Deployed (informational):** Shows whether the most recent PR on this branch was successfully deployed and verified via \\\`/land-and-deploy\\\`. Status: HEALTHY, REVERTED, or ISSUES. Never gates shipping. **Verdict logic:** - **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \\\`skip_eng_review\\\` is \\\`true\\\`) @@ -1087,6 +1089,95 @@ Only commit if there are changes. Stage all bootstrap files (config, test direct ---`; } +function generateDeployBootstrap(): string { + return `## Deploy Configuration Bootstrap + +**Detect existing deploy configuration in CLAUDE.md:** + +\`\`\`bash +grep -q "## Deploy Configuration" CLAUDE.md 2>/dev/null && echo "DEPLOY_CONFIG_EXISTS" || echo "NO_DEPLOY_CONFIG" +\`\`\` + +**If DEPLOY_CONFIG_EXISTS:** Read the Deploy Configuration section from CLAUDE.md. Use the detected platform, production URL, deploy workflow name, and merge method in subsequent steps. **Skip the rest of bootstrap.** + +**If NO_DEPLOY_CONFIG — auto-detect:** + +### D1. Detect deploy platform + +\`\`\`bash +# Check for platform config files +[ -f vercel.json ] || [ -d .vercel ] && echo "PLATFORM:vercel" +[ -f netlify.toml ] || [ -d netlify ] && echo "PLATFORM:netlify" +[ -f fly.toml ] && echo "PLATFORM:fly" +[ -f render.yaml ] && echo "PLATFORM:render" +[ -f Procfile ] && echo "PLATFORM:heroku" +[ -f railway.json ] || [ -f railway.toml ] && echo "PLATFORM:railway" +[ -f Dockerfile ] || [ -f docker-compose.yml ] && echo "PLATFORM:docker" +# Check for GitHub Actions deploy workflows +for f in .github/workflows/*.yml .github/workflows/*.yaml; do + [ -f "$f" ] && grep -qiE "deploy|release|production|staging|cd" "$f" 2>/dev/null && echo "DEPLOY_WORKFLOW:$f" +done +# Check project type +[ -f package.json ] && grep -q '"bin"' package.json 2>/dev/null && echo "PROJECT_TYPE:cli" +[ -f Cargo.toml ] && grep -q '\\[\\[bin\\]\\]' Cargo.toml 2>/dev/null && echo "PROJECT_TYPE:cli" +[ -f setup.py ] || [ -f pyproject.toml ] && grep -qiE "console_scripts|entry_points" setup.py pyproject.toml 2>/dev/null && echo "PROJECT_TYPE:cli" +ls *.gemspec 2>/dev/null && echo "PROJECT_TYPE:library" +\`\`\` + +### D2. Detect production URL + +\`\`\`bash +# Check package.json homepage +[ -f package.json ] && grep -o '"homepage":\\s*"[^"]*"' package.json 2>/dev/null +# Check for common URL patterns in config +[ -f vercel.json ] && cat vercel.json 2>/dev/null +[ -f netlify.toml ] && grep -i "url\\|domain" netlify.toml 2>/dev/null +[ -f fly.toml ] && grep "app" fly.toml 2>/dev/null +\`\`\` + +### D3. Detect merge method + +\`\`\`bash +gh api repos/{owner}/{repo} --jq '{squash: .allow_squash_merge, merge: .allow_merge_commit, rebase: .allow_rebase_merge}' 2>/dev/null || echo "MERGE_DETECT_FAILED" +\`\`\` + +Default preference order: squash (cleanest history) > merge > rebase. + +### D4. Classify and present + +Based on the detection results, determine the deploy configuration: + +1. **If PLATFORM detected:** Note the platform and any associated URL. +2. **If DEPLOY_WORKFLOW detected:** Note the workflow file path and name. +3. **If PROJECT_TYPE is "cli" or "library":** Note that this project likely doesn't have a web deploy. Post-merge verification is not applicable. +4. **If no platform, no workflow, and not a CLI/library:** Use AskUserQuestion: + - **Context:** Setting up deploy configuration for /land-and-deploy. + - **Question:** No deploy platform detected. What does your deploy look like? + - **RECOMMENDATION:** Choose the option that matches your setup. + - A) We deploy via GitHub Actions (specify workflow name) + - B) We deploy via Vercel / Netlify / Fly.io / other platform (specify URL) + - C) We deploy manually or via custom scripts + - D) This project doesn't deploy (library, CLI tool) + +### D5. Persist to CLAUDE.md + +If CLAUDE.md exists, append. If it doesn't exist, create it. + +Add a section: +\`\`\`markdown +## Deploy Configuration (auto-detected by gstack) +- Platform: {platform or "none detected"} +- Production URL: {url or "not detected — provide via /land-and-deploy <url>"} +- Deploy workflow: {workflow file or "none"} +- Merge method: {squash/merge/rebase} +- Project type: {web app / CLI / library} +\`\`\` + +Tell the user: "Deploy configuration saved to CLAUDE.md. Future /land-and-deploy runs will use these settings automatically. Edit the section manually to update." + +---`; +} + const RESOLVERS: Record<string, () => string> = { COMMAND_REFERENCE: generateCommandReference, SNAPSHOT_FLAGS: generateSnapshotFlags, @@ -1098,6 +1189,7 @@ const RESOLVERS: Record<string, () => string> = { DESIGN_REVIEW_LITE: generateDesignReviewLite, REVIEW_DASHBOARD: generateReviewDashboard, TEST_BOOTSTRAP: generateTestBootstrap, + DEPLOY_BOOTSTRAP: generateDeployBootstrap, }; // ─── Template Processing ──────────────────────────────────── diff --git a/ship/SKILL.md b/ship/SKILL.md index 875845dc..ffa940dc 100644 --- a/ship/SKILL.md +++ b/ship/SKILL.md @@ -186,7 +186,7 @@ echo "---CONFIG---" ~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false" ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite, land-and-deploy). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For Deployed, show the most recent `land-and-deploy` entry with status mapped: SUCCESS→HEALTHY, REVERTED→REVERTED, other→ISSUES. Display: ``` +====================================================================+ @@ -197,6 +197,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl | Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES | | CEO Review | 0 | — | — | no | | Design Review | 0 | — | — | no | +| Deployed | 0 | — | — | no | +--------------------------------------------------------------------+ | VERDICT: CLEARED — Eng Review passed | +====================================================================+ @@ -206,6 +207,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \`gstack-config set skip_eng_review true\` (the "don't bother me" setting). - **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup. - **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes. +- **Deployed (informational):** Shows whether the most recent PR on this branch was successfully deployed and verified via \`/land-and-deploy\`. Status: HEALTHY, REVERTED, or ISSUES. Never gates shipping. **Verdict logic:** - **CLEARED**: Eng Review has >= 1 entry within 7 days with status "clean" (or \`skip_eng_review\` is \`true\`) From c46ada15946ba1a037859ed464e139761e57b18c Mon Sep 17 00:00:00 2001 From: Garry Tan <garrytan@gmail.com> Date: Wed, 18 Mar 2026 06:07:11 -0700 Subject: [PATCH 4/4] chore: mark 3 TODOs completed, bump v0.7.0, update CHANGELOG MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Superseded by /land-and-deploy: - /merge skill — review-gated PR merge - Deploy-verify skill - Post-deploy verification (ship + browse) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --- CHANGELOG.md | 16 ++++++++++++++++ TODOS.md | 36 ++++++------------------------------ VERSION | 2 +- 3 files changed, 23 insertions(+), 31 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 49d9183d..78c32a4d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,21 @@ # Changelog +## [0.7.0.0] - 2026-03-18 + +### Added + +- **`/land-and-deploy` — the missing piece after `/ship`.** Merges your PR, waits for CI and deploy workflows, then runs a canary health check on production. Auto-detects merge queues, deploy platforms (Vercel, Netlify, Fly.io, GitHub Actions), and production URLs. Offers one-click revert if something breaks. Timing data (CI wait, queue, deploy, canary) logged for retrospectives. The workflow is now: `/review` → `/ship` → `/land-and-deploy`. +- **`/canary` — standalone post-deploy monitoring.** Watches production for 10 minutes after a deploy using the browse daemon. Takes periodic screenshots, detects new console errors, flags performance regressions (>2x baseline). Run `--baseline` before deploying to capture the "before" state. Alerts on *changes*, not absolutes — 3 pre-existing errors is fine, 1 new error is an alert. +- **`/benchmark` — performance regression detection.** Collects real Web Vitals (TTFB, FCP, LCP), bundle sizes, and request counts via `performance.getEntries()` through the browse daemon. Compares against saved baselines with configurable thresholds. Trend analysis across historical runs. Performance budgets with letter grades. +- **Performance & Bundle Impact review category.** `/review` and `/ship` now catch heavy dependency additions (moment.js → date-fns), missing lazy loading, synchronous scripts, CSS @import blocking, fetch waterfalls, and tree-shaking breaks. Added to `review/checklist.md` as an INFORMATIONAL category. +- **Deploy bootstrap auto-detection.** First time you run `/land-and-deploy`, it scans your repo for deploy platforms, production URLs, and merge method preferences. Saves results to CLAUDE.md so future runs skip detection. Same pattern as test bootstrap. +- **"Deployed" row in Review Readiness Dashboard.** After `/land-and-deploy` runs, the dashboard shows deploy status (HEALTHY/REVERTED/ISSUES) alongside Eng, CEO, and Design review status. + +### For contributors + +- Incorporated canary monitoring and benchmark patterns from community PR #151 (HMAKT99). +- 3 new skills registered across gen-skill-docs.ts, skill-check.ts, skill-validation, and gen-skill-docs tests. + ## [0.6.4.0] - 2026-03-17 ### Added diff --git a/TODOS.md b/TODOS.md index 5f771de7..76146de7 100644 --- a/TODOS.md +++ b/TODOS.md @@ -161,17 +161,6 @@ **Priority:** P2 **Depends on:** None -### Post-deploy verification (ship + browse) - -**What:** After push, browse staging/preview URL, screenshot key pages, check console for JS errors, compare staging vs prod via snapshot diff. Include verification screenshots in PR body. STOP if critical errors found. - -**Why:** Catch deployment-time regressions (JS errors, broken layouts) before merge. - -**Context:** Requires S3 upload infrastructure for PR screenshots. Pairs with visual PR annotations. - -**Effort:** L -**Priority:** P2 -**Depends on:** /setup-gstack-upload, visual PR annotations ### Visual verification with screenshots in PR body @@ -332,14 +321,6 @@ **Priority:** P3 **Depends on:** Video recording -### Deploy-verify skill - -**What:** Lightweight post-deploy smoke test: hit key URLs, verify 200s, screenshot critical pages, console error check, compare against baseline snapshots. Pass/fail with evidence. - -**Why:** Fast post-deploy confidence check, separate from full QA. - -**Effort:** M -**Priority:** P2 ### GitHub Actions eval upload @@ -456,17 +437,6 @@ Shipped as `/design-consultation` on garrytan/design branch. Renamed from `/setu **Priority:** P3 **Depends on:** gstack-diff-scope (shipped) -### /merge skill — review-gated PR merge - -**What:** Create a `/merge` skill that merges an approved PR, but first checks the Review Readiness Dashboard and runs `/review` (Fix-First) if code review hasn't been done. Separates "ship" (create PR) from "merge" (land it). - -**Why:** Currently `/review` runs inside `/ship` Step 3.5 but isn't tracked as a gate. A `/merge` skill ensures code review always happens before landing, and enables workflows where someone else reviews the PR first. - -**Context:** `/ship` creates the PR. `/merge` would: check dashboard → run `/review` if needed → `gh pr merge`. This is where code review tracking belongs — at merge time, not at plan time. - -**Effort:** M -**Priority:** P2 -**Depends on:** Ship Confidence Dashboard (shipped) ## Completeness @@ -484,6 +454,12 @@ Shipped as `/design-consultation` on garrytan/design branch. Renamed from `/setu ## Completed +### Deploy pipeline (v0.7.0) +- /merge skill — review-gated PR merge → superseded by /land-and-deploy +- Deploy-verify skill → superseded by /land-and-deploy canary verification +- Post-deploy verification (ship + browse) → superseded by /land-and-deploy +**Completed:** v0.7.0 + ### Phase 1: Foundations (v0.2.0) - Rename to gstack - Restructure to monorepo layout diff --git a/VERSION b/VERSION index 31d34d20..7b86566b 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.6.4.0 +0.7.0.0