Pantheon is a BMAD Method plugin that wraps every feature story in a structured, multi-agent pipeline — the same way a well-run engineering team operates. It works with Claude Code (best experience — native parallel agents and swarm support), OpenCode, GitHub Copilot, and Codex CLI — with specialized agents that build, review, triage, fix, and learn in parallel. The result: production-grade code, not "works on my machine" code.
AI coding assistants generate code fast, but speed without structure leaves gaps — missing validations, shallow error handling, tests that don't exercise real behavior. BMAD workflows address this, but they're only as good as the story is thorough and require numerous commands and back-and-forth to get through a single story. When you have dozens of epics and hundreds of stories, orchestrating agents through the right steps, in the right sequence, for the right stories, with proper quality gates — it becomes tedious and time-consuming to manage by hand.
Pantheon automates all of it — gap analysis, multi-perspective review, test quality validation, security scanning, and learning from past mistakes — across entire epics in a single command.
Every story runs through a 9-phase pipeline with named specialist agents — the Greek Pantheon:
PREPARE Load story, score and load relevant playbooks
|
FORGE Pygmalion creates domain-specialist reviewers on the fly
|
BUILD Metis (or a routed specialist) implements — code only, no tests
|
TEST Aletheia writes adversarial tests independently (bug loop: max 3 rounds)
|
VERIFY Cerberus (security gate), Argus (inspector), Nemesis (tests),
| Hestia (architecture) review in parallel
|
ASSESS Themis triages findings — real bug or style nit?
|
REFINE Builder fixes MUST_FIX issues in its own context (no re-explaining)
|
COMMIT Charon handles git operations with user scope selection
|
REFLECT Hermes extracts learnings, updates playbooks for next time
Each agent has a clear role boundary. Builders build. Testers test (separately). Reviewers review. The arbiter triages. No "do everything at once" chaos — the structure is what makes the output reliable.
The batch-stories workflow analyzes dependencies between stories, organizes them into parallel waves, and spawns concurrent workers — each running the full 9-phase pipeline independently.
Wave 1: Stories 6-1, 6-3 (no dependencies — run in parallel)
Wave 2: Stories 6-2, 6-4 (depend on Wave 1)
Wave 3: Stories 6-5, 6-6 (depend on Wave 2)
Hand it an epic. Walk away. Come back to production-ready code with 80%+ test coverage (configurable), multi-perspective reviews, and zero unresolved MUST_FIX issues across every story.
Experimental: Swarm mode can optionally use Claude Code's Agent Teams feature (
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1), which is experimental and may change without notice. Without it, swarm mode still works using standard Task tool subagent coordination.
Pantheon is designed from the ground up to work with Claude Code's multi-agent capabilities. In swarm mode, it spawns Heracles workers — each one an independent agent running the full story pipeline. Workers coordinate through shared task lists, claim stories automatically, and commit in parallel using a lock file protocol.
Hygeia, the Quality Gate Coordinator, serializes expensive checks (type-check, build, test suite) across workers. When three workers all need tsc --noEmit, they queue up and Hygeia runs it once against the current filesystem — then serves the same fresh result to all waiting workers. No caching, no stale results — just serialized execution with batch notification, keeping your machine responsive while agents build in parallel.
Most AI coding tools are stateless. Every conversation starts from zero. Pantheon learns.
The playbook system creates a compound learning loop:
- Story 1 runs → reviewers find 37 issues → 5 patterns extracted → playbooks updated
- Story 2 loads those playbooks → avoids 7 of those issues before writing a line of code
- By Epic 8, issues decline from 40+/story to under 10
Playbooks are scored for relevance (domain overlap, file patterns, historical hit rate) and loaded under a token budget. High-performing playbooks get loaded first. Low-performers get deprioritized. A compaction protocol keeps playbooks dense with value (3-10KB) rather than bloated with repetition.
This is operational knowledge extracted from real code reviews and fed forward into real implementations — on your specific codebase, with your specific patterns.
Cerberus is an independent security gate — not a regular reviewer. Its BLOCK findings stop the pipeline. No agent, no orchestrator, and no triage process can override a BLOCK.
Every review runs through three tiers:
- Deterministic secrets scanner — 11 regex patterns (AWS keys, GitHub/Slack/Stripe tokens, JWTs, private keys, connection strings) via a portable shell script. No LLM guessing — regex catches
AKIA[0-9A-Z]{16}every time. - Enterprise MCP policies (when available) — connects to a security MCP server for live policies, ADRs, severity thresholds, and automated scanning tools. Security team updates policies centrally; every project picks them up immediately.
- Bundled policy fallback — 10 policy files (OWASP Top 10, 7 ADRs, severity config) and 2 review playbooks ship with Pantheon. When no MCP server is configured, Cerberus uses these automatically — same enterprise-grade review, just not centrally managed.
No more "looks good to me" or vague "consider adding error handling." Every reviewer must provide file:line citations for every finding. Every task verification must cite the exact code that satisfies it. If you can't point to the line, it doesn't count.
A copy change doesn't deserve the same pipeline as a payment integration. Pantheon's 6-tier complexity engine automatically selects the right review depth:
| Tier | Review Mode | When |
|---|---|---|
| Trivial | Inline checks | Static content, config |
| Micro-Light | Consolidated (4-in-1) | Simple components, basic CRUD |
| Standard | Consolidated (4-in-1) | API integration, forms |
| Complex | Parallel reviewers | Auth, migrations, database |
| Critical | Maximum scrutiny | Encryption, PII, credentials |
80% of stories use consolidated review (saving ~25K tokens each). The remaining 20% get full parallel scrutiny where it matters.
All commands are invoked as slash commands. On Claude Code, type them directly. On GitHub Copilot, prefix with @workspace. On Codex CLI, load the corresponding instruction file first.
Run the full 9-phase pipeline on one story. Builder selection is automatic — React stories get the frontend specialist, API stories get the TypeScript specialist, database work gets the Prisma specialist.
# Implement a specific story
/story-pipeline story_key=17-1
# Implement with explicit builder override
/story-pipeline story_key=17-1 builder=heliosWhat happens (9 phases):
- PREPARE — Loads story, scores playbooks for relevance, loads top matches
- FORGE — Pygmalion analyzes the domain and forges specialist reviewers if needed
- BUILD — Routed builder implements production code
- TEST — Aletheia writes adversarial tests independently (bug loop: max 3 rounds)
- VERIFY — Parallel reviewers (Argus, Nemesis, Cerberus, Hestia, + conditionals) examine the work
- ASSESS — Themis triages findings into MUST_FIX / SHOULD_FIX / STYLE
- REFINE — Builder fixes all MUST_FIX issues
- COMMIT — Charon handles git commit with scope selection
- REFLECT — Hermes extracts learnings into playbooks
Output artifacts (in _bmad-output/sprint-artifacts/completions/):
17-1-metis.json— Builder completion report17-1-argus.json— Inspector verification with file:line evidence17-1-nemesis.json— Test quality analysis17-1-cerberus.json— Security scan results17-1-hestia.json— Architecture review17-1-themis.json— Triage decisions17-1-mnemosyne.json— Reflection / playbook updates
Process all stories in one or more epics with dependency-aware wave parallelism. Validates stories, scores complexity, builds a dependency DAG, and executes in waves.
# Implement all stories in epic 17 (sequential)
/batch-stories epic=17
# Parallel swarm mode (spawns Heracles workers)
/batch-stories epic=17 mode=parallel
# Multiple epics in one run
/batch-stories Epics 17-23
# Specific stories only
/batch-stories stories="17-1,17-3,17-5"
# Resume a failed batch (skips completed stories)
/batch-stories epic=17 resume=trueWhat happens:
- Loads
sprint-status.yaml, filters to target epic(s) - Validates all story files exist and parse correctly
- Scores complexity for each story (determines review depth)
- Analyzes inter-story dependencies, builds wave ordering
- Executes stories wave-by-wave (sequential or parallel)
- Generates session report with per-story metrics
Example wave output:
Wave 1: 17-1 (DB schema), 17-3 (shared types) → parallel, no deps
Wave 2: 17-2 (API endpoints), 17-4 (auth middleware) → depends on Wave 1
Wave 3: 17-5 (UI components) → depends on Wave 2
Go from a feature idea to implemented stories without manual BMAD workflow orchestration. Automates the entire planning chain (PRD, architecture, epics, sprint-status, stories) then hands off to batch-stories.
# From a plan file
/quick-feature plan=docs/feature-plan.md
# From inline description
/quick-feature "Add user authentication with OAuth2 and JWT tokens"
# With StackShift brownfield onboarding (auto-detected)
/quick-feature plan=docs/feature-plan.mdUser interaction points (only 2):
- CLARIFY — Targeted multiple-choice questions (4-12 based on plan detail)
- POST-EPICS — Epic selection + build mode (sequential/parallel)
Everything else runs autonomously.
Lighter than quick-feature — assumes a BMAD document trail already exists and adds to it. Three modes handle different entry points:
# Pre-build: turn a plan into stories before implementing
/plan-to-story plan=docs/new-feature.md
# Post-build: retroactively document already-built work
/plan-to-story plan=docs/what-we-built.md mode=post-build
# Sweep: find undocumented work in recent commits
/plan-to-story mode=sweep| Mode | Input | Use Case |
|---|---|---|
| pre-build | A plan | Turn plan into stories before building |
| post-build | A plan | Retroactively document already-built work |
| sweep | Git history | Find undocumented work in recent commits |
Run deep code review and hardening on existing implementations. Loops until clean — SCOPE, REVIEW, ASSESS, FIX, VERIFY, REPORT. Run repeatedly with different focuses to progressively harden code.
# General review sweep of an epic
/batch-review epic=17
# Security audit
/batch-review epic=17 focus="security vulnerabilities"
# Accessibility compliance
/batch-review epic=17 focus="accessibility, WCAG AA"
# Performance optimization
/batch-review path="src/api" focus="N+1 queries, performance bottlenecks"
# UX consistency
/batch-review epic=17 focus="styling, UX, button placement consistency"
# Error handling patterns
/batch-review path="src/services" focus="error handling consistency"
# Review across multiple epics
/batch-review Epics 17-23
# Specific stories
/batch-review stories="17-1,17-3"Hardening loop:
SCOPE → Identify files and focus area
REVIEW → Multi-perspective analysis (Cerberus, Argus, Nemesis, Hestia)
ASSESS → Themis triages findings
FIX → Builder addresses MUST_FIX items
VERIFY → Re-review to confirm fixes
REPORT → Summary with metrics
↑______↓ (loops until clean or max iterations reached)
Output artifacts (in _bmad-output/sprint-artifacts/hardening/):
{scope}-review.json— Raw review findings{scope}-triage.json— Triage decisions{scope}-fixes.json— Fix log{scope}-report.md— Human-readable summary{scope}-history.json— Iteration history
Harmonia ensures every page feels like it belongs to the same system. Two modes:
# Bootstrap: extract patterns from existing app, create Design Language Reference
/ux-audit mode=bootstrap
# Audit: compare pages against established DLR
/ux-audit
# Targeted audit after major UI changes
/ux-audit path="src/components/checkout"
# Story-scoped (automatic in pipeline for frontend stories)
/ux-audit story_key=17-3Bootstrap mode (no DLR exists):
- Scans UI components, pages, and styles
- Extracts interaction patterns, visual language, layout conventions
- Produces a Design Language Reference document
Audit mode (DLR exists):
- Compares each page/component against the DLR
- Reports inconsistencies across 6 areas: interaction patterns, visual language, layout, feedback/state, navigation, content/voice
- Findings classified: MUST_FIX (breaks mental model) / SHOULD_FIX (friction) / CODE_HEALTH (systemic) / STYLE (trivial)
Reverse gap analysis: scans your codebase for components, endpoints, models, and services that have no corresponding story. The opposite of "is the story implemented?" — this asks "is the code documented?"
# Scan everything against all stories
/detect-ghost-features
# Scope to a specific epic
/detect-ghost-features epic=17
# Scan and auto-generate backfill story proposals
/detect-ghost-features create_backfill=trueWhat it scans for:
- React components without story coverage
- API endpoints not tracked in any story
- Database tables/models with no documentation
- Services and utilities that appeared without stories
Severity levels:
- Critical — APIs, auth, payment (undocumented = high risk)
- High — Components, DB tables, services
- Medium — Utilities, helpers
- Low — Config files, constants
Interactive story generation with systematic codebase scanning. Every checkbox reflects reality — files are verified to exist, stubs are detected, test coverage is checked.
# Create a new story with verified gap analysis
/create-story-with-gap-analysis epic=17 story="Add user profile page"
# Regenerate an existing story with fresh verification
/create-story-with-gap-analysis story_key=17-3Verification status per task:
[x]— File exists, real implementation, tests exist[~]— File exists but is a stub/TODO or missing tests[ ]— File does not exist
Validate story checkbox claims against actual codebase reality. Finds false positives (checked but not done) and false negatives (done but unchecked).
# Verify a single story
/gap-analysis story_key=17-1
# Verify and auto-update checkboxes to match reality
/gap-analysis story_key=17-1 auto_update=true
# Strict mode (stubs count as incomplete)
/gap-analysis story_key=17-1 strict=trueClears all checkboxes and re-verifies each item against the actual codebase. Detects over-reported completion and identifies real gaps. Optionally fills gaps.
# Revalidate a story (report only)
/revalidate-story story_key=17-1
# Revalidate and fill gaps (implement missing items)
/revalidate-story story_key=17-1 fill_gaps=true
# Revalidate with a cap on how many gaps to fill
/revalidate-story story_key=17-1 fill_gaps=true max_gaps=5Give it your epics, architecture, and team composition. It builds a dependency DAG across every story, maps stories to architecture domains, and computes optimal parallel work streams.
# Plan for a 4-person team
/plan-execution team_size=4
# Greenfield project planning
/plan-execution team_size=3 project_type=greenfield
# Mid-project rebalancing (reads sprint-status.yaml to filter completed work)
/plan-execution team_size=4 rebalance=trueOutput includes:
- Execution phases — Foundation, Fan-out, Steady State, Convergence
- Per-developer work streams — stories grouped by domain, balanced by effort
- Coordination checkpoints — explicit handoff points between developers
- Risk zones — files touched by multiple developers, with mitigation strategies
- Mermaid dependency graph — visual DAG with color coding and critical path
Ingests all build artifacts from a completed epic (narrative logs, review findings, progress metrics, reflections), performs cross-story pattern analysis, and produces actionable outputs.
# Auto-detect completed epic from sprint-status
/epic-retrospective
# Retrospect a specific epic
/epic-retrospective epic=2Phases:
- GATHER — Discover epic, collect all completion artifacts 2-3. ANALYZE + SYNTHESIZE — Clio identifies patterns across stories, generates outputs
- PRESENT — Single checkpoint: review findings, approve changes
Outputs:
| Output | Applied? |
|---|---|
| Retrospective document | Always saved |
| Playbook update proposals | User approves |
| CLAUDE.md patch proposals | User approves (very high bar) |
| Pantheon process suggestions | Never auto-applied — take to source repo |
Scans all story files for unchecked tasks, autonomously executes remaining work, reviews quality, and updates artifacts. Designed to run at scale across 100+ stories.
# Scan and close stories across the project
/story-closer
# Target a specific epic
/story-closer epic=17Triage rules:
- 0 unchecked tasks → skip (already done)
- ≤30% unchecked → story-closer handles it (lightweight flow)
-
30% unchecked → routes to full
/story-pipeline
Harvests issues from tracked-issues.json (populated by /batch-review and /batch-stories), clusters them by root cause, and generates BMAD story files.
# Harvest and process all tracked issues
/tech-debt-burndown
# Filter to a specific type
/tech-debt-burndown type=security
# Filter to a specific epic's issues
/tech-debt-burndown epic=17Phases:
- HARVEST — Collect issues from local index or GitHub Issues
- ANALYZE — Root-cause clustering, deduplication, effort estimation
- PROPOSE — Interactive: approve, edit, skip, or merge proposals
- CREATE — Generate BMAD story files, mark source issues as addressed
One-time migration utility for repos with existing playbooks. Converts legacy format to v1 standardized format, bootstraps the index, and backfills learnings from historical pipeline artifacts.
# Run migration (safe to re-run — idempotent)
/playbook-migration
# Dry run to preview changes
/playbook-migration dry_run=true| Agent | Specialty | Triggers |
|---|---|---|
| Metis | General purpose | Fallback |
| Helios | React / Next.js | *.tsx, "component", "UI" |
| Hephaestus | TypeScript API | api/**/*.ts, "endpoint" |
| Athena | Database / Prisma | prisma/**, "migration" |
| Atlas | Infrastructure | *.tf, "deploy", "CI/CD" |
| Pythia | Python | *.py, "FastAPI", "Django" |
| Gopher | Go | *.go, "goroutine" |
| Agent | Focus | Included |
|---|---|---|
| Cerberus | Independent security gate (BLOCK/WARN severity) | Always — runs secrets scanner + policy review |
| Hestia | Architecture | Always |
| Argus | Task verification (file:line evidence) | Always |
| Nemesis | Test quality (meaningful assertions, not just coverage) | Always |
| Apollo | Logic / Performance | Backend stories |
| Arete | Code quality | Complex+ stories |
| Iris | Accessibility | Frontend stories |
| Agent | Role |
|---|---|
| Pygmalion | Forges domain-specialist reviewers per story. Uses Jaccard similarity against a specialist registry — REUSE (>=0.5), EVOLVE (0.3-0.49), or FORGE_NEW (<0.3). Each forged specialist gets a Greek mythology name. |
| Agent | Role |
|---|---|
| Themis | Triages findings — MUST_FIX / SHOULD_FIX / STYLE. Quick Fix Rule: if fixable in < 2 minutes, it's always MUST_FIX. |
| Aletheia | Adversarial test writer — writes tests independently from builder |
| Charon | Self-governed git operations — commit, PR, scope selection |
| Mnemosyne | Reflection + playbook management — extracts learnings, updates/creates playbooks |
| Hermes | Session reporter — generates comprehensive batch completion summaries |
| Clio | Epic retrospective analyst — cross-story pattern analysis, produces actionable outputs |
| Harmonia | UX design audit — bootstraps Design Language Reference or audits against it |
| Hygeia | Coordinates quality gates across parallel swarm workers |
-
Clone this repo somewhere on your machine:
git clone git@github.com:jschulte/pantheon.git ~/git/pantheon -
In your target project, run the BMAD installer:
npx bmad-method install
-
When the installer asks if you have any custom local workflows or agents, point it to the
srcfolder in this repo:~/git/pantheon/src
That's it. The installer will wire Pantheon's agents and workflows into your project alongside the rest of BMAD.
Pantheon agents and workflows are defined as .agent.yaml and workflow.yaml files. BMAD's IDE manager auto-generates platform-specific launchers (Claude Code skills, Copilot skills, OpenCode agents, etc.) from these canonical definitions.
In your project's _bmad/pantheon/config.yaml (ships with good defaults, modify as needed):
pantheon:
coverage_threshold: 80 # Minimum test coverage %
require_code_citations: true # file:line evidence required
enable_playbooks: true # Compound learning system
bootstrap_mode: true # Auto-init playbooks from codebase
enable_batch_processing: true
parallel_config:
max_concurrent: 3 # Stories per wave
smart_ordering: true # Auto-detect dependencies
use_consolidated_review: "auto" # Complexity-based routing
# External tracker integration (optional)
tracker:
provider: none # "rally", "github", or "none" (auto-detected at runtime)For a 10-story epic:
| Traditional | Pantheon | |
|---|---|---|
| Time | ~70 developer-days | ~16 hours |
| Test coverage | 40-60% | 85%+ |
| Review perspectives | 1 (maybe) | 4-6 per story |
| Security scan | Sometimes | Every story |
| Knowledge captured | Tribal, lossy | Playbooks, persistent |
| Consistency | Varies by reviewer | Same rigor every time |
Playbooks are structured knowledge files that capture patterns, gotchas, and anti-patterns learned from real code reviews on your codebase.
Before building, the pipeline scores playbooks for relevance:
- Domain overlap (does the playbook cover this story's domain?)
- File pattern match (does it apply to the files being changed?)
- Historical hit rate (did it actually prevent issues last time?)
After building, the reflection agent:
- Extracts new patterns from the review cycle
- Merges overlapping entries with existing playbooks
- Replaces stale entries with updated guidance
- Compacts to stay within 3-10KB per playbook
The result: Each playbook has structured metadata tracking which stories contributed to it, how many times it's been loaded, and its effectiveness rate. The more stories you run, the fewer issues your builder produces.
# 1. Describe what you want to build
/quick-feature "Add user authentication with OAuth2, JWT tokens, and role-based access control"
# 2. Answer 4-12 clarifying questions
# 3. Select which epics to build and mode (sequential/parallel)
# 4. Walk away — Pantheon handles the rest# 1. Create stories from your plan, integrated with existing BMAD trail
/plan-to-story plan=docs/new-feature-plan.md
# 2. Implement the stories
/batch-stories epic=18
# 3. Harden with focused reviews
/batch-review epic=18 focus="security"
/batch-review epic=18 focus="accessibility"# 1. Find undocumented code
/detect-ghost-features create_backfill=true
# 2. Run multi-pass hardening
/batch-review epic=17 focus="security vulnerabilities"
/batch-review epic=17 focus="N+1 queries, performance"
/batch-review epic=17 focus="error handling consistency"
# 3. Convert accumulated issues into stories
/tech-debt-burndown# 1. Find and close nearly-done stories
/story-closer
# 2. Revalidate completion claims
/revalidate-story story_key=17-1
# 3. Verify what's really done vs what checkboxes say
/gap-analysis story_key=17-2 auto_update=true# 1. Run automated retrospective on the completed epic
/epic-retrospective epic=17
# 2. Review and approve playbook updates, CLAUDE.md patches
# 3. Take Pantheon process suggestions back to the source repopantheon/
├── src/
│ ├── module.yaml # Module definition
│ ├── config.yaml # Default configuration
│ ├── agent-routing.yaml # Builder/reviewer routing rules
│ ├── agents/
│ │ ├── builders/ # Domain-specific builder personas
│ │ ├── reviewers/ # Specialist reviewer personas (incl. Cerberus security gate)
│ │ ├── validators/ # Verification agents
│ │ └── support/ # Triage, reflection, commit (Charon), coordination
│ ├── skills/ # Platform-portable skill definitions (SKILL.md)
│ ├── schemas/ # JSON schemas for agent artifacts
│ ├── tools/
│ │ └── scan-secrets.sh # Deterministic secrets scanner (11 regex patterns)
│ ├── workflows/
│ │ ├── story-pipeline/ # Core 9-phase implementation
│ │ │ └── data/security/ # Bundled security policies + playbooks
│ │ ├── batch-stories/ # Epic-level batch orchestration
│ │ ├── batch-review/ # Hardening workflow
│ │ ├── quick-feature/ # Plan-to-build pipeline
│ │ ├── plan-to-story/ # Lightweight BMAD trail integration
│ │ ├── plan-execution/ # Team execution planning
│ │ ├── detect-ghost-features/# Reverse gap analysis
│ │ ├── gap-analysis/ # Story verification against codebase
│ │ ├── create-story-with-gap-analysis/ # Verified story generation
│ │ ├── revalidate-story/ # Fresh re-verification
│ │ ├── story-closer/ # Close nearly-complete stories at scale
│ │ ├── tech-debt-burndown/ # Issue-to-story conversion
│ │ ├── ux-audit/ # Design consistency (Harmonia)
│ │ ├── rally-sync/ # External tracker sync
│ │ ├── playbook-migration/ # Legacy playbook upgrade
│ │ └── multi-agent-review/ # Parallel review coordination
├── scripts/
│ ├── validate-all-stories.sh # Pre-batch story validation
│ └── sanitize-story.sh # Story file sanitization
└── docs/
├── specialist-registry/ # Forged specialist personas
├── adrs/ # Architecture Decision Records
├── TROUBLESHOOTING.md # Common issues and solutions
├── PLATFORM-MIGRATION.md # Cross-platform migration guide
└── PHASE-FLOWCHART.md # Pipeline flow visualization
| Command | Purpose | Example |
|---|---|---|
/story-pipeline |
Implement one story (9-phase) | /story-pipeline story_key=17-1 |
/batch-stories |
Implement entire epic(s) | /batch-stories epic=17 mode=parallel |
/quick-feature |
Plan-to-build in one command | /quick-feature "Add OAuth2 auth" |
/plan-to-story |
Add work to existing BMAD trail | /plan-to-story plan=docs/plan.md |
/batch-review |
Deep multi-perspective review | /batch-review epic=17 focus="security" |
/ux-audit |
Design consistency audit | /ux-audit mode=bootstrap |
/detect-ghost-features |
Find undocumented code | /detect-ghost-features epic=17 |
/create-story-with-gap-analysis |
Generate verified stories | /create-story-with-gap-analysis epic=17 |
/gap-analysis |
Verify story vs codebase | /gap-analysis story_key=17-1 |
/revalidate-story |
Fresh re-verification | /revalidate-story story_key=17-1 |
/plan-execution |
Plan team work streams | /plan-execution team_size=4 |
/epic-retrospective |
Automated epic retrospective | /epic-retrospective epic=1 |
/story-closer |
Close nearly-done stories | /story-closer epic=17 |
/tech-debt-burndown |
Issues → stories | /tech-debt-burndown |
/rally-sync |
Sync with external tracker | /rally-sync epic=17 |
/playbook-migration |
Upgrade legacy playbooks | /playbook-migration dry_run=true |
- Node.js 18+
- Git
- Claude Code (primary) or another supported AI coding platform
- BMAD Method v6.0.0+ (for story format and module system)
Note: Workflow and agent files reference
@patterns/(e.g.,@patterns/tdd.md,@patterns/verification.md). These are resolved by the BMAD Method installer from the parent framework's shared patterns library. They are not included in this repository. If you see unresolved@patterns/references, ensure BMAD Method v6.0.0+ is installed in your project.
Pantheon uses two-tier versioning:
- Module version (
package.json):1.2.0— tracks npm releases and module packaging. This is the version to cite when reporting issues or checking compatibility. - Workflow versions (
workflow.yaml): track feature evolution of individual workflows independently (e.g., story-pipeline7.4.0, batch-stories4.0.0). These are internal and change more frequently than the module version.
See CHANGELOG.md for detailed release history.
- Claude Code is the primary platform. OpenCode has partial support (sequential only), GitHub Copilot has simplified support, and Codex CLI support is experimental. Full multi-agent verification requires Claude Code.
- No programmatic enforcement of agent constraints. Agent safety rules (e.g., "never force push") are Markdown instructions, not git hooks. LLMs may occasionally ignore instructions under context pressure.
- No integration tests. The test suite validates structural integrity (cross-references, schemas, naming) but cannot verify that agents follow pipeline instructions correctly at runtime.
- Token cost is not tracked. Pipeline runs do not report total tokens consumed. For rough estimates: a standard-complexity story consumes ~100-150K tokens; a critical-complexity story may consume ~300-500K tokens.
- Story file size matters. Stories under 3KB typically lack sufficient context for quality implementation. Stories over 50KB may cause context window issues. The sweet spot is 6-20KB per story file.
- Playbook system requires multiple stories to show value. The learning loop needs 3-5 stories before playbooks meaningfully reduce review findings.
- Concurrency control uses filesystem locks. Swarm mode's
mkdir-based locking works for local execution but is not suitable for multi-machine or CI/CD scenarios.
Author: Jonah Schulte







