v13.1.2 | The outer loop goes once, the inner loop goes many times.
A Claude Code agent orchestration framework built around three pillars: a flexible development platform, scientific personality research, and streamlined integrations.
A multi-agent system with customizable BikeLane workflows for structured software development:
- 11 Coordinated Agents - SM, TEA, Dev, Reviewer, Architect, PM, Tech Writer, UX Designer, DevOps, Orchestrator, BA
- 11 BikeLane Workflows - TDD, BDD, Trivial, 2-Party TDD, TDD-Team, BDD-Team, Patch, Agent-Docs, Architecture, Release, Git Cleanup
- 38 Slash Commands - Entry points for agent activation and workflows
- 25 Skills - Reusable knowledge domains (testing, code-review, jira, settings, mermaid, etc.)
- Prime Context System - Tiered context injection assembles agent definition, persona, session state, and sidecar memory
- Automatic Handoffs - Context-aware agent transitions via subagent delegation
- Agent Sidecars - Persistent learning files where agents record patterns, gotchas, and decisions across stories
- Frame TUI - Textual-based terminal dashboard running alongside Claude Code CLI
A scientific study of how strong personalities affect AI agent behavior:
- OCEAN Profiling - Big Five personality scores for every character
- TRAIL Framework - Categorizing errors (reasoning, planning, execution) and correlating with personality
- Benchmarking System -
/solo,/benchmark-control,/benchmarkfor statistical evaluation - JobFair - Discovering which characters excel at roles beyond their native specialization
The 45 persona themes (Discworld, Star Trek, Breaking Bad, Alice in Wonderland, etc.) are instruments of inquiry, not decoration. Early findings show character expertise often trumps abstract personality scores.
- Frame - Dashboard panel viewer for CLI-first developers — terminal TUI alongside Claude Code
- Jira Integration - Bidirectional sync, epic auto-creation, sprint velocity
- Sprint Management - Story tracking with
current-sprint.yaml - Codebase Analysis - Hotspots, complexity, dead code, dependencies, code markers, and health score via
pf debug
Three paths depending on who you are:
Someone on your team already ran /pf-setup. You just need the CLI and to clone.
# 1. Install the CLI (pick one)
pipx install "git+https://github.com/slabgorb/pennyfarthing.git"
# or: uv tool install "pennyfarthing-scripts @ git+https://github.com/slabgorb/pennyfarthing.git"
# 2. Clone the project
git clone git@github.com:your-org/your-project.git && cd your-project
# 3. Start Claude Code — bootstrap runs automatically on first session
claudeThat's it. The project's committed bootstrap.sh hook detects the first session, runs pf init, and sets everything up. You'll see agents, themes, and workflows immediately.
If pf isn't installed when you start Claude Code, the bootstrap will attempt to install it via uv, pipx, or pip automatically.
You're bringing Pennyfarthing into a repo for the first time.
# 1. Authenticate with GitHub (required — private repo)
gh auth login
# 2. Install the CLI (pick one)
pipx install "git+https://github.com/slabgorb/pennyfarthing.git"
# or: uv tool install "pennyfarthing-scripts @ git+https://github.com/slabgorb/pennyfarthing.git"
# or: curl -fsSL https://raw.githubusercontent.com/slabgorb/pennyfarthing/main/pennyfarthing-dist/scripts/install.sh | bash
# 3. Initialize your project
cd your-project
pf init
# 4. Verify
pf doctor
# 5. Start Claude Code and run interactive setup
claude
/pf-setup # walks through repo discovery, CLAUDE.md, theme selection, Jira, etc.
# 6. Start working
/pf-workpf init creates the .pennyfarthing/ and .claude/ directories. /pf-setup configures them interactively — repo topology, project context, theme, and optional integrations. After setup, teammates can follow Path A.
You're contributing to the framework using the orchestrator repo.
# 1. Clone the orchestrator (includes pennyfarthing/ as inlined subrepo)
git clone git@github.com:slabgorb/orc-penny.git && cd orc-penny
# 2. Setup — clones pennyfarthing/, installs deps, builds, installs pf CLI
just setup
# 3. Launch Claude Code with OTEL telemetry
just claude
# 4. Optional: interactive walkthrough
/guided-tourPrerequisites: Python 3.11+, Node 18+, pnpm 9+, just, Claude Code CLI, Git SSH access to slabgorb.
The orchestrator has two git repos — orc-penny/ (sprint files, sessions, docs, trunk-based on main) and pennyfarthing/ (framework source, gitflow on develop). The .pennyfarthing/ runtime directory symlinks to pennyfarthing/pennyfarthing-dist/ so changes are live immediately.
Full walkthrough: See Getting Started for detailed installation, setup, and first work session guide.
Pennyfarthing works in any terminal. Optional dashboards add real-time visibility into agent activity.
| I want to... | Mode | Command |
|---|---|---|
| Just use agents in my terminal | CLI only | claude (no dashboard needed) |
| Stay fully in the terminal | Frame TUI | just tui + just claude |
| One command, everything | Frame all-in-one | pf frame start |
See the full Frame Guide for setup, panels, and OTEL telemetry.
Frame provides 15 dashboard panels showing real-time agent activity:
All panels are draggable, floatable, and splittable:
| Panel | Purpose |
|---|---|
| Sprint | Current sprint stories and progress |
| Progress | At-a-glance story dashboard |
| BikeLane | Workflow phase state and navigation |
| AC | Acceptance criteria checklist with progress |
| Changed | Files modified during the session |
| Diffs | Git diff viewer for current changes |
| Git | Branch management and status |
| Todo | Task list tracking |
| Audit Log | Timestamped tool use history |
| Workflow | Workflow navigation and status |
| Hotspots | Codebase health — dead code, complexity |
| Settings | Permission mode, relay mode, bell mode |
| Debug | Prime context inspection with token counts |
| Background | Background job monitoring |
Frame is powered by Frame, a Python FastAPI/uvicorn server that serves API endpoints, WebSocket channels, and the OTLP telemetry receiver:
graph TB
subgraph "Frame"
BR["Python FastAPI server"]
end
BR --> WH["Frame<br/>(uvicorn)"]
BR -- "writes" --> BP[".frame-port"]
WH --> API["/api/* endpoints"]
WH --> WS["/ws/* channels"]
WH --> OTLP["/v1/* OTLP receiver"]
Frame renders tool use as human-readable summaries instead of raw JSON. Consecutive identical tool calls are stacked, and results are collapsible.
Each persona character has a unique portrait displayed in the conversation stream, making multi-agent workflows visually distinct.
| Mode | Description |
|---|---|
| Permission Mode | plan / manual / accept — controls how much Claude can do without approval |
| Relay Mode | Automatic agent handoffs — detects CYCLIST:HANDOFF markers and runs the next agent |
| Bell Mode | Queue messages while Claude works — injected at next tool execution via hooks |
Prime assembles the full agent context at activation: agent definition, persona character, behavior guide, sprint state, active session, and sidecar memory. This is injected via --append-system-prompt so agents behave identically regardless of display mode.
Prime uses tiered injection to manage token overhead:
| Tier | Tokens | When |
|---|---|---|
| Full | ~4000 | New session or new agent |
| Refresh | ~600 | Same agent, stale context |
| Handoff | ~700 | Agent-to-agent transition |
| Minimal | ~200 | Deep in same agent session |
Sidecars are persistent learning files where agents record what they discover during story work. Each agent maintains three files in .pennyfarthing/sidecars/:
{agent}-patterns.md— Strategies and patterns that worked{agent}-gotchas.md— Mistakes and edge cases to avoid{agent}-decisions.md— Architecture decisions and rationale
Agents write to sidecars before every handoff. Prime loads them on activation, so agents build on previous experience instead of rediscovering the same issues.
BikeLane is the umbrella workflow system supporting two types:
| Type | Description | Examples |
|---|---|---|
| Phased | Agent-driven with automatic handoffs | tdd, bdd, trivial, agent-docs |
| Stepped | Progressive disclosure with user gates | architecture, release, git-cleanup |
| Agent | Role | Phase |
|---|---|---|
| SM | Scrum Master | Story selection, session setup, completion |
| TEA | Test Engineer | Write failing tests (RED) |
| Dev | Developer | Make tests pass (GREEN) |
| Reviewer | Code Reviewer | Quality validation, approve/reject |
Use /workflow list to see all workflows. Use /workflow start <name> to begin any stepped workflow.
Gates are conditional checks on phase transitions. When an agent finishes a phase, the gate evaluates whether the transition should proceed:
| Gate | Purpose |
|---|---|
tests-pass |
Verify all tests pass before review |
tests-fail |
Verify tests are RED before implementation |
approval |
Verify reviewer has approved |
confidence-sm |
Check if user instruction is unambiguous |
Gates are defined in pennyfarthing-dist/gates/ and referenced via gate.file in workflow YAML.
Tandem workflows pair a background observer with the primary agent. The backseat watches the primary agent's work and injects observations:
- TDD-Tandem — Architect watches TEA, TEA watches Dev, PM watches Reviewer
- BDD-Tandem — Adds UX Designer watching Dev, Architect watching UX
For active questions (not passive observation), agents use the Consultation Protocol — synchronous Sonnet-powered request/response between agents.
Pennyfarthing measures how personality affects agent performance with two complementary benchmark systems.
Tests one role in isolation against a rubric, to discover which characters excel at which job.
# Run a single agent on a scenario
/solo theme:agent --scenario cache-invalidation
# Create a control baseline (10 runs)
/benchmark-control reviewer --scenario order-service
# Compare persona vs control with statistics
/benchmark breaking-bad reviewer --scenario order-serviceReplays the entire TEA → Dev → Reviewer pipeline against real code, scored on ground truth: findings that external reviewers flagged on PRs the pipeline had already approved. Nothing is synthetic — every finding is a real defect the pipeline shipped and a human later caught.
# Replay one pipeline with the control theme (no persona)
pf benchmark replay run scenarios/dpgd-116.yaml --model sonnet --n 1
# Replay with a persona theme — 4 runs, then 3-judge majority vote
pf benchmark replay run scenarios/dpgd-116.yaml --theme firefly --n 4
pf benchmark replay judge scenarios/dpgd-116.yaml --target-judges 3
# Detection heatmap across themes
pf benchmark replay compare scenarios/dpgd-116.yamlSee the Peloton guide for scenario authoring and methodology.
Interactive D3 charts of the pipeline-replay results, published via GitHub Pages so they open rendered in the browser — no build, no clone:
| Chart | What it shows |
|---|---|
| Score vs Consistency | Each theme's mean weighted catch rate vs run-to-run consistency, with quadrants at the control baseline. Color by OCEAN trait. |
| Finding Hit Rate | Heatmap of how reliably each ground-truth finding is caught, per theme. |
| Phase Attribution | Which phase — TEA, Dev, or Reviewer — actually catches the defects. |
The pages are static and share a data snapshot (docs/benchmarks/benchmark-data.js) extracted from the full dashboard that pf benchmark viz generates. Source lives in docs/benchmarks/.
Key findings:
- Persona themes move detection rates by less than ±10% vs control — the ceiling is set by agent definitions and prompts, not character voice.
- The TEA phase is the most impactful: a finding caught by a failing test is caught reliably; findings that depend on the Reviewer noticing them are caught less consistently.
- Security / CWE-class issues are well caught; build-config and self-authored test-quality issues are nearly invisible.
- Multivariate OCEAN patterns predict better than individual traits — the "Stoic Analyst" profile (Low O + High C + Low E + Low N) excels at code review.
See Benchmarking Documentation for methodology.
| Command | Description |
|---|---|
pf init |
Initialize Pennyfarthing in a project |
pf doctor |
Check installation health |
pf doctor --fix |
Auto-fix common issues |
pf validate |
Run all validators |
pf theme list |
Show available themes |
pf theme set <name> |
Change active theme |
pf package list |
Show installable theme plugins |
pf frame start |
Launch Frame dashboard |
pf sprint status |
Current sprint overview |
pf workflow list |
Show all workflows |
pf debug hotspots analyze |
Git change frequency analysis |
pf debug deadcode stale |
Find files with no recent commits |
pf debug healthscore analyze |
Composite codebase health score |
pf handoff marker <agent> |
Generate handoff marker |
| Guide | Description |
|---|---|
| BikeLane | Workflow engine — phased, stepped, procedural |
| Frame | Standalone panel viewer for CLI-first development |
| Gates | Workflow phase transition gates |
| Handoff CLI | Phase transitions and marker generation |
| Hooks | Hook system configuration and reference |
| Prime | Agent activation and context loading |
| Bell Mode | Message queue injection |
| Relay Mode | Automatic agent handoffs |
| Reflector | Agent-to-UI marker protocol |
| TirePump | Context clearing system |
| Tandem Protocol | Background observer pairing |
| Output Styles | Configurable response modes |
| Brownfield Tools | Codebase analysis CLI tools |
| Peloton Testing | Pipeline replay benchmarks from real PR reviews |
| Benchmarks | Persona evaluation system (JobFair) |
All 45 themes are bundled with pf init — no separate packages required. Themes span sci-fi, prestige TV, literature, mythology, comedy, history, and more:
the-expanse, star-trek-tng, breaking-bad, discworld, fifth-element, succession, the-wire, mad-men, shakespeare, jane-austen, dune, game-of-thrones, the-office, monty-python, greek-mythology, blade-runner, doctor-who, harry-potter, foundation, ted-lasso, alice-in-wonderland, firefly, and more.
All themes include OCEAN (Big Five) personality profiles. See Personas for personality analysis.
pf theme set the-expanseOr configure directly in .pennyfarthing/config.local.yaml:
theme: the-expanseAfter initialization:
your-project/
├── .pennyfarthing/
│ ├── agents/ # Agent behavior definitions
│ ├── guides/ # Component documentation
│ ├── gates/ # Workflow transition gates
│ ├── output-styles/ # Response format definitions
│ ├── personas/ # Character and theme files
│ ├── scripts/ # Runtime scripts
│ ├── templates/ # Project templates
│ ├── workflows/ # BikeLane workflow definitions
│ ├── sidecars/ # Agent learning files (local, writable)
│ ├── config.local.yaml # Theme, output style, modes
│ └── repos.yaml # Multi-repo topology
├── .claude/
│ ├── commands/ # Slash commands for Claude Code discovery
│ └── skills/ # Skills for Claude Code discovery
├── sprint/
│ ├── current-sprint.yaml # Active sprint
│ └── archive/ # Completed sessions
└── .session/
└── {story-id}-session.md # Active work session
- Python-first architecture (ADR-0034) — Python owns the runtime: CLI, Frame server (FastAPI/uvicorn), hooks, benchmarks. TypeScript/React is GUI-only
- Frame TUI — Textual-based terminal dashboard running alongside Claude Code CLI via
pf frame start - Frame rewrite — Python FastAPI server replaces Node.js, serving API endpoints, WebSocket channels, and OTLP telemetry
- Spec-check and spec-reconcile phases — Architect validates implementation alignment before review, reconciles deviations after
- RepoFieldSpec registry — Typed metadata for repos.yaml fields, enabling TUI editing of project topology
- Saddle mode — Background observer agent summon via
pf saddle summon - Demo pipeline —
pf demo generatebuilds presentation artifacts from sprint work - Pipeline replay benchmarks — Full TDD pipeline testing against real PR review findings via
pf benchmark replay - OTEL telemetry — Traces, logs, and spans via Frame WebSocket channels
- v12.7 - Judge versioning, pipeline replay framework, theme YAML schema, kitchen-sink workflow
- v12.6 - Consumer E2E test suite, gold standard calibration, difficulty profiles
- v12.0 - Python-first installation, monorepo consolidation, workflow gates, handoff CLI, output styles
- v10.x - Frame Dockview, repos topology, tandem protocol, codebase health dashboard
- v9.x - Theme expansion, release workflow, shadcn/ui migration, prime context, bell/relay modes
- v8.x - BikeLane workflows, scientific benchmarking, JobFair, agent sidecars
See CHANGELOG.md for full details.
Copyright 2025-2026 Keith Avery. Licensed under Apache-2.0.

