Pennyfarthing

v13.1.2 | The outer loop goes once, the inner loop goes many times.

A Claude Code agent orchestration framework built around three pillars: a flexible development platform, scientific personality research, and streamlined integrations.

What is Pennyfarthing?

1. Development Platform

A multi-agent system with customizable BikeLane workflows for structured software development:

11 Coordinated Agents - SM, TEA, Dev, Reviewer, Architect, PM, Tech Writer, UX Designer, DevOps, Orchestrator, BA
11 BikeLane Workflows - TDD, BDD, Trivial, 2-Party TDD, TDD-Team, BDD-Team, Patch, Agent-Docs, Architecture, Release, Git Cleanup
38 Slash Commands - Entry points for agent activation and workflows
25 Skills - Reusable knowledge domains (testing, code-review, jira, settings, mermaid, etc.)
Prime Context System - Tiered context injection assembles agent definition, persona, session state, and sidecar memory
Automatic Handoffs - Context-aware agent transitions via subagent delegation
Agent Sidecars - Persistent learning files where agents record patterns, gotchas, and decisions across stories
Frame TUI - Textual-based terminal dashboard running alongside Claude Code CLI

2. Personality Research

A scientific study of how strong personalities affect AI agent behavior:

OCEAN Profiling - Big Five personality scores for every character
TRAIL Framework - Categorizing errors (reasoning, planning, execution) and correlating with personality
Benchmarking System - /solo, /benchmark-control, /benchmark for statistical evaluation
JobFair - Discovering which characters excel at roles beyond their native specialization

The 45 persona themes (Discworld, Star Trek, Breaking Bad, Alice in Wonderland, etc.) are instruments of inquiry, not decoration. Early findings show character expertise often trumps abstract personality scores.

3. Integration & Tooling

Frame - Dashboard panel viewer for CLI-first developers — terminal TUI alongside Claude Code
Jira Integration - Bidirectional sync, epic auto-creation, sprint velocity
Sprint Management - Story tracking with current-sprint.yaml
Codebase Analysis - Hotspots, complexity, dead code, dependencies, code markers, and health score via pf debug

Quick Start

Three paths depending on who you are:

Path A: Join a project that already uses Pennyfarthing

Someone on your team already ran /pf-setup. You just need the CLI and to clone.

# 1. Install the CLI (pick one)
pipx install "git+https://github.com/slabgorb/pennyfarthing.git"
# or: uv tool install "pennyfarthing-scripts @ git+https://github.com/slabgorb/pennyfarthing.git"

# 2. Clone the project
git clone git@github.com:your-org/your-project.git && cd your-project

# 3. Start Claude Code — bootstrap runs automatically on first session
claude

That's it. The project's committed bootstrap.sh hook detects the first session, runs pf init, and sets everything up. You'll see agents, themes, and workflows immediately.

If pf isn't installed when you start Claude Code, the bootstrap will attempt to install it via uv, pipx, or pip automatically.

Path B: Add Pennyfarthing to your own project

You're bringing Pennyfarthing into a repo for the first time.

# 1. Authenticate with GitHub (required — private repo)
gh auth login

# 2. Install the CLI (pick one)
pipx install "git+https://github.com/slabgorb/pennyfarthing.git"
# or: uv tool install "pennyfarthing-scripts @ git+https://github.com/slabgorb/pennyfarthing.git"
# or: curl -fsSL https://raw.githubusercontent.com/slabgorb/pennyfarthing/main/pennyfarthing-dist/scripts/install.sh | bash

# 3. Initialize your project
cd your-project
pf init

# 4. Verify
pf doctor

# 5. Start Claude Code and run interactive setup
claude
/pf-setup    # walks through repo discovery, CLAUDE.md, theme selection, Jira, etc.

# 6. Start working
/pf-work

pf init creates the .pennyfarthing/ and .claude/ directories. /pf-setup configures them interactively — repo topology, project context, theme, and optional integrations. After setup, teammates can follow Path A.

Path C: Develop Pennyfarthing itself (dogfooding)

You're contributing to the framework using the orchestrator repo.

# 1. Clone the orchestrator (includes pennyfarthing/ as inlined subrepo)
git clone git@github.com:slabgorb/orc-penny.git && cd orc-penny

# 2. Setup — clones pennyfarthing/, installs deps, builds, installs pf CLI
just setup

# 3. Launch Claude Code with OTEL telemetry
just claude

# 4. Optional: interactive walkthrough
/guided-tour

Prerequisites: Python 3.11+, Node 18+, pnpm 9+, just, Claude Code CLI, Git SSH access to slabgorb.

The orchestrator has two git repos — orc-penny/ (sprint files, sessions, docs, trunk-based on main) and pennyfarthing/ (framework source, gitflow on develop). The .pennyfarthing/ runtime directory symlinks to pennyfarthing/pennyfarthing-dist/ so changes are live immediately.

Full walkthrough: See Getting Started for detailed installation, setup, and first work session guide.

Display Modes

Pennyfarthing works in any terminal. Optional dashboards add real-time visibility into agent activity.

I want to...	Mode	Command
Just use agents in my terminal	CLI only	`claude` (no dashboard needed)
Stay fully in the terminal	Frame TUI	`just tui` + `just claude`
One command, everything	Frame all-in-one	`pf frame start`

See the full Frame Guide for setup, panels, and OTEL telemetry.

Visual Dashboards

Frame provides 15 dashboard panels showing real-time agent activity:

Panels

All panels are draggable, floatable, and splittable:

Panel	Purpose
Sprint	Current sprint stories and progress
Progress	At-a-glance story dashboard
BikeLane	Workflow phase state and navigation
AC	Acceptance criteria checklist with progress
Changed	Files modified during the session
Diffs	Git diff viewer for current changes
Git	Branch management and status
Todo	Task list tracking
Audit Log	Timestamped tool use history
Workflow	Workflow navigation and status
Hotspots	Codebase health — dead code, complexity
Settings	Permission mode, relay mode, bell mode
Debug	Prime context inspection with token counts
Background	Background job monitoring

Architecture

Frame is powered by Frame, a Python FastAPI/uvicorn server that serves API endpoints, WebSocket channels, and the OTLP telemetry receiver:

graph TB
    subgraph "Frame"
        BR["Python FastAPI server"]
    end

    BR --> WH["Frame<br/>(uvicorn)"]

    BR -- "writes" --> BP[".frame-port"]

    WH --> API["/api/* endpoints"]
    WH --> WS["/ws/* channels"]
    WH --> OTLP["/v1/* OTLP receiver"]

Tool Visualization

Frame renders tool use as human-readable summaries instead of raw JSON. Consecutive identical tool calls are stacked, and results are collapsible.

Agent Portraits

Each persona character has a unique portrait displayed in the conversation stream, making multi-agent workflows visually distinct.

Workflow Modes

Mode	Description
Permission Mode	`plan` / `manual` / `accept` — controls how much Claude can do without approval
Relay Mode	Automatic agent handoffs — detects `CYCLIST:HANDOFF` markers and runs the next agent
Bell Mode	Queue messages while Claude works — injected at next tool execution via hooks

Prime Context System

Prime assembles the full agent context at activation: agent definition, persona character, behavior guide, sprint state, active session, and sidecar memory. This is injected via --append-system-prompt so agents behave identically regardless of display mode.

Prime uses tiered injection to manage token overhead:

Tier	Tokens	When
Full	~4000	New session or new agent
Refresh	~600	Same agent, stale context
Handoff	~700	Agent-to-agent transition
Minimal	~200	Deep in same agent session

Agent Sidecars

Sidecars are persistent learning files where agents record what they discover during story work. Each agent maintains three files in .pennyfarthing/sidecars/:

{agent}-patterns.md — Strategies and patterns that worked
{agent}-gotchas.md — Mistakes and edge cases to avoid
{agent}-decisions.md — Architecture decisions and rationale

Agents write to sidecars before every handoff. Prime loads them on activation, so agents build on previous experience instead of rediscovering the same issues.

BikeLane Workflows

BikeLane is the umbrella workflow system supporting two types:

Type	Description	Examples
Phased	Agent-driven with automatic handoffs	tdd, bdd, trivial, agent-docs
Stepped	Progressive disclosure with user gates	architecture, release, git-cleanup

Example: TDD Workflow (Phased)

Agent	Role	Phase
SM	Scrum Master	Story selection, session setup, completion
TEA	Test Engineer	Write failing tests (RED)
Dev	Developer	Make tests pass (GREEN)
Reviewer	Code Reviewer	Quality validation, approve/reject

Use /workflow list to see all workflows. Use /workflow start <name> to begin any stepped workflow.

Workflow Gates

Gates are conditional checks on phase transitions. When an agent finishes a phase, the gate evaluates whether the transition should proceed:

Gate	Purpose
`tests-pass`	Verify all tests pass before review
`tests-fail`	Verify tests are RED before implementation
`approval`	Verify reviewer has approved
`confidence-sm`	Check if user instruction is unambiguous

Gates are defined in pennyfarthing-dist/gates/ and referenced via gate.file in workflow YAML.

Tandem Mode

Tandem workflows pair a background observer with the primary agent. The backseat watches the primary agent's work and injects observations:

TDD-Tandem — Architect watches TEA, TEA watches Dev, PM watches Reviewer
BDD-Tandem — Adds UX Designer watching Dev, Architect watching UX

For active questions (not passive observation), agents use the Consultation Protocol — synchronous Sonnet-powered request/response between agents.

Benchmarking & Personality Research

Pennyfarthing measures how personality affects agent performance with two complementary benchmark systems.

JobFair — single-agent evaluation

Tests one role in isolation against a rubric, to discover which characters excel at which job.

# Run a single agent on a scenario
/solo theme:agent --scenario cache-invalidation

# Create a control baseline (10 runs)
/benchmark-control reviewer --scenario order-service

# Compare persona vs control with statistics
/benchmark breaking-bad reviewer --scenario order-service

Peloton — full-pipeline replay

Replays the entire TEA → Dev → Reviewer pipeline against real code, scored on ground truth: findings that external reviewers flagged on PRs the pipeline had already approved. Nothing is synthetic — every finding is a real defect the pipeline shipped and a human later caught.

# Replay one pipeline with the control theme (no persona)
pf benchmark replay run scenarios/dpgd-116.yaml --model sonnet --n 1

# Replay with a persona theme — 4 runs, then 3-judge majority vote
pf benchmark replay run scenarios/dpgd-116.yaml --theme firefly --n 4
pf benchmark replay judge scenarios/dpgd-116.yaml --target-judges 3

# Detection heatmap across themes
pf benchmark replay compare scenarios/dpgd-116.yaml

See the Peloton guide for scenario authoring and methodology.

Benchmark dashboard

Interactive D3 charts of the pipeline-replay results, published via GitHub Pages so they open rendered in the browser — no build, no clone:

Chart	What it shows
Score vs Consistency	Each theme's mean weighted catch rate vs run-to-run consistency, with quadrants at the control baseline. Color by OCEAN trait.
Finding Hit Rate	Heatmap of how reliably each ground-truth finding is caught, per theme.
Phase Attribution	Which phase — TEA, Dev, or Reviewer — actually catches the defects.

The pages are static and share a data snapshot (docs/benchmarks/benchmark-data.js) extracted from the full dashboard that pf benchmark viz generates. Source lives in docs/benchmarks/.

Key findings:

Persona themes move detection rates by less than ±10% vs control — the ceiling is set by agent definitions and prompts, not character voice.
The TEA phase is the most impactful: a finding caught by a failing test is caught reliably; findings that depend on the Reviewer noticing them are caught less consistently.
Security / CWE-class issues are well caught; build-config and self-authored test-quality issues are nearly invisible.
Multivariate OCEAN patterns predict better than individual traits — the "Stoic Analyst" profile (Low O + High C + Low E + Low N) excels at code review.

See Benchmarking Documentation for methodology.

CLI Commands

Command	Description
`pf init`	Initialize Pennyfarthing in a project
`pf doctor`	Check installation health
`pf doctor --fix`	Auto-fix common issues
`pf validate`	Run all validators
`pf theme list`	Show available themes
`pf theme set <name>`	Change active theme
`pf package list`	Show installable theme plugins
`pf frame start`	Launch Frame dashboard
`pf sprint status`	Current sprint overview
`pf workflow list`	Show all workflows
`pf debug hotspots analyze`	Git change frequency analysis
`pf debug deadcode stale`	Find files with no recent commits
`pf debug healthscore analyze`	Composite codebase health score
`pf handoff marker <agent>`	Generate handoff marker

Documentation

Guides (in `pennyfarthing-dist/guides/`)

Guide	Description
BikeLane	Workflow engine — phased, stepped, procedural
Frame	Standalone panel viewer for CLI-first development
Gates	Workflow phase transition gates
Handoff CLI	Phase transitions and marker generation
Hooks	Hook system configuration and reference
Prime	Agent activation and context loading
Bell Mode	Message queue injection
Relay Mode	Automatic agent handoffs
Reflector	Agent-to-UI marker protocol
TirePump	Context clearing system
Tandem Protocol	Background observer pairing
Output Styles	Configurable response modes
Brownfield Tools	Codebase analysis CLI tools
Peloton Testing	Pipeline replay benchmarks from real PR reviews
Benchmarks	Persona evaluation system (JobFair)

Available Themes (45)

All 45 themes are bundled with pf init — no separate packages required. Themes span sci-fi, prestige TV, literature, mythology, comedy, history, and more:

the-expanse, star-trek-tng, breaking-bad, discworld, fifth-element, succession, the-wire, mad-men, shakespeare, jane-austen, dune, game-of-thrones, the-office, monty-python, greek-mythology, blade-runner, doctor-who, harry-potter, foundation, ted-lasso, alice-in-wonderland, firefly, and more.

All themes include OCEAN (Big Five) personality profiles. See Personas for personality analysis.

Setting a Theme

pf theme set the-expanse

Or configure directly in .pennyfarthing/config.local.yaml:

theme: the-expanse

Directory Structure

After initialization:

your-project/
├── .pennyfarthing/
│   ├── agents/               # Agent behavior definitions
│   ├── guides/               # Component documentation
│   ├── gates/                # Workflow transition gates
│   ├── output-styles/        # Response format definitions
│   ├── personas/             # Character and theme files
│   ├── scripts/              # Runtime scripts
│   ├── templates/            # Project templates
│   ├── workflows/            # BikeLane workflow definitions
│   ├── sidecars/             # Agent learning files (local, writable)
│   ├── config.local.yaml     # Theme, output style, modes
│   └── repos.yaml            # Multi-repo topology
├── .claude/
│   ├── commands/             # Slash commands for Claude Code discovery
│   └── skills/               # Skills for Claude Code discovery
├── sprint/
│   ├── current-sprint.yaml   # Active sprint
│   └── archive/              # Completed sessions
└── .session/
    └── {story-id}-session.md # Active work session

What's New in v13.0.0

Python-first architecture (ADR-0034) — Python owns the runtime: CLI, Frame server (FastAPI/uvicorn), hooks, benchmarks. TypeScript/React is GUI-only
Frame TUI — Textual-based terminal dashboard running alongside Claude Code CLI via pf frame start
Frame rewrite — Python FastAPI server replaces Node.js, serving API endpoints, WebSocket channels, and OTLP telemetry
Spec-check and spec-reconcile phases — Architect validates implementation alignment before review, reconciles deviations after
RepoFieldSpec registry — Typed metadata for repos.yaml fields, enabling TUI editing of project topology
Saddle mode — Background observer agent summon via pf saddle summon
Demo pipeline — pf demo generate builds presentation artifacts from sprint work
Pipeline replay benchmarks — Full TDD pipeline testing against real PR review findings via pf benchmark replay
OTEL telemetry — Traces, logs, and spans via Frame WebSocket channels

Previous Highlights

v12.7 - Judge versioning, pipeline replay framework, theme YAML schema, kitchen-sink workflow
v12.6 - Consumer E2E test suite, gold standard calibration, difficulty profiles
v12.0 - Python-first installation, monorepo consolidation, workflow gates, handoff CLI, output styles
v10.x - Frame Dockview, repos topology, tandem protocol, codebase health dashboard
v9.x - Theme expansion, release workflow, shadcn/ui migration, prime context, bell/relay modes
v8.x - BikeLane workflows, scientific benchmarking, JobFair, agent sidecars

See CHANGELOG.md for full details.

Name		Name	Last commit message	Last commit date
Latest commit History 4,750 Commits
benchmarks		benchmarks
docs		docs
pennyfarthing-dist		pennyfarthing-dist
scripts		scripts
tests		tests
.gitignore		.gitignore
.markdownlint.yaml		.markdownlint.yaml
.npmignore		.npmignore
.yamllint.yaml		.yamllint.yaml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
install-handoff.md		install-handoff.md
justfile		justfile
pennyfarthing-reversed-transparent.png		pennyfarthing-reversed-transparent.png
pennyfarthing-reversed.png		pennyfarthing-reversed.png
pennyfarthing-transparent.png		pennyfarthing-transparent.png
pennyfarthing.png		pennyfarthing.png
pyproject.toml		pyproject.toml
setup.py		setup.py
start-session		start-session
tmux.conf.left		tmux.conf.left
tmux.conf.right		tmux.conf.right
tmux.conf.vert		tmux.conf.vert

Folders and files

Latest commit

History

Repository files navigation

Pennyfarthing

What is Pennyfarthing?

1. Development Platform

2. Personality Research

3. Integration & Tooling

Quick Start

Path A: Join a project that already uses Pennyfarthing

Path B: Add Pennyfarthing to your own project

Path C: Develop Pennyfarthing itself (dogfooding)

Display Modes

Visual Dashboards

Panels

Architecture

Tool Visualization

Agent Portraits

Workflow Modes

Prime Context System

Agent Sidecars

BikeLane Workflows

Example: TDD Workflow (Phased)

Workflow Gates

Tandem Mode

Benchmarking & Personality Research

JobFair — single-agent evaluation

Peloton — full-pipeline replay

Benchmark dashboard

CLI Commands

Documentation

Guides (in pennyfarthing-dist/guides/)

Available Themes (45)

Setting a Theme

Directory Structure

What's New in v13.0.0

Previous Highlights

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Guides (in `pennyfarthing-dist/guides/`)

Packages