Skip to content

slabgorb/pennyfarthing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4,750 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pennyfarthing

v13.1.2 | The outer loop goes once, the inner loop goes many times.

Pennyfarthing Logo

A Claude Code agent orchestration framework built around three pillars: a flexible development platform, scientific personality research, and streamlined integrations.


What is Pennyfarthing?

1. Development Platform

A multi-agent system with customizable BikeLane workflows for structured software development:

  • 11 Coordinated Agents - SM, TEA, Dev, Reviewer, Architect, PM, Tech Writer, UX Designer, DevOps, Orchestrator, BA
  • 11 BikeLane Workflows - TDD, BDD, Trivial, 2-Party TDD, TDD-Team, BDD-Team, Patch, Agent-Docs, Architecture, Release, Git Cleanup
  • 38 Slash Commands - Entry points for agent activation and workflows
  • 25 Skills - Reusable knowledge domains (testing, code-review, jira, settings, mermaid, etc.)
  • Prime Context System - Tiered context injection assembles agent definition, persona, session state, and sidecar memory
  • Automatic Handoffs - Context-aware agent transitions via subagent delegation
  • Agent Sidecars - Persistent learning files where agents record patterns, gotchas, and decisions across stories
  • Frame TUI - Textual-based terminal dashboard running alongside Claude Code CLI

2. Personality Research

A scientific study of how strong personalities affect AI agent behavior:

  • OCEAN Profiling - Big Five personality scores for every character
  • TRAIL Framework - Categorizing errors (reasoning, planning, execution) and correlating with personality
  • Benchmarking System - /solo, /benchmark-control, /benchmark for statistical evaluation
  • JobFair - Discovering which characters excel at roles beyond their native specialization

The 45 persona themes (Discworld, Star Trek, Breaking Bad, Alice in Wonderland, etc.) are instruments of inquiry, not decoration. Early findings show character expertise often trumps abstract personality scores.

3. Integration & Tooling

  • Frame - Dashboard panel viewer for CLI-first developers — terminal TUI alongside Claude Code
  • Jira Integration - Bidirectional sync, epic auto-creation, sprint velocity
  • Sprint Management - Story tracking with current-sprint.yaml
  • Codebase Analysis - Hotspots, complexity, dead code, dependencies, code markers, and health score via pf debug

Quick Start

Three paths depending on who you are:

Path A: Join a project that already uses Pennyfarthing

Someone on your team already ran /pf-setup. You just need the CLI and to clone.

# 1. Install the CLI (pick one)
pipx install "git+https://github.com/slabgorb/pennyfarthing.git"
# or: uv tool install "pennyfarthing-scripts @ git+https://github.com/slabgorb/pennyfarthing.git"

# 2. Clone the project
git clone git@github.com:your-org/your-project.git && cd your-project

# 3. Start Claude Code — bootstrap runs automatically on first session
claude

That's it. The project's committed bootstrap.sh hook detects the first session, runs pf init, and sets everything up. You'll see agents, themes, and workflows immediately.

If pf isn't installed when you start Claude Code, the bootstrap will attempt to install it via uv, pipx, or pip automatically.

Path B: Add Pennyfarthing to your own project

You're bringing Pennyfarthing into a repo for the first time.

# 1. Authenticate with GitHub (required — private repo)
gh auth login

# 2. Install the CLI (pick one)
pipx install "git+https://github.com/slabgorb/pennyfarthing.git"
# or: uv tool install "pennyfarthing-scripts @ git+https://github.com/slabgorb/pennyfarthing.git"
# or: curl -fsSL https://raw.githubusercontent.com/slabgorb/pennyfarthing/main/pennyfarthing-dist/scripts/install.sh | bash

# 3. Initialize your project
cd your-project
pf init

# 4. Verify
pf doctor

# 5. Start Claude Code and run interactive setup
claude
/pf-setup    # walks through repo discovery, CLAUDE.md, theme selection, Jira, etc.

# 6. Start working
/pf-work

pf init creates the .pennyfarthing/ and .claude/ directories. /pf-setup configures them interactively — repo topology, project context, theme, and optional integrations. After setup, teammates can follow Path A.

Path C: Develop Pennyfarthing itself (dogfooding)

You're contributing to the framework using the orchestrator repo.

# 1. Clone the orchestrator (includes pennyfarthing/ as inlined subrepo)
git clone git@github.com:slabgorb/orc-penny.git && cd orc-penny

# 2. Setup — clones pennyfarthing/, installs deps, builds, installs pf CLI
just setup

# 3. Launch Claude Code with OTEL telemetry
just claude

# 4. Optional: interactive walkthrough
/guided-tour

Prerequisites: Python 3.11+, Node 18+, pnpm 9+, just, Claude Code CLI, Git SSH access to slabgorb.

The orchestrator has two git repos — orc-penny/ (sprint files, sessions, docs, trunk-based on main) and pennyfarthing/ (framework source, gitflow on develop). The .pennyfarthing/ runtime directory symlinks to pennyfarthing/pennyfarthing-dist/ so changes are live immediately.

Full walkthrough: See Getting Started for detailed installation, setup, and first work session guide.

Display Modes

Pennyfarthing works in any terminal. Optional dashboards add real-time visibility into agent activity.

I want to... Mode Command
Just use agents in my terminal CLI only claude (no dashboard needed)
Stay fully in the terminal Frame TUI just tui + just claude
One command, everything Frame all-in-one pf frame start

See the full Frame Guide for setup, panels, and OTEL telemetry.

Visual Dashboards

Frame provides 15 dashboard panels showing real-time agent activity:

Panels

All panels are draggable, floatable, and splittable:

Panel Purpose
Sprint Current sprint stories and progress
Progress At-a-glance story dashboard
BikeLane Workflow phase state and navigation
AC Acceptance criteria checklist with progress
Changed Files modified during the session
Diffs Git diff viewer for current changes
Git Branch management and status
Todo Task list tracking
Audit Log Timestamped tool use history
Workflow Workflow navigation and status
Hotspots Codebase health — dead code, complexity
Settings Permission mode, relay mode, bell mode
Debug Prime context inspection with token counts
Background Background job monitoring

Architecture

Frame is powered by Frame, a Python FastAPI/uvicorn server that serves API endpoints, WebSocket channels, and the OTLP telemetry receiver:

graph TB
    subgraph "Frame"
        BR["Python FastAPI server"]
    end

    BR --> WH["Frame<br/>(uvicorn)"]

    BR -- "writes" --> BP[".frame-port"]

    WH --> API["/api/* endpoints"]
    WH --> WS["/ws/* channels"]
    WH --> OTLP["/v1/* OTLP receiver"]
Loading

Tool Visualization

Frame renders tool use as human-readable summaries instead of raw JSON. Consecutive identical tool calls are stacked, and results are collapsible.

Agent Portraits

Each persona character has a unique portrait displayed in the conversation stream, making multi-agent workflows visually distinct.

Workflow Modes

Mode Description
Permission Mode plan / manual / accept — controls how much Claude can do without approval
Relay Mode Automatic agent handoffs — detects CYCLIST:HANDOFF markers and runs the next agent
Bell Mode Queue messages while Claude works — injected at next tool execution via hooks

Prime Context System

Prime assembles the full agent context at activation: agent definition, persona character, behavior guide, sprint state, active session, and sidecar memory. This is injected via --append-system-prompt so agents behave identically regardless of display mode.

Prime uses tiered injection to manage token overhead:

Tier Tokens When
Full ~4000 New session or new agent
Refresh ~600 Same agent, stale context
Handoff ~700 Agent-to-agent transition
Minimal ~200 Deep in same agent session

Agent Sidecars

Sidecars are persistent learning files where agents record what they discover during story work. Each agent maintains three files in .pennyfarthing/sidecars/:

  • {agent}-patterns.md — Strategies and patterns that worked
  • {agent}-gotchas.md — Mistakes and edge cases to avoid
  • {agent}-decisions.md — Architecture decisions and rationale

Agents write to sidecars before every handoff. Prime loads them on activation, so agents build on previous experience instead of rediscovering the same issues.

BikeLane Workflows

BikeLane is the umbrella workflow system supporting two types:

Type Description Examples
Phased Agent-driven with automatic handoffs tdd, bdd, trivial, agent-docs
Stepped Progressive disclosure with user gates architecture, release, git-cleanup

Example: TDD Workflow (Phased)

Agent Role Phase
SM Scrum Master Story selection, session setup, completion
TEA Test Engineer Write failing tests (RED)
Dev Developer Make tests pass (GREEN)
Reviewer Code Reviewer Quality validation, approve/reject

Use /workflow list to see all workflows. Use /workflow start <name> to begin any stepped workflow.

Workflow Gates

Gates are conditional checks on phase transitions. When an agent finishes a phase, the gate evaluates whether the transition should proceed:

Gate Purpose
tests-pass Verify all tests pass before review
tests-fail Verify tests are RED before implementation
approval Verify reviewer has approved
confidence-sm Check if user instruction is unambiguous

Gates are defined in pennyfarthing-dist/gates/ and referenced via gate.file in workflow YAML.

Tandem Mode

Tandem workflows pair a background observer with the primary agent. The backseat watches the primary agent's work and injects observations:

  • TDD-Tandem — Architect watches TEA, TEA watches Dev, PM watches Reviewer
  • BDD-Tandem — Adds UX Designer watching Dev, Architect watching UX

For active questions (not passive observation), agents use the Consultation Protocol — synchronous Sonnet-powered request/response between agents.

Benchmarking & Personality Research

Pennyfarthing measures how personality affects agent performance with two complementary benchmark systems.

JobFair — single-agent evaluation

Tests one role in isolation against a rubric, to discover which characters excel at which job.

# Run a single agent on a scenario
/solo theme:agent --scenario cache-invalidation

# Create a control baseline (10 runs)
/benchmark-control reviewer --scenario order-service

# Compare persona vs control with statistics
/benchmark breaking-bad reviewer --scenario order-service

Peloton — full-pipeline replay

Replays the entire TEA → Dev → Reviewer pipeline against real code, scored on ground truth: findings that external reviewers flagged on PRs the pipeline had already approved. Nothing is synthetic — every finding is a real defect the pipeline shipped and a human later caught.

# Replay one pipeline with the control theme (no persona)
pf benchmark replay run scenarios/dpgd-116.yaml --model sonnet --n 1

# Replay with a persona theme — 4 runs, then 3-judge majority vote
pf benchmark replay run scenarios/dpgd-116.yaml --theme firefly --n 4
pf benchmark replay judge scenarios/dpgd-116.yaml --target-judges 3

# Detection heatmap across themes
pf benchmark replay compare scenarios/dpgd-116.yaml

See the Peloton guide for scenario authoring and methodology.

Benchmark dashboard

Interactive D3 charts of the pipeline-replay results, published via GitHub Pages so they open rendered in the browser — no build, no clone:

Mean weighted score vs consistency

Chart What it shows
Score vs Consistency Each theme's mean weighted catch rate vs run-to-run consistency, with quadrants at the control baseline. Color by OCEAN trait.
Finding Hit Rate Heatmap of how reliably each ground-truth finding is caught, per theme.
Phase Attribution Which phase — TEA, Dev, or Reviewer — actually catches the defects.

The pages are static and share a data snapshot (docs/benchmarks/benchmark-data.js) extracted from the full dashboard that pf benchmark viz generates. Source lives in docs/benchmarks/.

Key findings:

  • Persona themes move detection rates by less than ±10% vs control — the ceiling is set by agent definitions and prompts, not character voice.
  • The TEA phase is the most impactful: a finding caught by a failing test is caught reliably; findings that depend on the Reviewer noticing them are caught less consistently.
  • Security / CWE-class issues are well caught; build-config and self-authored test-quality issues are nearly invisible.
  • Multivariate OCEAN patterns predict better than individual traits — the "Stoic Analyst" profile (Low O + High C + Low E + Low N) excels at code review.

See Benchmarking Documentation for methodology.

CLI Commands

Command Description
pf init Initialize Pennyfarthing in a project
pf doctor Check installation health
pf doctor --fix Auto-fix common issues
pf validate Run all validators
pf theme list Show available themes
pf theme set <name> Change active theme
pf package list Show installable theme plugins
pf frame start Launch Frame dashboard
pf sprint status Current sprint overview
pf workflow list Show all workflows
pf debug hotspots analyze Git change frequency analysis
pf debug deadcode stale Find files with no recent commits
pf debug healthscore analyze Composite codebase health score
pf handoff marker <agent> Generate handoff marker

Documentation

Guides (in pennyfarthing-dist/guides/)

Guide Description
BikeLane Workflow engine — phased, stepped, procedural
Frame Standalone panel viewer for CLI-first development
Gates Workflow phase transition gates
Handoff CLI Phase transitions and marker generation
Hooks Hook system configuration and reference
Prime Agent activation and context loading
Bell Mode Message queue injection
Relay Mode Automatic agent handoffs
Reflector Agent-to-UI marker protocol
TirePump Context clearing system
Tandem Protocol Background observer pairing
Output Styles Configurable response modes
Brownfield Tools Codebase analysis CLI tools
Peloton Testing Pipeline replay benchmarks from real PR reviews
Benchmarks Persona evaluation system (JobFair)

Available Themes (45)

All 45 themes are bundled with pf init — no separate packages required. Themes span sci-fi, prestige TV, literature, mythology, comedy, history, and more:

the-expanse, star-trek-tng, breaking-bad, discworld, fifth-element, succession, the-wire, mad-men, shakespeare, jane-austen, dune, game-of-thrones, the-office, monty-python, greek-mythology, blade-runner, doctor-who, harry-potter, foundation, ted-lasso, alice-in-wonderland, firefly, and more.

All themes include OCEAN (Big Five) personality profiles. See Personas for personality analysis.

Setting a Theme

pf theme set the-expanse

Or configure directly in .pennyfarthing/config.local.yaml:

theme: the-expanse

Directory Structure

After initialization:

your-project/
├── .pennyfarthing/
│   ├── agents/               # Agent behavior definitions
│   ├── guides/               # Component documentation
│   ├── gates/                # Workflow transition gates
│   ├── output-styles/        # Response format definitions
│   ├── personas/             # Character and theme files
│   ├── scripts/              # Runtime scripts
│   ├── templates/            # Project templates
│   ├── workflows/            # BikeLane workflow definitions
│   ├── sidecars/             # Agent learning files (local, writable)
│   ├── config.local.yaml     # Theme, output style, modes
│   └── repos.yaml            # Multi-repo topology
├── .claude/
│   ├── commands/             # Slash commands for Claude Code discovery
│   └── skills/               # Skills for Claude Code discovery
├── sprint/
│   ├── current-sprint.yaml   # Active sprint
│   └── archive/              # Completed sessions
└── .session/
    └── {story-id}-session.md # Active work session

What's New in v13.0.0

  • Python-first architecture (ADR-0034) — Python owns the runtime: CLI, Frame server (FastAPI/uvicorn), hooks, benchmarks. TypeScript/React is GUI-only
  • Frame TUI — Textual-based terminal dashboard running alongside Claude Code CLI via pf frame start
  • Frame rewrite — Python FastAPI server replaces Node.js, serving API endpoints, WebSocket channels, and OTLP telemetry
  • Spec-check and spec-reconcile phases — Architect validates implementation alignment before review, reconciles deviations after
  • RepoFieldSpec registry — Typed metadata for repos.yaml fields, enabling TUI editing of project topology
  • Saddle mode — Background observer agent summon via pf saddle summon
  • Demo pipelinepf demo generate builds presentation artifacts from sprint work
  • Pipeline replay benchmarks — Full TDD pipeline testing against real PR review findings via pf benchmark replay
  • OTEL telemetry — Traces, logs, and spans via Frame WebSocket channels

Previous Highlights

  • v12.7 - Judge versioning, pipeline replay framework, theme YAML schema, kitchen-sink workflow
  • v12.6 - Consumer E2E test suite, gold standard calibration, difficulty profiles
  • v12.0 - Python-first installation, monorepo consolidation, workflow gates, handoff CLI, output styles
  • v10.x - Frame Dockview, repos topology, tandem protocol, codebase health dashboard
  • v9.x - Theme expansion, release workflow, shadcn/ui migration, prime context, bell/relay modes
  • v8.x - BikeLane workflows, scientific benchmarking, JobFair, agent sidecars

See CHANGELOG.md for full details.


License

Copyright 2025-2026 Keith Avery. Licensed under Apache-2.0.

About

Pennyfarthing — Claude Code agent orchestration framework

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages