Agent-first CLI for writing human-readable E2E tests and evals for nested agent systems.
Write specs in Python. Run them against any agent framework. Get reports that make sense to engineers, product, and finance.
pip install -e ".[dev]"
# or with uv:
uv pip install -e ".[dev]"# Initialize project config
butterflow init
# Detect agent frameworks in the current directory
butterflow detect
# List all available adapters (24 frameworks)
butterflow adapters list
# Discover flows from example specs
butterflow ingest examples/
# Dry-run — validate specs without spending tokens
butterflow run examples/ --dry-run
# Run only happy-path flows
butterflow run examples/ --subset happy --dry-run
# Build a token-aware execution plan
butterflow plan examples/ --show-cache-clusters
# Generate Markdown docs from specs
butterflow docs examples/
# View a run report (after a real run)
butterflow report
butterflow report --slice financial
butterflow report --slice ux-testing
butterflow report --slice backend-devfrom butterflow import expect, flow
with flow("refund happy path", subset="happy") as f:
f.intent("A valid invoice refund is routed to billing and completed.")
f.input("I need a refund for invoice 123")
f.expect(expect.agent("router").selects("billing"))
f.expect(expect.tool("lookup_invoice").called_with(invoice_id="123"))
f.expect(expect.tool("issue_refund").called())
f.expect(expect.final_response().contains("refund has been issued"))Specs are plain Python files. No test framework required. Butterflow owns discovery and execution.
butterflow adapters info langgraph
butterflow adapters install langgraph
# equivalent: pip install butterflow[langgraph]- run: uv run butterflow plan examples/ --show-cache-clusters
- run: uv run butterflow run examples/ --subset happySee CI Setup for token-aware scheduling and rate-limit configuration.
- Quickstart
- Spec Authoring Guide
- Adapter Authoring Guide
- Adapter Compatibility Table
- CI Setup
- Token Savings
uv run pytest
uv run ruff check .