Skip to content

d3banjan/butterflow

Repository files navigation

Butterflow

Agent-first CLI for writing human-readable E2E tests and evals for nested agent systems.

Write specs in Python. Run them against any agent framework. Get reports that make sense to engineers, product, and finance.

Install

pip install -e ".[dev]"
# or with uv:
uv pip install -e ".[dev]"

Demo

# Initialize project config
butterflow init

# Detect agent frameworks in the current directory
butterflow detect

# List all available adapters (24 frameworks)
butterflow adapters list

# Discover flows from example specs
butterflow ingest examples/

# Dry-run — validate specs without spending tokens
butterflow run examples/ --dry-run

# Run only happy-path flows
butterflow run examples/ --subset happy --dry-run

# Build a token-aware execution plan
butterflow plan examples/ --show-cache-clusters

# Generate Markdown docs from specs
butterflow docs examples/

# View a run report (after a real run)
butterflow report
butterflow report --slice financial
butterflow report --slice ux-testing
butterflow report --slice backend-dev

Write a spec

from butterflow import expect, flow

with flow("refund happy path", subset="happy") as f:
    f.intent("A valid invoice refund is routed to billing and completed.")
    f.input("I need a refund for invoice 123")

    f.expect(expect.agent("router").selects("billing"))
    f.expect(expect.tool("lookup_invoice").called_with(invoice_id="123"))
    f.expect(expect.tool("issue_refund").called())
    f.expect(expect.final_response().contains("refund has been issued"))

Specs are plain Python files. No test framework required. Butterflow owns discovery and execution.

Install an adapter

butterflow adapters info langgraph
butterflow adapters install langgraph
# equivalent: pip install butterflow[langgraph]

CI

- run: uv run butterflow plan examples/ --show-cache-clusters
- run: uv run butterflow run examples/ --subset happy

See CI Setup for token-aware scheduling and rate-limit configuration.

Docs

Development

uv run pytest
uv run ruff check .

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages