PRD-driven, rigorously validated Rust code generation. A Claude Code skill plus a companion Rust binary that turn a Product Requirements Document into a working Rust project through an autonomous iterate-and-prove loop guarded by a 7-receipt release gate.
One-liner — clones the skill into a temp dir, symlinks it into
~/.claude/skills/autobuilder/, exits clean:
curl -fsSL https://raw.githubusercontent.com/j0yen/autobuilder/main/skill/install.sh | bashOr the manual two-step:
git clone --depth 1 https://github.com/j0yen/autobuilder.git
./autobuilder/skill/install.shClaude Code picks up the skill on the next session start.
/autobuilder <PRD-path> invokes it.
curl -fsSL https://raw.githubusercontent.com/j0yen/autobuilder/main/skill/install.sh | bashOr manually:
git clone --depth 1 https://github.com/j0yen/autobuilder.git
./autobuilder/skill/install.shWhen cargo is on your PATH, install.sh automatically builds and installs:
- The 4 companion CLIs (
autobuilder-ac-counter,autobuilder-bincov-receipt,autobuilder-harness-portability-audit,autobuilder-proposal-aggregator) - The pipeline companion binary (
autobuilder)
Without cargo, the skill still works for Stages 1–2.
- Stages 1-2:
bash,jq,git. Claude Code for the skill itself. - Stages 3-5:
cargo/rustc 1.85+,cargo-deny,cargo-nextest, optionalcargo +nightly miri(only when--allow-unsafe).
Four standalone Rust CLIs shipped alongside the skill. Built and installed
automatically by install.sh when cargo is present; also installable
individually via cargo install --path tools/<name>.
| Binary | What it does |
|---|---|
autobuilder-ac-counter |
Counts acceptance criteria correctly across split-file (acceptance_*.rs), monolithic (acceptance.rs with ac|new_ac|ext families), and mock (tests/mocks/) layouts. Fixes the run-metrics.sh undercount. |
autobuilder-bincov-receipt |
Detects [[bin]] crates that ship a binary but have no tests/integration_cli.rs driving it via std::process::Command. Emits a bincov.v1 receipt; --strict exits 3. |
autobuilder-harness-portability-audit |
Scans harness scripts for Linux-only idioms (nproc, /proc/, flock, date -d, readlink -f, sed -i, stat -c) and reports macOS-equivalent suggestions. Draft-only; --strict exits 4. |
autobuilder-proposal-aggregator |
Clusters the proposals/*.json pile by target_file + lexical-Jaccard rationale similarity, ranks by distinct-crate recurrence, filters applied.log. Emits hardening-backlog.json. |
Source for each tool lives under tools/<name>/ in this repo.
.
├── autobuilder/ # Cargo workspace: the autobuilder companion binary
│ ├── src/ # one module per pipeline stage / receipt producer
│ └── crates/metric-harness/# reusable metric-harness crate
├── tools/ # standalone Rust CLIs (built by install.sh)
│ ├── autobuilder-ac-counter/
│ ├── autobuilder-bincov-receipt/
│ ├── autobuilder-harness-portability-audit/
│ └── autobuilder-proposal-aggregator/
├── agent/ # canonical agent-state files (intent-card, owner-map, …)
│ ├── intent-card.json
│ ├── owner-map.json
│ ├── proof-lanes.toml
│ └── test-map.json
├── corpora/ # JSONL eval corpora consumed by metric-harness
├── scripts/run-metrics.sh # emits autobuilder.metrics.v1 for this repo
└── PLAN.md # full skill design
PRDs (the inputs this pipeline consumes) live in the private companion
repo joeyen-atscale/autobuilder-private, not here.
The ideas autobuilder synthesizes were lifted from three upstream
repositories: miolini/autoresearch-macos
(locked harness + a single unfakeable scalar metric), neverhuman/jankurai
(repository-local evidence receipts and an anti-pattern catalog), and
neverhuman/jeryu (N-of-N signed proof receipts on a risk gate). They were
previously vendored into this tree for reference but have been removed: the
relevant shapes are already translated into the skill files (provenance is noted
inline, e.g. "Lifted from jankurai/agent/JANKURAI_STANDARD.md"). Clone the
upstreams directly if you need the originals.
PRD ──► Stage 1: Intake & 5-Whys ──► intent-card.json
└─► Stage 2: Scaffold (cargo new + locked harness + lints)
└─► Stage 3: Iterate-and-Prove Loop (advance-or-revert)
└─► Stage 4: Risk Gate (7 receipts must agree)
└─► Stage 5: Postmortem + Self-Evolve
Stage 3 also runs scripts/run-mutants.sh (cargo-mutants telemetry, Phase 1)
when the crate has tests: it merges mutation_kill_rate and mutant counts into
metrics.json to catch tests that pass but cover only the implementation's
happy path. It is telemetry-only today (never blocks); a calibrated kill-rate
gate is a follow-on. See PRD autobuilder-mutation-testing.
The agent edits only src/. Everything else — Cargo.toml, clippy.toml,
deny.toml, tests/, scripts/run-metrics.sh — is read-only harness,
mirroring autoresearch's prepare.py/train.py separation. The skill ships
the BAD_RUST audit and risk-gate driver scripts in
~/.claude/skills/autobuilder/{rules/audit-checks.sh,scripts/risk-gate.sh}
rather than per-project, so they stay versioned in one place.
Every receipt is a digest-bound JSON object under
target/autobuilder/receipts/. The gate only attests that all seven are
present, schema-valid, and bound to the current HEAD — each producer owns
its own work and its own digest.
| Receipt | Schema | Produced by |
|---|---|---|
intake |
autobuilder.intent_card.v1 |
autobuilder intake |
vti-plan |
autobuilder.vti_plan_receipt.v1 |
autobuilder vti-plan |
proof-receipt |
autobuilder.iteration_receipt.v1 |
autobuilder loop |
risk-gate |
autobuilder.bad_rust_audit.v1 |
(BAD_RUST audit) |
reviewer-agent |
autobuilder.reviewer_agent_receipt.v1 |
autobuilder reviewer-agent |
rollback-plan |
autobuilder.rollback_plan_receipt.v1 |
autobuilder rollback-plan |
ci-checks |
autobuilder.ci_checks_receipt.v1 |
autobuilder ci-checks |
autobuilder gate aggregates them into release-receipt.json and exits
non-zero on block.
A thin Rust 2024 / rustc 1.85 binary. Everything load-bearing — intent-card
validation, scaffold materialization, the experiment-loop runner, evidence
writing, the 7-receipt gate, postmortem aggregation, the gated self-evolution
diff — lives here so it does not rot in shell.
cd autobuilder
cargo build --releaseThe workspace pins rustc 1.85.0 (rust-toolchain.toml) and applies strict
clippy lints (unwrap_used, expect_used, panic, unreachable,
dbg_macro, unsafe_code — all deny).
autobuilder intake # Stage 1: validate intent-card.json
autobuilder scaffold # Stage 2: materialize a project from templates/
autobuilder loop # Stage 3: iterate-and-prove
autobuilder metric-harness # run a project's harness, emit metrics.json
autobuilder vti-plan # Stage 4: route changed paths through proof-lanes.toml
autobuilder rollback-plan # Stage 4: verify HEAD~N..HEAD is git-revert-clean
autobuilder reviewer-agent # Stage 4: prepare/finalize the reviewer receipt
autobuilder ci-checks # Stage 4: confirm CI is green via `gh`
autobuilder gate # Stage 4: aggregate the 7 receipts → release receipt
autobuilder postmortem # Stage 5: aggregate run artifacts
autobuilder evolve # Stage 5: gated skill-self-diff
All subcommands are real. The bin has been bootstrapped through its own
gate (verdict=pass) and dogfooded against an external PRD
(mcp-tuner, 9 ACs green).
scripts/run-metrics.sh is the harness for this repo. Its unfakeable scalar
is stage4_receipt_producers_callable — how many of the Stage 4 receipt
producers respond on the freshly-built binary. Every acceptance criterion
maps 1:1 to a producer and exercises the producer's actual contract against
a tmp git fixture (writes rollback.md, routes a src/ change with confidence
1.0, blocks ci-checks when no GH run exists for HEAD, etc.) — not just
--help. Plus a build/test sanity AC and a digest-roundtrip AC.
./scripts/run-metrics.sh
cat target/autobuilder/metrics.jsonautobuilder is also a Claude Code skill (see .claude/). Invoke it from
inside a Claude Code session with a PRD path:
/autobuilder --prd path/to/prd.md
…and the skill drives all five stages, leaving every receipt under
target/autobuilder/receipts/ for human review.
When a slice passes the gate and is ready to share, the convention is to
publish it as its own GitHub repo at github.com/j0yen/<slug> rather than
import it into a monorepo. See Stage 6 — Publish
in the skill doc for the per-slice steps. The wider ecosystem is indexed
in j0yen/wintermute's REPOS.md;
its bootstrap/install.sh clones each published slice on a fresh machine.
- v0.2.0 (2026-05-30): added
autobuilder publishsubcommand — codifies the Stage-6 publish pipeline (README/LICENSE generation, branch normalize,wm-publishrepo create,wm-push,REPOS.mdupdate) into a deterministic, idempotent, dry-run-capable command (PRD-autobuilder-publish, ACs 1–9 green).
MIT licensed. See LICENSE.