Evidence-first coding agent for Claude Code CLI. Port of burkeholland/anvil from GitHub Copilot CLI.
Your coding agent should prove its work. This one does.
/anvil verifies every change before you see it β builds, tests, lints, runs IDE diagnostics, then has other AI models (Claude, GPT, Gemini, Ollama) try to break it. Every check is INSERTed into a SQLite ledger; the final Evidence Bundle is a SELECT, not prose. If the INSERT didn't happen, the verification didn't happen.
- Adversarial review. Every change gets attacked by 1 (Medium task) or 3 (Large / π΄ task) different models in parallel. Claude runs as a Task subagent; GPT / Gemini / Ollama run via Bash scripts. They disagree with each other. That's the point.
- SQL verification ledger. Every build, test, lint, diagnostics, and reviewer verdict lives in
anvil_checks. The Evidence Bundle isSELECT ... ORDER BY phaseβ the agent can't fake it. - Baseline snapshots. Before touching anything, anvil records your project's state. After, it diffs. Regressions are caught before you see the code.
- Pushback. Anvil is a senior engineer, not an order taker. If your request introduces tech debt, duplicates existing code, or has a dangerous edge case, it tells you before writing a single line.
- Session memory. A PostToolUse hook tracks every file anvil edits. Next session, if you touch that file again, anvil recalls past failures and accounts for them.
- Git autopilot. Checks git state, stashes uncommitted work, creates a branch, commits the result with a clean rollback command.
- Context7 docs. Bundled MCP server fetches up-to-date library docs so anvil doesn't hallucinate APIs.
- Claude Code CLI (2.1.x or newer; plugin support).
- Python 3.9+ on your
PATHβ used for the ledger and external reviewer scripts (stdlib only, nopip install). - Git (obviously).
- Node.js (only needed for the bundled Context7 MCP server, launched via
npx). - Optional reviewer credentials (see Reviewer setup below). Anvil gracefully degrades if a reviewer isn't configured.
In any Claude Code session:
/plugin marketplace add allut/claude-anvil
/plugin install claude-anvil@allut-claude-anvil
git clone https://github.com/allut/claude-anvil.git
claude --plugin-dir ./claude-anvilInitialize the ledger once (it'll auto-init on first /anvil run, but this lets you verify the install):
python scripts/anvil-ledger.py init
# -> anvil ledger ready at ~/.claude-anvil/anvil.dbRun /anvil-setup from any Claude Code session. The wizard will:
- Ask which reviewers to enable (Claude / Gemini / Ollama / OpenAI-compatible).
- Collect API keys, validate each with a small live request, and write them to
~/.claude-anvil/config.json(chmod0600on POSIX). - Pick the Claude reviewer's model (
sonnet/haiku/opus). - Offer to
ollama pullif Ollama is enabled and the model isn't pulled yet. - Optionally create bare shortcuts so you can type
/anvilinstead of/claude-anvil:anvil(see below).
You can re-run /anvil-setup any time to reconfigure, or /anvil-setup reset to wipe and start over.
The wizard's final step offers to write ~/.claude/commands/anvil.md and ~/.claude/commands/anvil-setup.md, which expose the commands without the plugin namespace prefix. After accepting, you can use /anvil and /anvil-setup directly in any conversation.
Re-running /anvil-setup refreshes the shortcuts when the plugin updates. To create them manually:
python "${CLAUDE_PLUGIN_ROOT}/scripts/anvil-config.py" create-shortcuts "${CLAUDE_PLUGIN_ROOT}"| Reviewer | Cost | Get a key |
|---|---|---|
| Claude (Task subagent) | Covered by your Claude Code plan | Already installed β wizard picks the model (sonnet default, haiku faster/cheaper, opus deepest) |
| Ollama (local) | Free | Install from ollama.com; the wizard offers to ollama pull your chosen model |
| Gemini (Google AI Studio) | Free tier | aistudio.google.com/apikey |
| OpenAI-compatible | Paid for openai.com GPT-5; free via OpenRouter / Groq | The wizard offers OpenAI / OpenRouter / Groq presets, or a custom endpoint |
Default roster after wizard: Medium = first enabled reviewer; Large = up to 3 enabled reviewers (priority claude > gemini > openai > ollama).
Power users: any ANVIL_* env var still wins over config.json, so the legacy .env-based flow keeps working β see .env.example for the full list. The dispatcher (anvil-review.py) reads env first, then config.json, then defaults.
Provider quirks handled automatically:
- GPT-5 and o1/o3: the
temperaturefield is omitted (those models reject anything but1). - OpenRouter / Groq:
HTTP-RefererandX-Titleattribution headers are sent unconditionally. If a free-tier model rejectsresponse_format: {type: json_object}, the wizard's "JSON mode = off" toggle (orANVIL_OPENAI_JSON_MODE=off) makes anvil fall back to extracting JSON from the reply body.
A commit gate blocks git commit during an active /anvil session when the ledger is missing evidence in any of the three phases (baseline, after, review). Commits outside an anvil session are unaffected.
From any Claude Code session inside the project you want to edit:
/anvil fix the off-by-one in getNthItem(), plus a regression test
What happens next, on a Medium task:
- Boost β your request is rewritten into a precise spec (hidden unless the intent changed).
- Git hygiene β checks for uncommitted changes / being on
main; pushes back if anything looks risky. - Recall β queries the SQLite ledger for past anvil runs that touched the same files, so surprises from previous sessions get surfaced.
- Survey β greps the codebase for existing helpers you should extend instead of re-inventing.
- Baseline β runs the project's build/tests/diagnostics and INSERTs the results as
phase='baseline'. - Implement β edits the code.
- The Forge β reruns build/tests/diagnostics (as
phase='after'), then stages the diff and unleashes one reviewer (Claude by default). The reviewer's JSON verdict is INSERTed asphase='review', check_name='review-claude'. - Evidence Bundle β rendered from
SELECT * FROM anvil_checks WHERE task_id = .... Any regression, any reviewer finding, shows up. - Commit β auto-commits with a one-line rollback.
Large tasks (multi-file, auth/crypto/payments, π΄ files) additionally show a plan step you approve via AskUserQuestion, run three reviewers in parallel, and include an operational-readiness check (secrets, observability, graceful degradation).
Small tasks (typos, renames, config tweaks) skip the ledger β just Quick Verify + optional commit.
# All checks for a task
python scripts/anvil-ledger.py select-bundle fix-login-crash
# Has a file been touched by past anvil runs?
python scripts/anvil-ledger.py recall auth.ts
# Did anything break on this file in the past?
python scripts/anvil-ledger.py recall-issues auth.ts
# List learned facts (build commands, codebase conventions, etc.)
python scripts/anvil-ledger.py memory-listThe raw DB is at ~/.claude-anvil/anvil.db. Open it with any SQLite browser.
-
EPERM: operation not permitted, rename β¦ allut-claude-anvil β allut-claude-anvil.bak(Windows) β Claude Code backs up the old plugin directory before updating it. Windows blocks the rename if any process holds a handle to a file inside that directory (common culprits: Windows Defender real-time scanning, Windows Explorer, or Claude Code itself). Fix:- Exit Claude Code completely.
- Double-click
plugin/scripts/fix-windows-plugin.bat(or run it from any terminal β no PowerShell execution policy change required). It removes the stale plugin directories and prints a confirmation. - Reopen Claude Code and run
/plugin marketplace add allut/claude-anvilagain.
To prevent recurrence, add
%USERPROFILE%\.claude\pluginsto Windows Defender's exclusion list (Windows Security β Virus & threat protection β Manage settings β Exclusions). -
anvil-ledger.py: schema missingβ runpython scripts/anvil-ledger.py initfrom the plugin directory so the script can resolvesql/schema.sql. -
"Ollama unreachable" β is the daemon running?
curl http://localhost:11434/api/tags. -
Gemini 429 / quota exceeded β the default
gemini-2.5-flashgives 1,500 req/day free. If you're still hitting limits, re-run/anvil-setupand generate a new API key at https://aistudio.google.com/apikey, or switch to a different reviewer. -
Reviewer returned "concern: no parseable verdict" β the model ignored the JSON-only instruction. Happens occasionally with smaller/older models;
anvil-review.pyattempts to extract the first JSON object from the reply and falls back toconcernif that fails. -
PostToolUse hook isn't tracking edits β the hook only runs while an anvil session is open (
sessions.ended_at IS NULL). If/anvilcrashed partway through, runpython scripts/anvil-ledger.py end-session <id> "crashed"to close it.
The Setup Wizard stores API keys in the OS credential store when available β no plaintext lands in config.json:
| Platform | Storage |
|---|---|
| macOS | macOS Keychain (security CLI); config.json holds the marker "keychain" |
| Windows | DPAPI-encrypted Base64 inline in config.json; unreadable on any other machine or user account |
| Linux | libsecret via secret-tool; config.json holds the marker "keychain" |
| Fallback | Plaintext in config.json (chmod 0600) if no keychain backend is detected |
Run python scripts/anvil-config.py keychain-status to see which backend is active and where each key is stored.
The legacy .env file (if you use it) is also gitignored. Double-check your shell history and process listing regardless of storage tier.
Keys vs. diffs: API keys are sent only to their issuing service (Gemini key β Google, OpenAI key β OpenAI) as authentication credentials β they are not shared with other providers. What is sent to each enabled provider is the staged code diff for review; don't point GPT/Gemini at diffs you can't share with a third party.
The Anvil Loop prompt is adapted from burkeholland/anvil (MIT). The Copilot CLI primitives (ask_user, store_memory, session_store, Copilot multi-vendor subagents) were translated to their Claude Code equivalents.
MIT. See LICENSE.