Scan a codebase. Find issues. Generate fixes. Review them three times. Stop when domain expertise is required. Open a PR only when every gate passes.
ASIL is the open-source extract of a production autonomous coding pipeline. It is the first publicly-available system that pairs autonomous code generation with multi-gate review, cost control, and a domain-expertise boundary — so the loop knows when to keep working and when to stop and ask a human.
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ System C │ │ System B │ │ System A │
│ Cost Ctrl │◀──▶│ Thought │◀──▶│ Improvement │
│ │ │ Multiplier │ │ Loop │
└─────────────┘ └─────────────┘ └─────────────┘
token budget strategic scan → fix →
per-task caps thinking self-review →
kill switch synthesizer adversarial gate →
checkpoints papa agent domain guard →
open PR
Today's autonomous coding agents (Cursor's background agents, GitHub Copilot Workspace, Devin, etc.) have three structural problems:
- They guess when they hit domain-specific code. A pricing rule, a regulatory constraint, a contract clause, a vendor-specific quirk — the model invents a plausible-sounding answer because it has to keep going. The wrong answer ships.
- They have no built-in cost controls. A runaway loop or a fan-out that picks the wrong model can burn $200 of API spend before anyone notices.
- They don't review their own work. The same model that wrote the change confirms it's good. Self-review by the writer is theater.
ASIL fixes all three at the architecture level, not as an afterthought:
- A domain guard detects when the code being changed is in a region the maintainer has flagged as needing human input, and blocks the autonomous fix until a human triages it.
- A cost controller tracks token spend per task, per system, per day, with checkpoints and a hard kill switch.
- A three-persona self-review (code reviewer, security auditor, test engineer) plus a separate-LLM adversarial gate challenges the work before any PR is opened.
ASIL is composed of three orthogonal systems that you can adopt independently or wire together:
Token budget governor. The substrate every other system runs on.
BudgetManagerallocates per-task budgets sized by category and model tierTokenTrackerpersists actual spend to diskCostCheckpointis the runtime API: every LLM call passes through it, and it can refuse the call when budget is exhaustedKillSwitchenforces hard daily capsUsageReporterformats spend rollups for inbox / Slack deliverydecimal.jsfor every cost calculation — no float drift on currency
Strategic thinking layer. Use this when one model giving one answer isn't good enough.
- A router decides which specialized thinkers a request needs (architecture, planning, test strategy, security, API design, spec writing)
- Each thinker runs independently with its own system prompt and JSON output envelope
- A synthesizer merges their recommendations, surfaces conflicts, and computes a confidence score
- The papa agent (Opus-tier) resolves conflicts and produces a final handoff brief
- Every step accounts its tokens through System C
The grind. Runs unattended.
scan → cycle-detect → triage domain questions → for each task:
isolate (worktree)
execute (LLM-generated patch, applied via diff)
typecheck + tests
self-review × 3 personas (reviewer, security, test-engineer)
adversarial gate (different LLM provider)
domain guard
open PR
- Scanner picks up five categories: test failures, type errors, TODO resolution, dead code, coverage gaps
- Cycle detector prevents the loop from churning on the same file repeatedly
- Worktree isolation — every task runs in a disposable git worktree (auto-falls-back to
git cloneon filesystems that don't support worktrees) - Self-review runs three persona prompts scoped strictly to the diff
- Adversarial gate sends the diff to a different model (different provider, different family) to challenge the work
- Domain guard blocks the PR if the diff touches a
// DOMAIN_QUESTION:zone with no resolved answer
# 1. Install
git clone https://github.com/telivity-otaip/asil asil
cd asil
pnpm install
pnpm build
# 2. Configure environment
export ANTHROPIC_API_KEY=...
export OPENAI_API_KEY=...
export REPO_ROOT=$(pwd)/path/to/your/target/repo
# 3. Use the thought multiplier (System B) standalone
pnpm --filter asil-runners run:b "add rate limiting to the connect API endpoint"
# 4. Run the autonomous loop (System A)
pnpm --filter asil-runners auto grind --dry-run
pnpm --filter asil-runners auto grind --max-tasks 3See examples/quickstart.md for a five-minute integration walkthrough.
$ pnpm --filter asil-runners auto grind --max-tasks 5
🤖 ASIL — Autonomous Improvement Loop
Repo: /workspace/your-app
Budget: $20.00 / day, $0.40 / task
🔍 Scanning…
✓ test-failure 12 candidates
✓ type-error 3 candidates
✓ todo-resolution 27 candidates
✓ dead-code 7 candidates
✓ coverage-gap 3 candidates
→ 52 tasks queued (in 9.2s)
❓ 4 tasks block on unresolved DOMAIN_QUESTION markers.
Would you like to triage them now? [y/n] y
[1/4] packages/billing/src/rates.ts:124
"How is the late-payment grace period calculated?"
Proposals from Opus:
a) 5 calendar days from due date
b) 5 business days excluding bank holidays
c) Cannot be answered without the contract terms
Choose [a/b/c] or type a custom answer:
[…all four answered or deferred…]
🔧 Running 5 tasks in isolated worktrees…
✓ task-001 type-error packages/api/src/handlers/refund.ts
✓ typecheck (1.4s) ✓ tests (12.3s)
✓ reviewer pass
✓ security pass
✓ test-eng pass
✓ adversarial pass
✓ domain no DOMAIN_QUESTION zones touched
✓ PR opened: #1247
✗ task-003 todo-resolution packages/billing/src/audit.ts
✗ adversarial gate raised a blocker:
"The diff removes the audit-log entry but the
commit message says 'preserve audit trail'."
→ reverted, returned to queue with note for next pass
…
📊 Summary
Completed: 3 PRs opened ($0.84 spent)
Failed: 1 (adversarial gate)
Skipped: 1 (cycle detection — same file 3rd attempt)
┌────────────────────────────┐
│ System C (Cost) │
│ TokenTracker │
│ BudgetManager │
│ CostCheckpoint │
│ KillSwitch │
│ UsageReporter │
└────────────────────────────┘
▲ ▲
│ │
allocates budget │ │ records spend
checks │ │
│ │
┌──────────────────────────┴───┐ ┌──────┴──────────────────────────┐
│ System B (Thought) │ │ System A (Improvement) │
│ router → thinkers → │ │ scanner → executor → │
│ synthesizer → papa │ │ self-review → adversarial → │
│ → handoff brief │ │ domain guard → PR │
└──────────────────────────────┘ └─────────────────────────────────┘
Every LLM call in B and A flows through a CostCheckpoint. Each checkpoint is bound to a (taskId, systemId, agentId) triple so spend can be attributed in reports. When a budget is exhausted mid-task, the checkpoint refuses further calls and the task is rolled back cleanly.
| Knob | Where | Default | Purpose |
|---|---|---|---|
ANTHROPIC_API_KEY |
env | required | Primary LLM |
OPENAI_API_KEY |
env | required | Adversarial gate (different provider) |
REPO_ROOT |
env | required | Absolute path to the target repo |
ASIL_USAGE_DATA_DIR |
env | <REPO_ROOT>/.asil/usage-data |
Cost tracker state |
ASIL_QUEUE_PATH |
env | <REPO_ROOT>/.asil/usage-data/queue.json |
Task queue |
ASIL_SKILLS_PATH |
env | <REPO_ROOT>/.asil/skills |
Markdown skills the thinkers load |
ASIL_DOMAIN_ANSWERS_PATH |
env | <REPO_ROOT>/.asil/domain-answers.json |
Resolved domain answers |
papaModel / thinkerModel |
ThoughtMultiplierConfig |
opus / sonnet |
Model tiers per role |
executionModel / reviewModel |
ImprovementLoopConfig |
sonnet / sonnet |
Model tiers in System A |
maxTasksPerRun |
ImprovementLoopConfig |
CLI flag | Hard cap per auto grind |
maxAttempts |
ImprovementLoopConfig |
2 |
Per-task retry cap |
taskCooldownMs |
ImprovementLoopConfig |
5000 |
Throttle between tasks |
skipCategories |
ImprovementLoopConfig |
[] |
Disable scanner categories |
securityWeight |
ThoughtMultiplierConfig |
0.7 |
Conflict-resolution weight |
projectName |
ThoughtMultiplierConfig |
"Project" |
Header in generated brief |
projectDoNotChange |
ThoughtMultiplierConfig |
[] |
Project-specific "do not modify" entries |
dailyBudgetUSD |
BudgetManagerConfig |
20.00 |
Cost-controller daily cap |
ASIL is a TypeScript monorepo. Add the four packages to your workspace and wire them up via the LoopDeps and LLMCaller interfaces.
import { runLoop } from 'asil-improvement-loop';
import {
createAnthropicCaller,
createCodexCaller,
createGitOps,
createCostInfra,
createDiffApplier,
createFileFetcher,
} from 'asil-runners';
const llm = createAnthropicCaller(process.env.ANTHROPIC_API_KEY!);
const codex = createCodexCaller(process.env.OPENAI_API_KEY!);
const git = createGitOps(process.env.REPO_ROOT!);
const cost = createCostInfra(process.env.REPO_ROOT!);
const diff = createDiffApplier();
const files = createFileFetcher();
await runLoop({
llm,
codex,
git,
diff,
fileFetcher: files,
costInfra: cost,
config: {
executionModel: 'sonnet',
reviewModel: 'sonnet',
maxTasksPerRun: 5,
maxAttempts: 2,
taskCooldownMs: 5000,
markdownSkillsPath: '.asil/skills',
repoRoot: process.env.REPO_ROOT!,
queuePath: '.asil/usage-data/queue.json',
skipCategories: [],
codexConfig: { apiKey: 'OPENAI_API_KEY', model: 'gpt-4o' },
},
});LLMCaller is a tiny mock-friendly interface (call(systemPrompt, userPrompt, model) → Promise<{content, inputTokens, outputTokens}>). The shipped wirings cover Anthropic and OpenAI; swapping in a local model, a self-hosted relay, or a different provider is a one-file change.
- Domain guard. First system that explicitly detects domain-expertise boundaries via inline
// DOMAIN_QUESTION:markers and refuses to fabricate answers. Unanswered questions block the affected files until a human triages them. - Three-persona self-review. Code reviewer + security auditor + test engineer, each with its own scoped prompt, all run on the diff only. No prior-context pollution.
- Adversarial gate. A different LLM (different provider) reads the diff cold and tries to break it. Catches things the writer's family doesn't see.
- Cost controller is first-class. Budget allocation, per-call checkpoints, kill switch, daily caps, persisted state. Not a logging afterthought.
- Worktree isolation with FUSE fallback. Every task runs in a disposable
git worktreeclone. The main checkout is never touched. On filesystems whereworktree addfails (Google Drive, some FUSE mounts), the runner falls back togit cloneautomatically.
- Node ≥ 20, pnpm ≥ 9
- A target repo that uses pnpm (the executor runs
pnpm install,pnpm build,pnpm test,pnpm typecheckinside the worktree) - Anthropic API key (primary)
- OpenAI API key (adversarial gate)
Production: this codebase has been running unattended in a 80+ agent travel AI platform, opening PRs autonomously every day, with a measurable rate of human-meaningful improvements landing in main without any reviewer override. This is the open-source extract — domain-specific code and proprietary prompts removed, public-safe defaults set, configuration knobs documented.
MIT — see LICENSE.
Created by Dušan Milicevic (Telivity) — extracted from a production autonomous system managing an 80+ agent travel AI platform.