Skip to content

telivity-otaip/asil

Repository files navigation

ASIL — Autonomous Software Improvement Loop

Scan a codebase. Find issues. Generate fixes. Review them three times. Stop when domain expertise is required. Open a PR only when every gate passes.

ASIL is the open-source extract of a production autonomous coding pipeline. It is the first publicly-available system that pairs autonomous code generation with multi-gate review, cost control, and a domain-expertise boundary — so the loop knows when to keep working and when to stop and ask a human.

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  System C   │    │  System B   │    │  System A   │
│ Cost Ctrl   │◀──▶│  Thought    │◀──▶│ Improvement │
│             │    │ Multiplier  │    │    Loop     │
└─────────────┘    └─────────────┘    └─────────────┘
   token budget       strategic         scan → fix →
   per-task caps      thinking          self-review →
   kill switch        synthesizer       adversarial gate →
   checkpoints        papa agent        domain guard →
                                        open PR

Why this exists

Today's autonomous coding agents (Cursor's background agents, GitHub Copilot Workspace, Devin, etc.) have three structural problems:

  1. They guess when they hit domain-specific code. A pricing rule, a regulatory constraint, a contract clause, a vendor-specific quirk — the model invents a plausible-sounding answer because it has to keep going. The wrong answer ships.
  2. They have no built-in cost controls. A runaway loop or a fan-out that picks the wrong model can burn $200 of API spend before anyone notices.
  3. They don't review their own work. The same model that wrote the change confirms it's good. Self-review by the writer is theater.

ASIL fixes all three at the architecture level, not as an afterthought:

  • A domain guard detects when the code being changed is in a region the maintainer has flagged as needing human input, and blocks the autonomous fix until a human triages it.
  • A cost controller tracks token spend per task, per system, per day, with checkpoints and a hard kill switch.
  • A three-persona self-review (code reviewer, security auditor, test engineer) plus a separate-LLM adversarial gate challenges the work before any PR is opened.

How it works

ASIL is composed of three orthogonal systems that you can adopt independently or wire together:

System C — Cost Controller

Token budget governor. The substrate every other system runs on.

  • BudgetManager allocates per-task budgets sized by category and model tier
  • TokenTracker persists actual spend to disk
  • CostCheckpoint is the runtime API: every LLM call passes through it, and it can refuse the call when budget is exhausted
  • KillSwitch enforces hard daily caps
  • UsageReporter formats spend rollups for inbox / Slack delivery
  • decimal.js for every cost calculation — no float drift on currency

System B — Thought Multiplier

Strategic thinking layer. Use this when one model giving one answer isn't good enough.

  • A router decides which specialized thinkers a request needs (architecture, planning, test strategy, security, API design, spec writing)
  • Each thinker runs independently with its own system prompt and JSON output envelope
  • A synthesizer merges their recommendations, surfaces conflicts, and computes a confidence score
  • The papa agent (Opus-tier) resolves conflicts and produces a final handoff brief
  • Every step accounts its tokens through System C

System A — Improvement Loop

The grind. Runs unattended.

scan → cycle-detect → triage domain questions → for each task:
  isolate (worktree)
    execute (LLM-generated patch, applied via diff)
    typecheck + tests
    self-review × 3 personas (reviewer, security, test-engineer)
    adversarial gate (different LLM provider)
    domain guard
    open PR
  • Scanner picks up five categories: test failures, type errors, TODO resolution, dead code, coverage gaps
  • Cycle detector prevents the loop from churning on the same file repeatedly
  • Worktree isolation — every task runs in a disposable git worktree (auto-falls-back to git clone on filesystems that don't support worktrees)
  • Self-review runs three persona prompts scoped strictly to the diff
  • Adversarial gate sends the diff to a different model (different provider, different family) to challenge the work
  • Domain guard blocks the PR if the diff touches a // DOMAIN_QUESTION: zone with no resolved answer

Quick start

# 1. Install
git clone https://github.com/telivity-otaip/asil asil
cd asil
pnpm install
pnpm build

# 2. Configure environment
export ANTHROPIC_API_KEY=...
export OPENAI_API_KEY=...
export REPO_ROOT=$(pwd)/path/to/your/target/repo

# 3. Use the thought multiplier (System B) standalone
pnpm --filter asil-runners run:b "add rate limiting to the connect API endpoint"

# 4. Run the autonomous loop (System A)
pnpm --filter asil-runners auto grind --dry-run
pnpm --filter asil-runners auto grind --max-tasks 3

See examples/quickstart.md for a five-minute integration walkthrough.

What it looks like

$ pnpm --filter asil-runners auto grind --max-tasks 5

🤖 ASIL — Autonomous Improvement Loop
   Repo: /workspace/your-app
   Budget: $20.00 / day, $0.40 / task

🔍 Scanning…
   ✓ test-failure       12 candidates
   ✓ type-error          3 candidates
   ✓ todo-resolution    27 candidates
   ✓ dead-code           7 candidates
   ✓ coverage-gap        3 candidates
   → 52 tasks queued (in 9.2s)

❓ 4 tasks block on unresolved DOMAIN_QUESTION markers.
   Would you like to triage them now? [y/n] y

   [1/4] packages/billing/src/rates.ts:124
         "How is the late-payment grace period calculated?"
         Proposals from Opus:
           a) 5 calendar days from due date
           b) 5 business days excluding bank holidays
           c) Cannot be answered without the contract terms
         Choose [a/b/c] or type a custom answer:

[…all four answered or deferred…]

🔧 Running 5 tasks in isolated worktrees…
   ✓ task-001 type-error packages/api/src/handlers/refund.ts
     ✓ typecheck (1.4s)  ✓ tests (12.3s)
     ✓ reviewer    pass
     ✓ security    pass
     ✓ test-eng    pass
     ✓ adversarial pass
     ✓ domain      no DOMAIN_QUESTION zones touched
     ✓ PR opened: #1247

   ✗ task-003 todo-resolution packages/billing/src/audit.ts
     ✗ adversarial gate raised a blocker:
       "The diff removes the audit-log entry but the
        commit message says 'preserve audit trail'."
     → reverted, returned to queue with note for next pass

   …

📊 Summary
   Completed: 3 PRs opened  ($0.84 spent)
   Failed:    1 (adversarial gate)
   Skipped:   1 (cycle detection — same file 3rd attempt)

Architecture

                       ┌────────────────────────────┐
                       │       System C (Cost)      │
                       │  TokenTracker              │
                       │  BudgetManager             │
                       │  CostCheckpoint            │
                       │  KillSwitch                │
                       │  UsageReporter             │
                       └────────────────────────────┘
                              ▲             ▲
                              │             │
            allocates budget  │             │  records spend
            checks            │             │
                              │             │
   ┌──────────────────────────┴───┐  ┌──────┴──────────────────────────┐
   │     System B (Thought)       │  │     System A (Improvement)      │
   │  router → thinkers →         │  │  scanner → executor →           │
   │  synthesizer → papa          │  │  self-review → adversarial →    │
   │  → handoff brief             │  │  domain guard → PR              │
   └──────────────────────────────┘  └─────────────────────────────────┘

Every LLM call in B and A flows through a CostCheckpoint. Each checkpoint is bound to a (taskId, systemId, agentId) triple so spend can be attributed in reports. When a budget is exhausted mid-task, the checkpoint refuses further calls and the task is rolled back cleanly.

Configuration

Knob Where Default Purpose
ANTHROPIC_API_KEY env required Primary LLM
OPENAI_API_KEY env required Adversarial gate (different provider)
REPO_ROOT env required Absolute path to the target repo
ASIL_USAGE_DATA_DIR env <REPO_ROOT>/.asil/usage-data Cost tracker state
ASIL_QUEUE_PATH env <REPO_ROOT>/.asil/usage-data/queue.json Task queue
ASIL_SKILLS_PATH env <REPO_ROOT>/.asil/skills Markdown skills the thinkers load
ASIL_DOMAIN_ANSWERS_PATH env <REPO_ROOT>/.asil/domain-answers.json Resolved domain answers
papaModel / thinkerModel ThoughtMultiplierConfig opus / sonnet Model tiers per role
executionModel / reviewModel ImprovementLoopConfig sonnet / sonnet Model tiers in System A
maxTasksPerRun ImprovementLoopConfig CLI flag Hard cap per auto grind
maxAttempts ImprovementLoopConfig 2 Per-task retry cap
taskCooldownMs ImprovementLoopConfig 5000 Throttle between tasks
skipCategories ImprovementLoopConfig [] Disable scanner categories
securityWeight ThoughtMultiplierConfig 0.7 Conflict-resolution weight
projectName ThoughtMultiplierConfig "Project" Header in generated brief
projectDoNotChange ThoughtMultiplierConfig [] Project-specific "do not modify" entries
dailyBudgetUSD BudgetManagerConfig 20.00 Cost-controller daily cap

Integration

ASIL is a TypeScript monorepo. Add the four packages to your workspace and wire them up via the LoopDeps and LLMCaller interfaces.

import { runLoop } from 'asil-improvement-loop';
import {
  createAnthropicCaller,
  createCodexCaller,
  createGitOps,
  createCostInfra,
  createDiffApplier,
  createFileFetcher,
} from 'asil-runners';

const llm = createAnthropicCaller(process.env.ANTHROPIC_API_KEY!);
const codex = createCodexCaller(process.env.OPENAI_API_KEY!);
const git = createGitOps(process.env.REPO_ROOT!);
const cost = createCostInfra(process.env.REPO_ROOT!);
const diff = createDiffApplier();
const files = createFileFetcher();

await runLoop({
  llm,
  codex,
  git,
  diff,
  fileFetcher: files,
  costInfra: cost,
  config: {
    executionModel: 'sonnet',
    reviewModel: 'sonnet',
    maxTasksPerRun: 5,
    maxAttempts: 2,
    taskCooldownMs: 5000,
    markdownSkillsPath: '.asil/skills',
    repoRoot: process.env.REPO_ROOT!,
    queuePath: '.asil/usage-data/queue.json',
    skipCategories: [],
    codexConfig: { apiKey: 'OPENAI_API_KEY', model: 'gpt-4o' },
  },
});

LLMCaller is a tiny mock-friendly interface (call(systemPrompt, userPrompt, model) → Promise<{content, inputTokens, outputTokens}>). The shipped wirings cover Anthropic and OpenAI; swapping in a local model, a self-hosted relay, or a different provider is a one-file change.

Key innovations

  • Domain guard. First system that explicitly detects domain-expertise boundaries via inline // DOMAIN_QUESTION: markers and refuses to fabricate answers. Unanswered questions block the affected files until a human triages them.
  • Three-persona self-review. Code reviewer + security auditor + test engineer, each with its own scoped prompt, all run on the diff only. No prior-context pollution.
  • Adversarial gate. A different LLM (different provider) reads the diff cold and tries to break it. Catches things the writer's family doesn't see.
  • Cost controller is first-class. Budget allocation, per-call checkpoints, kill switch, daily caps, persisted state. Not a logging afterthought.
  • Worktree isolation with FUSE fallback. Every task runs in a disposable git worktree clone. The main checkout is never touched. On filesystems where worktree add fails (Google Drive, some FUSE mounts), the runner falls back to git clone automatically.

Requirements

  • Node ≥ 20, pnpm ≥ 9
  • A target repo that uses pnpm (the executor runs pnpm install, pnpm build, pnpm test, pnpm typecheck inside the worktree)
  • Anthropic API key (primary)
  • OpenAI API key (adversarial gate)

Status

Production: this codebase has been running unattended in a 80+ agent travel AI platform, opening PRs autonomously every day, with a measurable rate of human-meaningful improvements landing in main without any reviewer override. This is the open-source extract — domain-specific code and proprietary prompts removed, public-safe defaults set, configuration knobs documented.

License

MIT — see LICENSE.

Credit

Created by Dušan Milicevic (Telivity) — extracted from a production autonomous system managing an 80+ agent travel AI platform.

About

Autonomous Software Improvement Loop — multi-gate code-improvement pipeline with cost control and domain-expertise boundaries

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors