Agent OS

An autonomous-first software organization for supervised rollout: agents handle routine delivery loops, while humans stay in governance, review, and escalation paths.

You give it a backlog. It ships product.

Public proof — everything is auditable: Reliability dashboard · Case study · Live discussion

See it work - real task, end-to-end execution

Real execution: Issue #115 → agent dispatched → code written → tests pass → PR #122 merged → issue closed. The happy path can complete without manual coding, but new repos should still start in supervised mode.

Agent Performance - rolling 14 days

Success rate	Mean completion	Escalation rate	Tasks executed
69% (61/88)	0.1h	11% (10/88)	88

Current pool: Claude · Codex (Gemini and DeepSeek were retired from rotation after quality review). Metrics above are from the public reliability dashboard updated on 2026-04-21. Full reliability dashboard → · Multi-agent case study →

Reality Check

Agent OS is autonomous in the happy path, not "no-human-ever." Escalations are a first-class path after bounded retries.
Results shown in this repo are for this repo's workload. A fresh external repo usually starts lower and needs tuning.
Use automation_mode: dispatcher_only first for new repos, then graduate to full automation after reliability is stable.

Recommended Rollout

If you are evaluating Agent OS on a real product, do not jump straight to full autonomy.

Run the sandbox demo and verify the local toolchain works end to end.
Start your repo in dispatcher_only mode with manual PR review.
Give it 5 to 10 bounded issues with clear success criteria and good tests.
Measure operator time, escalation rate, and merged-PR quality before expanding scope.
Turn on the planner, groomer, and full cron loop only after the supervised pilot is stable.

The best adoption story is not "trust us blindly." It is "run a cheap, auditable pilot and promote the system only after it earns trust."

Good Initial Fits

backend endpoints, CLI commands, docs, tests, CI fixes, and contained refactors behind existing tests
repositories with one clear default branch, deterministic test commands, and explicit ownership
teams that want issue-to-PR automation first, not autonomous product strategy on day one

Poor First Fits

sweeping UX redesigns, weakly specified frontend work, or broad "make this better" tickets
repos with flaky CI, missing tests, hidden credentials, or ambiguous local setup
products expecting unattended operation before the first supervised pilot has passed

Goal

Make Agent OS the most credible autonomous software organization for technical founders and solo builders: a system that can reliably turn backlog input into useful shipped work, improve itself from operational evidence, and earn trust through visible results. Prioritize work that increases adoption, reliability, evidence quality, and operator confidence over work that only creates attention.

This README was written by an agent. The CI pipeline was built by an agent. The backlog groomer that generates improvement tickets was written by an agent dispatched from a ticket that was generated by the log analyzer. It's turtles all the way down.

Why Agent OS?

Most AI tools make individual developers faster. Agent OS asks a different question: what if developers can focus on high-leverage decisions while agents handle routine execution?

Not because humans aren't valuable — but because most engineering work is structured, bounded, and repetitive enough that a well-orchestrated team of AI agents can handle it autonomously. The hard part was never the coding. It was the coordination: task state, routing, context preservation, failure recovery, quality gates, and institutional memory.

Agent OS solves coordination so agents can do more of the routine delivery work reliably.

It's not a copilot. It's not a chatbot. It's an execution system you supervise.

The Loop

            GitHub Issue (Backlog)
                    │
            Status → Ready
                    │
            ┌───────▼────────┐
            │   Dispatcher   │  LLM-formats task, routes by repo + type
            └───────┬────────┘
                    │
            ┌───────▼────────┐
            │  Queue Engine  │  Worktree → Agent → Result → Retry/Escalate
            └───────┬────────┘
                    │
              Push branch, open PR (body: `Closes #N`)
                    │
            ┌───────▼────────┐
            │  PR Monitor    │  CI green → merge · Conflict → auto-rebase
            │  + e2e health  │  Wedged >4h on same blocker → terminal close,
            └───────┬────────┘  dispatcher re-spawns from clean main
                    │
              Issue auto-closed on merge, board → Done
                    │
     ┌──────────────┼──────────────────┬──────────────────┐
     ▼              ▼                  ▼                  ▼
 Log Analyzer  Backlog Groomer  Strategic Planner  Incident Scanner
 (weekly)      (hourly cadence)  (per sprint)      (every 6h)
 metrics       backlog hygiene   objectives        runtime signals →
 + failure     + new issues      + priorities      self-fix issues
     │              │                  │                  │
     └──────────────┴──────────────────┴──────────────────┘
                              │
                         back into the backlog

That last arrow is the point. Four separate improvement loops all file tickets about the system's own failures — from slow chronic issues (log analyzer, weekly) to acute runtime incidents (incident scanner, every 6h). The tickets enter the backlog. The agents fix them. The fixes get merged. Next cycle, the system is better. Indefinitely.

Recursive Self-Improvement

This is the part that makes Agent OS different from a task runner.

Every 6 hours — the incident scanner reads the last 24h of runtime signals (incidents, escalation notes, anomaly audit events), classifies recurring patterns via deterministic rules + LLM fallback, and files self-fix issues labeled autonomous-fix. Closes the acute-incident loop: by the time an operator would otherwise be paged twice about the same bug, the fix issue is already in the backlog.
Every 5 minutes — pr_monitor manages PR health end-to-end: auto-merge on green, auto-rebase on conflict, and if a PR is wedged for >4h on the same blocker signature it's terminal-closed + branch deleted so the dispatcher re-spawns from a clean main instead of looping forever on the same failure.
Every Monday — the log analyzer reads a week of execution metrics, synthesizes chronic failure patterns, and files fix tickets with evidence and reasoning
Every hour — the backlog groomer runs per-repo cadence checks; each repo generates new issues on its own schedule (default 3.5 days, tunable). It also triages blocked issues and dedupes against semantic near-duplicates.
Every sprint — the strategic planner evaluates business-outcome metrics, adjusts priorities, and selects the next sprint from the backlog

These generated issues are indistinguishable from human-written ones. They enter the same queue, get dispatched to the same agents, go through the same CI → merge pipeline. The system literally engineers itself.

Merged agent PRs now lead with Closes #N in the PR body so GitHub auto-closes the linked issue, eliminating a class of phantom escalations where the product fix shipped but the orchestration state lagged.

Get Started in 5 Minutes

Option A: Sandbox demo (2 minutes)

Zero config — creates a test issue, dispatches it to Claude, and shows the full loop:

git clone https://github.com/kai-linux/agent-os && cd agent-os
gh auth login          # only prerequisite besides claude CLI
./demo.sh              # or: make demo

Requirements: gh (authenticated), python3, claude CLI. Works on macOS and Linux.

Option B: Supervised pilot on your repo (5 minutes)

Run Agent OS against your own repo in the recommended adoption path: manual dispatch first, manual review first, then controlled expansion.

Step 1 — Clone and install

git clone https://github.com/kai-linux/agent-os && cd agent-os
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

Step 2 — Authenticate GitHub

gh auth login
gh auth refresh -s project            # needed for GitHub Projects board access

Step 3 — Configure

cp example.config.yaml config.yaml

Edit config.yaml — the minimum you need to set:

root_dir: "~/agent-os"
worktrees_dir: "/srv/worktrees"       # any writable path for agent worktrees
allowed_repos:
  - /path/to/your/repo                # local clone of the repo agents will work on
default_allow_push: true

Step 4 — Create your first task

Open an issue on your repo with a clear title and body containing:

## Goal
<what you want done>

## Success Criteria
- <measurable outcome>

## Constraints
- <any boundaries>

Then move the issue to Ready on your GitHub Projects board (or add a Status: Ready label).

For the first pilot, choose tasks that should take one agent 10 to 40 minutes and touch a narrow surface area.

Step 5 — Dispatch and watch

# Run the dispatcher once to pick up your Ready issue
python3 -m orchestrator.github_dispatcher

# Run the queue to execute the task
python3 -m orchestrator.queue

# Check the result
cat runtime/mailbox/*/result/.agent_result.md

The agent clones a worktree, writes code, runs tests, pushes a branch, and opens a PR.

Step 6 — View results

# See the PR the agent created
gh pr list --repo your-user/your-repo

# Review the first few PRs manually
gh pr view <PR_NUMBER> --repo your-user/your-repo

# Auto-merge only after the supervised pilot is stable
python3 -m orchestrator.pr_monitor

Optional: set up cron for full autonomy

Prefer bin/agentos init (Option C) — it installs the full cron block automatically. If you need to install manually, the current layout is:

# Auto-pull latest orchestrator code
* * * * * /path/to/agent-os/bin/run_autopull.sh >> runtime/logs/autopull.log 2>&1

# Dispatcher + queue + pr_monitor + telegram control
* * * * *   /path/to/agent-os/bin/run_dispatcher.sh       >> runtime/logs/dispatcher.log 2>&1
* * * * *   /path/to/agent-os/bin/run_queue.sh            >> runtime/logs/queue.log 2>&1
*/5 * * * * /path/to/agent-os/bin/run_pr_monitor.sh       >> runtime/logs/pr_monitor.log 2>&1
* * * * *   /path/to/agent-os/bin/run_telegram_control.sh >> runtime/logs/telegram_control.log 2>&1

# Self-improvement loops (per-repo cadence inside each)
0 * * * *  /path/to/agent-os/bin/run_backlog_groomer.sh    >> runtime/logs/backlog_groomer.log 2>&1
0 * * * *  /path/to/agent-os/bin/run_strategic_planner.sh  >> runtime/logs/strategic_planner.log 2>&1
15 */6 * * * /path/to/agent-os/bin/run_incident_scanner.sh >> runtime/logs/incident_scanner.log 2>&1

# Weekly scoring + log analysis (Monday 06:30 / 07:00)
30 6 * * 1 /path/to/agent-os/bin/run_agent_scorer.sh >> runtime/logs/agent_scorer.log 2>&1
0  7 * * 1 /path/to/agent-os/bin/run_log_analyzer.sh >> runtime/logs/log_analyzer.log 2>&1

# Daily digest + product inspection (08:00 / 06:00)
0 8 * * * /path/to/agent-os/bin/run_daily_digest.sh       >> runtime/logs/daily_digest.log 2>&1
0 6 * * * /path/to/agent-os/bin/run_product_inspector.sh  >> runtime/logs/product_inspector.log 2>&1

Every entrypoint sources bin/common_env.sh, which honors the bin/agentos off kill-switch, so cron entries stay installed but exit early when the orchestrator is paused.

Option C: Bootstrap From Scratch

If you do not already have a repo, project board, Telegram bot, or cron installed for Agent OS, run the guided bootstrap:

bin/agentos init

Eight interactive steps:

What are you building? Short intake — idea, kind (web/api/game/…), stack preference, success criteria for the first user.
GitHub repo. Create or adopt a repo. .gitignore is seeded with .agent_result.md (the agent→orchestrator handoff contract — must never be committed) and other sensible defaults.
Charter and supporting docs. The architect agent proposes a stack, rationale, and the first 3–5 seed issues, then writes four markdown docs into the new repo: NORTH_STAR.md (first vertical slice), VISION.md (2–5 year end-state), STRATEGY.md (phased path from today to vision), PLANNING_PRINCIPLES.md (non-negotiable agent rules).
Telegram control plane. Pair a bot to this operator. Existing Telegram credentials in config.yaml are preserved, not overwritten.
Tuning cadence and thresholds. Interactively set sprint_cadence_days, groomer_cadence_days, max_parallel_workers, runtime cap, plan size, retries, dependency-watcher cadence. Defaults come from existing config.yaml when present — press enter to keep what you already tuned.
Write config.yaml. If config.yaml already exists, you're prompted to confirm merging the new project into it (scalars like existing Telegram token and existing project entries are preserved via setdefault, so re-running init never clobbers earlier customizations). Backup of the pre-merge file is always written to config.yaml.bak.TIMESTAMP.
Cron setup. Installs or updates the orchestrator cron block; unchanged on re-run.
Done. Project URL, config path, and first-PR ETA printed.

You can safely re-run bin/agentos init whenever you want to add another project: existing settings survive, the new repo joins the pool, and the cadence prompts default to your current config so hitting enter preserves your tuning.

Pause & Resume

One command stops or restarts the whole orchestrator — no crontab editing, no process hunting. Every cron entrypoint sources bin/common_env.sh, which bails out early when a kill-switch file exists.

bin/agentos off       # pause all dispatch, queue, PR-monitor, groomer, etc.
bin/agentos on        # resume
bin/agentos status    # show current state

When OFF, cron jobs still fire on schedule but exit immediately (exit 0, cron-silent). Interactive runs that need to bypass the switch can set AGENT_OS_IGNORE_DISABLED=1.

From Telegram. The existing bot doubles as a control tower — the same chat that receives escalations and digests accepts commands:

Command	Effect
`/on`	remove the global kill-switch — cron resumes on next tick
`/off`	engage the global kill-switch — pause the orchestrator
`/status`	report current ON/OFF state
`/repos`	list configured repos with mode + cadence + per-repo state
`/repo on <key>` / `/repo off <key>`	pause/resume a single repo without touching config
`/repo mode <key> full\|dispatcher`	flip the parent project's `automation_mode`
`/repo cadence <key> <days>`	set sprint cadence (days, fractional allowed — e.g. `0.5`); drops any explicit groomer override so groomer auto-derives at half the sprint
`/jobs`	list cron entrypoints and their per-job state
`/job on <name>` / `/job off <name>`	pause/resume a single cron job (e.g. `pr_monitor`)
`/help`	list commands

A dedicated poller (bin/run_telegram_control.sh) runs every minute with AGENT_OS_IGNORE_DISABLED=1 so /on still reaches the orchestrator while it is paused. The telegram_control job itself is protected — /job off telegram_control is rejected so you can never lock yourself out.

State lives in flag files under runtime/state/:

disabled — global kill-switch
repo_disabled/<key> — per-repo skip
job_disabled/<name> — per-job skip

/repo mode and /repo cadence are the only commands that touch config.yaml; both do surgical line edits that preserve comments.

Secret guard. A hooks/pre-commit hook blocks commits to config.yaml or objectives/*.yaml (except objectives/example.yaml) and rejects any staged diff containing a Telegram-bot-token shape. Enable once with git config core.hooksPath hooks.

How It Works

Component	Role	Cadence
`github_dispatcher.py`	Triages backlog, assigns + formats tasks	Every minute
`queue.py`	Routes to best agent, retries, escalates	Per task
`pr_monitor.py`	CI gate, auto-merge, auto-rebase, e2e health terminal-close of wedged PRs	Every 5 min
`incident_scanner.py`	Turns runtime signals (incidents + escalations + audit anomalies) into self-fix issues via deterministic rules + LLM fallback	Every 6 hours
`backlog_groomer.py`	Backlog hygiene + per-repo task generation + blocker triage + dedup notifications	Every hour (per-repo cadence gate)
`strategic_planner.py`	Sprint planning from evidence + objectives	Every hour (per-sprint gate)
`work_verifier.py`	Pre-merge deterministic + LLM judge on every PR	Per PR
`log_analyzer.py`	Chronic failure analysis → fix tickets	Weekly (Mon 07:00)
`agent_scorer.py`	Execution + business-outcome scoring	Weekly (Mon 06:30)
`product_inspector.py`	Live product-health + adoption probes	Daily (06:00)
`daily_digest.py`	Operator digest to Telegram	Daily (08:00)

2 agents in the active pool: Claude, Codex — routed by task type with automatic fallback chains. Gemini and DeepSeek were retired from rotation after quality review; the adapter contracts remain in orchestrator/ so either can be re-enabled by updating agent_fallbacks in config.yaml.

The backlog is GitHub Issues. The sprint board is GitHub Projects. The standup is Telegram. The office is a $5/month VPS.

Key Design Choices

GitHub is the entire control plane — no second system
Markdown files, not message brokers — you can ls the queue
Isolated worktrees — agents never collide
One contract, many agents — .agent_result.md is the only interface
Memory that compounds — CODEBASE.md grows with every completed task

Capability Ladder

Level	What	Status
1	Reliable execution engine	Done
2	Strategic planning + retrospectives	Current
3	Evidence-driven planning (analytics, research, product inspection)	In progress
4	Closed-loop optimization (hypothesis → experiment → measurement)	Next
5+	Self-directed growth across repos and products	Future

Historical Case Study Snapshot

These are historical campaign snapshots from earlier runs, included for context. Use the reliability dashboard for current health.

Metric	Value
Issues closed	103
PRs merged	79
Commits	338 in 29 days (~12/day)
Overall success rate (campaign)	62% (90/146 tasks)

Reliability dashboard → · Case study → · GitHub Discussion

Documentation

Topic	Link
Deployment guide for solo builders	docs/deployment-guide.md
External repo pilot playbook	docs/external-repo-pilot.md

| Fork guide — customize agent routing, dispatch, prompts | FORK_GUIDE.md |

| Local development setup (contributors + forkers) | docs/local-development.md |

Get Involved

Try it — clone the repo and run ./demo.sh to see an agent ship code in minutes.

Contribute — check open issues or file one. PRs welcome.

Questions? — open a discussion or reach out via the repo.

If Agent OS is interesting to you, give it a star. It helps others find the project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent OS

See it work - real task, end-to-end execution

Agent Performance - rolling 14 days

Reality Check

Recommended Rollout

Good Initial Fits

Poor First Fits

Goal

Why Agent OS?

The Loop

Recursive Self-Improvement

Get Started in 5 Minutes

Option A: Sandbox demo (2 minutes)

Option B: Supervised pilot on your repo (5 minutes)

Option C: Bootstrap From Scratch

Pause & Resume

How It Works

Key Design Choices

Capability Ladder

Historical Case Study Snapshot

Documentation

Get Involved

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 524 Commits
.github		.github
bin		bin
docs		docs
hooks		hooks
objectives		objectives
orchestrator		orchestrator
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CODEBASE.md		CODEBASE.md
CONTRIBUTING.md		CONTRIBUTING.md
CRON.md		CRON.md
FORK_GUIDE.md		FORK_GUIDE.md
LICENSE		LICENSE
Makefile		Makefile
NORTH_STAR.md		NORTH_STAR.md
PLANNING_PRINCIPLES.md		PLANNING_PRINCIPLES.md
PR98_POSTMORTEM_COMMENT.md		PR98_POSTMORTEM_COMMENT.md
README.md		README.md
RUBRIC.md		RUBRIC.md
SPRINT_REPORT.md		SPRINT_REPORT.md
STRATEGY.md		STRATEGY.md
demo.sh		demo.sh
example.config.yaml		example.config.yaml
library_catalog.yaml		library_catalog.yaml
requirements.txt		requirements.txt
target_operating_model.yaml		target_operating_model.yaml
verified_packages.yaml		verified_packages.yaml

Folders and files

Latest commit

History

Repository files navigation

Agent OS

See it work - real task, end-to-end execution

Agent Performance - rolling 14 days

Reality Check

Recommended Rollout

Good Initial Fits

Poor First Fits

Goal

Why Agent OS?

The Loop

Recursive Self-Improvement

Get Started in 5 Minutes

Option A: Sandbox demo (2 minutes)

Option B: Supervised pilot on your repo (5 minutes)

Option C: Bootstrap From Scratch

Pause & Resume

How It Works

Key Design Choices

Capability Ladder

Historical Case Study Snapshot

Documentation

Get Involved

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages