Refresh README: e2e PR health, incident scanner, new init flow, retired agents by kai-linux · Pull Request #345 · kai-linux/agent-os

kai-linux · 2026-04-24T07:25:38Z

Summary

Sync the README with system state after the 2026-04-23 work (PRs #333–#344).

Changes

Agent pool — active pool is now Claude + Codex (Gemini and DeepSeek retired after quality review, per earlier operator decision)
"The Loop" diagram — shows the four improvement loops (log analyzer, groomer, planner, incident scanner) feeding the backlog, and names pr_monitor's e2e-health terminal-close step
"Recursive Self-Improvement" — lists the two new acute loops: incident scanner every 6h, pr_monitor e2e health every 5 min. Notes that merged agent PRs now auto-close their linked issue via Closes #N
"Option C: Bootstrap From Scratch" — now walks through all 8 interactive init steps, including the new tuning-cadence prompt (Init: preserve existing config, prompt for cadence, write supporting docs #342) and the additional charter documents (NORTH_STAR / VISION / STRATEGY / PLANNING_PRINCIPLES). Calls out that re-running init preserves an existing config.yaml
"Optional: set up cron" — expanded to mirror the actual installed crontab layout, including run_incident_scanner.sh
"How It Works" table — adds incident_scanner.py, work_verifier.py, product_inspector.py, daily_digest.py; updates pr_monitor.py to mention e2e health

…sues Closes the manual-diagnosis → manual-fix loop that took hours of operator attention on 2026-04-23 when agents echoed the `.agent_result.md` prompt template prose as the blocker_code value and the escalation kept firing. New `orchestrator/incident_scanner.py` runs on a fast cadence (suggested every 6h) and: 1. Ingests three signal streams from the last N hours (default 24): - `runtime/incidents/incidents.jsonl` (sev-classified alerts) - `runtime/mailbox/escalated/*.md` (blocked-task escalation notes) - `runtime/audit/audit.jsonl` filtered to anomaly event types (`pr_e2e_terminal_close`, `work_verifier_override`, `stuck_pr_merge`). 2. Aggregates signals into stable signatures with example contexts. 3. Runs deterministic rule matchers first so known-bad patterns don't need an LLM. Two rules land with this PR: - `template_echo`: detects `.agent_result.md` template placeholder text (\"One line. Required when STATUS...\", \"- bullet\", ...) flowing into error_patterns. - `repeated_terminal_close`: detects the same blocker signature hitting pr_monitor's e2e terminal close ≥3 times, meaning re-spawn isn't solving the class and a code fix is needed. 4. Falls back to a single LLM call (claude-sonnet-4-6) for remaining recurring signatures the rules didn't classify. The LLM's output is constrained to structured issue proposals. 5. Dedupes against its own recent-action log AND against existing open agent-os issues with the same title, then creates GitHub issues tagged `ready` / `prio:high` / `bot-generated` / `autonomous-fix`. The dispatcher/groomer pipeline picks them up like any other work — closing the loop. Scanner never edits code, merges PRs, or changes branches. It only files issues; everything downstream is the existing pipeline's job. Anchor test (`test_template_echo_incident_would_have_been_auto_detected`) replays the actual escalation that required manual intervention today and asserts the scanner would have filed the fix issue autonomously. Suggested crontab entry (every 6h at :15): 15 */6 * * * /path/to/agent-os/bin/run_incident_scanner.sh \ >> /path/to/agent-os/runtime/logs/incident_scanner.log 2>&1

…ed agents Sync the README with the current system state after the 2026-04-23 work: - Agent pool updated — Gemini and DeepSeek retired after quality review; active pool is now Claude + Codex. - "The Loop" diagram shows four improvement loops feeding back to the backlog (log analyzer, groomer, planner, incident scanner) and names pr_monitor's e2e health step. - "Recursive Self-Improvement" lists the two new acute loops: incident scanner every 6h, pr_monitor e2e health every 5 min. Notes that merged agent PRs now auto-close their linked issue via `Closes #N`. - "Option C: Bootstrap From Scratch" now walks through all 8 init steps, including the new tuning-cadence prompt and the additional charter documents (NORTH_STAR / VISION / STRATEGY / PLANNING_PRINCIPLES). Calls out that re-running init preserves an existing config.yaml. - "Optional: set up cron" expanded to mirror the actual installed crontab (dispatcher/queue/pr_monitor/telegram control every minute; groomer/planner hourly; incident_scanner every 6h; weekly scorer+log_analyzer; daily digest+product_inspector). - "How It Works" table adds incident_scanner.py, work_verifier.py, product_inspector.py, daily_digest.py and updates pr_monitor.py to mention e2e health.

kai-linux added 2 commits April 24, 2026 09:19

kai-linux merged commit 664dbfe into main Apr 24, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refresh README: e2e PR health, incident scanner, new init flow, retired agents#345

Refresh README: e2e PR health, incident scanner, new init flow, retired agents#345
kai-linux merged 2 commits intomainfrom
docs/readme-refresh-2026-04-23

kai-linux commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kai-linux commented Apr 24, 2026

Summary

Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant