The hardening work is complete. The system is running with two active repos (OperationsCenter and ExternalRepo), a functional six-lane watcher (including the new spec-director), repo-aware autonomy loop, and PR review automation. Current focus is on tuning the autonomy loop, building operator trust, and polishing the public-facing surface.
Run analyze-artifacts weekly, identify suppression patterns, tune per-family thresholds, document changes.
Tracked: #17
Status: in progress (docs/operator/tuning.md created; first tuning run pending)
Review emit/suppress rates for observation_coverage, test_visibility, dependency_drift. Define promotion criteria for hotspot_concentration and todo_accumulation. Tracked: #18 Status: open
Run the two-phase review loop against a real or controlled test PR. Verify audit trail, guardrail checklist, and escalation path.
Tracked: #19
Status: open (docs/operator/pr_review.md created; live validation pending)
README and demo.md updated to position the demo as a post-change ritual. Ensure demo stays runnable as config and thresholds change. Tracked: #20 Status: done (demo.md updated with Autonomy-Cycle Ritual section; README updated)
Operator docs suite: tuning.md, pr_review.md, runtime.md (dry-run-first), README cross-links, backlog update. Tracked: #21 Status: done (all operator docs created/updated)
Status: done (.github/workflows/ci.yml)
Status: done (TaskContractError in service.py)
Status: done (docs/demo.md)
Status: done (config/operations_center.example.yaml, .env.operations-center.example)
Status: done
Status: done
Status: done (src/operations_center/entrypoints/analyze/main.py)
Status: done (src/operations_center/entrypoints/autonomy_cycle/main.py)
Status: done (proposer/candidate_mapper.py)
Status: done (config/settings.py, adapters/workspace/bootstrap.py)
Status: done (.github/workflows/ci.yml, pyproject.toml)
Status: done (proposer filters by repo: label before evaluating idle state)
Status: done (round-robin by repo_key ensures fair multi-repo distribution)
Status: done (rate-limited tasks reset to Ready for AI automatically; budget not charged)
Status: done (ProposerGuardrailAdapter and DecisionEngineService accept usage_store param)
Status: done
TuningRegulatorService aggregates per-family metrics from retained decision and proposer artifacts and applies explicit recommendation rules (over-suppressed → loosen; noisy → tighten; healthy → keep). Recommendation-only by default; auto-apply mode (opt-in via --apply + OPERATIONS_CENTER_TUNING_AUTO_APPLY_ENABLED=1) writes conservative bounded changes to config/autonomy_tuning.json with full cooldown, quota, oscillation, and allowlist guardrails. DecisionEngineService reads tuning overrides at startup. Full audit trail retained in tools/report/operations_center/tuning/. 47 tests.
Status: done
New sixth watcher role spec that closes the direction gap in the reactive propose loop. TriggerDetector detects when to start a campaign (drop-file > Plane label > queue drain). BrainstormService calls the Anthropic API directly to produce a spec doc written to docs/specs/. CampaignBuilder converts the spec into a bounded set of Plane tasks across implement/test/improve phases. SpecComplianceService reviews each PR diff against the spec (structured JSON verdict) upstream of kodo self-review. RecoveryService handles stall detection, spec revision, and orderly self-cancel. Suppressor blocks conflicting heuristic proposals during an active campaign. New task kinds test_campaign and improve_campaign route to kodo --test / kodo --improve via ROLE_TASK_KINDS in worker/main.py.
Status: done
ExecutionArtifactCollector reads retained kodo_plane artifacts on every observer run and computes per-repo execution quality metrics (total_runs, no_op_count, executed_count, validation_failed_count). ExecutionHealthDeriver derives high_no_op_rate and persistent_validation_failures insights. ExecutionHealthRule converts them into execution_health_followup candidates. The family is in _DEFAULT_ALLOWED_FAMILIES so it fires automatically. No manual trigger needed.
Status: done
Five profile constants (ruff_clean, ty_clean, tests_pass, ci_green, manual_review) defined in validation_profiles.py. All 12 families mapped via profile_for_family(). CandidateBuilder auto-assigns from family; rules may override explicitly. validation_profile, requires_human_approval, and evidence_schema_version appear in every created task's ## Provenance block. emitted_candidates list with {family, validation_profile, confidence} per emitted candidate in cycle_<ts>.json report.
Status: done
EvidenceBundle Pydantic model (kind, count, distinct_file_count, delta, trend, top_codes, source, schema_version) synthesized by CandidateBuilder._synthesize_evidence_bundle() for lint_fix and type_fix; None for other families. distinct_file_count is a true count from the full violation/error output (not bounded by the top-N sample window).
Items here are promoted to the board automatically by the backlog_promotion family (when enabled).
type: arch and type: redesign items are never auto-promoted — they require deliberate operator action.
Type: maintenance
After observing healthy emit/create rates for the four default families, promote hotspot and todo families to _DEFAULT_ALLOWED_FAMILIES. Requires documented promotion criteria and at least one clean dry-run showing useful candidates.
Type: maintenance
After the first few tune-autonomy runs accumulate retained artifacts, review recommendations against actual board outcomes and adjust the recommendation rule thresholds (OVER_SUPPRESSED_RATE, NOISY_CREATE_RATE_CEILING, HEALTHY_CREATE_RATE_FLOOR) if the defaults are too aggressive or too conservative for the observed repos.
Type: feature
Define "trusted" in terms of measurable execution health and tuning stability (low no-op rate, clean validation history, healthy create/emit ratio stable across N regulator runs). Add a trusted_repos or auto_execute_families config key so specific repos skip the dry-run gate automatically.
Type: feature Allow repos to declare their own hourly/daily caps rather than sharing the global budget. Useful when ExternalRepo and OperationsCenter have different execution intensity.
Type: maintenance
Run analyze-artifacts in CI on the retained artifacts directory and upload the output as a build artifact. Makes the tuning loop auditable per-commit.