diff --git a/.recursive/strategy/2026-04-09.md b/.recursive/strategy/2026-04-09.md new file mode 100644 index 0000000..e880ac1 --- /dev/null +++ b/.recursive/strategy/2026-04-09.md @@ -0,0 +1,61 @@ +# Strategy Report -- Session #0139 + +**Date**: 2026-04-09 +**Period analyzed**: Sessions #0122-#0139 +**Queue**: 67 pending, 1 blocked urgent task (`#0277`) after the task edits in this branch +**Eval**: last merged 89/100 in session #0134; latest branch-local rerun in #0139 reached 78/100 but stayed unmerged +**Autonomy**: 85/100 | **Tests**: 1222 | **Vision**: 92% overall + +## Why the tracker is flat + +The visible tracker is flat for two separate reasons. + +1. The tracker file itself is stale. `.recursive/vision-tracker/TRACKER.md` still says "Last updated: 2026-04-06" and still shows Self-Maintaining at 68% with four 0% components. That means the dashboard view has not been recalculated to reflect the work done in sessions #0122-#0139. +2. The recent sessions mostly moved runtime hygiene, eval, and queue state instead of the four missing automation components that actually move Self-Maintaining: auto-release, auto-changelog, auto-tracker, and auto-CLAUDE.md. The visible 0% items are still untouched. + +The net effect is that session volume is high, but the work is concentrated in areas that do not advance the remaining tracker gaps. + +## What is working + +1. Non-build roles are now real instead of theoretical. Session #0122 delegated OVERSEE, #0123 delegated STRATEGIZE, and later sessions also used AUDIT and SECURITY (`.recursive/decisions/log.md:98-204`). That directly invalidates the older "never uses oversee" complaint. +2. Eval quality improved materially. Session #0132 reran Phractal at 84/100, session #0134 raised it to 89/100, and auto-clone validation succeeded along the way (`.recursive/commitments/log.md:128-141`). +3. Queue triage works when invoked. Session #0122 cut the queue from 77 to 69, and session #0128 cut 72 to 63 (`.recursive/decisions/log.md:98-138`). +4. The stop rule is protecting the repo from unsafe churn. Session #0139 stopped after two failed review cycles instead of grinding into a third repair loop (`.recursive/handoffs/LATEST.md:8-10, 23-31`). + +## What is failing + +1. The eval rerun path is blocked, not merely stale. Session #0139 produced PR #274, two branch-local eval reports, and then hit the stop rule because the PR failed review twice and was closed unmerged. The latest session index also shows the 65-minute #0139 run ending in failure (`.recursive/handoffs/LATEST.md:8-25`, `.recursive/sessions/index.md:102-106`). +2. The remaining tracker delta is concentrated in components that are not being attacked. `Auto-release`, `Auto-changelog update`, `Auto-tracker update`, and `Auto-CLAUDE.md update` are still at 0% in the tracker (`.recursive/vision-tracker/TRACKER.md:75-94`). +3. Queue churn still offsets gains. Session #0136 added 5 new security tasks and pushed the queue up by 4, and session #0137 added 3 more follow-up tasks even though all review cycles passed (`.recursive/commitments/log.md:148-156`). +4. The human-facing "flat tracker" symptom is real because the tracker file is not being rewritten on the same cadence as the sessions. The report history exists, but the tracker snapshot does not move with it (`.recursive/vision-tracker/TRACKER.md:3-5, 128-138`). + +## What is missing + +1. Auto-release, auto-changelog, auto-tracker, and auto-CLAUDE.md are still unstarted in the tracker (`.recursive/vision-tracker/TRACKER.md:90-94`). +2. Feedback ingestion is still 0% in the meta-prompt section (`.recursive/vision-tracker/TRACKER.md:98-114`). +3. There is no explicit "blocked eval vs stale eval" state. Right now the dashboard can say an eval is stale, but it does not stop the brain from treating a blocked eval path as if it were a measurement problem. +4. The queue has no enforced creation cap yet. The observed pattern is still "fixes create follow-ups faster than the queue is being retired" (`.recursive/tasks/0225.md:14-29`). + +## Cost and efficiency + +1. Session #0139 spent 65 minutes and $10.50 and ended blocked, which is a poor cost-to-progress ratio for a measurement session (`.recursive/sessions/index.md:106`). +2. The review cascade around #0277 consumed two review cycles, produced a merged eval artifact only in the unmerged branch-local sense, and still left the core blocker unresolved (`.recursive/handoffs/LATEST.md:8-25`). +3. Security and hardening sessions are still cost-effective when they close real defects, but they also create follow-up volume. Session #0136 is the clearest example: the security scan found 4 CONFIRMED and 4 THEORETICAL findings and created 5 tasks (`.recursive/commitments/log.md:182-186`). + +## Health assessment + +Operational health is mixed. + +The good news is that the stop rule and review process are functioning: unsafe loops are being halted, and the system is no longer pretending a blocked PR is ready. The bad news is that the system is still spending real session time on work that does not move the tracker snapshot or restore the eval loop. + +## Task assessment + +- `#0225`: valid, partially satisfied. Overseer has proven it can trim the queue, but the queue-trend / task-limit behavior is not yet durable. +- `#0226`: partially satisfied. The "never uses oversee/strategize/achieve" claim is no longer true, but the scheduling behavior is still ad hoc enough that it should not be treated as fully closed. +- `#0228`: should be superseded. The sessions_since_eval signal and rerun behavior now exist, but the remaining blocker is the nested-Claude eval path captured in `#0277`. + +## Ranked 2-3 session plan + +1. **Queue triage session**: run OVERSEE on runtime-state tasks first. Close or supersede stale follow-ups, especially the queue-hygiene family around `#0225`, and record the before/after queue count. This is the fastest way to make the next report show real movement instead of flat churn. +2. **Blocked eval fix session**: work the narrower successor to `#0277`. Keep the fix scoped to the eval path, make the temp runtime handling symlink-safe, sanitize the child subprocess environment, and add a real Claude Code fallback regression. This is the highest-leverage unblock. +3. **Measurement or autonomy session**: if the eval fix lands and there is a fresh code delta, rerun Phractal immediately. If not, spend the session on autonomy work so the planner can distinguish "stale eval with delta" from "blocked eval path with no delta" and stop recommending wasted reruns. diff --git a/.recursive/tasks/.next-id b/.recursive/tasks/.next-id index bbb81cf..e01062f 100644 --- a/.recursive/tasks/.next-id +++ b/.recursive/tasks/.next-id @@ -1 +1 @@ -279 +282 diff --git a/.recursive/tasks/0228.md b/.recursive/tasks/0228.md index 6f3f80d..014153b 100644 --- a/.recursive/tasks/0228.md +++ b/.recursive/tasks/0228.md @@ -1,10 +1,10 @@ --- -status: pending +status: done priority: normal target: created: 2026-04-08 source: github-issue-209 -completed: +completed: 2026-04-09 --- # Brain never re-runs eval after building nightshift @@ -25,3 +25,7 @@ Add a rule to brain.md: after every 3 build sessions, the next session MUST incl - Dashboard alerts when eval is 3+ sessions stale - Brain includes eval rerun in its delegation when the alert fires - Eval score trend is visible in the dashboard + +## Note + +Superseded by `#0242` for the signal/alert work and `#0277` for the remaining Claude Code eval-path blocker. diff --git a/.recursive/tasks/0279.md b/.recursive/tasks/0279.md new file mode 100644 index 0000000..511ba06 --- /dev/null +++ b/.recursive/tasks/0279.md @@ -0,0 +1,26 @@ +--- +status: pending +priority: normal +target: v0.0.9 +vision_section: self-maintaining +created: 2026-04-09 +source: strategy-report-0139 +completed: +--- + +# Triage the runtime-state queue and close superseded follow-ups + +## Problem + +The queue is still carrying runtime-state follow-ups that have already been functionally covered by later sessions or merged PRs. Session #0122 proved OVERSEE can reduce the queue, but the queue rebounds because stale follow-ups stay open and duplicate scope is not consolidated. + +## Fix + +Run an OVERSEE pass focused on `.recursive/tasks/` and close or supersede tasks whose acceptance criteria are already satisfied by sessions #0122-#0139. Keep the remaining open items narrow: one root cause, one owner, one next step. + +## Acceptance Criteria + +- [ ] Tasks fully covered by later sessions or PRs are marked `done` with a superseded note +- [ ] Remaining open tasks in the #0225-#0228 family have an explicit next action +- [ ] Queue before/after counts are recorded in the handoff +- [ ] No duplicate stale follow-ups remain open for the same root cause diff --git a/.recursive/tasks/0280.md b/.recursive/tasks/0280.md new file mode 100644 index 0000000..6579a8d --- /dev/null +++ b/.recursive/tasks/0280.md @@ -0,0 +1,27 @@ +--- +status: pending +priority: urgent +target: v0.0.9 +vision_section: self-maintaining +created: 2026-04-09 +source: strategy-report-0139 +completed: +--- + +# Unblock the Claude eval path with a narrower eval-runner fix + +## Problem + +Task `#0277` is blocked after two failed review cycles. The blocker is no longer "run another eval" but the nested-Claude execution path: the child subprocess needs symlink-safe temp handling, a sanitized environment, and a real end-to-end fallback regression before another scorable rerun is worth attempting. + +## Fix + +Scope the fix to the eval path only. Keep the shared CLI surface unchanged unless the fallback requires a clearly justified runner-level change. Make the child runtime safe, minimize inherited environment variables, and cover the real Claude Code fallback path with regression tests. + +## Acceptance Criteria + +- [ ] `nightshift test --agent claude --cycles 2 --cycle-minutes 5` completes from a Claude Code shell +- [ ] Temp runtime handling is symlink-safe and ownership-safe +- [ ] The child eval subprocess inherits a minimal, explicit environment +- [ ] Regression coverage proves the real fallback path works end-to-end +- [ ] The rerun yields a scorable report instead of stopping after two agent failures diff --git a/.recursive/tasks/0281.md b/.recursive/tasks/0281.md new file mode 100644 index 0000000..ef7ba75 --- /dev/null +++ b/.recursive/tasks/0281.md @@ -0,0 +1,26 @@ +--- +status: pending +priority: normal +target: v0.0.9 +vision_section: meta-prompt +created: 2026-04-09 +source: strategy-report-0139 +completed: +--- + +# Teach the planner to distinguish stale eval from blocked eval + +## Problem + +The dashboard can report that the last merged eval is stale, but the planner still lacks a first-class distinction between "stale with usable delta" and "blocked eval path with no delta." That gap risks wasting another session on a rerun that cannot measure anything. + +## Fix + +Add planner or signal logic that treats a blocked eval path as a different state from a stale eval. When there is no new `nightshift/` delta since the last merged eval, the next session should route to queue triage or autonomy work instead of automatically re-running Phractal. + +## Acceptance Criteria + +- [ ] Planner / dashboard state distinguishes `stale_eval_with_delta` from `eval_blocked_no_delta` +- [ ] The next session does not recommend a fresh eval rerun when there is no code delta and the eval path is blocked +- [ ] Queue triage or autonomy work is recommended instead in that state +- [ ] The planner still recommends a fresh eval rerun once a real code delta exists again