Skip to content

feat(self_improve): inactivity-guarded eps_accept relaxation#235

Draft
haraldschilly wants to merge 1 commit into
masterfrom
claude/funny-hamilton-ZYyB9
Draft

feat(self_improve): inactivity-guarded eps_accept relaxation#235
haraldschilly wants to merge 1 commit into
masterfrom
claude/funny-hamilton-ZYyB9

Conversation

@haraldschilly
Copy link
Copy Markdown
Owner

Summary

Closes the documented loop-productivity bottleneck in planning/SELF_IMPROVEMENT_LOOP.md (§13 "Inactivity-guarded loop productivity"): the most recent unattended ledger shows 15 accepts in 326 decided iterations (4.6%); earlier windows produced 1 accept in 86 iterations (~1.2%). At those rates the Thompson sampler's Beta posteriors barely move off the prior — defeating the point of having an adaptive sampler.

The fix is a principled, temporary relaxation of eps_accept after long droughts, re-tightened on the next accept. Per-iteration ledger fields persist the effective threshold and the streak length so the rule is auditable — a reviewer can grep the ledger for any accept whose effective_eps_accept < eps_accept and inspect those entries separately.

Algorithm

Geometric decay with a floor:

effective_eps_accept(s) = max(eps_accept · factor^(s // after), min_eps_accept)

where s is the number of consecutive non-accept iterations before the current one. With the recommended unattended preset (after=10, factor=0.5, min=0.001, eps_accept=0.005):

streak effective threshold
0–9 0.005
10–19 0.0025
20–29 0.00125
30+ 0.001 (floor)

The decay resets to the full eps_accept on every accept, so the relaxation is genuinely temporary. Both skip-iterations and reject-iterations count toward the streak — the bandit only cares about observed accepts.

What's in the PR

  • panobbgo/self_improve.py: three new LoopConfig knobs (inactivity_relax_after / inactivity_relax_factor / inactivity_min_eps_accept) with full validation; new LoopConfig.effective_eps_accept(s) helper for the decay maths; _run_internal tracks iters_since_accept and computes the effective threshold per iteration; resets the counter on every accept.
  • LoopIterationRecord: two new fields (effective_eps_accept, iters_since_accept) persist the rule auditably. Both default to None on legacy records.
  • scripts/self_improve.py: matching --inactivity-relax-after / --inactivity-relax-factor / --inactivity-min-eps-accept CLI flags.
  • Tests (tests/test_self_improve.py): 15 new tests across two classes:
    • TestInactivityRelaxConfig — disabled-by-default, validation errors (negative after, out-of-range factor, negative / too-large floor), threshold maths (no-relax before threshold, geometric decay across steps, floor clamping).
    • TestInactivityRelaxIntegration — records carry effective threshold + streak; streak resets on accept; skip-iterations count toward streak; borderline +0.04 delta accepted by relaxed 0.025 gate and rejected again after the reset; disabled mode populates fields with constant eps_accept; ledger round-trip; legacy record construction.
  • Documentation: planning §13 entry + idea graduated to "shipped"; new guide_benchmarking.rst subsection with the decay maths and recommended unattended preset; guide.rst quick-nav entry; AGENTS.md run-the-loop bash example; TODO.md recent-improvements entry.

Backwards compatibility

Strictly safe. Defaults (inactivity_relax_after=0) disable the feature so every existing CLI invocation, ledger, and test passes byte-identical to the prior behaviour. Legacy ledger records carry None for both new fields; existing reader code paths (the CLI summary, hold-out replays, aggregate_holdout_drift) never reference the new fields and continue to work unchanged.

Honesty — why this doesn't quietly move the §11 bar

The success criteria pin eps_accept at a fixed level. The 2026-05-30 ship mitigates the "silent shift" risk in two ways:

  1. Floor at inactivity_min_eps_accept (default 0.001, matching the bootstrap CI's noise floor at typical quick-mode rep counts) — a relaxed accept still has to beat a baseline-grade signal.
  2. Per-iteration ledger fields — every iteration's effective_eps_accept and iters_since_accept are persisted; a reviewer can grep for accepts where the effective threshold was lower than the configured one and audit those separately.

Test plan

  • uv run pytest tests/test_self_improve.py -q — 210 tests pass.
  • uv run pytest tests/ -q — full suite (1194 tests) passes (one flaky LinUCB test retried & passed on attempt 2; unrelated).
  • uv run pyright panobbgo/self_improve.py — 0 errors, 0 warnings.
  • uv run flake8 panobbgo --select=E9,F63,F7,F82 — clean.
  • uv run ruff format --check panobbgo/self_improve.py scripts/self_improve.py tests/test_self_improve.py — clean.
  • uv run sphinx-build -b doctest doc/source doc/build/doctest — 96 doctests pass.
  • --help smoke test confirms the three new CLI flags render with helpful descriptions.

Independence from open PRs

I checked the open PR list (#234 PSO random topology, #233 NL-SHADE-LBC, #232 L-BFGS-B multi-start) before picking this task — all three are heuristic / catalog additions, none touch the SelfImprover accept gate or the loop driver. No merge conflicts expected.

Follow-ups left

  • Inactivity-relax telemetry in summary viewscripts/self_improve.py summary could surface the longest drought (max iters_since_accept) and the count of relaxed accepts (those with effective_eps_accept < eps_accept). Idea sketched at the bottom of planning/SELF_IMPROVEMENT_LOOP.md under "Next iteration ideas".
  • Bump harness mode for the cron — the other half of the original §13 idea (--standard over --quick on a self-hosted runner) is left explicitly open at its place in the planning doc.

https://claude.ai/code/session_013EsweKhtfRM72A9cjhVe5h


Generated by Claude Code

Close the documented loop-productivity bottleneck (1-5% accept rate
over hundreds of unattended iterations) by geometrically decaying the
accept threshold after every N consecutive non-accepts, floored at a
configurable minimum, re-tightened on the next accept.  Per-iteration
ledger fields persist the effective threshold and streak so the rule
is auditable.

* LoopConfig.inactivity_relax_after / inactivity_relax_factor /
  inactivity_min_eps_accept knobs (default 0 = disabled) plus
  LoopConfig.effective_eps_accept helper for the geometric-decay maths.
* LoopIterationRecord.effective_eps_accept and iters_since_accept
  fields (default None on legacy records) persist the threshold that
  statistical_accept actually saw and the streak that produced it.
* scripts/self_improve.py: --inactivity-relax-after, --inactivity-
  relax-factor, --inactivity-min-eps-accept CLI flags.
* 15 new tests across TestInactivityRelaxConfig and
  TestInactivityRelaxIntegration covering validation, threshold maths,
  end-to-end loop behaviour, ledger round-trip, and legacy record
  construction.
* Docs: planning/SELF_IMPROVEMENT_LOOP.md §13 entry + idea graduated;
  doc/source/guide_benchmarking.rst new subsection with the
  geometric-decay maths and recommended unattended preset;
  doc/source/guide.rst quick-nav entry; AGENTS.md run-the-loop
  example; TODO.md recent-improvements entry.

Backwards compatibility: strictly safe.  Defaults disable the feature
so every existing CLI invocation, ledger, and test passes byte-
identical to the prior behaviour.

https://claude.ai/code/session_013EsweKhtfRM72A9cjhVe5h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants