feat(self_improve): inactivity-guarded eps_accept relaxation#235
Draft
haraldschilly wants to merge 1 commit into
Draft
feat(self_improve): inactivity-guarded eps_accept relaxation#235haraldschilly wants to merge 1 commit into
haraldschilly wants to merge 1 commit into
Conversation
Close the documented loop-productivity bottleneck (1-5% accept rate over hundreds of unattended iterations) by geometrically decaying the accept threshold after every N consecutive non-accepts, floored at a configurable minimum, re-tightened on the next accept. Per-iteration ledger fields persist the effective threshold and streak so the rule is auditable. * LoopConfig.inactivity_relax_after / inactivity_relax_factor / inactivity_min_eps_accept knobs (default 0 = disabled) plus LoopConfig.effective_eps_accept helper for the geometric-decay maths. * LoopIterationRecord.effective_eps_accept and iters_since_accept fields (default None on legacy records) persist the threshold that statistical_accept actually saw and the streak that produced it. * scripts/self_improve.py: --inactivity-relax-after, --inactivity- relax-factor, --inactivity-min-eps-accept CLI flags. * 15 new tests across TestInactivityRelaxConfig and TestInactivityRelaxIntegration covering validation, threshold maths, end-to-end loop behaviour, ledger round-trip, and legacy record construction. * Docs: planning/SELF_IMPROVEMENT_LOOP.md §13 entry + idea graduated; doc/source/guide_benchmarking.rst new subsection with the geometric-decay maths and recommended unattended preset; doc/source/guide.rst quick-nav entry; AGENTS.md run-the-loop example; TODO.md recent-improvements entry. Backwards compatibility: strictly safe. Defaults disable the feature so every existing CLI invocation, ledger, and test passes byte- identical to the prior behaviour. https://claude.ai/code/session_013EsweKhtfRM72A9cjhVe5h
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the documented loop-productivity bottleneck in
planning/SELF_IMPROVEMENT_LOOP.md(§13 "Inactivity-guarded loop productivity"): the most recent unattended ledger shows 15 accepts in 326 decided iterations (4.6%); earlier windows produced 1 accept in 86 iterations (~1.2%). At those rates the Thompson sampler's Beta posteriors barely move off the prior — defeating the point of having an adaptive sampler.The fix is a principled, temporary relaxation of
eps_acceptafter long droughts, re-tightened on the next accept. Per-iteration ledger fields persist the effective threshold and the streak length so the rule is auditable — a reviewer can grep the ledger for any accept whoseeffective_eps_accept < eps_acceptand inspect those entries separately.Algorithm
Geometric decay with a floor:
where
sis the number of consecutive non-accept iterations before the current one. With the recommended unattended preset (after=10, factor=0.5, min=0.001, eps_accept=0.005):The decay resets to the full
eps_accepton every accept, so the relaxation is genuinely temporary. Both skip-iterations and reject-iterations count toward the streak — the bandit only cares about observed accepts.What's in the PR
panobbgo/self_improve.py: three newLoopConfigknobs (inactivity_relax_after/inactivity_relax_factor/inactivity_min_eps_accept) with full validation; newLoopConfig.effective_eps_accept(s)helper for the decay maths;_run_internaltracksiters_since_acceptand computes the effective threshold per iteration; resets the counter on every accept.LoopIterationRecord: two new fields (effective_eps_accept,iters_since_accept) persist the rule auditably. Both default toNoneon legacy records.scripts/self_improve.py: matching--inactivity-relax-after/--inactivity-relax-factor/--inactivity-min-eps-acceptCLI flags.tests/test_self_improve.py): 15 new tests across two classes:TestInactivityRelaxConfig— disabled-by-default, validation errors (negativeafter, out-of-rangefactor, negative / too-large floor), threshold maths (no-relax before threshold, geometric decay across steps, floor clamping).TestInactivityRelaxIntegration— records carry effective threshold + streak; streak resets on accept; skip-iterations count toward streak; borderline +0.04 delta accepted by relaxed 0.025 gate and rejected again after the reset; disabled mode populates fields with constanteps_accept; ledger round-trip; legacy record construction.§13entry + idea graduated to "shipped"; newguide_benchmarking.rstsubsection with the decay maths and recommended unattended preset;guide.rstquick-nav entry;AGENTS.mdrun-the-loop bash example;TODO.mdrecent-improvements entry.Backwards compatibility
Strictly safe. Defaults (
inactivity_relax_after=0) disable the feature so every existing CLI invocation, ledger, and test passes byte-identical to the prior behaviour. Legacy ledger records carryNonefor both new fields; existing reader code paths (the CLI summary, hold-out replays,aggregate_holdout_drift) never reference the new fields and continue to work unchanged.Honesty — why this doesn't quietly move the §11 bar
The success criteria pin
eps_acceptat a fixed level. The 2026-05-30 ship mitigates the "silent shift" risk in two ways:inactivity_min_eps_accept(default0.001, matching the bootstrap CI's noise floor at typical quick-mode rep counts) — a relaxed accept still has to beat a baseline-grade signal.effective_eps_acceptanditers_since_acceptare persisted; a reviewer can grep for accepts where the effective threshold was lower than the configured one and audit those separately.Test plan
uv run pytest tests/test_self_improve.py -q— 210 tests pass.uv run pytest tests/ -q— full suite (1194 tests) passes (one flaky LinUCB test retried & passed on attempt 2; unrelated).uv run pyright panobbgo/self_improve.py— 0 errors, 0 warnings.uv run flake8 panobbgo --select=E9,F63,F7,F82— clean.uv run ruff format --check panobbgo/self_improve.py scripts/self_improve.py tests/test_self_improve.py— clean.uv run sphinx-build -b doctest doc/source doc/build/doctest— 96 doctests pass.--helpsmoke test confirms the three new CLI flags render with helpful descriptions.Independence from open PRs
I checked the open PR list (#234 PSO random topology, #233 NL-SHADE-LBC, #232 L-BFGS-B multi-start) before picking this task — all three are heuristic / catalog additions, none touch the
SelfImproveraccept gate or the loop driver. No merge conflicts expected.Follow-ups left
scripts/self_improve.py summarycould surface the longest drought (maxiters_since_accept) and the count of relaxed accepts (those witheffective_eps_accept < eps_accept). Idea sketched at the bottom ofplanning/SELF_IMPROVEMENT_LOOP.mdunder "Next iteration ideas".--standardover--quickon a self-hosted runner) is left explicitly open at its place in the planning doc.https://claude.ai/code/session_013EsweKhtfRM72A9cjhVe5h
Generated by Claude Code