feat(self_improve): inactivity-guarded eps_accept relaxation by haraldschilly · Pull Request #235 · haraldschilly/panobbgo

haraldschilly · 2026-05-30T05:42:30Z

Summary

Closes the documented loop-productivity bottleneck in planning/SELF_IMPROVEMENT_LOOP.md (§13 "Inactivity-guarded loop productivity"): the most recent unattended ledger shows 15 accepts in 326 decided iterations (4.6%); earlier windows produced 1 accept in 86 iterations (~1.2%). At those rates the Thompson sampler's Beta posteriors barely move off the prior — defeating the point of having an adaptive sampler.

The fix is a principled, temporary relaxation of eps_accept after long droughts, re-tightened on the next accept. Per-iteration ledger fields persist the effective threshold and the streak length so the rule is auditable — a reviewer can grep the ledger for any accept whose effective_eps_accept < eps_accept and inspect those entries separately.

Algorithm

Geometric decay with a floor:

effective_eps_accept(s) = max(eps_accept · factor^(s // after), min_eps_accept)

where s is the number of consecutive non-accept iterations before the current one. With the recommended unattended preset (after=10, factor=0.5, min=0.001, eps_accept=0.005):

streak	effective threshold
0–9	0.005
10–19	0.0025
20–29	0.00125
30+	0.001 (floor)

The decay resets to the full eps_accept on every accept, so the relaxation is genuinely temporary. Both skip-iterations and reject-iterations count toward the streak — the bandit only cares about observed accepts.

What's in the PR

panobbgo/self_improve.py: three new LoopConfig knobs (inactivity_relax_after / inactivity_relax_factor / inactivity_min_eps_accept) with full validation; new LoopConfig.effective_eps_accept(s) helper for the decay maths; _run_internal tracks iters_since_accept and computes the effective threshold per iteration; resets the counter on every accept.
LoopIterationRecord: two new fields (effective_eps_accept, iters_since_accept) persist the rule auditably. Both default to None on legacy records.
scripts/self_improve.py: matching --inactivity-relax-after / --inactivity-relax-factor / --inactivity-min-eps-accept CLI flags.
Tests (tests/test_self_improve.py): 15 new tests across two classes:
- TestInactivityRelaxConfig — disabled-by-default, validation errors (negative after, out-of-range factor, negative / too-large floor), threshold maths (no-relax before threshold, geometric decay across steps, floor clamping).
- TestInactivityRelaxIntegration — records carry effective threshold + streak; streak resets on accept; skip-iterations count toward streak; borderline +0.04 delta accepted by relaxed 0.025 gate and rejected again after the reset; disabled mode populates fields with constant eps_accept; ledger round-trip; legacy record construction.
Documentation: planning §13 entry + idea graduated to "shipped"; new guide_benchmarking.rst subsection with the decay maths and recommended unattended preset; guide.rst quick-nav entry; AGENTS.md run-the-loop bash example; TODO.md recent-improvements entry.

Backwards compatibility

Strictly safe. Defaults (inactivity_relax_after=0) disable the feature so every existing CLI invocation, ledger, and test passes byte-identical to the prior behaviour. Legacy ledger records carry None for both new fields; existing reader code paths (the CLI summary, hold-out replays, aggregate_holdout_drift) never reference the new fields and continue to work unchanged.

Honesty — why this doesn't quietly move the §11 bar

The success criteria pin eps_accept at a fixed level. The 2026-05-30 ship mitigates the "silent shift" risk in two ways:

Floor at inactivity_min_eps_accept (default 0.001, matching the bootstrap CI's noise floor at typical quick-mode rep counts) — a relaxed accept still has to beat a baseline-grade signal.
Per-iteration ledger fields — every iteration's effective_eps_accept and iters_since_accept are persisted; a reviewer can grep for accepts where the effective threshold was lower than the configured one and audit those separately.

Test plan

uv run pytest tests/test_self_improve.py -q — 210 tests pass.
uv run pytest tests/ -q — full suite (1194 tests) passes (one flaky LinUCB test retried & passed on attempt 2; unrelated).
uv run pyright panobbgo/self_improve.py — 0 errors, 0 warnings.
uv run flake8 panobbgo --select=E9,F63,F7,F82 — clean.
uv run ruff format --check panobbgo/self_improve.py scripts/self_improve.py tests/test_self_improve.py — clean.
uv run sphinx-build -b doctest doc/source doc/build/doctest — 96 doctests pass.
--help smoke test confirms the three new CLI flags render with helpful descriptions.

Independence from open PRs

I checked the open PR list (#234 PSO random topology, #233 NL-SHADE-LBC, #232 L-BFGS-B multi-start) before picking this task — all three are heuristic / catalog additions, none touch the SelfImprover accept gate or the loop driver. No merge conflicts expected.

Follow-ups left

Inactivity-relax telemetry in summary view — scripts/self_improve.py summary could surface the longest drought (max iters_since_accept) and the count of relaxed accepts (those with effective_eps_accept < eps_accept). Idea sketched at the bottom of planning/SELF_IMPROVEMENT_LOOP.md under "Next iteration ideas".
Bump harness mode for the cron — the other half of the original §13 idea (--standard over --quick on a self-hosted runner) is left explicitly open at its place in the planning doc.

https://claude.ai/code/session_013EsweKhtfRM72A9cjhVe5h

Generated by Claude Code

Close the documented loop-productivity bottleneck (1-5% accept rate over hundreds of unattended iterations) by geometrically decaying the accept threshold after every N consecutive non-accepts, floored at a configurable minimum, re-tightened on the next accept. Per-iteration ledger fields persist the effective threshold and streak so the rule is auditable. * LoopConfig.inactivity_relax_after / inactivity_relax_factor / inactivity_min_eps_accept knobs (default 0 = disabled) plus LoopConfig.effective_eps_accept helper for the geometric-decay maths. * LoopIterationRecord.effective_eps_accept and iters_since_accept fields (default None on legacy records) persist the threshold that statistical_accept actually saw and the streak that produced it. * scripts/self_improve.py: --inactivity-relax-after, --inactivity- relax-factor, --inactivity-min-eps-accept CLI flags. * 15 new tests across TestInactivityRelaxConfig and TestInactivityRelaxIntegration covering validation, threshold maths, end-to-end loop behaviour, ledger round-trip, and legacy record construction. * Docs: planning/SELF_IMPROVEMENT_LOOP.md §13 entry + idea graduated; doc/source/guide_benchmarking.rst new subsection with the geometric-decay maths and recommended unattended preset; doc/source/guide.rst quick-nav entry; AGENTS.md run-the-loop example; TODO.md recent-improvements entry. Backwards compatibility: strictly safe. Defaults disable the feature so every existing CLI invocation, ledger, and test passes byte- identical to the prior behaviour. https://claude.ai/code/session_013EsweKhtfRM72A9cjhVe5h

haraldschilly mentioned this pull request May 31, 2026

codify: Sobol.scramble=False in Rewarding_Diverse (first ledger-driven default change) #236

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(self_improve): inactivity-guarded eps_accept relaxation#235

feat(self_improve): inactivity-guarded eps_accept relaxation#235
haraldschilly wants to merge 1 commit into
masterfrom
claude/funny-hamilton-ZYyB9

haraldschilly commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

haraldschilly commented May 30, 2026

Summary

Algorithm

What's in the PR

Backwards compatibility

Honesty — why this doesn't quietly move the §11 bar

Test plan

Independence from open PRs

Follow-ups left

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants