haraldschilly · haraldschilly · May 30, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -415,6 +415,20 @@ uv run python scripts/self_improve.py run --iterations 100 \
     --mode standard --holdout-base-seeds 1234,5678,9012 \
     --fail-on-overfit-ci --holdout-ci-confidence 0.95
 
+# Inactivity-guarded eps_accept relaxation (shipped 2026-05-30).  Break
+# out of long accept droughts by geometrically decaying eps_accept
+# after every N consecutive non-accepts, floored at min_eps_accept,
+# re-tightened on the next accept.  Each iteration ledger record
+# persists the effective threshold + streak so the rule is auditable.
+# Recommended for the unattended nightly cron where the documented
+# accept rate is 1–5%.
+uv run python scripts/self_improve.py run --iterations 100 \
+    --adaptive --adaptive-prime-from-ledger --structural \
+    --guard-interval 10 \
+    --inactivity-relax-after 10 \
+    --inactivity-relax-factor 0.5 \
+    --inactivity-min-eps-accept 0.001
+
 # Inspect the ledger
 uv run python scripts/self_improve.py summary
 ```

diff --git a/TODO.md b/TODO.md
@@ -2,6 +2,65 @@
 
 ## Recent Improvements (continued)
 
+### Inactivity-guarded eps_accept relaxation — 2026-05-30
+- [x] **Three new `LoopConfig` knobs** in `panobbgo/self_improve.py`:
+      `inactivity_relax_after` (default `0` = disabled),
+      `inactivity_relax_factor` (default `0.5`), and
+      `inactivity_min_eps_accept` (default `0.001`).  When enabled, the
+      loop's accept gate decays the configured `eps_accept`
+      geometrically by `factor` for every additional `after`-block of
+      consecutive non-accepts, floored at `min_eps_accept`, re-tightened
+      on the next accept.  Closes the *Inactivity-guarded loop
+      productivity* follow-up in `planning/SELF_IMPROVEMENT_LOOP.md`.
+  - **Why it matters.** The most recent unattended ledger
+    (`planning/self_improve_summary.txt`) records 15 accepts in 326
+    decided iterations (4.6%); earlier windows produced 1 accept in 86
+    iterations (~1.2%).  At those rates the Thompson sampler's Beta
+    posteriors barely move off the prior — defeating the point of
+    adaptive sampling.  A geometric relaxation lets the loop reach for
+    borderline improvements (delta between `min_eps_accept` and
+    `eps_accept`) that the paired-bootstrap CI rules in as
+    statistically distinguishable from zero — exactly the regime where
+    the historical point-gate was leaving signal on the floor.
+- [x] **New `LoopConfig.effective_eps_accept(iters_since_accept)` helper**
+      returning `max(eps_accept · factor^(s // after), min_eps_accept)`
+      so the rule is callable directly from tests / callers without
+      reaching into the loop driver.
+- [x] **Two new `LoopIterationRecord` fields** — `effective_eps_accept`
+      and `iters_since_accept` — persist the threshold that
+      `statistical_accept` actually saw and the streak length consulted
+      to compute it.  Both default to `None` on legacy records so the
+      JSONL load path keeps working against historical ledgers.
+- [x] **CLI flags** on `scripts/self_improve.py run` —
+      `--inactivity-relax-after`, `--inactivity-relax-factor`, and
+      `--inactivity-min-eps-accept` — mirror the `LoopConfig` knobs
+      with the same defaults (`0`, `0.5`, `0.001`).
+- [x] **15 new tests in `tests/test_self_improve.py`** (total 210):
+      `TestInactivityRelaxConfig` covers validation (negative `after`,
+      out-of-range `factor`, negative / too-large floor) and the
+      threshold maths (no-relax before threshold, geometric decay,
+      floor clamping); `TestInactivityRelaxIntegration` covers
+      end-to-end loop behaviour (records carry effective threshold +
+      streak, streak resets on accept, skip-iterations count toward
+      streak, borderline +0.04 delta is accepted by relaxed 0.025 gate
+      and rejected again after reset, disabled mode populates fields
+      with constant `eps_accept`, ledger round-trip, legacy record
+      construction).
+- [x] **Backwards compatibility** — strictly safe.  Defaults disable
+      the feature; when `after = 0`, `effective_eps_accept` is a
+      constant equal to `eps_accept` and the loop passes the same
+      value to `statistical_accept` as before.  Composite baseline on
+      every default battery is byte-identical and existing ledgers
+      stay valid (the two new record fields default to `None` for
+      legacy records).
+- [x] **Documentation updated** — `planning/SELF_IMPROVEMENT_LOOP.md`
+      (§13 entry, follow-up promoted to "shipped" with the unshipped
+      half left open, new "Inactivity-relax telemetry in summary view"
+      follow-up), `doc/source/guide_benchmarking.rst` (new subsection
+      with the geometric-decay maths and recommended unattended
+      preset), `doc/source/guide.rst` (quick-nav entry), and
+      `AGENTS.md` (run-the-loop bash example).
+
 ### NL-SHADE-RSP adaptive DE (CEC 2021 winner) — 2026-05-25
 - [x] **New `NLSHADE_RSP` heuristic** in
       `panobbgo/heuristics/nl_shade_rsp.py`; closes the *NL-SHADE-RSP /

diff --git a/doc/source/guide.rst b/doc/source/guide.rst
@@ -41,7 +41,7 @@ Quick Navigation
    * - **Interested in research?**
      - Explore :doc:`guide_research` for related work, theoretical properties, and future directions
    * - **Want to measure progress?**
-     - Read :doc:`guide_benchmarking` for the composite score, external baselines, parametrically randomised problems, the statistical acceptance rule (with paired and unpaired bootstrap sampling — paired is auto-selected for the rep-aligned randomized harness and shrinks the CI 3-10× over the historical independent-resample scheme), the autonomous self-improvement loop driver, the anti-cherry-pick guard, the hold-out validation set (single- and multi-seed, with bootstrap-CI aggregation across hold-out seeds), the adaptive Thompson-sampling mutation sampler with optional per-class structural bandit arms, the structural ``add_heuristic`` / ``drop_heuristic`` portfolio mutations, the categorical ``MutationRule`` kind (for discrete knobs like ``PSO.topology``, ``Sobol.scramble``, ``LSHADE.archive_factor``, ``LSHADE.F_schedule``, and ``NLSHADE_RSP.adaptive_archive``), the tri-topology PSO (``gbest`` / ``lbest`` / ``vonneumann`` — fully-connected, ring, and 4-connected 2-D toroidal grid per Kennedy & Mendes 2003 / Mendes 2004) candidate pool, the L-SHADE adaptive Differential Evolution heuristic (Tanabe-Fukunaga 2014) with two opt-in jSO refinements (the linearly-decreasing ``p_best`` schedule from iLSHADE / jSO Brest et al. 2016 / 2017 and the three-phase asymmetric F-cap from jSO Brest et al. 2017), the literature-faithful jSO heuristic itself (Brest, Maučec & Bošković 2017 — CEC-2017 winner — inheriting the L-SHADE F-cap machinery by construction), the NL-SHADE-RSP heuristic (Stanovov, Akhmedova & Semenkin 2021 — CEC-2021 winner — a jSO subclass adding non-linear population reduction, rank-based selective pressure, and a randomised adaptive archive), and the COBYQA derivative-free trust-region local optimizer (Ragonneau-Zhang 2023)
+     - Read :doc:`guide_benchmarking` for the composite score, external baselines, parametrically randomised problems, the statistical acceptance rule (with paired and unpaired bootstrap sampling — paired is auto-selected for the rep-aligned randomized harness and shrinks the CI 3-10× over the historical independent-resample scheme), the autonomous self-improvement loop driver, the anti-cherry-pick guard, the hold-out validation set (single- and multi-seed, with bootstrap-CI aggregation across hold-out seeds), the adaptive Thompson-sampling mutation sampler with optional per-class structural bandit arms, the structural ``add_heuristic`` / ``drop_heuristic`` portfolio mutations, the categorical ``MutationRule`` kind (for discrete knobs like ``PSO.topology``, ``Sobol.scramble``, ``LSHADE.archive_factor``, ``LSHADE.F_schedule``, and ``NLSHADE_RSP.adaptive_archive``), the tri-topology PSO (``gbest`` / ``lbest`` / ``vonneumann`` — fully-connected, ring, and 4-connected 2-D toroidal grid per Kennedy & Mendes 2003 / Mendes 2004) candidate pool, the L-SHADE adaptive Differential Evolution heuristic (Tanabe-Fukunaga 2014) with two opt-in jSO refinements (the linearly-decreasing ``p_best`` schedule from iLSHADE / jSO Brest et al. 2016 / 2017 and the three-phase asymmetric F-cap from jSO Brest et al. 2017), the literature-faithful jSO heuristic itself (Brest, Maučec & Bošković 2017 — CEC-2017 winner — inheriting the L-SHADE F-cap machinery by construction), the NL-SHADE-RSP heuristic (Stanovov, Akhmedova & Semenkin 2021 — CEC-2021 winner — a jSO subclass adding non-linear population reduction, rank-based selective pressure, and a randomised adaptive archive), the COBYQA derivative-free trust-region local optimizer (Ragonneau-Zhang 2023), and the inactivity-guarded ``eps_accept`` relaxation knob that breaks the loop out of long accept droughts by geometrically decaying the accept threshold after every ``inactivity_relax_after`` consecutive non-accepts (floored at ``inactivity_min_eps_accept``, re-tightened on the next accept, with per-iteration ledger fields for honesty)
 
 Guide Contents
 --------------

diff --git a/doc/source/guide_benchmarking.rst b/doc/source/guide_benchmarking.rst
@@ -619,6 +619,76 @@ Programmatic use:
    )
    iter_records, guard_records = SelfImprover(cfg).run_with_guard_records()
 
+Inactivity-guarded eps_accept relaxation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Long unattended ledgers tend to show very low accept rates (the cron
+that motivated this knob recently recorded *1 accept in 86 iterations*).
+That is small enough that the adaptive sampler's posterior remains
+close to its prior for most arms — defeating the point of bandit
+sampling.  The mitigation is to temporarily lower the accept threshold
+after the loop has gone too long without an accept, then re-tighten on
+the next accept.
+
+Three knobs on :class:`~panobbgo.self_improve.LoopConfig` (mirrored as
+``--inactivity-relax-after`` / ``--inactivity-relax-factor`` /
+``--inactivity-min-eps-accept`` on the CLI):
+
+* :attr:`~panobbgo.self_improve.LoopConfig.inactivity_relax_after`
+  (default ``0`` = disabled).  Number of consecutive non-accept
+  iterations after which the relax rule starts to fire.  Both
+  *skip*-iterations (no applicable mutation) and *reject*-iterations
+  count toward the streak — the bandit cares about observed accepts,
+  not about how the loop reached "no accept".
+* :attr:`~panobbgo.self_improve.LoopConfig.inactivity_relax_factor`
+  (default ``0.5``).  Multiplicative factor applied to
+  ``eps_accept`` per relaxation step.  Each additional
+  ``inactivity_relax_after`` block of non-accepts halves the threshold
+  again, so after ``k`` blocks the effective threshold is
+  ``eps_accept · factor^k``.
+* :attr:`~panobbgo.self_improve.LoopConfig.inactivity_min_eps_accept`
+  (default ``0.001``).  Floor on the relaxed threshold so a relaxed
+  accept still beats a baseline-grade signal.  Picked to match the
+  bootstrap CI's noise floor at typical quick-mode rep counts.
+
+Behaviour:
+
+* Disabled (``inactivity_relax_after = 0``) ⇒
+  :func:`~panobbgo.self_improve.LoopConfig.effective_eps_accept` is a
+  constant equal to ``eps_accept``, byte-identical to the historical
+  behaviour.
+* Streak length ``s`` ⇒ effective threshold is
+  ``max(eps_accept · factor^(s // after), min_eps_accept)``.
+* On every accept the streak resets to ``0`` and the next iteration
+  starts again at the full ``eps_accept`` — the relaxation is
+  genuinely temporary.
+
+Each iteration records both
+:attr:`~panobbgo.self_improve.LoopIterationRecord.effective_eps_accept`
+and :attr:`~panobbgo.self_improve.LoopIterationRecord.iters_since_accept`
+so an auditor can replay the relax rule deterministically.  Old
+records (written before the feature shipped) carry ``None`` for both
+fields and continue to load unchanged.
+
+Recommended unattended preset, mirroring the planning doc's §10
+"inactivity-guarded loop productivity" sketch:
+
+.. code-block:: bash
+
+   uv run python scripts/self_improve.py run --iterations 100 \
+       --adaptive --adaptive-prime-from-ledger --structural \
+       --guard-interval 10 \
+       --inactivity-relax-after 10 \
+       --inactivity-relax-factor 0.5 \
+       --inactivity-min-eps-accept 0.001
+
+The §11 success criteria pin ``eps_accept`` at a fixed level, so a
+chronic relaxation would silently shift the loop's "improvement" bar.
+The floor + per-iteration ledger field keep this honest: a reviewer
+can grep the ledger for any record whose
+``effective_eps_accept`` is below ``eps_accept`` and audit those
+accepts separately.
+
 Adaptive mutation sampler (§10)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~