Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -415,6 +415,20 @@ uv run python scripts/self_improve.py run --iterations 100 \
--mode standard --holdout-base-seeds 1234,5678,9012 \
--fail-on-overfit-ci --holdout-ci-confidence 0.95

# Inactivity-guarded eps_accept relaxation (shipped 2026-05-30). Break
# out of long accept droughts by geometrically decaying eps_accept
# after every N consecutive non-accepts, floored at min_eps_accept,
# re-tightened on the next accept. Each iteration ledger record
# persists the effective threshold + streak so the rule is auditable.
# Recommended for the unattended nightly cron where the documented
# accept rate is 1–5%.
uv run python scripts/self_improve.py run --iterations 100 \
--adaptive --adaptive-prime-from-ledger --structural \
--guard-interval 10 \
--inactivity-relax-after 10 \
--inactivity-relax-factor 0.5 \
--inactivity-min-eps-accept 0.001

# Inspect the ledger
uv run python scripts/self_improve.py summary
```
Expand Down
59 changes: 59 additions & 0 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,65 @@

## Recent Improvements (continued)

### Inactivity-guarded eps_accept relaxation — 2026-05-30
- [x] **Three new `LoopConfig` knobs** in `panobbgo/self_improve.py`:
`inactivity_relax_after` (default `0` = disabled),
`inactivity_relax_factor` (default `0.5`), and
`inactivity_min_eps_accept` (default `0.001`). When enabled, the
loop's accept gate decays the configured `eps_accept`
geometrically by `factor` for every additional `after`-block of
consecutive non-accepts, floored at `min_eps_accept`, re-tightened
on the next accept. Closes the *Inactivity-guarded loop
productivity* follow-up in `planning/SELF_IMPROVEMENT_LOOP.md`.
- **Why it matters.** The most recent unattended ledger
(`planning/self_improve_summary.txt`) records 15 accepts in 326
decided iterations (4.6%); earlier windows produced 1 accept in 86
iterations (~1.2%). At those rates the Thompson sampler's Beta
posteriors barely move off the prior — defeating the point of
adaptive sampling. A geometric relaxation lets the loop reach for
borderline improvements (delta between `min_eps_accept` and
`eps_accept`) that the paired-bootstrap CI rules in as
statistically distinguishable from zero — exactly the regime where
the historical point-gate was leaving signal on the floor.
- [x] **New `LoopConfig.effective_eps_accept(iters_since_accept)` helper**
returning `max(eps_accept · factor^(s // after), min_eps_accept)`
so the rule is callable directly from tests / callers without
reaching into the loop driver.
- [x] **Two new `LoopIterationRecord` fields** — `effective_eps_accept`
and `iters_since_accept` — persist the threshold that
`statistical_accept` actually saw and the streak length consulted
to compute it. Both default to `None` on legacy records so the
JSONL load path keeps working against historical ledgers.
- [x] **CLI flags** on `scripts/self_improve.py run` —
`--inactivity-relax-after`, `--inactivity-relax-factor`, and
`--inactivity-min-eps-accept` — mirror the `LoopConfig` knobs
with the same defaults (`0`, `0.5`, `0.001`).
- [x] **15 new tests in `tests/test_self_improve.py`** (total 210):
`TestInactivityRelaxConfig` covers validation (negative `after`,
out-of-range `factor`, negative / too-large floor) and the
threshold maths (no-relax before threshold, geometric decay,
floor clamping); `TestInactivityRelaxIntegration` covers
end-to-end loop behaviour (records carry effective threshold +
streak, streak resets on accept, skip-iterations count toward
streak, borderline +0.04 delta is accepted by relaxed 0.025 gate
and rejected again after reset, disabled mode populates fields
with constant `eps_accept`, ledger round-trip, legacy record
construction).
- [x] **Backwards compatibility** — strictly safe. Defaults disable
the feature; when `after = 0`, `effective_eps_accept` is a
constant equal to `eps_accept` and the loop passes the same
value to `statistical_accept` as before. Composite baseline on
every default battery is byte-identical and existing ledgers
stay valid (the two new record fields default to `None` for
legacy records).
- [x] **Documentation updated** — `planning/SELF_IMPROVEMENT_LOOP.md`
(§13 entry, follow-up promoted to "shipped" with the unshipped
half left open, new "Inactivity-relax telemetry in summary view"
follow-up), `doc/source/guide_benchmarking.rst` (new subsection
with the geometric-decay maths and recommended unattended
preset), `doc/source/guide.rst` (quick-nav entry), and
`AGENTS.md` (run-the-loop bash example).

### NL-SHADE-RSP adaptive DE (CEC 2021 winner) — 2026-05-25
- [x] **New `NLSHADE_RSP` heuristic** in
`panobbgo/heuristics/nl_shade_rsp.py`; closes the *NL-SHADE-RSP /
Expand Down
2 changes: 1 addition & 1 deletion doc/source/guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Quick Navigation
* - **Interested in research?**
- Explore :doc:`guide_research` for related work, theoretical properties, and future directions
* - **Want to measure progress?**
- Read :doc:`guide_benchmarking` for the composite score, external baselines, parametrically randomised problems, the statistical acceptance rule (with paired and unpaired bootstrap sampling — paired is auto-selected for the rep-aligned randomized harness and shrinks the CI 3-10× over the historical independent-resample scheme), the autonomous self-improvement loop driver, the anti-cherry-pick guard, the hold-out validation set (single- and multi-seed, with bootstrap-CI aggregation across hold-out seeds), the adaptive Thompson-sampling mutation sampler with optional per-class structural bandit arms, the structural ``add_heuristic`` / ``drop_heuristic`` portfolio mutations, the categorical ``MutationRule`` kind (for discrete knobs like ``PSO.topology``, ``Sobol.scramble``, ``LSHADE.archive_factor``, ``LSHADE.F_schedule``, and ``NLSHADE_RSP.adaptive_archive``), the tri-topology PSO (``gbest`` / ``lbest`` / ``vonneumann`` — fully-connected, ring, and 4-connected 2-D toroidal grid per Kennedy & Mendes 2003 / Mendes 2004) candidate pool, the L-SHADE adaptive Differential Evolution heuristic (Tanabe-Fukunaga 2014) with two opt-in jSO refinements (the linearly-decreasing ``p_best`` schedule from iLSHADE / jSO Brest et al. 2016 / 2017 and the three-phase asymmetric F-cap from jSO Brest et al. 2017), the literature-faithful jSO heuristic itself (Brest, Maučec & Bošković 2017 — CEC-2017 winner — inheriting the L-SHADE F-cap machinery by construction), the NL-SHADE-RSP heuristic (Stanovov, Akhmedova & Semenkin 2021 — CEC-2021 winner — a jSO subclass adding non-linear population reduction, rank-based selective pressure, and a randomised adaptive archive), and the COBYQA derivative-free trust-region local optimizer (Ragonneau-Zhang 2023)
- Read :doc:`guide_benchmarking` for the composite score, external baselines, parametrically randomised problems, the statistical acceptance rule (with paired and unpaired bootstrap sampling — paired is auto-selected for the rep-aligned randomized harness and shrinks the CI 3-10× over the historical independent-resample scheme), the autonomous self-improvement loop driver, the anti-cherry-pick guard, the hold-out validation set (single- and multi-seed, with bootstrap-CI aggregation across hold-out seeds), the adaptive Thompson-sampling mutation sampler with optional per-class structural bandit arms, the structural ``add_heuristic`` / ``drop_heuristic`` portfolio mutations, the categorical ``MutationRule`` kind (for discrete knobs like ``PSO.topology``, ``Sobol.scramble``, ``LSHADE.archive_factor``, ``LSHADE.F_schedule``, and ``NLSHADE_RSP.adaptive_archive``), the tri-topology PSO (``gbest`` / ``lbest`` / ``vonneumann`` — fully-connected, ring, and 4-connected 2-D toroidal grid per Kennedy & Mendes 2003 / Mendes 2004) candidate pool, the L-SHADE adaptive Differential Evolution heuristic (Tanabe-Fukunaga 2014) with two opt-in jSO refinements (the linearly-decreasing ``p_best`` schedule from iLSHADE / jSO Brest et al. 2016 / 2017 and the three-phase asymmetric F-cap from jSO Brest et al. 2017), the literature-faithful jSO heuristic itself (Brest, Maučec & Bošković 2017 — CEC-2017 winner — inheriting the L-SHADE F-cap machinery by construction), the NL-SHADE-RSP heuristic (Stanovov, Akhmedova & Semenkin 2021 — CEC-2021 winner — a jSO subclass adding non-linear population reduction, rank-based selective pressure, and a randomised adaptive archive), the COBYQA derivative-free trust-region local optimizer (Ragonneau-Zhang 2023), and the inactivity-guarded ``eps_accept`` relaxation knob that breaks the loop out of long accept droughts by geometrically decaying the accept threshold after every ``inactivity_relax_after`` consecutive non-accepts (floored at ``inactivity_min_eps_accept``, re-tightened on the next accept, with per-iteration ledger fields for honesty)

Guide Contents
--------------
Expand Down
70 changes: 70 additions & 0 deletions doc/source/guide_benchmarking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -619,6 +619,76 @@ Programmatic use:
)
iter_records, guard_records = SelfImprover(cfg).run_with_guard_records()

Inactivity-guarded eps_accept relaxation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Long unattended ledgers tend to show very low accept rates (the cron
that motivated this knob recently recorded *1 accept in 86 iterations*).
That is small enough that the adaptive sampler's posterior remains
close to its prior for most arms — defeating the point of bandit
sampling. The mitigation is to temporarily lower the accept threshold
after the loop has gone too long without an accept, then re-tighten on
the next accept.

Three knobs on :class:`~panobbgo.self_improve.LoopConfig` (mirrored as
``--inactivity-relax-after`` / ``--inactivity-relax-factor`` /
``--inactivity-min-eps-accept`` on the CLI):

* :attr:`~panobbgo.self_improve.LoopConfig.inactivity_relax_after`
(default ``0`` = disabled). Number of consecutive non-accept
iterations after which the relax rule starts to fire. Both
*skip*-iterations (no applicable mutation) and *reject*-iterations
count toward the streak — the bandit cares about observed accepts,
not about how the loop reached "no accept".
* :attr:`~panobbgo.self_improve.LoopConfig.inactivity_relax_factor`
(default ``0.5``). Multiplicative factor applied to
``eps_accept`` per relaxation step. Each additional
``inactivity_relax_after`` block of non-accepts halves the threshold
again, so after ``k`` blocks the effective threshold is
``eps_accept · factor^k``.
* :attr:`~panobbgo.self_improve.LoopConfig.inactivity_min_eps_accept`
(default ``0.001``). Floor on the relaxed threshold so a relaxed
accept still beats a baseline-grade signal. Picked to match the
bootstrap CI's noise floor at typical quick-mode rep counts.

Behaviour:

* Disabled (``inactivity_relax_after = 0``) ⇒
:func:`~panobbgo.self_improve.LoopConfig.effective_eps_accept` is a
constant equal to ``eps_accept``, byte-identical to the historical
behaviour.
* Streak length ``s`` ⇒ effective threshold is
``max(eps_accept · factor^(s // after), min_eps_accept)``.
* On every accept the streak resets to ``0`` and the next iteration
starts again at the full ``eps_accept`` — the relaxation is
genuinely temporary.

Each iteration records both
:attr:`~panobbgo.self_improve.LoopIterationRecord.effective_eps_accept`
and :attr:`~panobbgo.self_improve.LoopIterationRecord.iters_since_accept`
so an auditor can replay the relax rule deterministically. Old
records (written before the feature shipped) carry ``None`` for both
fields and continue to load unchanged.

Recommended unattended preset, mirroring the planning doc's §10
"inactivity-guarded loop productivity" sketch:

.. code-block:: bash

uv run python scripts/self_improve.py run --iterations 100 \
--adaptive --adaptive-prime-from-ledger --structural \
--guard-interval 10 \
--inactivity-relax-after 10 \
--inactivity-relax-factor 0.5 \
--inactivity-min-eps-accept 0.001

The §11 success criteria pin ``eps_accept`` at a fixed level, so a
chronic relaxation would silently shift the loop's "improvement" bar.
The floor + per-iteration ledger field keep this honest: a reviewer
can grep the ledger for any record whose
``effective_eps_accept`` is below ``eps_accept`` and audit those
accepts separately.

Adaptive mutation sampler (§10)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
Loading