codify: Sobol.scramble=False in Rewarding_Diverse (first ledger-driven default change) by haraldschilly · Pull Request #236 · haraldschilly/panobbgo

haraldschilly · 2026-05-31T05:22:32Z

Summary

First application of the planning doc §12.3 step 2 codification rule — "if a self-improvement loop rule keeps winning, change the default."

panobbgo/harness.py _make_quick_strategies now ships Rewarding_Diverse with (Sobol, {"n": 16, "scramble": False}) instead of scramble=True.

Evidence

Three independent positive accepts in the self-improvement loop ledger (planning/done/self_improve_ledger_2026-05-31.jsonl iter 9 / 15 / 17 of the 2026-05 window):

iter=9   Δ=+0.0511  CI=[+0.0089, +0.0933]  worst=+0.0000
iter=15  Δ=+0.0217  CI=[+0.0056, +0.0433]  worst=+0.0000
iter=17  Δ=+0.0317  CI=[+0.0050, +0.0583]  worst=+0.0000

Every accept had its bootstrap-CI lower bound strictly above zero and zero per-pair regression — clean wins under the §6.2 statistical acceptance rule. All three accepts flipped True → False (the categorical rule always excludes the current value), so the data is consistent about the direction.

Why `False` beats `True` at quick mode

At n=16 in the quick-mode 2-D battery the deterministic Sobol' sequence places its first 16 points at provably space-filling locations of the unit hypercube (the digit-shifted construction is exactly a low-discrepancy net at n = 2^k). Owen scrambling preserves equidistribution in expectation but perturbs the specific positions — at small n the variance this introduces in coverage dominates the gain from breaking axis-aligned correlations. The downstream local heuristics (Random, Nearby, NelderMead) all start from those Sobol' points, so a more uniform "first looks" grid pays compound returns.

The harness-randomized rotations / translations already provide per-rep instance diversity, so the per-rep "different points" benefit that motivates scrambling in the literature is largely supplied by the harness rather than the Sobol heuristic itself.

What's preserved

Catalog rule unchanged. ("Sobol", "scramble", "categorical_choice") still applies (the predicate is "kwarg explicitly set", not "kwarg value is True"); the bandit can flip back to True if a future battery prefers Owen scrambling.
Sobol class default unchanged. Sobol.__init__ still defaults to scramble=True (the literature default); only the Rewarding_Diverse spec changes.
BayesOpt_Sobol (standard mode) unchanged. The quick-mode cron never exercises that strategy, so there's no ledger evidence for it yet — conservative move is to leave it alone and let the bandit explore.
panobbgo/harness_ioh.py unchanged. Same reason — no IOH-specific ledger signal yet.

Why archive the ledger

The categorical rule's bandit arm key is ("Sobol", "scramble", "categorical_choice"), which does not distinguish proposal direction. After the codification, every future proposal on Rewarding_Diverse flips False → True; if the new bandit primed from the un-archived ledger, its Beta posterior would carry stale "True → False good" history into a "False → True ?" sampling regime. Archiving the ledger to planning/done/ per §12.3 step 5 lets the next nightly cron rebuild the posterior from a clean slate.

Backwards compatibility

Strictly safe at the heuristic level. The Sobol class default is unchanged. Existing tests that construct Sobol directly with explicit kwargs are unaffected. The historical composite score baseline shifts up by the codified margin — that's the point of codification (per §11 success criteria, this is exactly the kind of "sustained positive trend" the framework's been building toward).

Test plan

uv run pytest tests/test_heuristic_sobol.py tests/test_harness.py tests/test_self_improve.py -q — 258 tests pass.
uv run pytest tests/ -q --ignore=tests/test_harness_ioh.py -x — 1165 tests pass (1 skipped, 1 flaky retried & passed).
uv run sphinx-build -b doctest doc/source doc/build/doctest — 96 doctests pass.
uv run ruff check / ruff format --check / pyright on changed files — all clean.

Independence from open PRs

Checked the open PR list before picking this task:

feat(lbfgsb): multi-start L-BFGS-B gradient local optimizer + structural catalog #232 (LBFGSB multi-start) — touches heuristics/lbfgsb.py + structural catalog. No conflict.
feat(nl_shade_lbc): NL-SHADE-LBC adaptive DE (CEC 2022 winner) #233 (NL-SHADE-LBC) — touches heuristics/nl_shade_lbc.py + structural catalog. No conflict.
feat(pso): Random informer-graph topology (Mendes 2004 / Clerc 2007 / SPSO 2011) #234 (PSO random topology) — touches heuristics/pso.py + categorical rules. No conflict.
feat(self_improve): inactivity-guarded eps_accept relaxation #235 (Inactivity-guarded eps_accept) — touches self_improve.py accept gate. Light overlap in self_improve.py (one comment-only change in catalog rules), but no semantic conflict.

Documentation updated

planning/SELF_IMPROVEMENT_LOOP.md: new §13 entry "Codify Sobol.scramble=False in Rewarding_Diverse".
doc/source/guide_benchmarking.rst: categorical-rule section callout for the codified default.
doc/source/guide.rst: quick-nav mention of the codification.
doc/source/heuristics.rst: Sobol bullet refreshed.
panobbgo/heuristics/sobol.py: class docstring notes the empirical finding.
panobbgo/harness.py: _make_quick_strategies docstring cites the codification, with an inline comment on the spec.
panobbgo/self_improve.py: catalog rule comment refreshed.
TODO.md: entry at the top of "Recent Improvements (continued)".

Follow-ups left

Open as next-iteration ideas in planning/SELF_IMPROVEMENT_LOOP.md:

Replicate the codification logic for BayesOpt_Sobol / harness_ioh.py once the standard-mode loop or IOH track gathers analogous evidence.
Sobol.n codification — the ledger shows 7 accepts on Sobol.n from 16, but the new values (8 / 12 / 12 / 20 / 20 / 24 / 24) are mixed across both directions. Not enough signal yet to pick a single codification target; revisit after another ~50 iterations of cron data.

https://claude.ai/code/session_01Nn7zT5kguoDQgAKMaTCQqc

Generated by Claude Code

…r accepts First application of planning/SELF_IMPROVEMENT_LOOP.md §12.3 step 2: when a self-improvement loop rule keeps winning, change the default. Three independent positive accepts (iter=9 Δ=+0.0511, iter=15 Δ=+0.0217, iter=17 Δ=+0.0317), each with bootstrap-CI lower bound strictly above zero and zero per-pair regression — clean wins under the §6.2 statistical acceptance rule. All three flipped True → False (the catalog rule always excludes the current value), so the data is consistent about the direction. At n=16 in the quick-mode 2-D battery the deterministic Sobol' grid is already optimally space-filling; Owen scrambling perturbs those grid points and adds variance the local heuristics then absorb. The catalog rule (Sobol, scramble, categorical_choice) stays live so the bandit can flip back if a future battery prefers True. BayesOpt_Sobol (standard mode) and harness_ioh.py are unchanged — no ledger evidence on those strategies yet. Archives the training ledger to planning/done/ per §12.3 step 5 so the next nightly cron primes the bandit from a clean slate without conflating the pre- and post-codification accept regimes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codify: Sobol.scramble=False in Rewarding_Diverse (first ledger-driven default change)#236

codify: Sobol.scramble=False in Rewarding_Diverse (first ledger-driven default change)#236
haraldschilly wants to merge 1 commit into
masterfrom
claude/funny-hamilton-ZROAT

haraldschilly commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

haraldschilly commented May 31, 2026

Summary

Evidence

Why False beats True at quick mode

What's preserved

Why archive the ledger

Backwards compatibility

Test plan

Independence from open PRs

Documentation updated

Follow-ups left

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Why `False` beats `True` at quick mode