codify: Sobol.scramble=False in Rewarding_Diverse (first ledger-driven default change)#236
Draft
haraldschilly wants to merge 1 commit into
Draft
codify: Sobol.scramble=False in Rewarding_Diverse (first ledger-driven default change)#236haraldschilly wants to merge 1 commit into
haraldschilly wants to merge 1 commit into
Conversation
…r accepts First application of planning/SELF_IMPROVEMENT_LOOP.md §12.3 step 2: when a self-improvement loop rule keeps winning, change the default. Three independent positive accepts (iter=9 Δ=+0.0511, iter=15 Δ=+0.0217, iter=17 Δ=+0.0317), each with bootstrap-CI lower bound strictly above zero and zero per-pair regression — clean wins under the §6.2 statistical acceptance rule. All three flipped True → False (the catalog rule always excludes the current value), so the data is consistent about the direction. At n=16 in the quick-mode 2-D battery the deterministic Sobol' grid is already optimally space-filling; Owen scrambling perturbs those grid points and adds variance the local heuristics then absorb. The catalog rule (Sobol, scramble, categorical_choice) stays live so the bandit can flip back if a future battery prefers True. BayesOpt_Sobol (standard mode) and harness_ioh.py are unchanged — no ledger evidence on those strategies yet. Archives the training ledger to planning/done/ per §12.3 step 5 so the next nightly cron primes the bandit from a clean slate without conflating the pre- and post-codification accept regimes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First application of the planning doc §12.3 step 2 codification rule — "if a self-improvement loop rule keeps winning, change the default."
panobbgo/harness.py_make_quick_strategiesnow shipsRewarding_Diversewith(Sobol, {"n": 16, "scramble": False})instead ofscramble=True.Evidence
Three independent positive accepts in the self-improvement loop ledger (
planning/done/self_improve_ledger_2026-05-31.jsonliter 9 / 15 / 17 of the 2026-05 window):Every accept had its bootstrap-CI lower bound strictly above zero and zero per-pair regression — clean wins under the §6.2 statistical acceptance rule. All three accepts flipped
True → False(the categorical rule always excludes the current value), so the data is consistent about the direction.Why
FalsebeatsTrueat quick modeAt
n=16in the quick-mode 2-D battery the deterministic Sobol' sequence places its first 16 points at provably space-filling locations of the unit hypercube (the digit-shifted construction is exactly a low-discrepancy net atn = 2^k). Owen scrambling preserves equidistribution in expectation but perturbs the specific positions — at smallnthe variance this introduces in coverage dominates the gain from breaking axis-aligned correlations. The downstream local heuristics (Random, Nearby, NelderMead) all start from those Sobol' points, so a more uniform "first looks" grid pays compound returns.The harness-randomized rotations / translations already provide per-rep instance diversity, so the per-rep "different points" benefit that motivates scrambling in the literature is largely supplied by the harness rather than the Sobol heuristic itself.
What's preserved
("Sobol", "scramble", "categorical_choice")still applies (the predicate is "kwarg explicitly set", not "kwarg value is True"); the bandit can flip back toTrueif a future battery prefers Owen scrambling.Sobolclass default unchanged.Sobol.__init__still defaults toscramble=True(the literature default); only theRewarding_Diversespec changes.BayesOpt_Sobol(standard mode) unchanged. The quick-mode cron never exercises that strategy, so there's no ledger evidence for it yet — conservative move is to leave it alone and let the bandit explore.panobbgo/harness_ioh.pyunchanged. Same reason — no IOH-specific ledger signal yet.Why archive the ledger
The categorical rule's bandit arm key is
("Sobol", "scramble", "categorical_choice"), which does not distinguish proposal direction. After the codification, every future proposal onRewarding_DiverseflipsFalse → True; if the new bandit primed from the un-archived ledger, its Beta posterior would carry stale "True → False good" history into a "False → True ?" sampling regime. Archiving the ledger toplanning/done/per §12.3 step 5 lets the next nightly cron rebuild the posterior from a clean slate.Backwards compatibility
Strictly safe at the heuristic level. The
Sobolclass default is unchanged. Existing tests that constructSoboldirectly with explicit kwargs are unaffected. The historical composite score baseline shifts up by the codified margin — that's the point of codification (per §11 success criteria, this is exactly the kind of "sustained positive trend" the framework's been building toward).Test plan
uv run pytest tests/test_heuristic_sobol.py tests/test_harness.py tests/test_self_improve.py -q— 258 tests pass.uv run pytest tests/ -q --ignore=tests/test_harness_ioh.py -x— 1165 tests pass (1 skipped, 1 flaky retried & passed).uv run sphinx-build -b doctest doc/source doc/build/doctest— 96 doctests pass.uv run ruff check/ruff format --check/pyrighton changed files — all clean.Independence from open PRs
Checked the open PR list before picking this task:
heuristics/lbfgsb.py+ structural catalog. No conflict.heuristics/nl_shade_lbc.py+ structural catalog. No conflict.heuristics/pso.py+ categorical rules. No conflict.self_improve.pyaccept gate. Light overlap inself_improve.py(one comment-only change in catalog rules), but no semantic conflict.Documentation updated
planning/SELF_IMPROVEMENT_LOOP.md: new §13 entry "CodifySobol.scramble=FalseinRewarding_Diverse".doc/source/guide_benchmarking.rst: categorical-rule section callout for the codified default.doc/source/guide.rst: quick-nav mention of the codification.doc/source/heuristics.rst: Sobol bullet refreshed.panobbgo/heuristics/sobol.py: class docstring notes the empirical finding.panobbgo/harness.py:_make_quick_strategiesdocstring cites the codification, with an inline comment on the spec.panobbgo/self_improve.py: catalog rule comment refreshed.TODO.md: entry at the top of "Recent Improvements (continued)".Follow-ups left
Open as next-iteration ideas in
planning/SELF_IMPROVEMENT_LOOP.md:BayesOpt_Sobol/harness_ioh.pyonce the standard-mode loop or IOH track gathers analogous evidence.Sobol.nfrom16, but the new values (8 / 12 / 12 / 20 / 20 / 24 / 24) are mixed across both directions. Not enough signal yet to pick a single codification target; revisit after another ~50 iterations of cron data.https://claude.ai/code/session_01Nn7zT5kguoDQgAKMaTCQqc
Generated by Claude Code