Skip to content

codify: Sobol.scramble=False in Rewarding_Diverse (first ledger-driven default change)#236

Draft
haraldschilly wants to merge 1 commit into
masterfrom
claude/funny-hamilton-ZROAT
Draft

codify: Sobol.scramble=False in Rewarding_Diverse (first ledger-driven default change)#236
haraldschilly wants to merge 1 commit into
masterfrom
claude/funny-hamilton-ZROAT

Conversation

@haraldschilly
Copy link
Copy Markdown
Owner

Summary

First application of the planning doc §12.3 step 2 codification rule — "if a self-improvement loop rule keeps winning, change the default."

panobbgo/harness.py _make_quick_strategies now ships Rewarding_Diverse with (Sobol, {"n": 16, "scramble": False}) instead of scramble=True.

Evidence

Three independent positive accepts in the self-improvement loop ledger (planning/done/self_improve_ledger_2026-05-31.jsonl iter 9 / 15 / 17 of the 2026-05 window):

iter=9   Δ=+0.0511  CI=[+0.0089, +0.0933]  worst=+0.0000
iter=15  Δ=+0.0217  CI=[+0.0056, +0.0433]  worst=+0.0000
iter=17  Δ=+0.0317  CI=[+0.0050, +0.0583]  worst=+0.0000

Every accept had its bootstrap-CI lower bound strictly above zero and zero per-pair regression — clean wins under the §6.2 statistical acceptance rule. All three accepts flipped True → False (the categorical rule always excludes the current value), so the data is consistent about the direction.

Why False beats True at quick mode

At n=16 in the quick-mode 2-D battery the deterministic Sobol' sequence places its first 16 points at provably space-filling locations of the unit hypercube (the digit-shifted construction is exactly a low-discrepancy net at n = 2^k). Owen scrambling preserves equidistribution in expectation but perturbs the specific positions — at small n the variance this introduces in coverage dominates the gain from breaking axis-aligned correlations. The downstream local heuristics (Random, Nearby, NelderMead) all start from those Sobol' points, so a more uniform "first looks" grid pays compound returns.

The harness-randomized rotations / translations already provide per-rep instance diversity, so the per-rep "different points" benefit that motivates scrambling in the literature is largely supplied by the harness rather than the Sobol heuristic itself.

What's preserved

  • Catalog rule unchanged. ("Sobol", "scramble", "categorical_choice") still applies (the predicate is "kwarg explicitly set", not "kwarg value is True"); the bandit can flip back to True if a future battery prefers Owen scrambling.
  • Sobol class default unchanged. Sobol.__init__ still defaults to scramble=True (the literature default); only the Rewarding_Diverse spec changes.
  • BayesOpt_Sobol (standard mode) unchanged. The quick-mode cron never exercises that strategy, so there's no ledger evidence for it yet — conservative move is to leave it alone and let the bandit explore.
  • panobbgo/harness_ioh.py unchanged. Same reason — no IOH-specific ledger signal yet.

Why archive the ledger

The categorical rule's bandit arm key is ("Sobol", "scramble", "categorical_choice"), which does not distinguish proposal direction. After the codification, every future proposal on Rewarding_Diverse flips False → True; if the new bandit primed from the un-archived ledger, its Beta posterior would carry stale "True → False good" history into a "False → True ?" sampling regime. Archiving the ledger to planning/done/ per §12.3 step 5 lets the next nightly cron rebuild the posterior from a clean slate.

Backwards compatibility

Strictly safe at the heuristic level. The Sobol class default is unchanged. Existing tests that construct Sobol directly with explicit kwargs are unaffected. The historical composite score baseline shifts up by the codified margin — that's the point of codification (per §11 success criteria, this is exactly the kind of "sustained positive trend" the framework's been building toward).

Test plan

  • uv run pytest tests/test_heuristic_sobol.py tests/test_harness.py tests/test_self_improve.py -q — 258 tests pass.
  • uv run pytest tests/ -q --ignore=tests/test_harness_ioh.py -x — 1165 tests pass (1 skipped, 1 flaky retried & passed).
  • uv run sphinx-build -b doctest doc/source doc/build/doctest — 96 doctests pass.
  • uv run ruff check / ruff format --check / pyright on changed files — all clean.

Independence from open PRs

Checked the open PR list before picking this task:

Documentation updated

  • planning/SELF_IMPROVEMENT_LOOP.md: new §13 entry "Codify Sobol.scramble=False in Rewarding_Diverse".
  • doc/source/guide_benchmarking.rst: categorical-rule section callout for the codified default.
  • doc/source/guide.rst: quick-nav mention of the codification.
  • doc/source/heuristics.rst: Sobol bullet refreshed.
  • panobbgo/heuristics/sobol.py: class docstring notes the empirical finding.
  • panobbgo/harness.py: _make_quick_strategies docstring cites the codification, with an inline comment on the spec.
  • panobbgo/self_improve.py: catalog rule comment refreshed.
  • TODO.md: entry at the top of "Recent Improvements (continued)".

Follow-ups left

Open as next-iteration ideas in planning/SELF_IMPROVEMENT_LOOP.md:

  • Replicate the codification logic for BayesOpt_Sobol / harness_ioh.py once the standard-mode loop or IOH track gathers analogous evidence.
  • Sobol.n codification — the ledger shows 7 accepts on Sobol.n from 16, but the new values (8 / 12 / 12 / 20 / 20 / 24 / 24) are mixed across both directions. Not enough signal yet to pick a single codification target; revisit after another ~50 iterations of cron data.

https://claude.ai/code/session_01Nn7zT5kguoDQgAKMaTCQqc


Generated by Claude Code

…r accepts

First application of planning/SELF_IMPROVEMENT_LOOP.md §12.3 step 2:
when a self-improvement loop rule keeps winning, change the default.

Three independent positive accepts (iter=9 Δ=+0.0511, iter=15 Δ=+0.0217,
iter=17 Δ=+0.0317), each with bootstrap-CI lower bound strictly above
zero and zero per-pair regression — clean wins under the §6.2
statistical acceptance rule.  All three flipped True → False (the
catalog rule always excludes the current value), so the data is
consistent about the direction.

At n=16 in the quick-mode 2-D battery the deterministic Sobol' grid is
already optimally space-filling; Owen scrambling perturbs those grid
points and adds variance the local heuristics then absorb.  The
catalog rule (Sobol, scramble, categorical_choice) stays live so the
bandit can flip back if a future battery prefers True.

BayesOpt_Sobol (standard mode) and harness_ioh.py are unchanged — no
ledger evidence on those strategies yet.

Archives the training ledger to planning/done/ per §12.3 step 5 so the
next nightly cron primes the bandit from a clean slate without
conflating the pre- and post-codification accept regimes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants