Podracing II: Electric Bugaloo — 0.9625 BPB (3-seed mean, all sub-0.964)#753
Open
newjordan wants to merge 8 commits intoopenai:mainfrom
Open
Podracing II: Electric Bugaloo — 0.9625 BPB (3-seed mean, all sub-0.964)#753newjordan wants to merge 8 commits intoopenai:mainfrom
newjordan wants to merge 8 commits intoopenai:mainfrom
Conversation
Multi-order backoff (2-7) + entropy-adaptive alpha on 11L/512d U-Net. All 3 seeds sub-1.0. GPTQ calibration inside training phase. Seeds: 42=0.9631, 2045=0.9620, 7=0.9624, mean=0.9625 Credits: @deanbrr openai#659, @Asukabot0 openai#727, @signalrush openai#414 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f9f804a to
ed062df
Compare
ZERO changes to model, training loop, optimizer, compile, or anything outside the eval function. The C-step is pure numpy on CPU. Patch adds: - 5 env vars (CUBRIC_CADENCE, COUNT_DECAY, BOOST/PRUNE/REWEIGHT) - _cubric_c_step() function (numpy, CPU-only) - Buffering + firing logic inside eval_val_sliding_hashed_ngram - Training path is byte-identical to train_gpt.py Usage: CUBRIC_CADENCE=4 to enable, CUBRIC_CADENCE=0 (default) = off Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests order (8,9), buckets (8M,16M), min_count (1,3), alpha range, entropy sigmoid params. All eval-time, no training changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
No more copies. Cubric env vars + C-step function + eval wiring added directly to the production script. CUBRIC_CADENCE=0 (default) = off, identical to original. Run script points to real train_gpt.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0.9625 mean BPB. Backoff 2-7 + entropy-adaptive alpha. Three identical copies for safety. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pure deletion — 166 lines of dead code removed, zero functional change. TTT eval was gated behind `if args.ttt_eval_enabled:` which was always False. The function `eval_val_sliding_ttt` and all TTT parameter parsing removed. N-gram backoff eval, GPTQ, and all scoring paths unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SOTA untouched. Each test is a separate copy: - train_gpt_baseline.py (clean SOTA copy, control) - train_gpt_cadence4.py (SOTA + cubric C-step, cadence=4) - train_gpt_cadence10.py (SOTA + cubric C-step, cadence=10) Each has its own run script. HYPOTHESES.md documents everything. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3 tasks
travispchen
added a commit
to travispchen/parameter-golf
that referenced
this pull request
Mar 25, 2026
…ed mean) N-gram7 BPB: 0.9370 (±0.0003) across seeds 1337/42/2025 Sliding BPB: 1.1222 (±0.0003) Artifact: ~15.9 MB (within 16MB cap) Training: 600s on 8xH100 Key innovation: order-adaptive entropy gating assigns different entropy thresholds per n-gram order. High-order matches (7-gram) trusted at moderate model confidence; low-order matches (2-gram) only trusted when model is very uncertain. Built on PR openai#753 (Podracing II) with XSA extended to all 11 layers and entropy_center=3.0. Co-Authored-By: Travis Chen <travispchen@gmail.com>
ahmettrkck
added a commit
to ahmettrkck/parameter-golf
that referenced
this pull request
Mar 25, 2026
Logistic domain mixing was wrong for target-probability mixing. PR openai#753 uses linear: p_mixed = (1-a)*p_neural + a*p_ngram. Keep CTW-inspired depth-adaptive alpha boost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Results
Progression
What Changed vs Podracing I
Two eval-time improvements, no training changes:
Compliance
Credits
Reproduce
8xH100 SXM, 600s training + ~140s eval.