Podracing II: Electric Bugaloo — 0.9625 BPB (3-seed mean, all sub-0.964) by newjordan · Pull Request #753 · openai/parameter-golf

newjordan · 2026-03-25T17:53:09Z

Results

Seed	Sliding BPB	7-gram Backoff BPB	Artifact
42	1.1210	0.9631	15.59 MB
2045	1.1196	0.9620	15.71 MB
7	1.1202	0.9624	15.59 MB
Mean	1.1203	0.9625	—

Progression

PR	Mean BPB	Notes
#190	—	The Stinky Frost Recipe
#390, #401	1.1295, 1.1243	Sponge Bath TTT + EMA/SWA/QAT
#445	1.1236	Late Training Replay + GPTQ-lite
#498, #499	1.1378	The Frugendorff
#508, #578	1.1215	GPTQ + Early QAT + Legal TTT
#533, #577	1.1207	GPTQ + Short TTT
#587	1.1208	XSA + quantization tuning
#656	1.1195	Three Breadsticks
#706	1.0461	Podracing I (fixed 5-gram)
#753	0.9625	Podracing II (backoff + adaptive)

What Changed vs Podracing I

Two eval-time improvements, no training changes:

Multi-order backoff (2-7): longest context first, cascade on miss
Entropy-adaptive alpha: trust n-gram more when model is uncertain

Compliance

Score-first, backward-looking cache
Alpha from model entropy only — no target access
GPTQ calibration inside training phase
Training logs + submission.json included

Credits

N-gram eval cache: @deanbrr (Record: 5-gram Eval Cache + LeakyReLU² + Parallel Muon val_bpb: 1.0920 (3-seed mean, std 0.0007) | ~15.9 MB | 8×H100 SXM #659)
Backoff + adaptive alpha: @Asukabot0 (Record: First Legal Sub-1.0 BPB — Multi-order N-gram Backoff + Entropy-Adaptive Alpha (val_bpb=0.9674, 3-seed) #727)
Base architecture: @signalrush (Record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 (val_bpb=1.1233) #414)

Reproduce

SEED=2045 MLP_ACT=leaky_relu_sq MLP_LEAKY_SLOPE=0.5 XSA_LAST_N=4 BIGRAM_VOCAB_SIZE=1536 ROPE_DIMS=24 NGRAM_EVAL_ORDER=7 NGRAM_EVAL_ADAPTIVE=1 TTT_EVAL_ENABLED=0 torchrun --nproc_per_node=8 train_gpt.py

8xH100 SXM, 600s training + ~140s eval.

@deanbrr

Multi-order backoff (2-7) + entropy-adaptive alpha on 11L/512d U-Net. All 3 seeds sub-1.0. GPTQ calibration inside training phase. Seeds: 42=0.9631, 2045=0.9620, 7=0.9624, mean=0.9625 Credits: @deanbrr openai#659, @Asukabot0 openai#727, @signalrush openai#414 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ZERO changes to model, training loop, optimizer, compile, or anything outside the eval function. The C-step is pure numpy on CPU. Patch adds: - 5 env vars (CUBRIC_CADENCE, COUNT_DECAY, BOOST/PRUNE/REWEIGHT) - _cubric_c_step() function (numpy, CPU-only) - Buffering + firing logic inside eval_val_sliding_hashed_ngram - Training path is byte-identical to train_gpt.py Usage: CUBRIC_CADENCE=4 to enable, CUBRIC_CADENCE=0 (default) = off Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Tests order (8,9), buckets (8M,16M), min_count (1,3), alpha range, entropy sigmoid params. All eval-time, no training changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

No more copies. Cubric env vars + C-step function + eval wiring added directly to the production script. CUBRIC_CADENCE=0 (default) = off, identical to original. Run script points to real train_gpt.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

0.9625 mean BPB. Backoff 2-7 + entropy-adaptive alpha. Three identical copies for safety. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Pure deletion — 166 lines of dead code removed, zero functional change. TTT eval was gated behind `if args.ttt_eval_enabled:` which was always False. The function `eval_val_sliding_ttt` and all TTT parameter parsing removed. N-gram backoff eval, GPTQ, and all scoring paths unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

SOTA untouched. Each test is a separate copy: - train_gpt_baseline.py (clean SOTA copy, control) - train_gpt_cadence4.py (SOTA + cubric C-step, cadence=4) - train_gpt_cadence10.py (SOTA + cubric C-step, cadence=10) Each has its own run script. HYPOTHESES.md documents everything. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ed mean) N-gram7 BPB: 0.9370 (±0.0003) across seeds 1337/42/2025 Sliding BPB: 1.1222 (±0.0003) Artifact: ~15.9 MB (within 16MB cap) Training: 600s on 8xH100 Key innovation: order-adaptive entropy gating assigns different entropy thresholds per n-gram order. High-order matches (7-gram) trusted at moderate model confidence; low-order matches (2-gram) only trusted when model is very uncertain. Built on PR openai#753 (Podracing II) with XSA extended to all 11 layers and entropy_center=3.0. Co-Authored-By: Travis Chen <travispchen@gmail.com>

Logistic domain mixing was wrong for target-probability mixing. PR openai#753 uses linear: p_mixed = (1-a)*p_neural + a*p_ngram. Keep CTW-inspired depth-adaptive alpha boost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

notapplica mentioned this pull request Mar 25, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

newjordan force-pushed the submission/podracing-ii branch from f9f804a to ed062df Compare March 25, 2026 18:04

newjordan changed the title ~~Podracing II: Electric Bugaloo — 0.9620 BPB (best seed), mean 0.9823~~ Podracing II: Electric Bugaloo — 0.9625 BPB (3-seed mean, all sub-0.964) Mar 25, 2026

Octavian and others added 7 commits March 25, 2026 13:24

N-gram parameter sweep: 10 arms, one variable each, 1-GPU Vast

f26350f

Tests order (8,9), buckets (8M,16M), min_count (1,3), alpha range, entropy sigmoid params. All eval-time, no training changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add proper run script for cubric test — no more paste issues

7199387

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Podracer garage: SOTA train_gpt.py + run.sh × 3 safety copies

52138d4

0.9625 mean BPB. Backoff 2-7 + entropy-adaptive alpha. Three identical copies for safety. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ndokutovich mentioned this pull request Mar 25, 2026

Record: Curriculum Learning + LeakyReLU(0.9)² + 7-gram Backoff (val_bpb=0.9633) #764

Open

3 tasks

travispchen mentioned this pull request Mar 25, 2026

Record: Order-Adaptive Entropy Gating + XSA-All (val_bpb=0.9370) #774

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Podracing II: Electric Bugaloo — 0.9625 BPB (3-seed mean, all sub-0.964)#753

Podracing II: Electric Bugaloo — 0.9625 BPB (3-seed mean, all sub-0.964)#753
newjordan wants to merge 8 commits intoopenai:mainfrom
newjordan:submission/podracing-ii

newjordan commented Mar 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

newjordan commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Results

Progression

What Changed vs Podracing I

Compliance

Credits

Reproduce

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

newjordan commented Mar 25, 2026 •

edited

Loading