Skip to content

Podracing II: Electric Bugaloo — 0.9625 BPB (3-seed mean, all sub-0.964)#753

Open
newjordan wants to merge 8 commits intoopenai:mainfrom
newjordan:submission/podracing-ii
Open

Podracing II: Electric Bugaloo — 0.9625 BPB (3-seed mean, all sub-0.964)#753
newjordan wants to merge 8 commits intoopenai:mainfrom
newjordan:submission/podracing-ii

Conversation

@newjordan
Copy link

@newjordan newjordan commented Mar 25, 2026

podracing

Results

Seed Sliding BPB 7-gram Backoff BPB Artifact
42 1.1210 0.9631 15.59 MB
2045 1.1196 0.9620 15.71 MB
7 1.1202 0.9624 15.59 MB
Mean 1.1203 0.9625

Progression

PR Mean BPB Notes
#190 The Stinky Frost Recipe
#390, #401 1.1295, 1.1243 Sponge Bath TTT + EMA/SWA/QAT
#445 1.1236 Late Training Replay + GPTQ-lite
#498, #499 1.1378 The Frugendorff
#508, #578 1.1215 GPTQ + Early QAT + Legal TTT
#533, #577 1.1207 GPTQ + Short TTT
#587 1.1208 XSA + quantization tuning
#656 1.1195 Three Breadsticks
#706 1.0461 Podracing I (fixed 5-gram)
#753 0.9625 Podracing II (backoff + adaptive)

What Changed vs Podracing I

Two eval-time improvements, no training changes:

  1. Multi-order backoff (2-7): longest context first, cascade on miss
  2. Entropy-adaptive alpha: trust n-gram more when model is uncertain

Compliance

  • Score-first, backward-looking cache
  • Alpha from model entropy only — no target access
  • GPTQ calibration inside training phase
  • Training logs + submission.json included

Credits

Reproduce

SEED=2045 MLP_ACT=leaky_relu_sq MLP_LEAKY_SLOPE=0.5 XSA_LAST_N=4 BIGRAM_VOCAB_SIZE=1536 ROPE_DIMS=24 NGRAM_EVAL_ORDER=7 NGRAM_EVAL_ADAPTIVE=1 TTT_EVAL_ENABLED=0 torchrun --nproc_per_node=8 train_gpt.py

8xH100 SXM, 600s training + ~140s eval.

Multi-order backoff (2-7) + entropy-adaptive alpha on 11L/512d U-Net.
All 3 seeds sub-1.0. GPTQ calibration inside training phase.

Seeds: 42=0.9631, 2045=0.9620, 7=0.9624, mean=0.9625

Credits: @deanbrr openai#659, @Asukabot0 openai#727, @signalrush openai#414

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@newjordan newjordan force-pushed the submission/podracing-ii branch from f9f804a to ed062df Compare March 25, 2026 18:04
@newjordan newjordan changed the title Podracing II: Electric Bugaloo — 0.9620 BPB (best seed), mean 0.9823 Podracing II: Electric Bugaloo — 0.9625 BPB (3-seed mean, all sub-0.964) Mar 25, 2026
Octavian and others added 7 commits March 25, 2026 13:24
ZERO changes to model, training loop, optimizer, compile, or anything
outside the eval function. The C-step is pure numpy on CPU.

Patch adds:
- 5 env vars (CUBRIC_CADENCE, COUNT_DECAY, BOOST/PRUNE/REWEIGHT)
- _cubric_c_step() function (numpy, CPU-only)
- Buffering + firing logic inside eval_val_sliding_hashed_ngram
- Training path is byte-identical to train_gpt.py

Usage: CUBRIC_CADENCE=4 to enable, CUBRIC_CADENCE=0 (default) = off

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests order (8,9), buckets (8M,16M), min_count (1,3), alpha range,
entropy sigmoid params. All eval-time, no training changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
No more copies. Cubric env vars + C-step function + eval wiring added
directly to the production script. CUBRIC_CADENCE=0 (default) = off,
identical to original. Run script points to real train_gpt.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0.9625 mean BPB. Backoff 2-7 + entropy-adaptive alpha.
Three identical copies for safety.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pure deletion — 166 lines of dead code removed, zero functional change.
TTT eval was gated behind `if args.ttt_eval_enabled:` which was always False.
The function `eval_val_sliding_ttt` and all TTT parameter parsing removed.
N-gram backoff eval, GPTQ, and all scoring paths unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SOTA untouched. Each test is a separate copy:
- train_gpt_baseline.py (clean SOTA copy, control)
- train_gpt_cadence4.py (SOTA + cubric C-step, cadence=4)
- train_gpt_cadence10.py (SOTA + cubric C-step, cadence=10)

Each has its own run script. HYPOTHESES.md documents everything.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
travispchen added a commit to travispchen/parameter-golf that referenced this pull request Mar 25, 2026
…ed mean)

N-gram7 BPB: 0.9370 (±0.0003) across seeds 1337/42/2025
Sliding BPB: 1.1222 (±0.0003)
Artifact: ~15.9 MB (within 16MB cap)
Training: 600s on 8xH100

Key innovation: order-adaptive entropy gating assigns different
entropy thresholds per n-gram order. High-order matches (7-gram)
trusted at moderate model confidence; low-order matches (2-gram)
only trusted when model is very uncertain.

Built on PR openai#753 (Podracing II) with XSA extended to all 11 layers
and entropy_center=3.0.

Co-Authored-By: Travis Chen <travispchen@gmail.com>
ahmettrkck added a commit to ahmettrkck/parameter-golf that referenced this pull request Mar 25, 2026
Logistic domain mixing was wrong for target-probability mixing.
PR openai#753 uses linear: p_mixed = (1-a)*p_neural + a*p_ngram.
Keep CTW-inspired depth-adaptive alpha boost.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant