Skip to content

Podracing: 1.0461 BPB (3-seed mean) — 5-gram eval + LeakyReLU²#706

Open
newjordan wants to merge 2 commits intoopenai:mainfrom
newjordan:submission/podracing
Open

Podracing: 1.0461 BPB (3-seed mean) — 5-gram eval + LeakyReLU²#706
newjordan wants to merge 2 commits intoopenai:mainfrom
newjordan:submission/podracing

Conversation

@newjordan
Copy link

Results

Seed Sliding BPB 5-gram BPB Artifact
1337 1.1190 1.0451 15.63 MB
42 1.1217 1.0471 15.59 MB
2045 1.1200 1.0460 15.64 MB
Mean 1.1202 1.0461

Progression

PR Mean BPB Notes
#190 The Stinky Frost Recipe
#390, #401 1.1295, 1.1243 Sponge Bath TTT + EMA/SWA/QAT
#445 1.1236 Late Training Replay + EMA + GPTQ-lite
#498, #499 1.1478 The Frugendorff (recursive weight sharing)
#508, #578 1.1215 GPTQ + Early QAT + Legal TTT
#533, #577 1.1207 GPTQ + Short TTT
#587 1.1208 XSA + quantization tuning
#656 1.1195 Three Breadsticks (activation + eval)
This PR 1.0461 Podracing (5-gram eval interpolation)

5-gram Eval (score-first, legal)

Fixed-weight hashed n-gram interpolation during sliding window eval. Concept credited to @deanbrr (PR #659).

  • Cache built from already-scored tokens only (backward-looking)
  • Fixed alpha=0.20: always p_final = 0.80 * p_model + 0.20 * p_ngram
  • No safety gate, no target-aware selection, no min-NLL comparison
  • Hashed count-min sketch (4M buckets), min_count=2
  • Score-first legality: cache updated only AFTER segment scoring

Architecture

11L/512d U-Net, 26.93M params. LeakyReLU² (slope 0.5), XSA last 4, BigramHash 1536. GPTQ int6+zstd, late QAT. TTT disabled.

Reproduce

SEED=2045 MLP_ACT=leaky_relu_sq MLP_LEAKY_SLOPE=0.5 XSA_LAST_N=4 BIGRAM_VOCAB_SIZE=1536 ROPE_DIMS=24 NGRAM_EVAL_ORDER=5 NGRAM_EVAL_ALPHA=0.20 NGRAM_EVAL_MIN_COUNT=2 NGRAM_EVAL_BUCKETS=4194304 torchrun --nproc_per_node=8 train_gpt.py

8xH100 SXM, 600s training + ~190s eval. Training logs and submission.json included.

Octavian and others added 2 commits March 24, 2026 22:49
11L/512d U-Net + legal score-first 5-gram eval interpolation.
Inspired by @deanbrr's n-gram cache technique (PR openai#659).

3-seed results:
  seed 1337: 1.0451  (15.63MB)
  seed 42:   1.0471  (15.59MB)
  seed 2045: 1.0460  (15.64MB)
  mean:      1.0461

Run: SEED=2045 MLP_ACT=leaky_relu_sq MLP_LEAKY_SLOPE=0.5 \
     XSA_LAST_N=4 BIGRAM_VOCAB_SIZE=1536 ROPE_DIMS=24 \
     NGRAM_EVAL_ORDER=5 NGRAM_EVAL_ALPHA=0.20 \
     torchrun --nproc_per_node=8 train_gpt.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3-seed logs (1337, 42, 2045) + submission.json + README.
N-gram eval concept credited to @deanbrr (PR openai#659).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@newjordan
Copy link
Author

dissapointing I have to make a new PR for this as opponed to a commit to my better timed one.

@valerio-oai
Copy link
Contributor

valerio-oai commented Mar 25, 2026

Hi @newjordan , thank you for the logs, but as I mentioned in my last comment on your PR, the run is still illegal on account of GPTQ calibration happening after training time (see the training logs logging 600s of training time and then GPTQ calibrating for 3.6s), meaning it is still accessing training data at eval time, which is disallowed.

@newjordan
Copy link
Author

crushed... sorry to waste your time. back at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants