Podracing: 1.0461 BPB (3-seed mean) — 5-gram eval + LeakyReLU²#706
Open
newjordan wants to merge 2 commits intoopenai:mainfrom
Open
Podracing: 1.0461 BPB (3-seed mean) — 5-gram eval + LeakyReLU²#706newjordan wants to merge 2 commits intoopenai:mainfrom
newjordan wants to merge 2 commits intoopenai:mainfrom
Conversation
11L/512d U-Net + legal score-first 5-gram eval interpolation. Inspired by @deanbrr's n-gram cache technique (PR openai#659). 3-seed results: seed 1337: 1.0451 (15.63MB) seed 42: 1.0471 (15.59MB) seed 2045: 1.0460 (15.64MB) mean: 1.0461 Run: SEED=2045 MLP_ACT=leaky_relu_sq MLP_LEAKY_SLOPE=0.5 \ XSA_LAST_N=4 BIGRAM_VOCAB_SIZE=1536 ROPE_DIMS=24 \ NGRAM_EVAL_ORDER=5 NGRAM_EVAL_ALPHA=0.20 \ torchrun --nproc_per_node=8 train_gpt.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3-seed logs (1337, 42, 2045) + submission.json + README. N-gram eval concept credited to @deanbrr (PR openai#659). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
|
dissapointing I have to make a new PR for this as opponed to a commit to my better timed one. |
Contributor
|
Hi @newjordan , thank you for the logs, but as I mentioned in my last comment on your PR, the run is still illegal on account of GPTQ calibration happening after training time (see the training logs logging 600s of training time and then GPTQ calibrating for 3.6s), meaning it is still accessing training data at eval time, which is disallowed. |
Author
|
crushed... sorry to waste your time. back at it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Results
Progression
5-gram Eval (score-first, legal)
Fixed-weight hashed n-gram interpolation during sliding window eval. Concept credited to @deanbrr (PR #659).
p_final = 0.80 * p_model + 0.20 * p_ngramArchitecture
11L/512d U-Net, 26.93M params. LeakyReLU² (slope 0.5), XSA last 4, BigramHash 1536. GPTQ int6+zstd, late QAT. TTT disabled.
Reproduce
8xH100 SXM, 600s training + ~190s eval. Training logs and submission.json included.