Record: PR549 + MiLe decay + 8-bit Muon + 1.04x LR + Cache+Backout — val_bpb 1.1176 by Gusanidas · Pull Request #703 · openai/parameter-golf

Gusanidas · 2026-03-25T11:36:58Z

Summary

Four orthogonal improvements on PR #549 (LeakyReLU² + Legal TTT + Parallel Muon):

MiLe loss — Entropy-weighted token loss ((1-exp(-entropy))^γ) with γ=1.1 decaying to 0 during warmdown. Focuses training on harder tokens early, then reverts to standard CE.
8-bit Muon momentum — Blockwise symmetric int8 quantization (block_size=256) of Muon first-moment buffers. ~62% memory reduction, lossless.
1.04x LR boost — All learning rates scaled by 1.04x.
Cache+Backout — After layer 7, cache hidden states. Layers 8-10 attention reads from cached (clean) context. Post-decoder: x = x - λ·x_cache where λ is a learned scalar (init 0.1).

Tested on 4xB200 with wall clock 903s to match equivalent 8xH100 compute.
Pending to be tested on 8xH100 SXM.

Test plan

Verify on 8xH100 SXM with 600s wall clock
Confirm 3-seed mean BPB
Check artifact size < 16MB

🤖 Generated with Claude Code

…val_bpb 1.1176 Four orthogonal improvements on PR openai#549 (LeakyReLU² + Legal TTT + Parallel Muon): - MiLe loss: entropy-weighted token loss with γ=1.1 decaying to 0 during warmdown - 8-bit Muon momentum: blockwise symmetric int8 quantization of momentum buffers - 1.04x LR boost: all learning rates scaled by 1.04x - Cache+Backout: cache layer 7 state, late attention reads cached context, subtract backout_lambda * cache from final output Tested on 4xB200 with wall clock 903s to match equivalent 8xH100 compute. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

notapplica mentioned this pull request Mar 25, 2026

Parameter Golf Live AI Commentary + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: PR549 + MiLe decay + 8-bit Muon + 1.04x LR + Cache+Backout — val_bpb 1.1176#703

Record: PR549 + MiLe decay + 8-bit Muon + 1.04x LR + Cache+Backout — val_bpb 1.1176#703
Gusanidas wants to merge 1 commit intoopenai:mainfrom
Gusanidas:alejandro/pr-clean

Gusanidas commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Gusanidas commented Mar 25, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant