10L Int5-MLP + BigramHash(4096) + SWA (1.1507 BPB) by Bortlesboat · Pull Request #694 · openai/parameter-golf

Bortlesboat · 2026-03-25T08:16:17Z

Non-record submission

val_bpb: 1.1507 (mean of 3 seeds, sliding window stride=64, post int5/int6+zstd quantization roundtrip)

Seed	val_bpb	artifact_bytes
42	1.1508	15,620,994
1337	1.1499	15,290,882
2024	1.1514	15,327,813

Architecture

10 layers, d=512, GQA 8H/4KV, relu^2
BigramHash(4096, dim=128), SmearGate, U-Net skips
Mixed int5 MLP / int6 attention + zstd-22
SWA(frac=0.4), Muon WD=0.04, warmdown=3000

Based on

thwu1's 10L Int5-MLP submission with reduced BigramHash for reliable size margin across seeds.

Timing (8xH100 SXM)

Training: ~600s (6200 steps)
Eval: ~258s (sliding window stride=64)

Explores stacking eval-time techniques (neural cache, LoRA TTT) and quantization-aware training on top of the openai#1 recipe. QAT has an export mismatch bug resulting in high quantization penalty — submitting as non-record to document the approach for iteration.

Non-record submission. 10 layers, d=512, GQA 8H/4KV, mixed int5/int6 quantization + zstd-22. BigramHash(4096, dim=128), SmearGate, SWA(0.4). Mean of 3 seeds: 1.1507 +/- 0.0006 BPB. All artifacts under 16MB.

Bortlesboat added 2 commits March 20, 2026 23:10

records: 10L Int5-MLP + BigramHash(4096) + SWA (1.1507 BPB)

345f145

Non-record submission. 10 layers, d=512, GQA 8H/4KV, mixed int5/int6 quantization + zstd-22. BigramHash(4096, dim=128), SmearGate, SWA(0.4). Mean of 3 seeds: 1.1507 +/- 0.0006 BPB. All artifacts under 16MB.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

10L Int5-MLP + BigramHash(4096) + SWA (1.1507 BPB)#694

10L Int5-MLP + BigramHash(4096) + SWA (1.1507 BPB)#694
Bortlesboat wants to merge 2 commits intoopenai:mainfrom
Bortlesboat:submission/10L-int5-bigram4096

Bortlesboat commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Bortlesboat commented Mar 25, 2026

Non-record submission

Architecture

Based on

Timing (8xH100 SXM)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant