Skip to content

Non-record: BigramHash(4096) + Cosine EMA + LZMA-9#681

Open
Alfaxad wants to merge 1 commit intoopenai:mainfrom
Alfaxad:submission/bigram4096-cosine-ema-lzma9
Open

Non-record: BigramHash(4096) + Cosine EMA + LZMA-9#681
Alfaxad wants to merge 1 commit intoopenai:mainfrom
Alfaxad:submission/bigram4096-cosine-ema-lzma9

Conversation

@Alfaxad
Copy link

@Alfaxad Alfaxad commented Mar 25, 2026

Summary

Results (1xH100)

Metric Value
Pre-quant val BPB 1.3628
Legal TTT BPB 1.4775
Artifact size 7.9MB (49% of 16MB)
Steps (1xH100, 10 min) 940

Notes

  • 8xH100 evaluation pending (budget constraint) — on 8xH100 this runs ~7500 steps
  • Artifact is very compact (7.9MB), leaving room for model expansion
  • The cosine EMA creates a quantization gap (1.37→2.21 roundtrip) that TTT partially recovers — fixed EMA may be preferable
  • Developed using ShinkaEvolve (Sakana AI) evolutionary code optimization with GPT-5.4 + Gemini 3 Pro as mutation operators

Test plan

  • Verified script runs on 1xH100 with torchrun
  • Training completes within 10 min wallclock
  • Artifact under 16MB (7.9MB)
  • Legal TTT produces valid BPB score
  • 8xH100 evaluation (pending compute budget)

🤖 Generated with Claude Code

Non-record submission tested on 1xH100 (940 steps, val_bpb=1.4775).
Built on PR openai#549 stack with:
- BigramHash expanded 2048->4096
- Cosine EMA schedule (0.99->0.999)
- Earlier late QAT (threshold 0.15->0.10)
- LZMA preset 6->9

Artifact: 7.9MB (49% of 16MB limit).
Developed using ShinkaEvolve evolutionary code optimization.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant