Record: Depth Recurrence (layers 4 and 5 repeated): val_bpb 1.1182 by msisovic · Pull Request #686 · openai/parameter-golf

msisovic · 2026-03-25T05:58:51Z

Summary

Building on PR #549, I explored two directions for improving val_bpb: width scaling (MODEL_DIM=576) and depth scaling (adding layers). Width scaling to dim=576 provided a regression in performance. Depth scaling to 12 independent layers at dim=512 reached 1.1126 post-TTT - significantly better - so I decided to go in that direction.

This led me to depth recurrence: re-executing mid-network layers with independent learnable block scalars, getting the depth benefit without the parameter/size cost. Layers 4 and 5 are each executed twice in sequence (pattern: 0,1,2,3,4,5,4,5,6,7,8,9,10), producing 13 virtual layers from 11 physical. Only ~2K block scalar params are added. Dual recurrence recovers ~70% of the independent 12-layer gain while keeping the artifact well under budget at ~15.9MB.

I also confirmed that tied TTT (no weight untying for recurrent layers) performs equivalently to untied, and that the TTT gain (~0.0025 BPB) is consistent regardless of ecurrence config. Everything else (TTT, int6 quantization, SWA, bigram embeddings, value embeddings, Muon optimizer) is inherited from #549.

Config	Params	Artifact	Post-TTT val_bpb
PR #549 baseline (11L)	~24M	~19.5MB	1.1194
Full 12L (over budget)	~29M	~17.3MB	1.1126
Recur L5 (11→12 virtual)	~27M	~15.9MB	1.1180
Recur L4,5 (11→13 virtual)	~27M	~15.9MB	1.1182

Reproducibility

Seed	val_loss	val_bpb
1337	1.88749538	1.11788404
2025	1.88948575	1.11906285
2024	1.88811812	1.11825287
Mean	1.88836642	1.11839992
Std	0.00083132	0.00049235

Run Commands

# Seed 1337 (default)
ITERATIONS=9000 RECUR_LAYERS=4,5 TTT_ENABLED=1 TTT_UNTIE=0 \
  torchrun --nproc_per_node=8 train_gpt.py

# Seed 2025
ITERATIONS=9000 RECUR_LAYERS=4,5 TTT_ENABLED=1 TTT_UNTIE=0 SEED=2025 \
  torchrun --nproc_per_node=8 train_gpt.py

# Seed 2024
ITERATIONS=9000 RECUR_LAYERS=4,5 TTT_ENABLED=1 TTT_UNTIE=0 SEED=2024 \
  torchrun --nproc_per_node=8 train_gpt.py

Seed 1337 complete (val_bpb=1.1179). Seeds 42 and 2024 need rerun after GPU restart (stale CUDA contexts blocking clean runs). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Previous run accidentally used 8000 iterations. Reran with 9000 to match other seeds. Mean val_bpb: 1.1184 (was 1.1182), std: 0.00049 (was 0.00076). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

msisovic · 2026-03-25T11:59:52Z

Note: One of the 3 runs was ran with ITERATIONS=8000, instead of 9000, reran and fixed now, no actual changes were done since the submission.

msisovic and others added 6 commits March 25, 2026 01:43

Getting some good results

e9477e4

Add submission: Depth Recurrence (layers 4,5) + TTT

51984ca

Seed 1337 complete (val_bpb=1.1179). Seeds 42 and 2024 need rerun after GPU restart (stale CUDA contexts blocking clean runs). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Finalize 3-seed results: mean val_bpb=1.1182 (seeds 1337,2025,2024)

9c2cad6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove experiment_results.md from submission

bf96887

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Restore train_gpt.py to match main branch

a9bb2ab

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Clean up submission README: first person, simplify approach section

174142c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

msisovic changed the title ~~Submission/2026 03 25 recur layers ttt~~ Record: Depth Recurrence (layers 4 and 5 repeated): val_bpb 1.1182 Mar 25, 2026

Rerun seed 2024 with ITERATIONS=9000 for consistency

f5b9631

Previous run accidentally used 8000 iterations. Reran with 9000 to match other seeds. Mean val_bpb: 1.1184 (was 1.1182), std: 0.00049 (was 0.00076). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

stukenov mentioned this pull request Mar 25, 2026

Record: XSA-all + Depth Recurrence + Hedge Mixer TTT (val_bpb=1.0278, 3-seed mean) #733

Closed

Polishing readme

3162c39

notapplica mentioned this pull request Mar 25, 2026

Parameter Golf Live AI Commentary + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Depth Recurrence (layers 4 and 5 repeated): val_bpb 1.1182#686

Record: Depth Recurrence (layers 4 and 5 repeated): val_bpb 1.1182#686
msisovic wants to merge 8 commits intoopenai:mainfrom
msisovic:submission/2026-03-25_RecurLayers_TTT

msisovic commented Mar 25, 2026 •

edited

Loading

Uh oh!

msisovic commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

msisovic commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reproducibility

Run Commands

Uh oh!

msisovic commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

msisovic commented Mar 25, 2026 •

edited

Loading