Skip to content

Add MergedTop3_v3 clean 8xH100 record-track submission#698

Open
hesong0222-dev wants to merge 5 commits intoopenai:mainfrom
hesong0222-dev:riom-v3-clean-submit
Open

Add MergedTop3_v3 clean 8xH100 record-track submission#698
hesong0222-dev wants to merge 5 commits intoopenai:mainfrom
hesong0222-dev:riom-v3-clean-submit

Conversation

@hesong0222-dev
Copy link

Summary

This PR adds a clean track_10min_16mb submission folder:

  • records/track_10min_16mb/2026-03-25_MergedTop3_v3_clean_h100

The submission is a merged top-stack recipe built from the public leaderboard lineage:

  • 11 layers
  • XSA on the last 4 layers
  • EMA only
  • 3x MLP
  • SmearGate
  • BigramHash with 2048 buckets
  • mixed int6 quantization with zstd
  • sequence length 2048
  • Muon/AdamW weight decay 0.04
  • sliding-window eval with stride 64
  • Partial RoPE (ROPE_DIMS=16)
  • layerwise LN scaling
  • GPTQ-lite clip search
  • WARMDOWN_ITERS=3500

This folder is the clean rerun package after removing recovery-only framing and adding strict runtime gates for a fresh uninterrupted 8x H100 execution.

Clean run result

Measured on 8x H100 with a fresh uninterrupted single-seed run:

  • step_stop=5347
  • train_time=580.213s
  • final_int6_roundtrip_exact val_loss=1.96565872
  • final_int6_roundtrip_exact val_bpb=1.16417381
  • eval_time=44.398s
  • bytes_model_int6_zstd=15,562,277
  • bytes_code=72,924
  • bytes_total=15,635,201

This run stayed under both required caps:

  • training time < 600s
  • evaluation time < 600s
  • artifact size < 16,000,000

What is included

  • train_gpt.py
  • README.md
  • submission.json
  • requirements.txt
  • helper scripts for remote bootstrap, strict preflight, one-shot run, and artifact collection

Verification

  • local python3 -m py_compile train_gpt.py passed
  • strict single-process and 8-process remote preflight passed before training
  • clean 8x H100 one-shot run completed
  • final exact metric and final byte accounting were captured from the successful run output and later cross-checked against the recovered remote log/artifacts

Limitations

  • This is a clean single-seed result, not a new-SOTA statistical claim
  • no multi-seed significance test is included in this PR
  • this PR does not include multi-seed statistical evidence

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant