Skip to content

Submit 1x A100 QAT Fix - 1.4078 BPB (Non-Record) [v3]#712

Closed
Shuvam-Banerji-Seal wants to merge 2 commits intoopenai:mainfrom
Shuvam-Banerji-Seal:submit-single-device-qat-v3
Closed

Submit 1x A100 QAT Fix - 1.4078 BPB (Non-Record) [v3]#712
Shuvam-Banerji-Seal wants to merge 2 commits intoopenai:mainfrom
Shuvam-Banerji-Seal:submit-single-device-qat-v3

Conversation

@Shuvam-Banerji-Seal
Copy link

Single-device (A100) run tuning hyperparams down from multi-device scales to ensure proper LR scheduling.

  • Swaps torch.quantile for w.amax().clamp_min to evade a 30x compiler performance penalty in Triton.
  • Fully addresses all previous review feedback on unused dependencies and imports.
  • Fixes bigram embeddings missing validations when size < 2.
  • Log message properly tracks compressor variable strings.

Closes #707

Copilot AI review requested due to automatic review settings March 25, 2026 13:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new non-record submission folder capturing a single-A100 QAT tuning run intended to avoid a torch.quantile compile-time slowdown and to fit a 10-minute wallclock training constraint.

Changes:

  • Adds a full train_gpt.py snapshot implementing QAT-in-CastedLinear, mixed int6/int8 export, and sliding-window eval.
  • Adds run artifacts (train.log) and metadata (submission.json) for the recorded result.
  • Adds a short README describing the motivation and reported metrics.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

File Description
records/track_non_record_16mb/2026-03-23_Single_A100_QAT_FastFix/train_gpt.py Training + export script for the submission (QAT, quant/export, eval).
records/track_non_record_16mb/2026-03-23_Single_A100_QAT_FastFix/train.log Captured training/eval log for the run.
records/track_non_record_16mb/2026-03-23_Single_A100_QAT_FastFix/submission.json Leaderboard metadata for the submission.
records/track_non_record_16mb/2026-03-23_Single_A100_QAT_FastFix/README.md Submission overview and claimed results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ix/train_gpt.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants