Skip to content

Submit 1x A100 QAT Fix - 1.4078 BPB (Non-Record) [v2]#707

Closed
Shuvam-Banerji-Seal wants to merge 1 commit intoopenai:mainfrom
Shuvam-Banerji-Seal:submit-single-device-qat-v2
Closed

Submit 1x A100 QAT Fix - 1.4078 BPB (Non-Record) [v2]#707
Shuvam-Banerji-Seal wants to merge 1 commit intoopenai:mainfrom
Shuvam-Banerji-Seal:submit-single-device-qat-v2

Conversation

@Shuvam-Banerji-Seal
Copy link

@Shuvam-Banerji-Seal Shuvam-Banerji-Seal commented Mar 25, 2026

Single-device (A100) run tuning hyperparams down from multi-device scales to ensure proper LR scheduling.

  • Swaps torch.quantile for w.amax().clamp_min to evade a 30x compiler performance penalty in Triton.
  • Fully addresses all previous review feedback on unused dependencies and imports.

Closes #527

Copilot AI review requested due to automatic review settings March 25, 2026 12:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new non-record track_non_record_16mb submission tuned for single-A100 QAT runs, focusing on correct LR scheduling at 1-GPU scale and avoiding a torch.quantile-related compile slowdown in the QAT path.

Changes:

  • Introduces a full train_gpt.py training/eval + int6/int8 mixed-quant export pipeline for the submission.
  • Adds run artifacts (train.log, submission.json, README.md) documenting results and configuration.
  • Updates QAT weight clipping inside CastedLinear to use amax(...).clamp_min(...) instead of torch.quantile in the hot path.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.

File Description
records/track_non_record_16mb/2026-03-23_Single_A100_QAT_FastFix/train_gpt.py Main training script implementing single-A100 hyperparams, QAT in CastedLinear, and export/roundtrip eval.
records/track_non_record_16mb/2026-03-23_Single_A100_QAT_FastFix/train.log Captured training + final sliding-window eval output for the submission.
records/track_non_record_16mb/2026-03-23_Single_A100_QAT_FastFix/submission.json Submission metadata (name, val_bpb, size, blurb, author/date).
records/track_non_record_16mb/2026-03-23_Single_A100_QAT_FastFix/README.md Human-readable summary of the rationale and reported results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1225 to +1227
code_bytes = len(code.encode("utf-8"))
log0(f"Serialized model int6+{_COMPRESSOR}: {quant_file_bytes} bytes")
log0(f"Total submission size int8+zlib: {quant_file_bytes + code_bytes} bytes")
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log message hard-codes "int8+zlib" for total submission size, but this script supports multiple compressors (zstd/lz4/zlib) and the size being reported is for the chosen _COMPRESSOR. This makes logs (and any downstream parsing) misleading when COMPRESSOR != zlib; consider using _COMPRESSOR in the message (and/or aligning the "int6/int8" label with the actual quantization).

Copilot uses AI. Check for mistakes.
Comment on lines +637 to +643
def bigram_hash(self, tokens: Tensor) -> Tensor:
t = tokens.to(torch.int32)
mod = self.bigram_vocab_size - 1
out = torch.empty_like(t)
out[..., 0] = mod
out[..., 1:] = torch.bitwise_xor(36313 * t[..., 1:], 27191 * t[..., :-1]) % mod
return out.long()
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bigram_hash uses mod = bigram_vocab_size - 1 and then does % mod; if BIGRAM_VOCAB_SIZE is set to 1 (or 0), this will divide/modulo by 0 and crash. Add an explicit validation that bigram_vocab_size >= 2 when enabling bigram embeddings, or handle these small values safely.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants