Skip to content

Add classical 4-gram non-record submission#701

Open
Muhtasham wants to merge 1 commit intoopenai:mainfrom
Muhtasham:muhtasham/classical-4gram-submit
Open

Add classical 4-gram non-record submission#701
Muhtasham wants to merge 1 commit intoopenai:mainfrom
Muhtasham:muhtasham/classical-4gram-submit

Conversation

@Muhtasham
Copy link

Summary

This PR adds a non-record classical submission under records/track_non_record_16mb/2026-03-25_classical_4gram_10m_eval.

The submission is fully non-neural:

  • discounted hashed 4-gram model with unigram/bigram backoff
  • artifact built from the official fineweb10B_sp1024 train export
  • exact evaluation on the official fineweb_val_* split
  • no training-data access during evaluation; final eval uses the saved artifact via --load-state only

Exact Results

  • val_bpb: 1.91070694
  • validation tokens: 62,021,846
  • evaluation wallclock: 571.97s
  • model artifact bytes: 14,310,783
  • code bytes: 57,801
  • total bytes: 14,368,584

Included Files

  • README.md
  • submission.json
  • train.log
  • eval.log
  • train_gpt.py

Notes

This is a non-record submission, not a SOTA claim.

The final packaged line is the faster classical path:

  • first 10,000,000 train tokens used to build the artifact
  • --absolute-discount 0.75
  • --ngram-contexts 3
  • --mix-backoff-experts 0
  • no cache experts
  • no copy experts

I verified locally that:

  • the artifact is under the decimal 16,000,000 byte cap
  • the final full-validation run is performed on the official fineweb_val_* split
  • the final evaluation does not read training shards
  • the records folder compiles and runs from within the folder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant