Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Classical 4-gram Artifact

This is a non-record classical submission based on a discounted hashed 4-gram model exported as a compressed artifact and evaluated exactly on the full FineWeb validation split.

The model is fully non-neural:
- no transformer
- no embeddings to train
- no GPU dependence in the solver itself
- no training-data access during evaluation beyond the saved artifact

## Configuration

- Track: `non-record-16mb`
- Model: discounted hashed 4-gram with backoff to bigram and unigram
- Artifact build data: first `10,000,000` training tokens
- Artifact bytes: `14,310,783`
- Code bytes (`train_gpt.py`): `57,801`
- Total submission bytes: `14,368,584`

Command used to build the artifact:

```bash
./.venv/bin/python records/track_non_record_16mb/2026-03-25_classical_4gram_10m_eval/train_gpt.py \
--skip-validation 1 \
--save-state /tmp/state_ng4_10000k_comp.zlib \
--train-pattern 'data/datasets/fineweb10B_sp1024/fineweb_train_*.bin' \
--warmup-tokens 10000000 \
--cache-windows '' \
--copy-contexts '' \
--doc-copy-contexts '' \
--absolute-discount 0.75 \
--ngram-contexts 3 \
--mix-backoff-experts 0
```

Command used for the final full-validation evaluation:

```bash
./.venv/bin/python records/track_non_record_16mb/2026-03-25_classical_4gram_10m_eval/train_gpt.py \
--max-tokens 0 \
--report-every 5000000 \
--load-state /tmp/state_ng4_10000k_comp.zlib \
--cache-windows '' \
--copy-contexts '' \
--doc-copy-contexts '' \
--absolute-discount 0.75 \
--ngram-contexts 3 \
--mix-backoff-experts 0
```

## Exact Metrics

- Full validation tokens loaded: `62,021,846`
- Predictions: `62,021,845`
- Full-validation `val_bpb`: `1.91070694`
- Full-validation wallclock: `571.97` seconds
- Validation bytes: `151,080,891`

Artifact build run:
- warmup predictions: `9,999,999`
- artifact build wallclock: `68.63` seconds

This line is much weaker than the best neural submissions, but it now satisfies the mechanical submission constraints locally:
- exact full-validation run
- artifact under `16,000,000` bytes
- single-file `train_gpt.py`
- full-validation runtime under `10` minutes on this machine

## Included Files

- `train_gpt.py` — single-file classical solver
- `submission.json` — metadata for the run
- `train.log` — exact artifact-build stdout
- `eval.log` — exact full-validation stdout
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
step=5000000 bpb=2.005436 tok_per_s=87059 weights=[ngram_4=1.000]
step=10000000 bpb=1.997001 tok_per_s=101167 weights=[ngram_4=1.000]
step=15000000 bpb=1.977351 tok_per_s=98463 weights=[ngram_4=1.000]
step=20000000 bpb=1.962266 tok_per_s=100651 weights=[ngram_4=1.000]
step=25000000 bpb=1.948778 tok_per_s=102760 weights=[ngram_4=1.000]
step=30000000 bpb=1.943999 tok_per_s=104743 weights=[ngram_4=1.000]
step=35000000 bpb=1.939619 tok_per_s=105619 weights=[ngram_4=1.000]
step=40000000 bpb=1.932213 tok_per_s=106524 weights=[ngram_4=1.000]
step=45000000 bpb=1.923477 tok_per_s=106111 weights=[ngram_4=1.000]
step=50000000 bpb=1.922186 tok_per_s=107292 weights=[ngram_4=1.000]
step=55000000 bpb=1.918598 tok_per_s=108088 weights=[ngram_4=1.000]
step=60000000 bpb=1.912753 tok_per_s=108456 weights=[ngram_4=1.000]
loaded_state_bytes=14310783
loaded_warmup_predictions=9999999
tokens_loaded=62021846
predictions=62021845
total_bytes=151080891
val_bpb=1.91070694
elapsed_seconds=571.97
expert_weights:
ngram_4: weight=1.000000 avg_logloss_bits=4.654349
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"author": "muhtasham",
"github_id": "Muhtasham",
"name": "Classical 4-gram Artifact",
"blurb": "Non-record classical submission: discounted hashed 4-gram model built from 10M train tokens, exported as a 14.31MB compressed artifact, and evaluated exactly on the full FineWeb validation split. Full-val BPB is 1.91070694 with full-val wallclock 571.97s and total submission size 14,368,584 bytes.",
"date": "2026-03-25T00:00:00Z",
"track": "non-record-16mb",
"val_loss": null,
"val_bpb": 1.91070694,
"pre_quant_val_loss": null,
"pre_quant_val_bpb": null,
"step_stop": null,
"wallclock_seconds": 68.63,
"bytes_total": 14368584,
"bytes_model_int8_zlib": 14310783,
"bytes_code": 57801,
"gpu": "local CPU"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
warmup_predictions=9999999
warmup_elapsed_seconds=44.01
saved_state_bytes=14310783
elapsed_seconds=68.63
Loading