openai · ndokutovich · Mar 25, 2026 · Mar 26, 2026
diff --git a/...rds/track_10min_16mb/2026-03-25_CurriculumLearning_LeakyReLU09_Ngram7/README.md b/...rds/track_10min_16mb/2026-03-25_CurriculumLearning_LeakyReLU09_Ngram7/README.md
@@ -0,0 +1,64 @@
+# Record: Curriculum Learning + LeakyReLU(0.9)^2 + 7-gram Backoff (val_bpb=0.9633)
+
+**val_bpb = 0.9633** (seed 42, additional seeds pending compute grant) | **15.56 MB** | 8xH100 SXM, 600s
+
+## Approach
+
+Built on PR #753 (Podracing II) with two additions:
+
+### 1. Curriculum Learning (Shard Reordering)
+
+Training shards reordered by model perplexity — hardest shards first. Based on PR #650 by @abaybektursun which demonstrated -0.003 BPB from shard ordering alone. Zero code change, just environment variable.
+
+### 2. LeakyReLU(0.9)^2 Slope Optimization
+
+Following @MatoTeziTanka's controlled slope sweep on issue #140, replaced standard slope=0.5 with slope=0.9. The sweep showed monotonic improvement from 0.1 to 0.9, with 0.9 giving -0.013 BPB vs 0.5 on the same stack.
+
+## Results
+
+| Metric | Value |
+|--------|-------|
+| Sliding window (stride=64) | 1.1216 |
+| **Sliding + 7-gram backoff** | **0.9633** |
+| Legal TTT (score-first, 3ep) | 1.1216 |
+| Artifact | 15,560,351 bytes |
+| Steps | 6,647 at 90.3ms/step |
+| Training time | 600s |
+
+## Architecture (from PR #753)
+
+- 11L, 512d, GQA 8/4, MLP 3x
+- LeakyReLU(0.9)^2 activation
+- XSA on all 11 layers
+- BigramHash, SmearGate, SWA, EMA
+- Int6 QAT + GPTQ (within training budget, issue #677 compliant)
+- 7-gram backoff eval cache (backward-looking, no weight updates)
+
+## Eval-time Techniques
+
+**7-gram backoff cache** (from PR #753): Multi-order n-gram model built from already-scored tokens. Linear interpolation with entropy-adaptive alpha. Fully backward-looking — each token scored before its statistics enter the cache.
+
+**Legal score-first TTT** (from PR #753): SGD with 3 epochs, freeze last 2 blocks. Every token scored under inference_mode before any weight update.
+
+## Reproduction
+
+```bash
+SEED=42 bash run.sh
+```
+
+Environment variables set in run.sh:
+- `SHARD_ORDER=44,63,65,42,...` (curriculum learning)
+- `MLP_LEAKY_SLOPE=0.9`
+- `NGRAM_EVAL_ORDER=7`
+
+## Acknowledgments
+
+- @newjordan (PR #753, Podracing II base)
+- @abaybektursun (PR #650, curriculum learning / shard reordering)
+- @MatoTeziTanka (LeakyReLU slope sweep, issue #140)
+- @Asukabot0 (PR #715/#727, n-gram backoff technique)
+
+## Status
+
+1 seed submitted. 2 additional seeds pending OpenAI compute grant ($1000 applied).
+Previously PR #486 (formerly #2 on leaderboard, TrigramHash originator). $339 personal compute spent.
diff --git a/records/track_10min_16mb/2026-03-25_CurriculumLearning_LeakyReLU09_Ngram7/run.sh b/records/track_10min_16mb/2026-03-25_CurriculumLearning_LeakyReLU09_Ngram7/run.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+set -euo pipefail
+export PYTHONUNBUFFERED=1
+
+SEED="${SEED:-42}"
+
+# Production-ready: PR #753 base + curriculum learning
+export SEED
+export SHARD_ORDER="${SHARD_ORDER:-44,63,65,42,18,67,30,69,61,3,13,19,50,49,56,45,73,79,57,32,28,68,66,34,46,38,17,77,0,14,26,74,59,62,41,9,58,22,78,4,48,8,12,27,75,36,16,43,52,15,33,47,25,55,54,23,37,51,31,21,60,1,20,72,24,53,39,35,71,76,40,5,10,2,7,6,70,11,64,29}"
+# N-gram backoff defaults from PR #753
+export NGRAM_EVAL_ORDER="${NGRAM_EVAL_ORDER:-7}"
+# LeakyReLU slope 0.9 > 0.5 (MatoTeziTanka sweep, -0.013 BPP)
+export MLP_LEAKY_SLOPE="${MLP_LEAKY_SLOPE:-0.9}"
+
+NGPU=$(nvidia-smi -L 2>/dev/null | wc -l)
+echo "GPUs: $NGPU | Seed: $SEED | Ngram: $NGRAM_EVAL_ORDER | Shard order: ${SHARD_ORDER:+yes}"
+
+if [ "$NGPU" -gt 1 ]; then
+    torchrun --standalone --nproc_per_node="$NGPU" train_gpt.py
+else
+    python train_gpt.py
+fi
diff --git a/records/track_10min_16mb/2026-03-25_CurriculumLearning_LeakyReLU09_Ngram7/submission.json b/records/track_10min_16mb/2026-03-25_CurriculumLearning_LeakyReLU09_Ngram7/submission.json
@@ -0,0 +1,17 @@
+{
+  "name": "Curriculum Learning + LeakyReLU(0.9)² + 7-gram Backoff",
+  "author": "ndokutovich",
+  "github_id": "ndokutovich",
+  "val_bpb": 0.9633,
+  "val_loss": 1.6265,
+  "bytes_total": 15560351,
+  "artifact_bytes": 15560351,
+  "training_time_seconds": 600,
+  "eval_time_seconds": 131,
+  "hardware": "8xH100 SXM",
+  "seed": 42,
+  "num_seeds": 1,
+  "date": "2026-03-25",
+  "blurb": "PR #753 base + curriculum learning (hardest-first shard reorder, PR #650) + LeakyReLU(0.9)² slope optimization (MatoTeziTanka sweep) + 7-gram backoff eval cache. 1 seed, 2 additional pending compute grant.",
+  "notes": "Previously PR #486 (formerly #2 on leaderboard, TrigramHash originator). $360 personal compute spent."
+}