Skip to content

Record: Curriculum Learning + LeakyReLU(0.9)² + 7-gram Backoff (val_bpb=0.9633)#764

Open
ndokutovich wants to merge 1 commit intoopenai:mainfrom
ndokutovich:submission-v7-curriculum-ngram
Open

Record: Curriculum Learning + LeakyReLU(0.9)² + 7-gram Backoff (val_bpb=0.9633)#764
ndokutovich wants to merge 1 commit intoopenai:mainfrom
ndokutovich:submission-v7-curriculum-ngram

Conversation

@ndokutovich
Copy link

Summary

val_bpb = 0.9633 (seed 42, additional seeds pending compute grant) | 15.56 MB | 8xH100 SXM, 600s

Built on PR #753 (Podracing II) with two novel additions:

1. Curriculum Learning (Shard Reordering)

Training shards reordered by model perplexity — hardest shards first. Based on PR #650 (-0.003 BPB). Zero code change, environment variable only.

2. LeakyReLU(0.9)² Slope Optimization

Following @MatoTeziTanka's controlled sweep (issue #140): slope 0.9 gives -0.013 BPB vs standard 0.5. One parameter change.

Results

Eval Method BPB
Sliding window (stride=64) 1.1216
Sliding + 7-gram backoff 0.9633
Legal TTT (score-first, 3ep) 1.1216

Artifact: 15,560,351 bytes (< 16MB)
Steps: 6,647 at 90.3ms/step
GPTQ calibration within training budget (issue #677 compliant)

Reproduction

SEED=42 bash run.sh

Acknowledgments

@newjordan (PR #753), @abaybektursun (PR #650), @MatoTeziTanka (slope sweep), @Asukabot0 (n-gram backoff)

Status

1 seed submitted. 2 additional seeds pending OpenAI compute grant.
Previously PR #486 (formerly #2 on leaderboard, TrigramHash originator). $339 personal compute spent.

Test plan

  • 1 seed (42) validated on 8xH100 SXM
  • Seed 1337 (pending compute)
  • Seed 2024 (pending compute)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant