This is the code for running the analyses in The Supervision Horizon: An Exploration of the Mechanics of On-Policy Self-Distillation. There are three core analyses:
- Computing core metrics across the sequence
- Measuring the gradient norm from training on the first vs last 25% of the sequence
- Measuring the within-problem cosine similarity of gradients from training on the first vs last 25% of the sequence
This project uses uv. Collect the dependencies with:
uv syncTo run each experiment, do:
uv run -m analysis.experiment_1_core_metricsuv run -m analysis.experiment_2_gradient_normuv run -m analysis.experiment_3_gradient_similarityEach experiment has the following arguments you can adjust:
- model-name
- num-problems
- num-rollouts
- temperature
- max-new-tokens
- top-k
- output-dir