NAIL — Noise-robust Aggregation for Imitation Learning

The repo has two experiment stacks:

Directory	What it runs
`gsm/`	LoRA distillation on GSM8K/TinyGSM with Gemma student/expert models.
`modadd/`	Modular-addition experiments with a small transformer trained from scratch.

Setup

Install once from the repo root:

uv sync --locked
source .venv/bin/activate

This creates .venv/ from uv.lock and installs the dependencies for both experiment stacks, including PyTorch CUDA 12.8, vLLM, Transformers, PEFT, Hydra, and W&B.

Then choose a stack:

cd gsm      # real-model GSM8K/TinyGSM experiments
# or
cd modadd   # modular-addition experiments

See gsm/README.md and modadd/README.md for the full commands.

Method Map

GSM commands below are run from inside gsm/.

Method	GSM command	Modadd command
LogLossBC	`bash scripts/train.sh configs/offline_bc.yaml`	`python -m nanogpt.run experiment=modadd_noisy_bc`
NAIL-F	`bash scripts/train.sh configs/nail_f.yaml`	`python -m nanogpt.run experiment=modadd_nail`
NAIL-R	`bash scripts/train.sh configs/nail_r.yaml`	`python -m nanogpt.run experiment=modadd_nail_reverse_mc_fixed`
NAIL-Mixed	`bash scripts/train.sh configs/nail_mixed.yaml`	`python -m nanogpt.run experiment=modadd_nail task.loss=mixed task.kl_beta=<beta>`
OPD-F	`bash scripts/train.sh configs/opd_f.yaml`	`python -m nanogpt.run experiment=modadd_opd_forward`
OPD-R	`bash scripts/train.sh configs/opd_r.yaml`	`python -m nanogpt.run experiment=modadd_opd`

Attribution

The base causal transformer in modadd/model.py and the nanogpt package name are derived from Andrej Karpathy's nanoGPT, MIT licensed.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
gsm		gsm
modadd		modadd
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NAIL — Noise-robust Aggregation for Imitation Learning

Setup

Method Map

Attribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NAIL — Noise-robust Aggregation for Imitation Learning

Setup

Method Map

Attribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages