Skip to content

plau666/NAIL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NAIL — Noise-robust Aggregation for Imitation Learning

The repo has two experiment stacks:

Directory What it runs
gsm/ LoRA distillation on GSM8K/TinyGSM with Gemma student/expert models.
modadd/ Modular-addition experiments with a small transformer trained from scratch.

Setup

Install once from the repo root:

uv sync --locked
source .venv/bin/activate

This creates .venv/ from uv.lock and installs the dependencies for both experiment stacks, including PyTorch CUDA 12.8, vLLM, Transformers, PEFT, Hydra, and W&B.

Then choose a stack:

cd gsm      # real-model GSM8K/TinyGSM experiments
# or
cd modadd   # modular-addition experiments

See gsm/README.md and modadd/README.md for the full commands.

Method Map

GSM commands below are run from inside gsm/.

Method GSM command Modadd command
LogLossBC bash scripts/train.sh configs/offline_bc.yaml python -m nanogpt.run experiment=modadd_noisy_bc
NAIL-F bash scripts/train.sh configs/nail_f.yaml python -m nanogpt.run experiment=modadd_nail
NAIL-R bash scripts/train.sh configs/nail_r.yaml python -m nanogpt.run experiment=modadd_nail_reverse_mc_fixed
NAIL-Mixed bash scripts/train.sh configs/nail_mixed.yaml python -m nanogpt.run experiment=modadd_nail task.loss=mixed task.kl_beta=<beta>
OPD-F bash scripts/train.sh configs/opd_f.yaml python -m nanogpt.run experiment=modadd_opd_forward
OPD-R bash scripts/train.sh configs/opd_r.yaml python -m nanogpt.run experiment=modadd_opd

Attribution

The base causal transformer in modadd/model.py and the nanogpt package name are derived from Andrej Karpathy's nanoGPT, MIT licensed.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors