Here, we share our ideas and code for building LLMs – including Transformers, GPT-2, and training methods like SFT, DPO, and GRPO – entirely from scratch. We also provide simple mathematical derivations for algorithms such as DPO, GRPO and on-policy distillation, along with insights into recent research topics in LLMs, such as reasoning. We hope you find these resources helpful.
tianbingsz/LLM
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|