Skip to content

Shubhaditya14/ScratchGPT

Repository files navigation

A minimal GPT-2 (124M) implementation built from scratch in PyTorch.

ScratchGPT is an educational implementation of GPT-2, focused on understanding transformers end-to-end: tokenization, attention, transformer blocks, training loops, and autoregressive generation. The codebase is lightweight, readable, and easy to extend.

Features

GPT-2 Small architecture (124M params)

GPT-2 BPE tokenization (tiktoken)

Multi-head self-attention

Causal masking

Transformer blocks (MHSA + MLP + residuals + LayerNorm)

AdamW optimizer with warmup + cosine decay

Dataloaders for packed sequences

Text generation (temperature, top-k, top-p)

TPU v5e support (XLA multi-core training)

Project Structure ScratchGPT/ model/ gpt.py block.py attention.py data/ loader.py shakespeare.txt tokenizer.py config.py scripts/ train_gpu.py train_tpu_v5e.py generate.py

Training GPU pip install -r requirements.txt python -m scripts.train_gpu

TPU (v5e) python -m scripts.train_tpu_v5e

Generate Text python -m scripts.generate --checkpoint checkpoints/gpt2_124m.pt

Concepts Learned

GPT-2 BPE tokenization

Shifted (x, y) next-token prediction

Embeddings + positional encodings

Q/K/V projections and attention scores

Multi-head attention + concatenation

Feed-forward MLP layers

Residual connections + LayerNorm

Causal masking

Training loop construction

Sampling for text generation

About

GPT-2 implemented from scratch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages