LLM Stories

A series of essays building up mental models for how modern LLMs are actually served — written in plain language, no math notation, lots of diagrams.

📖 Live site: https://wgzesg.github.io/llm_stories/

The goal isn't to teach you the equations. It's to build the intuitions that make every later equation feel inevitable. Each article picks one slice of the LLM serving pipeline and walks through it as a discovery journey — the kind where each section ends with "oh, that's all it is?"

Most articles come in two languages: English and 中文. The Chinese versions keep technical terms in English (中英混排) — they're written for Chinese tech developers and learners, not as literal translations.

Articles

#	Title	English	中文
01	An LLM, end to end	index.md	—
02	Tensor parallelism, built from scratch in your head	index.md	index.zh.md
03	Walking tensor parallelism through a full block	index.md	index.zh.md
04	How to batch many requests through one forward pass	index.md	index.zh.md
05	ORCA and chunked prefill: evening out the iteration	index.md	—
06	Prefill and decode disaggregation: two phases on opposite sides of the roofline	index.md	index.zh.md

The full list — shipped, in progress, and the holes still left to dig — lives in the roadmap.

Tech stack

Static site generator: Hugo (extended)
Theme: PaperMod (added as a git submodule under themes/PaperMod)
Hosting: GitHub Pages, deployed automatically by .github/workflows/hugo.yml on every push to main

Local preview

# Clone with the theme submodule
git clone --recurse-submodules <repo-url>
cd llm_stories

# If you cloned without --recurse-submodules:
git submodule update --init --recursive

# Run the dev server
hugo server -D --buildDrafts

# Open http://localhost:1313/llm_stories/

Adding a new article

hugo new content posts/02-some-article/index.md       # English (default)
hugo new content posts/02-some-article/index.zh.md    # Chinese

Then edit the frontmatter (draft: false when ready) and the body.

Editing published articles

Just edit the markdown in content/posts/<slug>/ and git push. The GitHub Action rebuilds the site automatically. Markdown is the source of truth; nothing is ever "locked."

Style

No matrix-math notation. Just shapes ([N × d]) and stories.
Pictures over LaTeX. ASCII diagrams for high-level layout sketches; inline SVG for shape traces, attention masks, and anything that needs to be richer.
Discovery journey, not lecture. The reader should feel they derived the answer with us, not had it handed down.
Pick one mental model and stick with it. When metaphors compete, kill the weaker one.
Chinese versions = native voice, not translations. Tech terms stay English; the prose is rewritten for Chinese readers, not transliterated.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
archetypes		archetypes
assets/css/extended		assets/css/extended
content/posts		content/posts
layouts/partials		layouts/partials
themes		themes
.gitignore		.gitignore
.gitmodules		.gitmodules
CLAUDE.md		CLAUDE.md
README.md		README.md
hugo.toml		hugo.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Stories

Articles

Tech stack

Local preview

Adding a new article

Editing published articles

Style

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Stories

Articles

Tech stack

Local preview

Adding a new article

Editing published articles

Style

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages