10 hands-on notebooks. Zero hand-waving. Pure understanding.
You've used LLMs. You've been amazed. You've also been confused, frustrated, or burned when they confidently gave you completely wrong answers.
What if you could understand exactly why that happens?
Most tutorials teach you how to call an API. This series teaches you how LLMs actually think — why they sometimes hallucinate facts, why changing one word in your prompt changes everything, and why your chatbot forgets what you said 5 messages ago.
By the end, you won't just be a user of LLMs. You'll understand them well enough to build serious things with them.
- ✅ Know exactly why your prompts work (or don't) — and fix them
- ✅ Understand why LLMs make stuff up and know how to catch and prevent it
- ✅ Build your own AI agent that uses tools (calculator, search, APIs)
- ✅ Create a RAG system that lets an LLM answer questions from your documents
- ✅ Measure and benchmark any LLM's real-world performance
- ✅ Talk confidently about chain-of-thought, embeddings, temperature, and more
Tokenization_In_LLMs.ipynb · Gemini 2.5 Flash
Before an LLM reads a single word, it chops your text into tokens — and those tokens aren't words. They're not characters either. So what are they? And why does typing "dog" vs " dog" (with a space) sometimes produce wildly different results?
You'll discover:
- Why the same sentence in Hindi costs 3× more tokens than in English
- How emojis, newlines, and punctuation mess with token counts
- The trick to writing prompts that save you money (and it's simpler than you think)
- Why repeating text in your prompt is actually more efficient, not less
2. 📊 Generation & Sampling — "Does the AI 'choose' its next word? Sort of... but not how you think."
Generation_&_Sampling_Behaviour.ipynb · Gemini 2.5 Flash
Every word an LLM outputs is drawn from a probability distribution over its entire vocabulary. That means the AI could technically say anything — so what makes it coherent? The answer is temperature. Turn it up and the AI goes wild. Turn it down and it plays it safe.
You'll discover:
- Why
temperature=0makes an LLM a robot andtemperature=2makes it an artist - What log probabilities are and why they matter for building reliable AI apps
- How to visualize the "decision" an LLM makes at each step
- The real difference between input tokens, output tokens, and why that hits your wallet
Prompting_Patterns.ipynb · Gemini 2.5 Flash
This is where a lot of people give up and just say "prompt engineering is an art." It isn't. There are concrete, repeatable patterns that reliably improve LLM output quality — and we test every single one with side-by-side comparisons.
You'll discover:
- How to get great results from zero examples (zero-shot) vs. a few examples (few-shot)
- The wild experiment: two prompts, same task, completely different formatting — just from the examples you gave
- How to turn the LLM into a domain expert by assigning it one sentence of identity ("You are a senior physician...")
- Why "better prompts" often have fewer words, not more
Reasoning_Techniques_In_LLMs.ipynb · Gemini 2.5 Flash
Standard prompting asks the AI for an answer. Chain-of-Thought asks it to think out loud first. The difference in accuracy on complex problems is staggering. And then there are even more powerful techniques on top of that.
You'll discover:
- Chain of Thought (CoT): Add 7 words to your prompt and watch an LLM solve multi-step math it previously got wrong
- Self-consistency: Run the same problem 5 times, take a vote — why this is surprisingly powerful
- Tree of Thoughts: The technique that lets AIs backtrack and explore alternatives like a chess player
- Live experiments on math problems, code generation, and creative writing showing measurable accuracy gains
Hallucinations_in_LLMs.ipynb · Gemini 2.5 Flash
An LLM never says "I don't know." It says something plausible-sounding that may be completely fabricated. This is the most important thing to understand before deploying AI in any real context.
You'll discover:
- The 5 root causes of hallucination (and why they're hard to fully eliminate)
- The experiment: 10 factual questions, naive prompting → only 0% parsed as accurate. Add self-verification → 90% accurate. Same model. Different prompt.
- How to ask an LLM to rate its own confidence — and whether that rating is actually reliable
- Real strategies used in production systems to catch and prevent hallucinations
Memory_Systems.ipynb · Gemini 2.5 Flash + sentence-transformers
Here's the dirty secret: LLMs have no actual memory. They see what's in their context window and nothing else. So how do apps like ChatGPT seem to remember you? This notebook reveals the tricks.
You'll discover:
- The cold truth about LLM "memory" (it's all context window tricks)
- How to build a simple vector database from scratch in pure Python
- Why retrieving relevant memories using semantic search beats keyword search completely
- The technique of memory compression — summarizing old conversations to avoid hitting token limits
7. 🔗 Embeddings & Semantic Space — "What if you could turn the meaning of a sentence into a point in space?"
Embeddings_And_Semantic_Space.ipynb · sentence-transformers (MiniLM)
The concept that unlocks almost all of modern AI: embeddings. A sentence gets turned into a list of ~384 numbers. And somehow, semantically similar sentences end up mathematically close to each other. This notebook makes that abstract idea very real.
You'll discover:
- How to generate embeddings for any sentence in 3 lines of Python
- Why "dog" and "puppy" are close in embedding space, but "dog" and "automobile" are far apart
- Cosine similarity: the one formula that powers semantic search, RAG, clustering, and more
- How to visualize thousands of sentences in 2D to see semantic clusters form before your eyes
8. 🔍 Retrieval Augmented Generation (RAG) — "How to give an LLM a brain transplant using your own documents."
Retrieval_Augmented_Generation.ipynb · Gemini 2.5 Flash + sentence-transformers
LLMs are trained up to a cutoff date and know nothing about your documents. RAG is the architecture that fixes both problems. It's the foundation behind every enterprise AI assistant you've ever used.
You'll discover:
- The exact 4-step RAG pipeline: Chunk → Embed → Retrieve → Generate
- Side-by-side comparison: What the LLM says without your documents vs. with them (the difference is shocking)
- How to build a complete, working RAG system from scratch — no LangChain, no magic, just Python
- The subtle art of chunking documents so retrieval actually works well
reAct_&_Tools_Usage_in_LLMs.ipynb · GLM-4.7-Flash via HuggingFace
LLMs confined to text generation are like a brilliant person locked in a room with no internet. Tool use opens the door. ReAct is the framework that lets an LLM reason about when to use a tool and how to interpret the result.
You'll discover:
- The Thought → Action → Observation loop: how an LLM becomes an agent
- Build a real calculator tool that the AI calls to get exact arithmetic (goodbye hallucinated math)
- How tool schemas work: teaching a model what tools exist and how to call them
- Watch an LLM correctly multiply two 15-digit numbers — by outsourcing it to your Python function
Evaluation_&_Benchmarking_for_LLMs.ipynb · HuggingFace models
This is the notebook professionals use. Vibes are not a metric. If you're building real AI applications, you need to measure them — systematically, reproducibly, and quantitatively.
You'll discover:
- The 4 axes you must measure: Accuracy, Consistency, Latency, and Cost
- BLEU and ROUGE scores: how to compute them and what they actually mean
- Building a fully automated evaluation pipeline with expected-answer comparison
- An interactive widget UI to run evaluations live inside Jupyter
Follow the notebooks in order — each one unlocks the next:
[1] Tokenization ──► [2] Generation ──► [3] Prompting ──► [4] Reasoning
│ │ │ │
▼ ▼ ▼ ▼
[5] Hallucinations [7] Embeddings [6] Memory [9] ReAct & Tools
│ │ │
└───────────────────┴─────────────────┘
│
[8] RAG System
│
[10] Evaluation
| 🎯 Your Goal | 📖 Start Here |
|---|---|
| Just getting started with LLMs | Notebooks 1 → 3 |
| Getting better outputs from AI | Notebooks 3 → 5 |
| Building apps with memory & search | Notebooks 6 → 8 |
| Building AI agents | Notebook 9 (after 1-3) |
| Measuring AI in production | Notebook 10 |
- A Google Colab account (free) or Python 3.9+ locally
- A free Google AI Studio API key (for Gemini notebooks)
- A free HuggingFace account + token (for open-source model notebooks)
| Notebooks | Key Needed |
|---|---|
| 1–5 | GOOGLE_API_KEY |
| 7, 9, 10 | HF_TOKEN |
| 6, 8 | Both |
1. Open notebook in Google Colab
2. Click the 🔑 Secrets panel on the left
3. Add GOOGLE_API_KEY and/or HF_TOKEN
4. Runtime → Run All
pip install google-generativeai huggingface_hub sentence-transformers(Each notebook also has its own install cell at the top.)
- Change the temperature (Notebook 2) and run the same prompt 5 times each. The shift from
0.1to1.5has to be experienced, not just read about. - Break the prompts on purpose. Remove examples from few-shot prompts. Watch quality collapse. Add them back. Watch it recover.
- Feed your own documents to the RAG pipeline (Notebook 8). Use notes, PDFs you've converted to text, anything. Suddenly it becomes personal.
- Add a new tool to the ReAct agent (Notebook 9). A weather tool, a dictionary lookup, anything. The framework is already there.
- Don't skip the hallucination notebook (Notebook 5). It will permanently change how you read AI-generated content.
Finish all 10 notebooks and you'll have the foundation to:
- Build production RAG pipelines with Pinecone, Weaviate, or ChromaDB
- Design multi-agent systems where AIs coordinate with each other
- Fine-tune open-source LLMs for your specific domain
- Build and ship real AI-powered products with confidence
understanding-llms/
├── README.md ← You are here
├── Tokenization_In_LLMs.ipynb ← Notebook 1
├── Generation_&_Sampling_Behaviour.ipynb ← Notebook 2
├── Prompting_Patterns.ipynb ← Notebook 3
├── Reasoning_Techniques_In_LLMs.ipynb ← Notebook 4
├── Hallucinations_in_LLMs.ipynb ← Notebook 5
├── Memory_Systems.ipynb ← Notebook 6
├── Embeddings_And_Semantic_Space.ipynb ← Notebook 7
├── Retrieval_Augmented_Generation.ipynb ← Notebook 8
├── reAct_&_Tools_Usage_in_LLMs.ipynb ← Notebook 9
└── Evaluation_&_Benchmarking_for_LLMs.ipynb ← Notebook 10
Built with ❤️ using Google Gemini 2.5 Flash and HuggingFace open-source models.