🧠 Understanding Large Language Models (LLMs)

Ever wondered what's REALLY happening inside ChatGPT? Let's find out.

10 hands-on notebooks. Zero hand-waving. Pure understanding.

🤔 Why Does This Exist?

You've used LLMs. You've been amazed. You've also been confused, frustrated, or burned when they confidently gave you completely wrong answers.

What if you could understand exactly why that happens?

Most tutorials teach you how to call an API. This series teaches you how LLMs actually think — why they sometimes hallucinate facts, why changing one word in your prompt changes everything, and why your chatbot forgets what you said 5 messages ago.

By the end, you won't just be a user of LLMs. You'll understand them well enough to build serious things with them.

⚡ What You'll Be Able to Do After This

✅ Know exactly why your prompts work (or don't) — and fix them
✅ Understand why LLMs make stuff up and know how to catch and prevent it
✅ Build your own AI agent that uses tools (calculator, search, APIs)
✅ Create a RAG system that lets an LLM answer questions from your documents
✅ Measure and benchmark any LLM's real-world performance
✅ Talk confidently about chain-of-thought, embeddings, temperature, and more

📚 The 10 Notebooks — What's Inside Each?

1. 🔤 Tokenization in LLMs — "The AI doesn't read words. It reads... what exactly?"

Tokenization_In_LLMs.ipynb · Gemini 2.5 Flash

Before an LLM reads a single word, it chops your text into tokens — and those tokens aren't words. They're not characters either. So what are they? And why does typing "dog" vs " dog" (with a space) sometimes produce wildly different results?

You'll discover:

Why the same sentence in Hindi costs 3× more tokens than in English
How emojis, newlines, and punctuation mess with token counts
The trick to writing prompts that save you money (and it's simpler than you think)
Why repeating text in your prompt is actually more efficient, not less

2. 📊 Generation & Sampling — "Does the AI 'choose' its next word? Sort of... but not how you think."

Generation_&_Sampling_Behaviour.ipynb · Gemini 2.5 Flash

Every word an LLM outputs is drawn from a probability distribution over its entire vocabulary. That means the AI could technically say anything — so what makes it coherent? The answer is temperature. Turn it up and the AI goes wild. Turn it down and it plays it safe.

You'll discover:

Why temperature=0 makes an LLM a robot and temperature=2 makes it an artist
What log probabilities are and why they matter for building reliable AI apps
How to visualize the "decision" an LLM makes at each step
The real difference between input tokens, output tokens, and why that hits your wallet

3. 💬 Prompting Patterns — "The wording of your question changes everything. Here's the science."

Prompting_Patterns.ipynb · Gemini 2.5 Flash

This is where a lot of people give up and just say "prompt engineering is an art." It isn't. There are concrete, repeatable patterns that reliably improve LLM output quality — and we test every single one with side-by-side comparisons.

You'll discover:

How to get great results from zero examples (zero-shot) vs. a few examples (few-shot)
The wild experiment: two prompts, same task, completely different formatting — just from the examples you gave
How to turn the LLM into a domain expert by assigning it one sentence of identity ("You are a senior physician...")
Why "better prompts" often have fewer words, not more

4. 🧩 Reasoning Techniques — "How do you make an AI actually think instead of just pattern-match?"

Reasoning_Techniques_In_LLMs.ipynb · Gemini 2.5 Flash

Standard prompting asks the AI for an answer. Chain-of-Thought asks it to think out loud first. The difference in accuracy on complex problems is staggering. And then there are even more powerful techniques on top of that.

You'll discover:

Chain of Thought (CoT): Add 7 words to your prompt and watch an LLM solve multi-step math it previously got wrong
Self-consistency: Run the same problem 5 times, take a vote — why this is surprisingly powerful
Tree of Thoughts: The technique that lets AIs backtrack and explore alternatives like a chess player
Live experiments on math problems, code generation, and creative writing showing measurable accuracy gains

5. 🌀 Hallucinations — "Why does the AI confidently lie to you, and can you stop it?"

Hallucinations_in_LLMs.ipynb · Gemini 2.5 Flash

An LLM never says "I don't know." It says something plausible-sounding that may be completely fabricated. This is the most important thing to understand before deploying AI in any real context.

You'll discover:

The 5 root causes of hallucination (and why they're hard to fully eliminate)
The experiment: 10 factual questions, naive prompting → only 0% parsed as accurate. Add self-verification → 90% accurate. Same model. Different prompt.
How to ask an LLM to rate its own confidence — and whether that rating is actually reliable
Real strategies used in production systems to catch and prevent hallucinations

6. 🧠 Memory Systems — "Why does your chatbot forget everything after a few messages?"

Memory_Systems.ipynb · Gemini 2.5 Flash + sentence-transformers

Here's the dirty secret: LLMs have no actual memory. They see what's in their context window and nothing else. So how do apps like ChatGPT seem to remember you? This notebook reveals the tricks.

You'll discover:

The cold truth about LLM "memory" (it's all context window tricks)
How to build a simple vector database from scratch in pure Python
Why retrieving relevant memories using semantic search beats keyword search completely
The technique of memory compression — summarizing old conversations to avoid hitting token limits

7. 🔗 Embeddings & Semantic Space — "What if you could turn the meaning of a sentence into a point in space?"

Embeddings_And_Semantic_Space.ipynb · sentence-transformers (MiniLM)

The concept that unlocks almost all of modern AI: embeddings. A sentence gets turned into a list of ~384 numbers. And somehow, semantically similar sentences end up mathematically close to each other. This notebook makes that abstract idea very real.

You'll discover:

How to generate embeddings for any sentence in 3 lines of Python
Why "dog" and "puppy" are close in embedding space, but "dog" and "automobile" are far apart
Cosine similarity: the one formula that powers semantic search, RAG, clustering, and more
How to visualize thousands of sentences in 2D to see semantic clusters form before your eyes

8. 🔍 Retrieval Augmented Generation (RAG) — "How to give an LLM a brain transplant using your own documents."

Retrieval_Augmented_Generation.ipynb · Gemini 2.5 Flash + sentence-transformers

LLMs are trained up to a cutoff date and know nothing about your documents. RAG is the architecture that fixes both problems. It's the foundation behind every enterprise AI assistant you've ever used.

You'll discover:

The exact 4-step RAG pipeline: Chunk → Embed → Retrieve → Generate
Side-by-side comparison: What the LLM says without your documents vs. with them (the difference is shocking)
How to build a complete, working RAG system from scratch — no LangChain, no magic, just Python
The subtle art of chunking documents so retrieval actually works well

9. ⚡ ReAct & Tool Usage — "How to build an AI agent that can actually DO things in the world."

reAct_&_Tools_Usage_in_LLMs.ipynb · GLM-4.7-Flash via HuggingFace

LLMs confined to text generation are like a brilliant person locked in a room with no internet. Tool use opens the door. ReAct is the framework that lets an LLM reason about when to use a tool and how to interpret the result.

You'll discover:

The Thought → Action → Observation loop: how an LLM becomes an agent
Build a real calculator tool that the AI calls to get exact arithmetic (goodbye hallucinated math)
How tool schemas work: teaching a model what tools exist and how to call them
Watch an LLM correctly multiply two 15-digit numbers — by outsourcing it to your Python function

10. 📏 Evaluation & Benchmarking — "How do you actually know if your AI is getting better or worse?"

Evaluation_&_Benchmarking_for_LLMs.ipynb · HuggingFace models

This is the notebook professionals use. Vibes are not a metric. If you're building real AI applications, you need to measure them — systematically, reproducibly, and quantitatively.

You'll discover:

The 4 axes you must measure: Accuracy, Consistency, Latency, and Cost
BLEU and ROUGE scores: how to compute them and what they actually mean
Building a fully automated evaluation pipeline with expected-answer comparison
An interactive widget UI to run evaluations live inside Jupyter

🗺️ Your Learning Journey

Follow the notebooks in order — each one unlocks the next:

[1] Tokenization ──► [2] Generation ──► [3] Prompting ──► [4] Reasoning
        │                   │                 │                  │
        ▼                   ▼                 ▼                  ▼
[5] Hallucinations    [7] Embeddings    [6] Memory         [9] ReAct & Tools
        │                   │                 │
        └───────────────────┴─────────────────┘
                            │
                     [8] RAG System
                            │
                     [10] Evaluation

🎯 Your Goal	📖 Start Here
Just getting started with LLMs	Notebooks 1 → 3
Getting better outputs from AI	Notebooks 3 → 5
Building apps with memory & search	Notebooks 6 → 8
Building AI agents	Notebook 9 (after 1-3)
Measuring AI in production	Notebook 10

🛠️ Setup in 5 Minutes

What You Need

A Google Colab account (free) or Python 3.9+ locally
A free Google AI Studio API key (for Gemini notebooks)
A free HuggingFace account + token (for open-source model notebooks)

API Keys by Notebook

Notebooks	Key Needed
1–5	`GOOGLE_API_KEY`
7, 9, 10	`HF_TOKEN`
6, 8	Both

Google Colab Setup (Recommended — Zero Install)

1. Open notebook in Google Colab
2. Click the 🔑 Secrets panel on the left
3. Add GOOGLE_API_KEY and/or HF_TOKEN
4. Runtime → Run All

Local Setup

pip install google-generativeai huggingface_hub sentence-transformers

(Each notebook also has its own install cell at the top.)

💡 5 Ways to Get 10× More Out of These Notebooks

Change the temperature (Notebook 2) and run the same prompt 5 times each. The shift from 0.1 to 1.5 has to be experienced, not just read about.
Break the prompts on purpose. Remove examples from few-shot prompts. Watch quality collapse. Add them back. Watch it recover.
Feed your own documents to the RAG pipeline (Notebook 8). Use notes, PDFs you've converted to text, anything. Suddenly it becomes personal.
Add a new tool to the ReAct agent (Notebook 9). A weather tool, a dictionary lookup, anything. The framework is already there.
Don't skip the hallucination notebook (Notebook 5). It will permanently change how you read AI-generated content.

🚀 Where This Takes You

Finish all 10 notebooks and you'll have the foundation to:

Build production RAG pipelines with Pinecone, Weaviate, or ChromaDB
Design multi-agent systems where AIs coordinate with each other
Fine-tune open-source LLMs for your specific domain
Build and ship real AI-powered products with confidence

📁 Repository Structure

understanding-llms/
├── README.md                                ← You are here
├── Tokenization_In_LLMs.ipynb               ← Notebook 1
├── Generation_&_Sampling_Behaviour.ipynb    ← Notebook 2
├── Prompting_Patterns.ipynb                 ← Notebook 3
├── Reasoning_Techniques_In_LLMs.ipynb      ← Notebook 4
├── Hallucinations_in_LLMs.ipynb             ← Notebook 5
├── Memory_Systems.ipynb                     ← Notebook 6
├── Embeddings_And_Semantic_Space.ipynb      ← Notebook 7
├── Retrieval_Augmented_Generation.ipynb     ← Notebook 8
├── reAct_&_Tools_Usage_in_LLMs.ipynb       ← Notebook 9
└── Evaluation_&_Benchmarking_for_LLMs.ipynb ← Notebook 10

Built with ❤️ using Google Gemini 2.5 Flash and HuggingFace open-source models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Understanding Large Language Models (LLMs)

Ever wondered what's REALLY happening inside ChatGPT? Let's find out.

🤔 Why Does This Exist?

⚡ What You'll Be Able to Do After This

📚 The 10 Notebooks — What's Inside Each?

1. 🔤 Tokenization in LLMs — "The AI doesn't read words. It reads... what exactly?"

2. 📊 Generation & Sampling — "Does the AI 'choose' its next word? Sort of... but not how you think."

3. 💬 Prompting Patterns — "The wording of your question changes everything. Here's the science."

4. 🧩 Reasoning Techniques — "How do you make an AI actually think instead of just pattern-match?"

5. 🌀 Hallucinations — "Why does the AI confidently lie to you, and can you stop it?"

6. 🧠 Memory Systems — "Why does your chatbot forget everything after a few messages?"

7. 🔗 Embeddings & Semantic Space — "What if you could turn the meaning of a sentence into a point in space?"

8. 🔍 Retrieval Augmented Generation (RAG) — "How to give an LLM a brain transplant using your own documents."

9. ⚡ ReAct & Tool Usage — "How to build an AI agent that can actually DO things in the world."

10. 📏 Evaluation & Benchmarking — "How do you actually know if your AI is getting better or worse?"

🗺️ Your Learning Journey

🛠️ Setup in 5 Minutes

What You Need

API Keys by Notebook

Google Colab Setup (Recommended — Zero Install)

Local Setup

💡 5 Ways to Get 10× More Out of These Notebooks

🚀 Where This Takes You

📁 Repository Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Embeddings_And_Semantic_Space.ipynb		Embeddings_And_Semantic_Space.ipynb
Evaluation_&_Benchmarking_for_LLMs.ipynb		Evaluation_&_Benchmarking_for_LLMs.ipynb
Generation_&_Sampling_Behaviour.ipynb		Generation_&_Sampling_Behaviour.ipynb
Hallucinations_in_LLMs.ipynb		Hallucinations_in_LLMs.ipynb
Memory_Systems.ipynb		Memory_Systems.ipynb
Prompting_Patterns.ipynb		Prompting_Patterns.ipynb
README.md		README.md
Reasoning_Techniques_In_LLMs.ipynb		Reasoning_Techniques_In_LLMs.ipynb
Retrieval_Augmented_Generation.ipynb		Retrieval_Augmented_Generation.ipynb
Tokenization_In_LLMs.ipynb		Tokenization_In_LLMs.ipynb
reAct_&_Tools_Usage_in_LLMs.ipynb		reAct_&_Tools_Usage_in_LLMs.ipynb

Folders and files

Latest commit

History

Repository files navigation

🧠 Understanding Large Language Models (LLMs)

Ever wondered what's REALLY happening inside ChatGPT? Let's find out.

🤔 Why Does This Exist?

⚡ What You'll Be Able to Do After This

📚 The 10 Notebooks — What's Inside Each?

1. 🔤 Tokenization in LLMs — "The AI doesn't read words. It reads... what exactly?"

2. 📊 Generation & Sampling — "Does the AI 'choose' its next word? Sort of... but not how you think."

3. 💬 Prompting Patterns — "The wording of your question changes everything. Here's the science."

4. 🧩 Reasoning Techniques — "How do you make an AI actually think instead of just pattern-match?"

5. 🌀 Hallucinations — "Why does the AI confidently lie to you, and can you stop it?"

6. 🧠 Memory Systems — "Why does your chatbot forget everything after a few messages?"

7. 🔗 Embeddings & Semantic Space — "What if you could turn the meaning of a sentence into a point in space?"

8. 🔍 Retrieval Augmented Generation (RAG) — "How to give an LLM a brain transplant using your own documents."

9. ⚡ ReAct & Tool Usage — "How to build an AI agent that can actually DO things in the world."

10. 📏 Evaluation & Benchmarking — "How do you actually know if your AI is getting better or worse?"

🗺️ Your Learning Journey

🛠️ Setup in 5 Minutes

What You Need

API Keys by Notebook

Google Colab Setup (Recommended — Zero Install)

Local Setup

💡 5 Ways to Get 10× More Out of These Notebooks

🚀 Where This Takes You

📁 Repository Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages