Skip to content

Dash10107/deepdive-llms-notebooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Understanding Large Language Models (LLMs)

Ever wondered what's REALLY happening inside ChatGPT? Let's find out.

10 hands-on notebooks. Zero hand-waving. Pure understanding.


🤔 Why Does This Exist?

You've used LLMs. You've been amazed. You've also been confused, frustrated, or burned when they confidently gave you completely wrong answers.

What if you could understand exactly why that happens?

Most tutorials teach you how to call an API. This series teaches you how LLMs actually think — why they sometimes hallucinate facts, why changing one word in your prompt changes everything, and why your chatbot forgets what you said 5 messages ago.

By the end, you won't just be a user of LLMs. You'll understand them well enough to build serious things with them.


⚡ What You'll Be Able to Do After This

  • ✅ Know exactly why your prompts work (or don't) — and fix them
  • ✅ Understand why LLMs make stuff up and know how to catch and prevent it
  • ✅ Build your own AI agent that uses tools (calculator, search, APIs)
  • ✅ Create a RAG system that lets an LLM answer questions from your documents
  • ✅ Measure and benchmark any LLM's real-world performance
  • ✅ Talk confidently about chain-of-thought, embeddings, temperature, and more

📚 The 10 Notebooks — What's Inside Each?

1. 🔤 Tokenization in LLMs — "The AI doesn't read words. It reads... what exactly?"

Tokenization_In_LLMs.ipynb · Gemini 2.5 Flash

Before an LLM reads a single word, it chops your text into tokens — and those tokens aren't words. They're not characters either. So what are they? And why does typing "dog" vs " dog" (with a space) sometimes produce wildly different results?

You'll discover:

  • Why the same sentence in Hindi costs 3× more tokens than in English
  • How emojis, newlines, and punctuation mess with token counts
  • The trick to writing prompts that save you money (and it's simpler than you think)
  • Why repeating text in your prompt is actually more efficient, not less

2. 📊 Generation & Sampling — "Does the AI 'choose' its next word? Sort of... but not how you think."

Generation_&_Sampling_Behaviour.ipynb · Gemini 2.5 Flash

Every word an LLM outputs is drawn from a probability distribution over its entire vocabulary. That means the AI could technically say anything — so what makes it coherent? The answer is temperature. Turn it up and the AI goes wild. Turn it down and it plays it safe.

You'll discover:

  • Why temperature=0 makes an LLM a robot and temperature=2 makes it an artist
  • What log probabilities are and why they matter for building reliable AI apps
  • How to visualize the "decision" an LLM makes at each step
  • The real difference between input tokens, output tokens, and why that hits your wallet

3. 💬 Prompting Patterns — "The wording of your question changes everything. Here's the science."

Prompting_Patterns.ipynb · Gemini 2.5 Flash

This is where a lot of people give up and just say "prompt engineering is an art." It isn't. There are concrete, repeatable patterns that reliably improve LLM output quality — and we test every single one with side-by-side comparisons.

You'll discover:

  • How to get great results from zero examples (zero-shot) vs. a few examples (few-shot)
  • The wild experiment: two prompts, same task, completely different formatting — just from the examples you gave
  • How to turn the LLM into a domain expert by assigning it one sentence of identity ("You are a senior physician...")
  • Why "better prompts" often have fewer words, not more

4. 🧩 Reasoning Techniques — "How do you make an AI actually think instead of just pattern-match?"

Reasoning_Techniques_In_LLMs.ipynb · Gemini 2.5 Flash

Standard prompting asks the AI for an answer. Chain-of-Thought asks it to think out loud first. The difference in accuracy on complex problems is staggering. And then there are even more powerful techniques on top of that.

You'll discover:

  • Chain of Thought (CoT): Add 7 words to your prompt and watch an LLM solve multi-step math it previously got wrong
  • Self-consistency: Run the same problem 5 times, take a vote — why this is surprisingly powerful
  • Tree of Thoughts: The technique that lets AIs backtrack and explore alternatives like a chess player
  • Live experiments on math problems, code generation, and creative writing showing measurable accuracy gains

5. 🌀 Hallucinations — "Why does the AI confidently lie to you, and can you stop it?"

Hallucinations_in_LLMs.ipynb · Gemini 2.5 Flash

An LLM never says "I don't know." It says something plausible-sounding that may be completely fabricated. This is the most important thing to understand before deploying AI in any real context.

You'll discover:

  • The 5 root causes of hallucination (and why they're hard to fully eliminate)
  • The experiment: 10 factual questions, naive prompting → only 0% parsed as accurate. Add self-verification → 90% accurate. Same model. Different prompt.
  • How to ask an LLM to rate its own confidence — and whether that rating is actually reliable
  • Real strategies used in production systems to catch and prevent hallucinations

6. 🧠 Memory Systems — "Why does your chatbot forget everything after a few messages?"

Memory_Systems.ipynb · Gemini 2.5 Flash + sentence-transformers

Here's the dirty secret: LLMs have no actual memory. They see what's in their context window and nothing else. So how do apps like ChatGPT seem to remember you? This notebook reveals the tricks.

You'll discover:

  • The cold truth about LLM "memory" (it's all context window tricks)
  • How to build a simple vector database from scratch in pure Python
  • Why retrieving relevant memories using semantic search beats keyword search completely
  • The technique of memory compression — summarizing old conversations to avoid hitting token limits

7. 🔗 Embeddings & Semantic Space — "What if you could turn the meaning of a sentence into a point in space?"

Embeddings_And_Semantic_Space.ipynb · sentence-transformers (MiniLM)

The concept that unlocks almost all of modern AI: embeddings. A sentence gets turned into a list of ~384 numbers. And somehow, semantically similar sentences end up mathematically close to each other. This notebook makes that abstract idea very real.

You'll discover:

  • How to generate embeddings for any sentence in 3 lines of Python
  • Why "dog" and "puppy" are close in embedding space, but "dog" and "automobile" are far apart
  • Cosine similarity: the one formula that powers semantic search, RAG, clustering, and more
  • How to visualize thousands of sentences in 2D to see semantic clusters form before your eyes

8. 🔍 Retrieval Augmented Generation (RAG) — "How to give an LLM a brain transplant using your own documents."

Retrieval_Augmented_Generation.ipynb · Gemini 2.5 Flash + sentence-transformers

LLMs are trained up to a cutoff date and know nothing about your documents. RAG is the architecture that fixes both problems. It's the foundation behind every enterprise AI assistant you've ever used.

You'll discover:

  • The exact 4-step RAG pipeline: Chunk → Embed → Retrieve → Generate
  • Side-by-side comparison: What the LLM says without your documents vs. with them (the difference is shocking)
  • How to build a complete, working RAG system from scratch — no LangChain, no magic, just Python
  • The subtle art of chunking documents so retrieval actually works well

9. ⚡ ReAct & Tool Usage — "How to build an AI agent that can actually DO things in the world."

reAct_&_Tools_Usage_in_LLMs.ipynb · GLM-4.7-Flash via HuggingFace

LLMs confined to text generation are like a brilliant person locked in a room with no internet. Tool use opens the door. ReAct is the framework that lets an LLM reason about when to use a tool and how to interpret the result.

You'll discover:

  • The Thought → Action → Observation loop: how an LLM becomes an agent
  • Build a real calculator tool that the AI calls to get exact arithmetic (goodbye hallucinated math)
  • How tool schemas work: teaching a model what tools exist and how to call them
  • Watch an LLM correctly multiply two 15-digit numbers — by outsourcing it to your Python function

10. 📏 Evaluation & Benchmarking — "How do you actually know if your AI is getting better or worse?"

Evaluation_&_Benchmarking_for_LLMs.ipynb · HuggingFace models

This is the notebook professionals use. Vibes are not a metric. If you're building real AI applications, you need to measure them — systematically, reproducibly, and quantitatively.

You'll discover:

  • The 4 axes you must measure: Accuracy, Consistency, Latency, and Cost
  • BLEU and ROUGE scores: how to compute them and what they actually mean
  • Building a fully automated evaluation pipeline with expected-answer comparison
  • An interactive widget UI to run evaluations live inside Jupyter

🗺️ Your Learning Journey

Follow the notebooks in order — each one unlocks the next:

[1] Tokenization ──► [2] Generation ──► [3] Prompting ──► [4] Reasoning
        │                   │                 │                  │
        ▼                   ▼                 ▼                  ▼
[5] Hallucinations    [7] Embeddings    [6] Memory         [9] ReAct & Tools
        │                   │                 │
        └───────────────────┴─────────────────┘
                            │
                     [8] RAG System
                            │
                     [10] Evaluation
🎯 Your Goal 📖 Start Here
Just getting started with LLMs Notebooks 1 → 3
Getting better outputs from AI Notebooks 3 → 5
Building apps with memory & search Notebooks 6 → 8
Building AI agents Notebook 9 (after 1-3)
Measuring AI in production Notebook 10

🛠️ Setup in 5 Minutes

What You Need

  • A Google Colab account (free) or Python 3.9+ locally
  • A free Google AI Studio API key (for Gemini notebooks)
  • A free HuggingFace account + token (for open-source model notebooks)

API Keys by Notebook

Notebooks Key Needed
1–5 GOOGLE_API_KEY
7, 9, 10 HF_TOKEN
6, 8 Both

Google Colab Setup (Recommended — Zero Install)

1. Open notebook in Google Colab
2. Click the 🔑 Secrets panel on the left
3. Add GOOGLE_API_KEY and/or HF_TOKEN
4. Runtime → Run All

Local Setup

pip install google-generativeai huggingface_hub sentence-transformers

(Each notebook also has its own install cell at the top.)


💡 5 Ways to Get 10× More Out of These Notebooks

  1. Change the temperature (Notebook 2) and run the same prompt 5 times each. The shift from 0.1 to 1.5 has to be experienced, not just read about.
  2. Break the prompts on purpose. Remove examples from few-shot prompts. Watch quality collapse. Add them back. Watch it recover.
  3. Feed your own documents to the RAG pipeline (Notebook 8). Use notes, PDFs you've converted to text, anything. Suddenly it becomes personal.
  4. Add a new tool to the ReAct agent (Notebook 9). A weather tool, a dictionary lookup, anything. The framework is already there.
  5. Don't skip the hallucination notebook (Notebook 5). It will permanently change how you read AI-generated content.

🚀 Where This Takes You

Finish all 10 notebooks and you'll have the foundation to:

  • Build production RAG pipelines with Pinecone, Weaviate, or ChromaDB
  • Design multi-agent systems where AIs coordinate with each other
  • Fine-tune open-source LLMs for your specific domain
  • Build and ship real AI-powered products with confidence

📁 Repository Structure

understanding-llms/
├── README.md                                ← You are here
├── Tokenization_In_LLMs.ipynb               ← Notebook 1
├── Generation_&_Sampling_Behaviour.ipynb    ← Notebook 2
├── Prompting_Patterns.ipynb                 ← Notebook 3
├── Reasoning_Techniques_In_LLMs.ipynb      ← Notebook 4
├── Hallucinations_in_LLMs.ipynb             ← Notebook 5
├── Memory_Systems.ipynb                     ← Notebook 6
├── Embeddings_And_Semantic_Space.ipynb      ← Notebook 7
├── Retrieval_Augmented_Generation.ipynb     ← Notebook 8
├── reAct_&_Tools_Usage_in_LLMs.ipynb       ← Notebook 9
└── Evaluation_&_Benchmarking_for_LLMs.ipynb ← Notebook 10

Built with ❤️ using Google Gemini 2.5 Flash and HuggingFace open-source models.

About

Curious how LLMs actually work? This is 10 hands-on notebooks that go from tokenization and generation to RAG, hallucinations, and building AI agents — all through experimentation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors