Getting Started Guide

Welcome to Week 3 of the LLM learning track! This guide will get you up and running in 5 minutes.

🆓 This project uses FREE local models via Ollama - no API costs!

🎯 What You're Building

A reliable LLM system that extracts structured data from messy text:

"Invoice #123, total $456.78, due March 15th"
                    ↓
{
  "invoice_number": "123",
  "total_amount": 456.78,
  "due_date": "2025-03-15"
}

With validation, retries, and 99%+ reliability.

⚡ Quick Start (3 Steps)

Step 1: Setup (2 minutes)

Install Ollama first:

# macOS/Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Or macOS with Homebrew
brew install ollama

# Start Ollama service
ollama serve

# Pull the model (in another terminal)
ollama pull llama3.2

Then setup the project:

# Run the quick start script
./quickstart.sh

This will:

✅ Check Ollama installation
✅ Download llama3.2 model if needed
✅ Create virtual environment
✅ Install dependencies
✅ Run a demo extraction

OR manually:

# Install and start Ollama
ollama serve  # Keep running in one terminal
ollama pull llama3.2  # In another terminal

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Setup environment (optional - has defaults)
cp .env.example .env

Step 2: Run Your First Extraction (1 minute)

# Extract from a sample invoice
python cli.py extract \
  --input sample_inputs/invoice_tech.txt \
  --type invoice

You should see:

✓ Extraction succeeded after 1 attempt!

┌─ Extracted Data ───────────────────┐
│ {
│   "invoice_number": "INV-2025-0342",
│   "total_amount": 8470.43,
│   "vendor_name": "TechSupply Solutions Inc.",
│   ...
│ }
└────────────────────────────────────┘

Step 3: Try More Examples (2 minutes)

# Extract from an email
python cli.py extract \
  --input sample_inputs/email_project.txt \
  --type email

# Extract from a support ticket
python cli.py extract \
  --input sample_inputs/support_ticket_urgent.txt \
  --type support_ticket

# See all available schema types
python cli.py list-schemas

📚 What to Read Next

1. Understand the Concepts (15 minutes)

Read CONCEPTS.md to understand:

Why function calling matters
How guardrails work
What makes output reliable

2. Explore the Code (20 minutes)

Start with schemas:

# Open the schema definitions
code src/schemas.py

Look for:

InvoiceData - see how fields are defined
Field() validators - see validation rules
@field_validator - see custom validation

Then explore the engine:

# Open the extraction engine
code src/extractor.py

Look for:

extract() method - main entry point
_call_llm() - how we use function calling
Retry logic in the main loop
_build_validation_feedback() - error recovery

3. Run with Verbose Logging (10 minutes)

# See everything that happens
python cli.py extract \
  --input sample_inputs/invoice_tech.txt \
  --type invoice \
  --verbose

Then check the logs:

ls -lt logs/ | head -2
cat logs/extraction_*.log

🎮 Interactive Learning Exercises

Exercise 1: Break It (5 minutes)

Create a file with incomplete data:

echo "Invoice #123, total $50" > test_invoice.txt

python cli.py extract \
  --input test_invoice.txt \
  --type invoice \
  --verbose

Questions:

What validation errors occur?
Does it retry? How many times?
What's the final error message?

Exercise 2: Test Validation (5 minutes)

Create invalid JSON:

echo '{"invoice_number": 123}' > bad.json

python cli.py validate \
  --schema invoice \
  --file bad.json

Questions:

What error does Pydantic report?
Why is 123 invalid for invoice_number?
What would be valid?

Exercise 3: Add a Field (15 minutes)

Open src/schemas.py and add a new optional field to InvoiceData:

payment_method: Optional[str] = Field(
    None,
    description="Payment method (credit card, check, etc.)"
)

Save and run:

python cli.py extract \
  --input sample_inputs/invoice_tech.txt \
  --type invoice

Does it extract the new field?

Exercise 4: Create Your Own Schema (30 minutes)

Add a new extraction type for receipts:

Add to src/schemas.py:

class ReceiptData(BaseModel):
    store_name: str
    purchase_date: str = Field(..., pattern=r"^\d{4}-\d{2}-\d{2}$")
    total: float = Field(..., gt=0)
    items: List[str]
    payment_method: Optional[str] = None

Register it:

EXTRACTION_SCHEMAS["receipt"] = {
    "model": ReceiptData,
    "description": "Extract data from receipts",
    "name": "extract_receipt_data"
}

Create a sample receipt in sample_inputs/receipt.txt
Test it:

python cli.py extract \
  --input sample_inputs/receipt.txt \
  --type receipt

🐛 Troubleshooting

"Ollama not found" or "command not found: ollama"

Install Ollama from https://ollama.ai
On macOS: brew install ollama
Verify: ollama --version

"Connection refused" to localhost:11434

Start Ollama: ollama serve
Check if running: curl http://localhost:11434/api/tags
Make sure no firewall is blocking port 11434

"Model not found"

Pull the model: ollama pull llama3.2
List installed models: ollama list
Try a different model: ollama pull mistral

"Validation failed after 3 attempts"

Check the input text - is it really an invoice/email/ticket?
Look at the validation errors in the output
Check logs for detailed error messages
Try adjusting the schema if it's too strict

"Module not found"

Make sure you activated the virtual environment
Run pip install -r requirements.txt again

💡 Tips for Learning

1. Use Verbose Mode

Always run with --verbose when learning:

python cli.py extract -i <file> -t <type> --verbose

2. Read the Logs

Logs show everything:

# Find latest log
ls -lt logs/ | head -2

# View it
cat logs/extraction_*.log

3. Test Edge Cases

Try inputs that should fail:

Missing fields
Wrong formats
Ambiguous data
Empty files

4. Experiment with Temperature

# More deterministic
python cli.py extract -i <file> -t <type> --temperature 0.0

# More creative
python cli.py extract -i <file> -t <type> --temperature 0.7

5. Use the Validation Command

Test schemas without LLM calls:

python cli.py validate -s invoice -f data.json

📖 Documentation Index

README.md - Complete documentation
CONCEPTS.md - Why these patterns matter
LEARNING_OUTCOMES.md - What you'll master
PROJECT_STRUCTURE.md - Code organization
example_usage.py - Programmatic examples

🎯 Learning Path

1. Quick Start (5 min)
   └─→ Get it running

2. Concepts (15 min)
   └─→ Understand why

3. Code Exploration (30 min)
   ├─→ Read schemas.py
   ├─→ Read extractor.py
   └─→ Run with --verbose

4. Hands-On (60 min)
   ├─→ Exercise 1: Break it
   ├─→ Exercise 2: Test validation
   ├─→ Exercise 3: Add a field
   └─→ Exercise 4: Create new schema

5. Deep Dive (60+ min)
   ├─→ Modify retry logic
   ├─→ Add custom validators
   ├─→ Integrate with your app
   └─→ Deploy to production

✅ Success Checklist

After completing this project, you should be able to:

🚀 Next Steps

Once you're comfortable with this project:

Extend it
- Add new schemas (contracts, resumes, catalogs)
- Add async support for batch processing
- Build a web UI with Streamlit
Apply it
- Use it in your own projects
- Process real documents
- Build a data pipeline
Continue learning
- Week 4: RAG & Knowledge Integration
- Week 5: Agents & Complex Workflows
- Week 6: Production Deployment

🤝 Need Help?

Check logs in logs/
Review error messages carefully
Read the relevant documentation section
Try the troubleshooting guide above
Experiment with simpler inputs first

Remember: The goal isn't just to make it work—it's to understand why it works and when to use these patterns.

Take your time, experiment, break things, and learn! 🎓

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Started Guide

🎯 What You're Building

⚡ Quick Start (3 Steps)

Step 1: Setup (2 minutes)

Step 2: Run Your First Extraction (1 minute)

Step 3: Try More Examples (2 minutes)

📚 What to Read Next

1. Understand the Concepts (15 minutes)

2. Explore the Code (20 minutes)

3. Run with Verbose Logging (10 minutes)

🎮 Interactive Learning Exercises

Exercise 1: Break It (5 minutes)

Exercise 2: Test Validation (5 minutes)

Exercise 3: Add a Field (15 minutes)

Exercise 4: Create Your Own Schema (30 minutes)

🐛 Troubleshooting

"Ollama not found" or "command not found: ollama"

"Connection refused" to localhost:11434

"Model not found"

"Validation failed after 3 attempts"

"Module not found"

💡 Tips for Learning

1. Use Verbose Mode

2. Read the Logs

3. Test Edge Cases

4. Experiment with Temperature

5. Use the Validation Command

📖 Documentation Index

🎯 Learning Path

✅ Success Checklist

🚀 Next Steps

🤝 Need Help?

FilesExpand file tree

GETTING_STARTED.md

Latest commit

History

GETTING_STARTED.md

File metadata and controls

Getting Started Guide

🎯 What You're Building

⚡ Quick Start (3 Steps)

Step 1: Setup (2 minutes)

Step 2: Run Your First Extraction (1 minute)

Step 3: Try More Examples (2 minutes)

📚 What to Read Next

1. Understand the Concepts (15 minutes)

2. Explore the Code (20 minutes)

3. Run with Verbose Logging (10 minutes)

🎮 Interactive Learning Exercises

Exercise 1: Break It (5 minutes)

Exercise 2: Test Validation (5 minutes)

Exercise 3: Add a Field (15 minutes)

Exercise 4: Create Your Own Schema (30 minutes)

🐛 Troubleshooting

"Ollama not found" or "command not found: ollama"

"Connection refused" to localhost:11434

"Model not found"

"Validation failed after 3 attempts"

"Module not found"

💡 Tips for Learning

1. Use Verbose Mode

2. Read the Logs

3. Test Edge Cases

4. Experiment with Temperature

5. Use the Validation Command

📖 Documentation Index

🎯 Learning Path

✅ Success Checklist

🚀 Next Steps

🤝 Need Help?