Skip to content

Latest commit

 

History

History
260 lines (212 loc) · 7.46 KB

File metadata and controls

260 lines (212 loc) · 7.46 KB

Project Structure

llm_102/
│
├── 📄 README.md                    # Main documentation - START HERE
├── 📄 CONCEPTS.md                  # Deep dive into core concepts
├── 📄 LEARNING_OUTCOMES.md         # What you'll master
│
├── 🔧 Configuration Files
│   ├── .env.example                # Environment variables template
│   ├── .gitignore                  # Git ignore rules
│   ├── requirements.txt            # Python dependencies
│   └── requirements-dev.txt        # Development/testing dependencies
│
├── 🚀 Entry Points
│   ├── cli.py                      # Command-line interface (main entry)
│   ├── example_usage.py            # Programmatic usage examples
│   └── quickstart.sh               # Quick setup script
│
├── 🧠 Core System (src/)
│   ├── __init__.py                 # Package initialization
│   ├── schemas.py                  # Pydantic models for data extraction
│   │                                 • InvoiceData
│   │                                 • EmailData
│   │                                 • SupportTicketData
│   ├── extractor.py                # Core extraction engine
│   │                                 • LLMExtractor class
│   │                                 • Retry logic
│   │                                 • Validation feedback
│   └── logging_config.py           # Logging configuration
│
├── 📥 Sample Inputs (sample_inputs/)
│   ├── invoice_tech.txt            # Sample tech invoice
│   ├── email_project.txt           # Sample project email
│   ├── email_inquiry.txt           # Sample inquiry email
│   └── support_ticket_urgent.txt   # Sample urgent support ticket
│
├── 📤 Sample Outputs (sample_outputs/)
│   ├── invoice_success.json        # Example successful invoice extraction
│   └── email_success.json          # Example successful email extraction
│
├── 🧪 Tests (tests/)
│   ├── __init__.py
│   └── test_schemas.py             # Unit tests for Pydantic schemas
│
└── 📁 Generated Directories
    ├── logs/                       # Execution logs (auto-created)
    └── output/                     # CLI output files (auto-created)

File Descriptions

Core Documentation

  • README.md: Complete guide with quick start, examples, and explanations
  • CONCEPTS.md: Detailed explanations of function calling, guardrails, validation
  • LEARNING_OUTCOMES.md: Summary of skills and concepts mastered

Configuration

  • .env.example: Template for environment variables (copy to .env)
  • requirements.txt: Core Python dependencies (OpenAI, Pydantic, Rich, etc.)
  • requirements-dev.txt: Testing dependencies (pytest)

Entry Points

  • cli.py: Full-featured CLI for extraction and validation

    • extract: Extract data from text files
    • validate: Validate JSON against schemas
    • list-schemas: Show available extraction types
  • example_usage.py: Shows how to use the system programmatically

  • quickstart.sh: One-command setup and demo

Core System (src/)

  • schemas.py: Pydantic models defining data structures

    • Type annotations
    • Field validation rules
    • Custom validators
    • Nested models
  • extractor.py: The heart of the system

    • LLMExtractor class
    • Function calling implementation
    • Retry logic with error feedback
    • Validation orchestration
  • logging_config.py: Structured logging setup

    • Console and file handlers
    • Configurable log levels
    • Timestamp-based log files

Sample Data

  • sample_inputs/: Real-world examples to test extraction

    • Various invoice formats
    • Different email types
    • Support ticket scenarios
  • sample_outputs/: Examples of successful extractions

    • Shows expected output format
    • Useful for understanding structure

Tests

  • tests/: Unit tests for validation logic
    • Test valid inputs
    • Test edge cases
    • Test validation failures
    • Run with pytest

Key Design Principles

1. Separation of Concerns

  • Schemas define structure (what)
  • Extractor handles logic (how)
  • CLI handles interaction (interface)

2. Extensibility

  • Add new schemas → just extend schemas.py
  • Add new validation → Pydantic validators
  • Add new features → modify extractor

3. Observability

  • All logs saved to files
  • Attempt tracking in results
  • Clear error messages

4. Type Safety

  • Pydantic everywhere
  • No loose dictionaries
  • Compiler-assisted development

5. Production-Ready

  • Environment-based config
  • Graceful error handling
  • Comprehensive logging
  • Easy testing

Data Flow

1. Input File
   ↓
2. CLI reads text
   ↓
3. LLMExtractor.extract()
   ├─ Build prompt with schema
   ├─ Call OpenAI API (function calling)
   ├─ Parse function response
   └─ Validate with Pydantic
      ├─ ✓ Success → Return result
      └─ ✗ Failure → Retry with feedback
         └─ (repeat up to max_retries)
   ↓
4. CLI displays result
   ↓
5. (Optional) Save to output file

How to Navigate This Project

If you're new:

  1. Read README.md - Quick start and overview
  2. Read CONCEPTS.md - Understand why
  3. Run ./quickstart.sh - See it work
  4. Explore schemas.py - See how schemas are defined
  5. Read extractor.py - See how extraction works

If you're extending:

  1. Add schema to schemas.py
  2. Register in EXTRACTION_SCHEMAS
  3. Create sample input in sample_inputs/
  4. Test with CLI
  5. Add tests to tests/

If you're debugging:

  1. Run with --verbose flag
  2. Check logs/ directory
  3. Use validate command to test schemas
  4. Add print statements in extractor
  5. Review attempt history in results

Module Dependencies

cli.py
  ├─→ src/__init__.py
  │     ├─→ schemas.py
  │     ├─→ extractor.py
  │     │     └─→ schemas.py
  │     └─→ logging_config.py
  │
  ├─→ typer (CLI framework)
  ├─→ rich (terminal formatting)
  └─→ python-dotenv (environment)

extractor.py
  ├─→ openai (API client)
  └─→ pydantic (validation)

schemas.py
  └─→ pydantic (models)

Quick Commands Reference

# Setup
./quickstart.sh                      # One-command setup

# Basic extraction
python cli.py extract -i <file> -t <type>

# With output
python cli.py extract -i <file> -t <type> -o result.json

# Verbose mode
python cli.py extract -i <file> -t <type> --verbose

# List available schemas
python cli.py list-schemas

# Validate JSON
python cli.py validate -s <type> -f <file>

# Run tests
pytest tests/ -v

# Run examples
python example_usage.py

Environment Setup Checklist

  • Python 3.10+ installed
  • Virtual environment created (python -m venv venv)
  • Virtual environment activated (source venv/bin/activate)
  • Dependencies installed (pip install -r requirements.txt)
  • .env file created from .env.example
  • OpenAI API key added to .env
  • Test run successful (./quickstart.sh)

This structure is designed for:

  • ✅ Easy learning (clear separation, good docs)
  • ✅ Easy extension (add schemas without touching core)
  • ✅ Easy debugging (comprehensive logs, clear errors)
  • ✅ Production use (type safety, error handling, config management)