Project Structure

llm_102/
│
├── 📄 README.md                    # Main documentation - START HERE
├── 📄 CONCEPTS.md                  # Deep dive into core concepts
├── 📄 LEARNING_OUTCOMES.md         # What you'll master
│
├── 🔧 Configuration Files
│   ├── .env.example                # Environment variables template
│   ├── .gitignore                  # Git ignore rules
│   ├── requirements.txt            # Python dependencies
│   └── requirements-dev.txt        # Development/testing dependencies
│
├── 🚀 Entry Points
│   ├── cli.py                      # Command-line interface (main entry)
│   ├── example_usage.py            # Programmatic usage examples
│   └── quickstart.sh               # Quick setup script
│
├── 🧠 Core System (src/)
│   ├── __init__.py                 # Package initialization
│   ├── schemas.py                  # Pydantic models for data extraction
│   │                                 • InvoiceData
│   │                                 • EmailData
│   │                                 • SupportTicketData
│   ├── extractor.py                # Core extraction engine
│   │                                 • LLMExtractor class
│   │                                 • Retry logic
│   │                                 • Validation feedback
│   └── logging_config.py           # Logging configuration
│
├── 📥 Sample Inputs (sample_inputs/)
│   ├── invoice_tech.txt            # Sample tech invoice
│   ├── email_project.txt           # Sample project email
│   ├── email_inquiry.txt           # Sample inquiry email
│   └── support_ticket_urgent.txt   # Sample urgent support ticket
│
├── 📤 Sample Outputs (sample_outputs/)
│   ├── invoice_success.json        # Example successful invoice extraction
│   └── email_success.json          # Example successful email extraction
│
├── 🧪 Tests (tests/)
│   ├── __init__.py
│   └── test_schemas.py             # Unit tests for Pydantic schemas
│
└── 📁 Generated Directories
    ├── logs/                       # Execution logs (auto-created)
    └── output/                     # CLI output files (auto-created)

File Descriptions

Core Documentation

README.md: Complete guide with quick start, examples, and explanations
CONCEPTS.md: Detailed explanations of function calling, guardrails, validation
LEARNING_OUTCOMES.md: Summary of skills and concepts mastered

Configuration

.env.example: Template for environment variables (copy to .env)
requirements.txt: Core Python dependencies (OpenAI, Pydantic, Rich, etc.)
requirements-dev.txt: Testing dependencies (pytest)

Entry Points

cli.py: Full-featured CLI for extraction and validation
- extract: Extract data from text files
- validate: Validate JSON against schemas
- list-schemas: Show available extraction types
example_usage.py: Shows how to use the system programmatically
quickstart.sh: One-command setup and demo

Core System (`src/`)

schemas.py: Pydantic models defining data structures
- Type annotations
- Field validation rules
- Custom validators
- Nested models
extractor.py: The heart of the system
- LLMExtractor class
- Function calling implementation
- Retry logic with error feedback
- Validation orchestration
logging_config.py: Structured logging setup
- Console and file handlers
- Configurable log levels
- Timestamp-based log files

Sample Data

sample_inputs/: Real-world examples to test extraction
- Various invoice formats
- Different email types
- Support ticket scenarios
sample_outputs/: Examples of successful extractions
- Shows expected output format
- Useful for understanding structure

Tests

tests/: Unit tests for validation logic
- Test valid inputs
- Test edge cases
- Test validation failures
- Run with pytest

Key Design Principles

1. Separation of Concerns

Schemas define structure (what)
Extractor handles logic (how)
CLI handles interaction (interface)

2. Extensibility

Add new schemas → just extend schemas.py
Add new validation → Pydantic validators
Add new features → modify extractor

3. Observability

All logs saved to files
Attempt tracking in results
Clear error messages

4. Type Safety

Pydantic everywhere
No loose dictionaries
Compiler-assisted development

5. Production-Ready

Environment-based config
Graceful error handling
Comprehensive logging
Easy testing

Data Flow

1. Input File
   ↓
2. CLI reads text
   ↓
3. LLMExtractor.extract()
   ├─ Build prompt with schema
   ├─ Call OpenAI API (function calling)
   ├─ Parse function response
   └─ Validate with Pydantic
      ├─ ✓ Success → Return result
      └─ ✗ Failure → Retry with feedback
         └─ (repeat up to max_retries)
   ↓
4. CLI displays result
   ↓
5. (Optional) Save to output file

How to Navigate This Project

If you're new:

Read README.md - Quick start and overview
Read CONCEPTS.md - Understand why
Run ./quickstart.sh - See it work
Explore schemas.py - See how schemas are defined
Read extractor.py - See how extraction works

If you're extending:

Add schema to schemas.py
Register in EXTRACTION_SCHEMAS
Create sample input in sample_inputs/
Test with CLI
Add tests to tests/

If you're debugging:

Run with --verbose flag
Check logs/ directory
Use validate command to test schemas
Add print statements in extractor
Review attempt history in results

Module Dependencies

cli.py
  ├─→ src/__init__.py
  │     ├─→ schemas.py
  │     ├─→ extractor.py
  │     │     └─→ schemas.py
  │     └─→ logging_config.py
  │
  ├─→ typer (CLI framework)
  ├─→ rich (terminal formatting)
  └─→ python-dotenv (environment)

extractor.py
  ├─→ openai (API client)
  └─→ pydantic (validation)

schemas.py
  └─→ pydantic (models)

Quick Commands Reference

# Setup
./quickstart.sh                      # One-command setup

# Basic extraction
python cli.py extract -i <file> -t <type>

# With output
python cli.py extract -i <file> -t <type> -o result.json

# Verbose mode
python cli.py extract -i <file> -t <type> --verbose

# List available schemas
python cli.py list-schemas

# Validate JSON
python cli.py validate -s <type> -f <file>

# Run tests
pytest tests/ -v

# Run examples
python example_usage.py

Environment Setup Checklist

Python 3.10+ installed
Virtual environment created (python -m venv venv)
Virtual environment activated (source venv/bin/activate)
Dependencies installed (pip install -r requirements.txt)
.env file created from .env.example
OpenAI API key added to .env
Test run successful (./quickstart.sh)

This structure is designed for:

✅ Easy learning (clear separation, good docs)
✅ Easy extension (add schemas without touching core)
✅ Easy debugging (comprehensive logs, clear errors)
✅ Production use (type safety, error handling, config management)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Structure

File Descriptions

Core Documentation

Configuration

Entry Points

Core System (`src/`)

Sample Data

Tests

Key Design Principles

1. Separation of Concerns

2. Extensibility

3. Observability

4. Type Safety

5. Production-Ready

Data Flow

How to Navigate This Project

Module Dependencies

Quick Commands Reference

Environment Setup Checklist

FilesExpand file tree

PROJECT_STRUCTURE.md

Latest commit

History

PROJECT_STRUCTURE.md

File metadata and controls

Project Structure

File Descriptions

Core Documentation

Configuration

Entry Points

Core System (src/)

Sample Data

Tests

Key Design Principles

1. Separation of Concerns

2. Extensibility

3. Observability

4. Type Safety

5. Production-Ready

Data Flow

How to Navigate This Project

Module Dependencies

Quick Commands Reference

Environment Setup Checklist

Core System (`src/`)