llm_102/
│
├── 📄 README.md # Main documentation - START HERE
├── 📄 CONCEPTS.md # Deep dive into core concepts
├── 📄 LEARNING_OUTCOMES.md # What you'll master
│
├── 🔧 Configuration Files
│ ├── .env.example # Environment variables template
│ ├── .gitignore # Git ignore rules
│ ├── requirements.txt # Python dependencies
│ └── requirements-dev.txt # Development/testing dependencies
│
├── 🚀 Entry Points
│ ├── cli.py # Command-line interface (main entry)
│ ├── example_usage.py # Programmatic usage examples
│ └── quickstart.sh # Quick setup script
│
├── 🧠 Core System (src/)
│ ├── __init__.py # Package initialization
│ ├── schemas.py # Pydantic models for data extraction
│ │ • InvoiceData
│ │ • EmailData
│ │ • SupportTicketData
│ ├── extractor.py # Core extraction engine
│ │ • LLMExtractor class
│ │ • Retry logic
│ │ • Validation feedback
│ └── logging_config.py # Logging configuration
│
├── 📥 Sample Inputs (sample_inputs/)
│ ├── invoice_tech.txt # Sample tech invoice
│ ├── email_project.txt # Sample project email
│ ├── email_inquiry.txt # Sample inquiry email
│ └── support_ticket_urgent.txt # Sample urgent support ticket
│
├── 📤 Sample Outputs (sample_outputs/)
│ ├── invoice_success.json # Example successful invoice extraction
│ └── email_success.json # Example successful email extraction
│
├── 🧪 Tests (tests/)
│ ├── __init__.py
│ └── test_schemas.py # Unit tests for Pydantic schemas
│
└── 📁 Generated Directories
├── logs/ # Execution logs (auto-created)
└── output/ # CLI output files (auto-created)
- README.md: Complete guide with quick start, examples, and explanations
- CONCEPTS.md: Detailed explanations of function calling, guardrails, validation
- LEARNING_OUTCOMES.md: Summary of skills and concepts mastered
- .env.example: Template for environment variables (copy to
.env) - requirements.txt: Core Python dependencies (OpenAI, Pydantic, Rich, etc.)
- requirements-dev.txt: Testing dependencies (pytest)
-
cli.py: Full-featured CLI for extraction and validation
extract: Extract data from text filesvalidate: Validate JSON against schemaslist-schemas: Show available extraction types
-
example_usage.py: Shows how to use the system programmatically
-
quickstart.sh: One-command setup and demo
-
schemas.py: Pydantic models defining data structures
- Type annotations
- Field validation rules
- Custom validators
- Nested models
-
extractor.py: The heart of the system
LLMExtractorclass- Function calling implementation
- Retry logic with error feedback
- Validation orchestration
-
logging_config.py: Structured logging setup
- Console and file handlers
- Configurable log levels
- Timestamp-based log files
-
sample_inputs/: Real-world examples to test extraction
- Various invoice formats
- Different email types
- Support ticket scenarios
-
sample_outputs/: Examples of successful extractions
- Shows expected output format
- Useful for understanding structure
- tests/: Unit tests for validation logic
- Test valid inputs
- Test edge cases
- Test validation failures
- Run with
pytest
- Schemas define structure (what)
- Extractor handles logic (how)
- CLI handles interaction (interface)
- Add new schemas → just extend
schemas.py - Add new validation → Pydantic validators
- Add new features → modify extractor
- All logs saved to files
- Attempt tracking in results
- Clear error messages
- Pydantic everywhere
- No loose dictionaries
- Compiler-assisted development
- Environment-based config
- Graceful error handling
- Comprehensive logging
- Easy testing
1. Input File
↓
2. CLI reads text
↓
3. LLMExtractor.extract()
├─ Build prompt with schema
├─ Call OpenAI API (function calling)
├─ Parse function response
└─ Validate with Pydantic
├─ ✓ Success → Return result
└─ ✗ Failure → Retry with feedback
└─ (repeat up to max_retries)
↓
4. CLI displays result
↓
5. (Optional) Save to output file
If you're new:
- Read README.md - Quick start and overview
- Read CONCEPTS.md - Understand why
- Run
./quickstart.sh- See it work - Explore schemas.py - See how schemas are defined
- Read extractor.py - See how extraction works
If you're extending:
- Add schema to schemas.py
- Register in
EXTRACTION_SCHEMAS - Create sample input in
sample_inputs/ - Test with CLI
- Add tests to
tests/
If you're debugging:
- Run with
--verboseflag - Check
logs/directory - Use
validatecommand to test schemas - Add print statements in extractor
- Review attempt history in results
cli.py
├─→ src/__init__.py
│ ├─→ schemas.py
│ ├─→ extractor.py
│ │ └─→ schemas.py
│ └─→ logging_config.py
│
├─→ typer (CLI framework)
├─→ rich (terminal formatting)
└─→ python-dotenv (environment)
extractor.py
├─→ openai (API client)
└─→ pydantic (validation)
schemas.py
└─→ pydantic (models)
# Setup
./quickstart.sh # One-command setup
# Basic extraction
python cli.py extract -i <file> -t <type>
# With output
python cli.py extract -i <file> -t <type> -o result.json
# Verbose mode
python cli.py extract -i <file> -t <type> --verbose
# List available schemas
python cli.py list-schemas
# Validate JSON
python cli.py validate -s <type> -f <file>
# Run tests
pytest tests/ -v
# Run examples
python example_usage.py- Python 3.10+ installed
- Virtual environment created (
python -m venv venv) - Virtual environment activated (
source venv/bin/activate) - Dependencies installed (
pip install -r requirements.txt) -
.envfile created from.env.example - OpenAI API key added to
.env - Test run successful (
./quickstart.sh)
This structure is designed for:
- ✅ Easy learning (clear separation, good docs)
- ✅ Easy extension (add schemas without touching core)
- ✅ Easy debugging (comprehensive logs, clear errors)
- ✅ Production use (type safety, error handling, config management)