A semantic search API for Austin building permits using vector embeddings and Pinecone vector database.
- 🔍 Semantic Search: Find permits using natural language queries
- 🎯 Advanced Filtering: Filter by permit type, year, location, valuation, and more
- ⚡ Fast Performance: Vector similarity search with sub-second response times
- 📊 Rich Metadata: Comprehensive permit information with similarity scores
- 🚀 RESTful API: Easy-to-use FastAPI endpoints
Create a .env file in the root directory:
# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key_here
# Pinecone Configuration
PINECONE_API_KEY=your_pinecone_api_key_here
PINECONE_ENVIRONMENT=your_pinecone_environment_here
# Optional: Pinecone Index Name
PINECONE_INDEX_NAME=austin-permits
DATASET_API_URL=https://data.austintexas.gov/resource/3syk-w9eu.csv
LIMIT=10
OFFSET=1000pip install -r requirements.txtFirst, make sure you have processed permit data in data/processed/ directory, then run:
python scripts/create_embeddings.pyOption 1: Using the startup script (Recommended)
python run.pyOption 2: Using uvicorn directly
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reloadOption 3: From the app directory
cd app
python main.py- API Documentation: http://localhost:8000/docs
- Health Check: http://localhost:8000/health
- Search Endpoint: POST http://localhost:8000/api/v1/search
Endpoint: POST /api/v1/search
Request Body:
{
"query": "commercial remodel downtown",
"filters": {
"permit_class": "Commercial",
"calendar_year_issued": 2011
},
"top_k": 5
}The API supports filtering on all metadata fields. Here are the main categories:
permit_number,project_id,master_permit_number
permit_type,permit_type_description,permit_class,permit_class_original,work_class,status,issue_method
address,original_address,city,state,zip_code,council_district,jurisdiction,property_id,legal_description,latitude,longitude,total_lot_sqft
applied_date,issue_date,expires_date,completed_date,calendar_year_issued,fiscal_year_issued,day_issued
total_job_valuation,total_new_addition_sqft,total_existing_building_sqft,remodel_repair_sqft,total_valuation_remodel,number_of_floors,housing_units
building_valuation,building_valuation_remodel,electrical_valuation,electrical_valuation_remodel,mechanical_valuation,mechanical_valuation_remodel,plumbing_valuation,plumbing_valuation_remodel,medgas_valuation,medgas_valuation_remodel
contractor_company,contractor_trade,contractor_full_name,contractor_phone,contractor_address1,contractor_address2,contractor_city,contractor_zip
applicant_name,applicant_organization,applicant_phone,applicant_address1,applicant_address2,applicant_city,applicant_zip
project_description,permit_link
condominium,certificate_of_occupancy,recently_issued
Run the test suite to verify everything is working:
python scripts/test_api.pyThe API automatically logs all search queries with structured data for monitoring, analytics, and debugging purposes.
Each log entry is stored as a JSON object with the following structure:
{
"timestamp": "2024-01-15T10:30:45.123456",
"query_text": "commercial remodel downtown",
"filters": {
"permit_class": "Commercial",
"calendar_year_issued": 2023
},
"top_results": [
{
"record_id": "PERMIT_123_PROJECT_456",
"similarity_score": 0.85,
"permit_number": "2023-123456",
"address": "123 Main St, Austin, TX",
"permit_type": "Commercial Remodel",
"status": "Issued",
"total_job_valuation": 50000,
"calendar_year_issued": 2023
}
],
"search_time_ms": 245.67,
"user_agent": "Mozilla/5.0...",
"client_ip": "192.168.1.100",
"total_results": 5
}- Format: JSONL (JSON Lines) file
- Location:
logs/search_queries.jsonl - Rotation: Automatic file creation with timestamps
- Retention: Configurable (default: unlimited)
| Endpoint | Method | Description | Parameters |
|---|---|---|---|
/api/v1/logs/recent |
GET | Get recent search logs | limit (default: 25, max: 100) |
The API uses semantic versioning with the following approach:
- Current Version:
v1 - Base Path:
/api/v1/ - Future Versions:
v2,v3, etc.
- Backward Compatibility: New versions maintain backward compatibility where possible
- Deprecation Policy: Deprecated features are announced with advance notice
- Migration Path: Clear migration guides for breaking changes
- Parallel Support: Multiple versions can run simultaneously during transitions
To upgrade from v1 to v2:
- Update Configuration: Modify
app/api_config/api_version.py - Test Compatibility: Run migration tests
- Deploy Gradually: Use feature flags for gradual rollout
- Monitor Performance: Track API usage and performance metrics
# Current v1 configuration
API_VERSION = "v1"
API_BASE_PATH = "/api/v1"
SEARCH_ROUTER_PREFIX = "/api/v1"
LOGS_ROUTER_PREFIX = "/api/v1/logs"
# Future v2 configuration
API_VERSION = "v2"
API_BASE_PATH = "/api/v2"
SEARCH_ROUTER_PREFIX = "/api/v2"
LOGS_ROUTER_PREFIX = "/api/v2/logs"| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Health check |
/health |
GET | Detailed service status |
/api/v1/search |
POST | Search permits with filters |
/api/v1/logs/recent |
GET | Get recent search logs |
The system includes comprehensive schema versioning for normalized permit records to ensure data compatibility and enable future migrations.
Each normalized permit record includes metadata with schema version information:
{
"metadata": {
"schema_version": 1,
"processing_timestamp": "2024-01-15T10:30:45.123456",
"data_source": "Austin Texas Government API",
"record_id": "PERMIT_123_PROJECT_456",
"raw_field_count": 45
}
}- Current Version:
1 - Version Location:
app/utils/data_processor.py-SCHEMA_VERSIONconstant - Migration Support: Built-in migration utilities for future schema changes
When schema changes are needed:
- Increment Version: Update
SCHEMA_VERSIONinPermitDataProcessor - Implement Migration: Add migration logic in
_apply_migration()method - Test Migration: Use
migrate_record()method to test migrations - Batch Migration: Process existing data with new schema
ConstructIQ/
├── app/
│ ├── main.py # FastAPI application entry point
│ ├── config.py # Application configuration
│ ├── api_config/ # API versioning configuration
│ │ ├── __init__.py
│ │ └── api_version.py # API version constants
│ ├── models/ # Pydantic models
│ │ ├── __init__.py
│ │ ├── search.py # Search request/response models
│ │ └── common.py # Common response models
│ ├── api/ # API routes
│ │ ├── __init__.py
│ │ ├── search.py # Search endpoints
│ │ ├── health.py # Health check endpoints
│ │ └── logs.py # Query logging endpoints
│ ├── services/ # Business logic
│ │ ├── __init__.py
│ │ ├── permit.py # Main permit service
│ │ ├── embedding.py # OpenAI embedding service
│ │ ├── vector_db.py # Pinecone vector database service
│ │ └── logging_service.py # Query logging service
│ └── utils/ # Utilities
│ ├── __init__.py
│ └── data_processor.py # Data processing utilities
├── data/
│ ├── raw/ # Raw permit data
│ └── processed/ # Processed permit data
├── logs/ # Query log files
│ └── search_queries.jsonl # Structured query logs
├── scripts/
│ ├── create_embeddings.py # Data indexing script
│ ├── example_search.py # Search examples
│ ├── test_api.py # API test suite
│ ├── process_data.py # Data processing script
│ └── load_data.py # Data loading utilities
├── requirements.txt # Python dependencies
└── README.md # This file
# Install development dependencies
pip install -r requirements.txt
# Start with auto-reload
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000- Search Response Time: < 500ms for most queries
- Vector Index: Pinecone serverless for scalability
- Embedding Model: OpenAI text-embedding-3-small (1536 dimensions)
- Batch Processing: Configurable batch sizes for indexing
- API Key Errors: Ensure your
.envfile has valid API keys - Index Not Found: Run
python scripts/create_embeddings.pyfirst - Connection Errors: Check if the FastAPI server is running on port 8000
Check the console output for detailed logs and error messages.