A multi-agent natural language interface for commercial insurance data, powered by LangGraph, LangChain, and GPT-4o-mini. Intelligently routes queries between structured database operations and semantic document search.
User Question (Streamlit UI)
↓
┌────────────────────────────────────────┐
│ MultiAgentOrchestrator (LangGraph) │
│ Primary Entry Point │
└──────────────┬─────────────────────────┘
↓
┌──────────────┐
│ Supervisor │ ← GPT-4o-mini routing decision
│ Node │
└──────┬───────┘
│
┌──────────┴──────────────┐
│ Conditional Routing │
└─┬────────┬──────────┬───┘
│ │ │
sql_only document hybrid
│ search │
↓ ↓ ↓
┌─────┐ ┌─────┐ ┌─────────┐
│ SQL │ │ Doc │ │SQL→Doc │
│Agent│ │Agent│ │Parallel │
└──┬──┘ └──┬──┘ └────┬────┘
│ │ │
└────────┴───────────┘
↓
┌──────────────┐
│ Synthesizer │ ← Combines multi-agent responses
│ Node │
└──────┬───────┘
↓
Final Response
User Question + Conversation History
↓
Skill Router (keyword-based)
↓
SQLGenerationChain
├─ Prompt: Skill-specific schema context
├─ LLM: GPT-4o-mini (temp=0.1)
└─ Parser: Pydantic validation
↓
SQL Validator (security checks)
↓
PostgreSQL Execution (with retry)
↓
ResponseGenerationChain
├─ Prompt: Skill-specific response guidelines
├─ LLM: GPT-4o-mini (temp=0.7)
└─ Parser: String output
↓
Natural Language Response
ai_assistant/
├── graph/ # Multi-agent orchestration (LangGraph)
│ ├── orchestrator.py # Main workflow builder & executor
│ ├── state.py # TypedDict state schema
│ └── nodes/
│ ├── supervisor.py # Routing decision node (GPT)
│ ├── sql_agent.py # SQL execution node
│ ├── document_agent.py # Vector search node
│ └── synthesizer.py # Response synthesis node
├── config/
│ ├── database.py # SQLAlchemy + PostgreSQL
│ └── settings.py # Environment variables & LLM config
├── chains/ # LangChain chain definitions
│ ├── sql_generation_chain.py # SQL generation (temp=0.1)
│ └── response_chain.py # Response generation (temp=0.7)
├── skills/ # Skill-based query routing
│ ├── base.py # BaseSkill interface
│ ├── phone_calls.py # Phone calls skill
│ ├── phone_messages.py # SMS/text messages skill
│ ├── email_communications.py # Email & quotes skill
│ ├── companies_data.py # Company information skill
│ ├── documents.py # Document metadata queries skill
│ └── general.py # General fallback skill
├── agents/
│ ├── sql_agent.py # LangChain agent orchestrator
│ └── skill_router.py # Keyword-based skill detection
├── memory/
│ └── conversation_memory.py # LangChain memory manager
├── core/
│ ├── validator.py # SQL validation & security
│ ├── executor.py # SQL execution with retry
│ └── schema_loader.py # Database schema context
├── database/
│ └── chromadb/ # Vector database for documents
│ ├── client.py # ChromaDB wrapper
│ ├── indexer.py # Document chunking & embedding
│ └── searcher.py # Semantic search
├── evaluation/ # Testing & evaluation
│ ├── runner.py # Test execution runner
│ └── test_questions.py # Test question definitions
└── ui/
└── app.py # Streamlit interface
- 🎯 Multi-Agent Orchestration - LangGraph workflow with intelligent routing
- 🤖 LangChain-Powered Agents - Modern LLM orchestration with chains
- 📄 Document Search - ChromaDB vector database for policy documents
- 🔀 Hybrid Queries - Combine structured data + document content
- 🧠 Smart Routing - Supervisor node analyzes questions and routes optimally
- 🔄 Automatic Retry Logic - Up to 3 attempts with error feedback
- 💬 Conversation Memory - LangChain memory for multi-turn conversations
- 🎯 Skill-Based Routing - 5 specialized skills for different query types
- ✅ SQL Validation - Security checks before execution
- 📊 Results Export - Download query results as CSV
- 🌙 Dark Theme UI - Gemini-style Streamlit interface
MultiAgentOrchestrator (graph/orchestrator.py)
- Primary entry point for all queries
- Builds and executes LangGraph state machine
- Manages state flow through nodes
- Converts graph output to legacy API format
Supervisor Node (graph/nodes/supervisor.py)
- Analyzes user questions using GPT-4o-mini
- Routes to:
sql_only,document_search, orhybrid - Uses structured output for routing decisions
SQL Agent Node (graph/nodes/sql_agent.py)
- Executes structured data queries
- Delegates to LangChainSQLAgent
- Handles retry logic with error feedback
Document Agent Node (graph/nodes/document_agent.py)
- Performs semantic search in ChromaDB
- Returns relevant document chunks
- Configurable top-k and similarity threshold
Synthesizer Node (graph/nodes/synthesizer.py)
- Combines responses from multiple agents
- Uses LLM for intelligent synthesis
- Single-agent: pass-through, Multi-agent: synthesis
State Management (graph/state.py)
- TypedDict schema with 17 fields
- Tracks routing, SQL results, documents, errors
- Accumulates execution path for debugging
Document Indexer (database/chromadb/indexer.py)
- Chunks documents (4000 tokens, 200 overlap)
- Creates embeddings (text-embedding-3-small, 1536 dims)
- Stores in ChromaDB collection
Document Searcher (database/chromadb/searcher.py)
- Semantic search with cosine similarity
- Configurable similarity threshold (0.7)
- Returns top-k relevant chunks
SQL Generation Chain (chains/sql_generation_chain.py)
- Uses LangChain chains and components for LLM orchestration
ChatPromptTemplatefor reusable, versioned prompt templatesJsonOutputParserwith Pydantic models for type-safe responses- Output:
reasoning,sql,explanation,needs_clarification
Response Generation Chain (chains/response_chain.py)
- Natural language response generation (temp=0.7)
- Skill-specific response guidelines
- Fallback to template responses on LLM failure
ConversationMemoryManager (memory/conversation_memory.py)
- LangChain
ConversationBufferWindowMemoryfor automatic conversation tracking - Per-session memory isolation
- Configurable window size (default: 3 exchanges)
- Methods:
add_exchange(),get_conversation_history(),clear_session()
6 Specialized Skills:
- Phone Calls - Call records, voicemails, recording summaries (
phone_call_silver) - Phone Messages - SMS/text messages (
phone_message_silver) - Email Communications - Emails, quotes, pricing, account activity (
emails_silver) - Companies Data - Company info, contacts, business details (
companies) - Documents - Document metadata queries
- General - Fallback for any other queries (all tables)
Each skill provides:
- Skill-specific schema context (as LangChain prompt templates)
- Custom response formatting (fallback if LLM fails)
- Example queries for few-shot learning
- Supervisor Pattern - Centralized routing with conditional branching (LangGraph)
- State Machine - TypedDict state management with accumulator fields
- Strategy Pattern - Skills as pluggable strategies with consistent interface
- Chain of Responsibility - SQL gen → validate → execute → respond with retry
- Multi-Modal Data - Structured (SQL) + Unstructured (Documents) integration
- Response Synthesis - LLM intelligently combines outputs from multiple agents
- Composition Over Inheritance - Agents composed of chains/validators/memory
- Facade Pattern - MultiAgentOrchestrator hides graph complexity
User: "What was the last phone call?"
↓
Supervisor: Analyzes question → Routes to sql_only
↓
SQL Agent:
• SkillDetector: Detects "phone_calls" skill (keyword: "call")
• SQLGenerationChain: Generates SQL with phone_call_silver schema
• Validator: Checks security (SELECT-only, company_id filter)
• Executor: Runs query on PostgreSQL
• ResponseChain: Generates natural language response
↓
Synthesizer: Pass-through (single agent)
↓
Output: "The most recent call was on January 9, 2026 from John Smith
regarding policy renewal. Duration: 8 minutes."
User: "What's covered in the liability policy?"
↓
Supervisor: Analyzes question → Routes to document_search
↓
Document Agent:
• Converts question to search embedding
• Queries ChromaDB with semantic search
• Returns top-5 relevant document chunks (similarity > 0.7)
↓
Synthesizer: Formats document excerpts
↓
Output: "The liability policy covers:
• General Liability: Up to $2M per occurrence
• Professional Liability: $1M aggregate
• Product Liability: Included in coverage
[Source: Policy_2026_Liability.pdf, Page 3]"
User: "Show me quotes and what the policy covers"
↓
Supervisor: Analyzes question → Routes to hybrid
↓
SQL Agent (parallel):
• Detects "email_communications" skill
• Queries emails_silver for quotes
• Returns: 2 quotes found ($1,433.88, $1,250.00)
↓
Document Agent (parallel):
• Searches for "policy coverage" in ChromaDB
• Returns: Policy document excerpts
↓
Synthesizer: LLM combines both sources
↓
Output: "You received 2 quotes:
• Harper Insurance: $1,433.88 (Jan 9, 2026)
• Competitor Quote: $1,250.00 (Jan 8, 2026)
The Harper policy covers:
• General Liability: $2M
• Workers Comp: $500K
• Commercial Auto: $1M combined
[Sources: Database + Policy_2026.pdf]"
User: "What was the last phone call?"
Assistant: "The last call was from John Smith on Jan 9..."
User: "What did they discuss?"
↓
Memory: Retrieves previous exchange
↓
SQL Agent:
• SQLGenerationChain receives conversation history
• Understands "they" = John Smith from previous answer
• Generates query using recording_summary field
↓
Output: "They discussed renewing the business liability policy.
John mentioned expanding operations and needing higher
coverage limits."
cd ai_assistant
# Copy environment template and add your credentials
cp .env.example .env
# Edit .env with your DATABASE_URL and OPENAI_API_KEY
# Build and run
docker compose up --buildThe app will be available at http://localhost:8501
Docker Commands:
# Start the container
docker compose up
# Start in detached mode (background)
docker compose up -d
# Rebuild after code changes
docker compose up --build
# Stop the container
docker compose down
# View logs
docker compose logs -fcd ai_assistant
pip install -r requirements.txtCreate a .env file in the ai_assistant/ directory:
# PostgreSQL (Supabase)
DATABASE_URL=postgresql://postgres.xxx:xxx@aws-0-us-east-1.pooler.supabase.com:6543/postgres
# OpenAI
OPENAI_API_KEY=sk-...
# LangChain Configuration (Optional)
LANGCHAIN_TRACING_V2=false
LANGCHAIN_API_KEY=
LANGCHAIN_PROJECT=ai-assistant
# LLM Configuration
LLM_MODEL=gpt-4o-mini
LLM_TEMPERATURE_SQL=0.1
LLM_TEMPERATURE_RESPONSE=0.7
LLM_TIMEOUT=30
LLM_MAX_RETRIES=2
# Memory Configuration
MEMORY_WINDOW_SIZE=3
# SQL Agent Config
MAX_RETRIES=3
SQL_TIMEOUT_SECONDS=30
NLG_ENABLED=true
NLG_MAX_ROWS=10
# Vector Search Configuration
VECTOR_SEARCH_TOP_K=5
VECTOR_SIMILARITY_THRESHOLD=0.7
EMBEDDINGS_MODEL=text-embedding-3-small
EMBEDDINGS_DIMENSION=1536
# Document Chunking
CHUNK_SIZE=4000
CHUNK_OVERLAP=200python -c "from config.database import test_connection; print('Connected!' if test_connection() else 'Failed')"streamlit run ui/app.pyThe app will open in your browser at http://localhost:8501
LLM_MODEL: Model name (default:gpt-4o-mini)LLM_TEMPERATURE_SQL: Temperature for SQL generation (default:0.1- deterministic)LLM_TEMPERATURE_RESPONSE: Temperature for responses (default:0.7- creative)LLM_TIMEOUT: Request timeout in seconds (default:30)LLM_MAX_RETRIES: LangChain retry attempts (default:2)
MEMORY_WINDOW_SIZE: Number of conversation turns to remember (default:3)
MAX_RETRIES: SQL generation retry attempts (default:3)SQL_TIMEOUT_SECONDS: SQL execution timeout (default:30)NLG_ENABLED: Use LLM for responses (default:true)
VECTOR_SEARCH_TOP_K: Number of document chunks to return (default:5)VECTOR_SIMILARITY_THRESHOLD: Minimum similarity score (default:0.7)EMBEDDINGS_MODEL: OpenAI embedding model (default:text-embedding-3-small)CHUNK_SIZE: Document chunk size in tokens (default:4000)CHUNK_OVERLAP: Overlap between chunks in tokens (default:200)
User: "What was the last phone call?"
Flow:
1. Supervisor → Routes to sql_only
2. SQL Agent → phone_calls skill
3. SQL: SELECT * FROM communications.phone_call_silver
WHERE matched_company_id = 29447
ORDER BY call_timestamp DESC LIMIT 1
4. Response: "The most recent call was on January 9, 2026..."
User: "What quotes were sent?"
Flow:
1. Supervisor → Routes to sql_only
2. SQL Agent → email_communications skill
3. SQL: SELECT * FROM communications.emails_silver
WHERE matched_company_id = 29447 AND category = 'QUOTE'
4. Response: "On January 9, 2026, Harper sent a quote for $1,433.88..."
User: "What does the policy say about liability coverage?"
Flow:
1. Supervisor → Routes to document_search
2. Document Agent → Semantic search in ChromaDB
3. Returns: Top-5 relevant chunks from policy PDFs
4. Response: "The liability policy covers up to $2M per occurrence..."
User: "Show me the quotes and what they cover"
Flow:
1. Supervisor → Routes to hybrid
2. SQL Agent → Queries emails_silver for quotes (parallel)
3. Document Agent → Searches policy documents (parallel)
4. Synthesizer → LLM combines both sources
5. Response: "You received 2 quotes... The policy covers..."
User: "What was the last call?"
Assistant: "The last call was from John on Jan 9..."
User: "What did they discuss?"
Flow:
1. Memory retrieves previous exchange
2. SQL Agent sees context: "previous question was about the last call"
3. Understands "they" = the caller from previous answer
4. Generates query using recording_summary field
# Test multi-agent orchestrator
python -c "from graph.orchestrator import MultiAgentOrchestrator; orch = MultiAgentOrchestrator(29447); result = orch.process_query('What was the last call?', 'test_session'); print(f'Success: {result[\"success\"]}')"
# Test SQL agent
python -c "from agents.sql_agent import LangChainSQLAgent; agent = LangChainSQLAgent(29447); print(agent.generate_sql('What are the contact details?'))"
# Test conversation memory
python -c "from memory.conversation_memory import ConversationMemoryManager; mgr = ConversationMemoryManager(); mgr.add_exchange('test', 'Q1', 'A1'); print(mgr.get_conversation_history('test'))"
# Test document search
python -c "from database.chromadb.searcher import search_documents; results = search_documents('liability coverage', top_k=3); print(f'Found {len(results)} documents')"
# Test executor (full pipeline)
python -c "from core.executor import execute_with_retry; result = execute_with_retry('What are the contact details?', 29447); print(f'Success: {result[\"success\"]}')"cd ai_assistant/evaluation
python runner.pyEnable LangSmith for debugging and tracing:
# In .env
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=ls-...
LANGCHAIN_PROJECT=ai-assistantThen run queries and view traces at https://smith.langchain.com
- Average Query Time:
- SQL-Only: 2-4 seconds
- Document Search: 1-3 seconds
- Hybrid: 4-6 seconds
- API Cost per Query: ~$0.01-0.03 (GPT-4o-mini)
- Success Rate: >95% on first attempt, ~100% with retries
- Memory Overhead: Minimal (3-5 exchanges per session)
pip install -r requirements.txt# Create .env file with:
OPENAI_API_KEY=sk-...- Check that session_id is consistent
- Verify
ConversationMemoryManageris inst.session_state
pip install --upgrade langchain langchain-openai langchain-core# Index documents first
cd database/chromadb
python indexer.py- Orchestration: LangGraph (state-based multi-agent workflows)
- LLM Framework: LangChain (chains, prompts, parsers, memory)
- LLM Provider: OpenAI GPT-4o-mini
- Database: PostgreSQL with SQLAlchemy ORM
- Vector DB: ChromaDB for document embeddings
- UI: Streamlit
- Validation: sqlparse
- Type Safety: Pydantic v2
- LLM-based skill routing (instead of keyword matching)
- Streaming responses with
astream() - Multi-provider support (Anthropic Claude, etc.)
- Query caching with LangChain cache
- Agent tools for complex operations
- Async execution for concurrent queries
- Persistent conversation history across sessions