Skip to content

Dynamic context retrieval and Self-Learning Data Query System

License

Notifications You must be signed in to change notification settings

krishna-kudari/Text2SQL-Agent

Repository files navigation

DataChat - Production-Ready Text2SQL Agent

A self-learning Text-to-SQL agent with natural language interface, built for production.

Features

  • Natural Language to SQL: Ask questions in plain English, get SQL queries and results
  • Self-Learning: Improves with every interaction through feedback loops
  • Multi-Database Support: PostgreSQL, MySQL, SQLite, and more
  • Human-in-the-Loop: Validation for complex queries before execution
  • Custom Instructions: Define project-specific rules and terminology
  • Multiple Chats: Maintain context across conversations
  • Plan & Confirm: See execution plans before running complex queries

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Frontend (Next.js + shadcn)                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Chat UI β”‚ Projects β”‚ Data Sources β”‚ Custom Instructions β”‚ KB   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Backend API (FastAPI)                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Orchestrator β†’ Planner β†’ SQL Agent β†’ Validator β†’ Learning      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        Services Layer                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ LLM (Groq) β”‚ DataSource β”‚ Schema     β”‚ Knowledge  β”‚ Embeddings  β”‚
β”‚            β”‚ Connectors β”‚ Indexer    β”‚ Base       β”‚ (Nomic)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        Data Layer                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   SQLite/PG     β”‚    ChromaDB     β”‚   User Data Sources         β”‚
β”‚   (Metadata)    β”‚    (Vectors)    β”‚   (via Connectors)          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tech Stack

Layer Technology
Frontend Next.js 14, shadcn/ui, Tailwind, Framer Motion
Backend Python 3.12, FastAPI, Pydantic
LLM Groq API (default), OpenAI, Ollama
Vector DB ChromaDB
Embeddings nomic-embed-text (via Ollama)
Database SQLite (dev), PostgreSQL (prod)

Quick Start

Prerequisites

  • Python 3.12+
  • Node.js 20+
  • Ollama (for local embeddings)
  • Groq API Key (free at console.groq.com)

1. Clone and Setup

cd Text2Sql

# Backend
cd backend
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Frontend
cd ../frontend
npm install

2. Configure Environment

Create .env in the backend directory:

# LLM
LLM_PROVIDER=groq
GROQ_API_KEY=your-groq-api-key

# Embeddings (Ollama)
OLLAMA_BASE_URL=http://localhost:11434

# Database
DATABASE_URL=sqlite:///./datachat.db

# Debug
DEBUG=true

3. Start Ollama for Embeddings

# Install Ollama (macOS)
brew install ollama

# Start server
ollama serve

# Pull embedding model
ollama pull nomic-embed-text

4. Run the Application

# Terminal 1: Backend
cd backend
source venv/bin/activate
uvicorn backend.main:app --reload --port 8000

# Terminal 2: Frontend
cd frontend
npm run dev

Visit http://localhost:3000

Project Structure

text2sql/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ api/                # FastAPI routes
β”‚   β”œβ”€β”€ agents/             # Agent implementations (SOLID)
β”‚   β”‚   β”œβ”€β”€ orchestrator.py
β”‚   β”‚   β”œβ”€β”€ planner.py
β”‚   β”‚   β”œβ”€β”€ sql_agent.py
β”‚   β”‚   β”œβ”€β”€ validator.py
β”‚   β”‚   └── learning.py
β”‚   β”œβ”€β”€ services/           # Business logic
β”‚   β”‚   β”œβ”€β”€ llm/           # LLM providers
β”‚   β”‚   β”œβ”€β”€ datasources/   # DB connectors
β”‚   β”‚   β”œβ”€β”€ embeddings/    # Embedding service
β”‚   β”‚   β”œβ”€β”€ knowledge/     # Schema indexer
β”‚   β”‚   └── vector_store/  # ChromaDB
β”‚   β”œβ”€β”€ models/            # Pydantic models
β”‚   β”œβ”€β”€ db/                # SQLAlchemy
β”‚   β”œβ”€β”€ core/              # Config & utils
β”‚   └── main.py
β”‚
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ app/               # Next.js pages
β”‚   β”œβ”€β”€ components/        # React components
β”‚   β”‚   β”œβ”€β”€ ui/           # shadcn components
β”‚   β”‚   └── chat/         # Chat components
β”‚   β”œβ”€β”€ lib/              # Utilities
β”‚   └── hooks/            # React hooks
β”‚
β”œβ”€β”€ docker-compose.yaml
└── README.md

Usage

1. Create a Project

Projects organize your data sources and custom instructions.

2. Add Data Source

Connect to your database:

  • PostgreSQL, MySQL, SQLite
  • CSV/Parquet files
  • REST APIs

3. Start Chatting

Ask questions naturally:

  • "Show me top 10 products by revenue"
  • "What's the average order value this month?"
  • "List customers who churned last quarter"

4. Provide Feedback

Help the agent learn:

  • πŸ‘ Correct - adds to knowledge base
  • πŸ‘Ž Incorrect - learn from corrections

5. Custom Instructions

Add project-specific rules:

  • "status field uses values: active, pending, cancelled"
  • "Always filter by tenant_id"
  • "Revenue = quantity * unit_price"

API Endpoints

Endpoint Method Description
/api/projects GET List projects
/api/projects POST Create project
/api/chat/message POST Send chat message
/api/chat/sessions/{id} GET List chat sessions
/api/datasources POST Add data source
/api/datasources/{id}/sync POST Sync schema
/api/feedback POST Submit feedback

Configuration

LLM Providers

# Groq (default - fast & free)
LLM_PROVIDER=groq
GROQ_API_KEY=your-key

# OpenAI
LLM_PROVIDER=openai
OPENAI_API_KEY=your-key

# Ollama (local)
LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434

Available Models (Groq)

  • llama-3.3-70b-versatile (default)
  • llama-3.1-8b-instant
  • mixtral-8x7b-32768

Development

Run Tests

cd backend
pytest

Code Formatting

# Backend
black backend/
ruff check backend/

# Frontend
npm run lint

Deployment

Docker

# Build and run
docker-compose up -d

# View logs
docker-compose logs -f

Production Checklist

  • Set DEBUG=false
  • Use PostgreSQL for metadata
  • Configure proper CORS origins
  • Set up rate limiting
  • Enable SSL/TLS
  • Configure logging
  • Set up monitoring

License

MIT

About

Dynamic context retrieval and Self-Learning Data Query System

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •