Skip to content

amannanda-22/EVALMIND

Repository files navigation

<<<<<<< HEAD

EvalMind — LLM Output Quality Monitor

Monitor, evaluate, and alert on the quality of your RAG pipeline responses using RAGAS metrics, LLM-as-Judge scoring, and hallucination detection.


Architecture

                          ┌─────────────────────────────────────────────┐
                          │                EvalMind Stack                │
                          └─────────────────────────────────────────────┘

 User / Browser
      │
      ▼
┌─────────────┐       ┌──────────────────────────────────────────────────┐
│   nginx     │──────▶│  Streamlit Frontend  :8501                       │
│  :80 / 443  │       │  (frontend/app.py)                               │
│             │       └─────────────────┬────────────────────────────────┘
│             │                         │ HTTP /api/v1/*
│             │       ┌─────────────────▼────────────────────────────────┐
│             │──────▶│  FastAPI Backend  :8000                          │
└─────────────┘       │  (api/main.py)                                   │
                       │  ├── /auth (JWT)                                │
                       │  ├── /evaluate (RAGAS + Judge + Hallucination)  │
                       │  ├── /metrics  (summary, worst)                 │
                       │  ├── /alerts   (CRUD)                           │
                       │  ├── /history  (conversation)                   │
                       │  └── /admin    (admin-only)                     │
                       └───┬──────┬──────┬───────────────────────────────┘
                           │      │      │
             ┌─────────────┘      │      └──────────────────┐
             ▼                    ▼                          ▼
    ┌────────────────┐   ┌────────────────┐        ┌────────────────┐
    │  PostgreSQL    │   │     Redis      │        │  Google Gemini │
    │    :5432       │   │    :6379       │        │  (API calls)   │
    │  (Primary DB)  │   │  (Cache)       │        └────────────────┘
    └────────────────┘   └────────────────┘
             │
             ▼
    ┌────────────────┐         ┌──────────────┐      ┌──────────────┐
    │   Prometheus   │◀────────│   /metrics   │      │   Grafana    │
    │    :9090       │         │  endpoint    │      │    :3000     │
    └────────────────┘         └──────────────┘      └──────┬───────┘
                                                             │ dashboards
                                                    ┌────────▼───────┐
                                                    │  n8n Workflows │
                                                    │  (Alerts/Email)│
                                                    └────────────────┘

Features

  • RAGAS Evaluation: Faithfulness, Answer Relevancy, Context Precision, Context Recall
  • LLM-as-Judge: Accuracy, Completeness, Clarity, Citation Quality (1-10 scale)
  • Hallucination Detection: Claim-level verification with SUPPORTED/UNSUPPORTED/UNCERTAIN verdicts
  • Composite Confidence Score: Weighted combination (0-100) of all metrics
  • Threshold Alerting: Auto-create alerts when scores fall below configurable thresholds
  • Cost Tracking: Per-evaluation and daily USD cost summaries
  • Redis Cache: Semantic caching to avoid redundant evaluations
  • Prometheus + Grafana: Built-in metrics endpoint with a pre-built dashboard
  • n8n Integration: Webhook-triggered email alerts and daily reports
  • JWT Authentication: Role-based access (user/admin)
  • Batch Evaluation: Evaluate up to 10 QA pairs in a single request
  • Conversation History: Per-session conversation tracking

Prerequisites

  • Python 3.11+
  • Docker & Docker Compose (for containerized setup)
  • Google Cloud / AI Studio API key (Gemini)
  • Gmail account with App Password (for email alerts)
  • n8n instance (optional, for workflow automation)

Quick Start

Windows

REM Clone and enter the project
cd C:\path\to\EVALMIND\evalMind

REM Copy environment template
copy .env.example .env.development

REM Edit .env.development and set GOOGLE_API_KEY
notepad .env.development

REM Create virtual environment
python -m venv venv
venv\Scripts\activate

REM Install dependencies
pip install -r requirements.txt

REM Initialize database
python -c "from database.connection import init_db; init_db()"

REM Start backend
uvicorn api.main:app --reload --port 8000

REM In another terminal, start frontend
venv\Scripts\activate
streamlit run frontend/app.py --server.port 8501

Mac / Linux

# Clone and enter the project
cd /path/to/EVALMIND/evalMind

# Copy environment template
cp .env.example .env.development

# Edit .env.development and set GOOGLE_API_KEY
nano .env.development

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Initialize database
python -c "from database.connection import init_db; init_db()"

# Start backend (terminal 1)
uvicorn api.main:app --reload --port 8000

# Start frontend (terminal 2)
source venv/bin/activate
streamlit run frontend/app.py --server.port 8501

Environment Setup

Copy .env.example to .env.development and populate:

# ── Required ──────────────────────────────────────────────────
GOOGLE_API_KEY=your-google-ai-studio-api-key
SECRET_KEY=your-random-secret-key-min-32-chars
JWT_SECRET_KEY=your-random-jwt-secret-min-32-chars

# ── Database ──────────────────────────────────────────────────
# Dev (SQLite - no setup needed):
DATABASE_URL=sqlite:///./evalMind_dev.db

# Production (PostgreSQL):
# DATABASE_URL=postgresql://evaluser:evalpass@localhost:5432/evalMind

# ── Redis (optional but recommended) ─────────────────────────
REDIS_URL=redis://localhost:6379/0
REDIS_ENABLED=true

# ── Gmail Alerts (optional) ───────────────────────────────────
GMAIL_USER=your.email@gmail.com
GMAIL_APP_PASSWORD=your-16-char-app-password
ALERT_EMAIL_RECIPIENT=alerts@yourdomain.com

# ── Quality Thresholds ────────────────────────────────────────
FAITHFULNESS_THRESHOLD=0.7
RELEVANCY_THRESHOLD=0.7
CONFIDENCE_THRESHOLD=60.0

# ── Environment ───────────────────────────────────────────────
ENVIRONMENT=development
DEBUG=false
PROMETHEUS_ENABLED=true

Getting a Google API Key

  1. Go to Google AI Studio
  2. Click Create API Key
  3. Copy the key and set GOOGLE_API_KEY=... in your .env.development

Getting a Gmail App Password

  1. Enable 2FA on your Google account
  2. Go to Google Account > Security > 2-Step Verification > App passwords
  3. Create an app password for "Mail"
  4. Use the 16-character password (no spaces) as GMAIL_APP_PASSWORD

Local Dev Setup (without Docker)

# 1. Set up Python environment
python3 -m venv venv && source venv/bin/activate

# 2. Install all dependencies
pip install -r requirements.txt

# 3. Set up environment
cp .env.example .env.development
# Edit .env.development - set GOOGLE_API_KEY at minimum

# 4. Initialize SQLite database (dev default)
python -c "from database.connection import init_db; init_db()"

# 5. (Optional) Run Alembic migrations instead
alembic upgrade head

# 6. Create first admin user
python -c "
from database.connection import get_session_local
from database.models import User
from passlib.context import CryptContext
pwd = CryptContext(schemes=['bcrypt'])
db = get_session_local()()
admin = User(email='admin@example.com', username='admin', hashed_password=pwd.hash('changeme'), role='admin')
db.add(admin); db.commit()
print('Admin user created: admin@example.com / changeme')
"

# 7. Start backend
uvicorn api.main:app --reload --host 0.0.0.0 --port 8000

# 8. Start frontend (new terminal)
streamlit run frontend/app.py --server.port 8501 --server.address 0.0.0.0

Access:


Docker Setup

Development

# Build and start all services
docker compose up --build

# Run in background
docker compose up -d --build

# View logs
docker compose logs -f backend
docker compose logs -f frontend

# Stop all services
docker compose down

# Stop and remove volumes (reset database)
docker compose down -v

Services after docker compose up:

Service URL
Frontend http://localhost:8501
Backend http://localhost:8000
API Docs http://localhost:8000/docs
Prometheus http://localhost:9090
Grafana http://localhost:3000

Grafana default login: admin / evalMind2024

Initialize DB in Docker

# Run migration inside backend container
docker compose exec backend alembic upgrade head

# Or create tables directly
docker compose exec backend python -c "from database.connection import init_db; init_db()"

API Documentation Reference

All endpoints are at http://localhost:8000/api/v1/

Method Endpoint Auth Description
POST /auth/register None Register new user
POST /auth/login None Login, get JWT tokens
POST /auth/refresh None Refresh access token
GET /auth/me User Get current user info
POST /evaluate/ User Single evaluation
POST /evaluate/batch User Batch evaluation (max 10)
GET /metrics/summary User Aggregated metrics summary
GET /metrics/worst User Worst-performing evaluations
GET /alerts User List alerts (paginated)
PATCH /alerts/{id}/resolve User Resolve an alert
GET /history User Conversation history
GET /documents User List documents
GET /admin/users Admin List all users
GET /admin/stats Admin System-wide statistics
GET /health None Health check
GET /live None Liveness probe
GET /ready None Readiness probe
GET /metrics None Prometheus metrics

Example: Single Evaluation

curl -X POST http://localhost:8000/api/v1/evaluate/ \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is the capital of France?",
    "answer": "The capital of France is Paris.",
    "context_chunks": ["France is a country in Western Europe. Its capital is Paris."],
    "ground_truth": "Paris is the capital of France.",
    "model_used": "gemini-2.5-flash"
  }'

Deployment Guides

a. Render.com (Free Tier)

  1. Push your code to GitHub
  2. Go to render.com and create a new Web Service
  3. Connect your GitHub repo
  4. Set Build Command: pip install -r requirements.txt
  5. Set Start Command: uvicorn api.main:app --host 0.0.0.0 --port $PORT
  6. Add Environment Variables (Settings > Environment):
    • GOOGLE_API_KEY = your key
    • SECRET_KEY = random 32+ char string
    • JWT_SECRET_KEY = random 32+ char string
    • DATABASE_URL = (use Render PostgreSQL add-on URL)
    • REDIS_ENABLED = false (free tier has no Redis)
    • ENVIRONMENT = production
  7. Add a PostgreSQL database from Render Dashboard
  8. For frontend: create another Web Service with Start Command: streamlit run frontend/app.py --server.port $PORT --server.address 0.0.0.0

b. AWS EC2 (t3.medium)

# 1. Launch t3.medium Ubuntu 22.04 instance
# 2. SSH into instance
ssh -i your-key.pem ubuntu@your-ec2-ip

# 3. Install Docker
sudo apt-get update
sudo apt-get install -y docker.io docker-compose-v2
sudo usermod -aG docker ubuntu
newgrp docker

# 4. Clone your repo
git clone https://github.com/youruser/evalmind.git
cd evalmind/evalMind

# 5. Create production env file
cp .env.example .env.production
nano .env.production  # Set all required values

# 6. Start production stack
docker compose -f docker-compose.prod.yml up -d

# 7. Set up nginx (already included in prod compose)
# 8. Configure domain DNS to point to EC2 public IP
# 9. Set up SSL with Let's Encrypt
sudo apt-get install -y certbot
sudo certbot certonly --standalone -d yourdomain.com
# Update nginx.conf with SSL cert paths

# 10. Open security group ports: 80, 443

c. Azure VM (B2s)

# 1. Create B2s VM with Ubuntu 22.04 in Azure Portal
# 2. Open ports 80, 443, 22 in Network Security Group
# 3. SSH into VM
ssh azureuser@your-vm-ip

# 4. Install Docker
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker azureuser
newgrp docker

# 5. Clone repo and configure
git clone https://github.com/youruser/evalmind.git
cd evalmind/evalMind
cp .env.example .env.production
nano .env.production

# 6. Start production stack
docker compose -f docker-compose.prod.yml up -d

# 7. Configure Azure DNS and SSL as needed

d. Docker on VPS

# 1. SSH to your VPS
ssh root@your-vps-ip

# 2. Install Docker
curl -fsSL https://get.docker.com | sh

# 3. Set up the project
git clone https://github.com/youruser/evalmind.git
cd evalmind/evalMind
cp .env.example .env.production

# 4. Edit production environment
nano .env.production
# Required:
#   GOOGLE_API_KEY, SECRET_KEY, JWT_SECRET_KEY
#   DATABASE_URL (PostgreSQL in compose handles this)
#   ENVIRONMENT=production

# 5. Launch
docker compose -f docker-compose.prod.yml up -d

# 6. Check status
docker compose -f docker-compose.prod.yml ps
docker compose -f docker-compose.prod.yml logs backend

# 7. Run database migrations
docker compose -f docker-compose.prod.yml exec backend alembic upgrade head

Manual Configuration Checklist

Google API Key Setup

  • Visit https://aistudio.google.com/app/apikey
  • Create a new API key
  • Set GOOGLE_API_KEY=<key> in your .env file
  • Verify: curl "https://generativelanguage.googleapis.com/v1beta/models?key=YOUR_KEY"

Gmail App Password Setup

  • Go to your Google Account settings
  • Security > 2-Step Verification (must be enabled)
  • App passwords > Create for "Mail" and "Other device"
  • Set GMAIL_USER=your@gmail.com
  • Set GMAIL_APP_PASSWORD=<16-char-password> (no spaces)
  • Set ALERT_EMAIL_RECIPIENT=alerts@domain.com

n8n Webhook URL Configuration

  • Install n8n: docker run -d -p 5678:5678 n8nio/n8n
  • Access n8n at http://localhost:5678
  • Import workflows/n8n_workflow.json via Settings > Import Workflow
  • Activate the "Alert Webhook" workflow
  • Copy the webhook URL shown (e.g., http://localhost:5678/webhook/evalMind-alert)
  • Set environment variable: N8N_WEBHOOK_URL=<webhook-url>
  • Configure Gmail credentials in n8n (Credentials > Add > Gmail OAuth2)
  • Set EVALMIND_API_URL and EVALMIND_API_TOKEN in n8n environment variables
  • Test the webhook:
    curl -X POST <webhook-url> \
      -H "Content-Type: application/json" \
      -d '{"alert_type":"test","severity":"critical","message":"Test alert","evaluation_id":1,"confidence_score":25}'

Grafana Dashboard Import

  • Access Grafana at http://localhost:3000
  • Login: admin / evalMind2024 (change on first login)
  • Add Prometheus data source: Configuration > Data Sources > Add > Prometheus
    • URL: http://prometheus:9090 (if in Docker) or http://localhost:9090
    • Click Save & Test
  • Import dashboard: Dashboards > Import
    • Upload monitoring/grafana_dashboard.json
    • Select Prometheus data source
    • Click Import
  • The EvalMind Monitor dashboard will appear with all 10 panels

First Admin User Creation

Option A — via API (after server is running):

curl -X POST http://localhost:8000/api/v1/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email":"admin@example.com","username":"admin","password":"changeme123","role":"admin"}'

Option B — directly in database:

# For local Python
python -c "
import sys; sys.path.insert(0, '.')
from database.connection import get_session_local
from database.models import User
from passlib.context import CryptContext
pwd_ctx = CryptContext(schemes=['bcrypt'])
db = get_session_local()()
user = User(
    email='admin@example.com',
    username='admin',
    hashed_password=pwd_ctx.hash('changeme123'),
    role='admin',
    is_active=True
)
db.add(user); db.commit()
print('Admin created: admin@example.com / changeme123')
"

# For Docker
docker compose exec backend python -c "
import sys; sys.path.insert(0, '.')
from database.connection import get_session_local
from database.models import User
from passlib.context import CryptContext
pwd_ctx = CryptContext(schemes=['bcrypt'])
db = get_session_local()()
user = User(email='admin@example.com', username='admin', hashed_password=pwd_ctx.hash('changeme123'), role='admin')
db.add(user); db.commit()
print('Admin created')
"

Testing Instructions

# Activate virtual environment
source venv/bin/activate  # Mac/Linux
venv\Scripts\activate      # Windows

# Install test dependencies (included in requirements.txt)
pip install pytest pytest-asyncio httpx

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_evaluator.py -v
pytest tests/test_api.py -v
pytest tests/test_integration.py -v

# Run with coverage
pip install pytest-cov
pytest tests/ --cov=. --cov-report=html

# Run a specific test class
pytest tests/test_evaluator.py::TestConfidenceScorer -v

# Run a specific test
pytest tests/test_evaluator.py::TestConfidenceScorer::test_perfect_scores_give_high_confidence -v

# Run with detailed output on failure
pytest tests/ -v --tb=long

Tests use in-memory SQLite and mock all external services (Gemini API, Redis). No real API keys are needed for testing.


Troubleshooting

"GOOGLE_API_KEY not set" or quota errors

  • Verify the key is set: echo $GOOGLE_API_KEY
  • Check Gemini quotas at https://aistudio.google.com
  • The system automatically falls back through model chain on quota errors

Database connection errors (PostgreSQL)

# Check if Postgres is running
docker compose ps postgres
# Check logs
docker compose logs postgres
# Test connection manually
docker compose exec postgres psql -U evaluser -d evalMind -c "\dt"

Redis connection refused

  • Redis is optional. Set REDIS_ENABLED=false in .env to disable caching
  • Check if Redis is running: docker compose ps redis

Port already in use

# Mac/Linux:
lsof -ti:8000 | xargs kill -9
# Windows:
netstat -ano | findstr :8000
taskkill /PID <PID> /F

Alembic migration errors

# Check current revision
alembic current
# Show migration history
alembic history
# Downgrade to base
alembic downgrade base
# Re-run all migrations
alembic upgrade head

Docker "no space left" errors

docker system prune -af
docker volume prune -f

Streamlit cannot connect to backend

  • Ensure API_BASE_URL=http://localhost:8000/api/v1 is set in .env
  • When running in Docker, use API_BASE_URL=http://backend:8000/api/v1
  • Check backend health: curl http://localhost:8000/health

JWT token expired

  • Default access token expires in 30 minutes
  • Use the refresh token endpoint: POST /api/v1/auth/refresh
  • Increase JWT_ACCESS_TOKEN_EXPIRE_MINUTES in settings if needed

n8n webhook not receiving alerts

  • Confirm n8n workflow is Active (toggle in workflow editor)
  • Verify webhook URL matches N8N_WEBHOOK_URL env variable
  • Check n8n execution history for errors

Grafana panels show "No data"

  • Verify Prometheus data source is configured and shows "Data source is working"
  • Ensure backend has processed at least one evaluation
  • Check Prometheus targets at http://localhost:9090/targets — backend should show UP
  • Confirm PROMETHEUS_ENABLED=true in your env file

Project Structure

evalMind/
├── api/
│   ├── auth/           # JWT auth: register, login, refresh
│   ├── middleware/     # Logging, rate limiting
│   ├── routes/         # evaluate, metrics, alerts, history, admin
│   └── main.py         # FastAPI app factory
├── cache/
│   └── redis_cache.py  # Redis cache manager
├── config/
│   └── settings.py     # Pydantic settings (all env vars)
├── database/
│   ├── connection.py   # SQLAlchemy engine + session
│   ├── models.py       # ORM models
│   └── migrations/     # Alembic migration files
│       └── versions/
│           └── 001_initial.py
├── evaluator/
│   ├── confidence_scorer.py      # Weighted composite scoring
│   ├── core_evaluator.py         # RAGAS evaluation wrapper
│   ├── hallucination_detector.py # Claim verification
│   └── llm_judge.py              # LLM-as-Judge scoring
├── frontend/
│   └── app.py          # Streamlit UI
├── monitoring/
│   ├── prometheus.py           # Prometheus metric definitions
│   ├── prometheus.yml          # Prometheus scrape config
│   └── grafana_dashboard.json  # Pre-built Grafana dashboard
├── tests/
│   ├── conftest.py          # Pytest fixtures
│   ├── test_evaluator.py    # Unit tests for evaluator modules
│   ├── test_api.py          # API endpoint tests
│   ├── test_integration.py  # Integration tests
│   └── test_cases.json      # 20 real QA test cases
├── tracker/
│   └── cost_tracker.py      # Cost calculation and DB recording
├── workflows/
│   └── n8n_workflow.json    # n8n alert + daily report workflows
├── Dockerfile
├── docker-compose.yml       # Development stack
├── docker-compose.prod.yml  # Production stack
├── alembic.ini
└── requirements.txt

=======

EVALMIND

cf90cafaa941778d4996da00d650900c979d4ef5

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages