EvalMind

<<<<<<< HEAD

EvalMind — LLM Output Quality Monitor

Monitor, evaluate, and alert on the quality of your RAG pipeline responses using RAGAS metrics, LLM-as-Judge scoring, and hallucination detection.

Architecture

                          ┌─────────────────────────────────────────────┐
                          │                EvalMind Stack                │
                          └─────────────────────────────────────────────┘

 User / Browser
      │
      ▼
┌─────────────┐       ┌──────────────────────────────────────────────────┐
│   nginx     │──────▶│  Streamlit Frontend  :8501                       │
│  :80 / 443  │       │  (frontend/app.py)                               │
│             │       └─────────────────┬────────────────────────────────┘
│             │                         │ HTTP /api/v1/*
│             │       ┌─────────────────▼────────────────────────────────┐
│             │──────▶│  FastAPI Backend  :8000                          │
└─────────────┘       │  (api/main.py)                                   │
                       │  ├── /auth (JWT)                                │
                       │  ├── /evaluate (RAGAS + Judge + Hallucination)  │
                       │  ├── /metrics  (summary, worst)                 │
                       │  ├── /alerts   (CRUD)                           │
                       │  ├── /history  (conversation)                   │
                       │  └── /admin    (admin-only)                     │
                       └───┬──────┬──────┬───────────────────────────────┘
                           │      │      │
             ┌─────────────┘      │      └──────────────────┐
             ▼                    ▼                          ▼
    ┌────────────────┐   ┌────────────────┐        ┌────────────────┐
    │  PostgreSQL    │   │     Redis      │        │  Google Gemini │
    │    :5432       │   │    :6379       │        │  (API calls)   │
    │  (Primary DB)  │   │  (Cache)       │        └────────────────┘
    └────────────────┘   └────────────────┘
             │
             ▼
    ┌────────────────┐         ┌──────────────┐      ┌──────────────┐
    │   Prometheus   │◀────────│   /metrics   │      │   Grafana    │
    │    :9090       │         │  endpoint    │      │    :3000     │
    └────────────────┘         └──────────────┘      └──────┬───────┘
                                                             │ dashboards
                                                    ┌────────▼───────┐
                                                    │  n8n Workflows │
                                                    │  (Alerts/Email)│
                                                    └────────────────┘

Features

RAGAS Evaluation: Faithfulness, Answer Relevancy, Context Precision, Context Recall
LLM-as-Judge: Accuracy, Completeness, Clarity, Citation Quality (1-10 scale)
Hallucination Detection: Claim-level verification with SUPPORTED/UNSUPPORTED/UNCERTAIN verdicts
Composite Confidence Score: Weighted combination (0-100) of all metrics
Threshold Alerting: Auto-create alerts when scores fall below configurable thresholds
Cost Tracking: Per-evaluation and daily USD cost summaries
Redis Cache: Semantic caching to avoid redundant evaluations
Prometheus + Grafana: Built-in metrics endpoint with a pre-built dashboard
n8n Integration: Webhook-triggered email alerts and daily reports
JWT Authentication: Role-based access (user/admin)
Batch Evaluation: Evaluate up to 10 QA pairs in a single request
Conversation History: Per-session conversation tracking

Prerequisites

Python 3.11+
Docker & Docker Compose (for containerized setup)
Google Cloud / AI Studio API key (Gemini)
Gmail account with App Password (for email alerts)
n8n instance (optional, for workflow automation)

Quick Start

Windows

REM Clone and enter the project
cd C:\path\to\EVALMIND\evalMind

REM Copy environment template
copy .env.example .env.development

REM Edit .env.development and set GOOGLE_API_KEY
notepad .env.development

REM Create virtual environment
python -m venv venv
venv\Scripts\activate

REM Install dependencies
pip install -r requirements.txt

REM Initialize database
python -c "from database.connection import init_db; init_db()"

REM Start backend
uvicorn api.main:app --reload --port 8000

REM In another terminal, start frontend
venv\Scripts\activate
streamlit run frontend/app.py --server.port 8501

Mac / Linux

# Clone and enter the project
cd /path/to/EVALMIND/evalMind

# Copy environment template
cp .env.example .env.development

# Edit .env.development and set GOOGLE_API_KEY
nano .env.development

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Initialize database
python -c "from database.connection import init_db; init_db()"

# Start backend (terminal 1)
uvicorn api.main:app --reload --port 8000

# Start frontend (terminal 2)
source venv/bin/activate
streamlit run frontend/app.py --server.port 8501

Environment Setup

Copy .env.example to .env.development and populate:

# ── Required ──────────────────────────────────────────────────
GOOGLE_API_KEY=your-google-ai-studio-api-key
SECRET_KEY=your-random-secret-key-min-32-chars
JWT_SECRET_KEY=your-random-jwt-secret-min-32-chars

# ── Database ──────────────────────────────────────────────────
# Dev (SQLite - no setup needed):
DATABASE_URL=sqlite:///./evalMind_dev.db

# Production (PostgreSQL):
# DATABASE_URL=postgresql://evaluser:evalpass@localhost:5432/evalMind

# ── Redis (optional but recommended) ─────────────────────────
REDIS_URL=redis://localhost:6379/0
REDIS_ENABLED=true

# ── Gmail Alerts (optional) ───────────────────────────────────
GMAIL_USER=your.email@gmail.com
GMAIL_APP_PASSWORD=your-16-char-app-password
ALERT_EMAIL_RECIPIENT=alerts@yourdomain.com

# ── Quality Thresholds ────────────────────────────────────────
FAITHFULNESS_THRESHOLD=0.7
RELEVANCY_THRESHOLD=0.7
CONFIDENCE_THRESHOLD=60.0

# ── Environment ───────────────────────────────────────────────
ENVIRONMENT=development
DEBUG=false
PROMETHEUS_ENABLED=true

Getting a Google API Key

Go to Google AI Studio
Click Create API Key
Copy the key and set GOOGLE_API_KEY=... in your .env.development

Getting a Gmail App Password

Enable 2FA on your Google account
Go to Google Account > Security > 2-Step Verification > App passwords
Create an app password for "Mail"
Use the 16-character password (no spaces) as GMAIL_APP_PASSWORD

Local Dev Setup (without Docker)

# 1. Set up Python environment
python3 -m venv venv && source venv/bin/activate

# 2. Install all dependencies
pip install -r requirements.txt

# 3. Set up environment
cp .env.example .env.development
# Edit .env.development - set GOOGLE_API_KEY at minimum

# 4. Initialize SQLite database (dev default)
python -c "from database.connection import init_db; init_db()"

# 5. (Optional) Run Alembic migrations instead
alembic upgrade head

# 6. Create first admin user
python -c "
from database.connection import get_session_local
from database.models import User
from passlib.context import CryptContext
pwd = CryptContext(schemes=['bcrypt'])
db = get_session_local()()
admin = User(email='admin@example.com', username='admin', hashed_password=pwd.hash('changeme'), role='admin')
db.add(admin); db.commit()
print('Admin user created: admin@example.com / changeme')
"

# 7. Start backend
uvicorn api.main:app --reload --host 0.0.0.0 --port 8000

# 8. Start frontend (new terminal)
streamlit run frontend/app.py --server.port 8501 --server.address 0.0.0.0

Access:

Docker Setup

Development

# Build and start all services
docker compose up --build

# Run in background
docker compose up -d --build

# View logs
docker compose logs -f backend
docker compose logs -f frontend

# Stop all services
docker compose down

# Stop and remove volumes (reset database)
docker compose down -v

Services after docker compose up:

Service	URL
Frontend	http://localhost:8501
Backend	http://localhost:8000
API Docs	http://localhost:8000/docs
Prometheus	http://localhost:9090
Grafana	http://localhost:3000

Grafana default login: admin / evalMind2024

Initialize DB in Docker

# Run migration inside backend container
docker compose exec backend alembic upgrade head

# Or create tables directly
docker compose exec backend python -c "from database.connection import init_db; init_db()"

API Documentation Reference

All endpoints are at http://localhost:8000/api/v1/

Method	Endpoint	Auth	Description
POST	/auth/register	None	Register new user
POST	/auth/login	None	Login, get JWT tokens
POST	/auth/refresh	None	Refresh access token
GET	/auth/me	User	Get current user info
POST	/evaluate/	User	Single evaluation
POST	/evaluate/batch	User	Batch evaluation (max 10)
GET	/metrics/summary	User	Aggregated metrics summary
GET	/metrics/worst	User	Worst-performing evaluations
GET	/alerts	User	List alerts (paginated)
PATCH	/alerts/{id}/resolve	User	Resolve an alert
GET	/history	User	Conversation history
GET	/documents	User	List documents
GET	/admin/users	Admin	List all users
GET	/admin/stats	Admin	System-wide statistics
GET	/health	None	Health check
GET	/live	None	Liveness probe
GET	/ready	None	Readiness probe
GET	/metrics	None	Prometheus metrics

Example: Single Evaluation

curl -X POST http://localhost:8000/api/v1/evaluate/ \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is the capital of France?",
    "answer": "The capital of France is Paris.",
    "context_chunks": ["France is a country in Western Europe. Its capital is Paris."],
    "ground_truth": "Paris is the capital of France.",
    "model_used": "gemini-2.5-flash"
  }'

Deployment Guides

a. Render.com (Free Tier)

Push your code to GitHub
Go to render.com and create a new Web Service
Connect your GitHub repo
Set Build Command: pip install -r requirements.txt
Set Start Command: uvicorn api.main:app --host 0.0.0.0 --port $PORT
Add Environment Variables (Settings > Environment):
- GOOGLE_API_KEY = your key
- SECRET_KEY = random 32+ char string
- JWT_SECRET_KEY = random 32+ char string
- DATABASE_URL = (use Render PostgreSQL add-on URL)
- REDIS_ENABLED = false (free tier has no Redis)
- ENVIRONMENT = production
Add a PostgreSQL database from Render Dashboard
For frontend: create another Web Service with Start Command: streamlit run frontend/app.py --server.port $PORT --server.address 0.0.0.0

b. AWS EC2 (t3.medium)

# 1. Launch t3.medium Ubuntu 22.04 instance
# 2. SSH into instance
ssh -i your-key.pem ubuntu@your-ec2-ip

# 3. Install Docker
sudo apt-get update
sudo apt-get install -y docker.io docker-compose-v2
sudo usermod -aG docker ubuntu
newgrp docker

# 4. Clone your repo
git clone https://github.com/youruser/evalmind.git
cd evalmind/evalMind

# 5. Create production env file
cp .env.example .env.production
nano .env.production  # Set all required values

# 6. Start production stack
docker compose -f docker-compose.prod.yml up -d

# 7. Set up nginx (already included in prod compose)
# 8. Configure domain DNS to point to EC2 public IP
# 9. Set up SSL with Let's Encrypt
sudo apt-get install -y certbot
sudo certbot certonly --standalone -d yourdomain.com
# Update nginx.conf with SSL cert paths

# 10. Open security group ports: 80, 443

c. Azure VM (B2s)

# 1. Create B2s VM with Ubuntu 22.04 in Azure Portal
# 2. Open ports 80, 443, 22 in Network Security Group
# 3. SSH into VM
ssh azureuser@your-vm-ip

# 4. Install Docker
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker azureuser
newgrp docker

# 5. Clone repo and configure
git clone https://github.com/youruser/evalmind.git
cd evalmind/evalMind
cp .env.example .env.production
nano .env.production

# 6. Start production stack
docker compose -f docker-compose.prod.yml up -d

# 7. Configure Azure DNS and SSL as needed

d. Docker on VPS

# 1. SSH to your VPS
ssh root@your-vps-ip

# 2. Install Docker
curl -fsSL https://get.docker.com | sh

# 3. Set up the project
git clone https://github.com/youruser/evalmind.git
cd evalmind/evalMind
cp .env.example .env.production

# 4. Edit production environment
nano .env.production
# Required:
#   GOOGLE_API_KEY, SECRET_KEY, JWT_SECRET_KEY
#   DATABASE_URL (PostgreSQL in compose handles this)
#   ENVIRONMENT=production

# 5. Launch
docker compose -f docker-compose.prod.yml up -d

# 6. Check status
docker compose -f docker-compose.prod.yml ps
docker compose -f docker-compose.prod.yml logs backend

# 7. Run database migrations
docker compose -f docker-compose.prod.yml exec backend alembic upgrade head

Manual Configuration Checklist

Google API Key Setup

Visit https://aistudio.google.com/app/apikey
Create a new API key
Set GOOGLE_API_KEY=<key> in your .env file
Verify: curl "https://generativelanguage.googleapis.com/v1beta/models?key=YOUR_KEY"

Gmail App Password Setup

Go to your Google Account settings
Security > 2-Step Verification (must be enabled)
App passwords > Create for "Mail" and "Other device"
Set GMAIL_USER=your@gmail.com
Set GMAIL_APP_PASSWORD=<16-char-password> (no spaces)
Set ALERT_EMAIL_RECIPIENT=alerts@domain.com

n8n Webhook URL Configuration

Install n8n: docker run -d -p 5678:5678 n8nio/n8n
Access n8n at http://localhost:5678
Import workflows/n8n_workflow.json via Settings > Import Workflow
Activate the "Alert Webhook" workflow
Copy the webhook URL shown (e.g., http://localhost:5678/webhook/evalMind-alert)
Set environment variable: N8N_WEBHOOK_URL=<webhook-url>
Configure Gmail credentials in n8n (Credentials > Add > Gmail OAuth2)
Set EVALMIND_API_URL and EVALMIND_API_TOKEN in n8n environment variables

Test the webhook:

curl -X POST <webhook-url> \
  -H "Content-Type: application/json" \
  -d '{"alert_type":"test","severity":"critical","message":"Test alert","evaluation_id":1,"confidence_score":25}'

Grafana Dashboard Import

Access Grafana at http://localhost:3000
Login: admin / evalMind2024 (change on first login)
Add Prometheus data source: Configuration > Data Sources > Add > Prometheus
- URL: http://prometheus:9090 (if in Docker) or http://localhost:9090
- Click Save & Test
Import dashboard: Dashboards > Import
- Upload monitoring/grafana_dashboard.json
- Select Prometheus data source
- Click Import
The EvalMind Monitor dashboard will appear with all 10 panels

First Admin User Creation

Option A — via API (after server is running):

curl -X POST http://localhost:8000/api/v1/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email":"admin@example.com","username":"admin","password":"changeme123","role":"admin"}'

Option B — directly in database:

# For local Python
python -c "
import sys; sys.path.insert(0, '.')
from database.connection import get_session_local
from database.models import User
from passlib.context import CryptContext
pwd_ctx = CryptContext(schemes=['bcrypt'])
db = get_session_local()()
user = User(
    email='admin@example.com',
    username='admin',
    hashed_password=pwd_ctx.hash('changeme123'),
    role='admin',
    is_active=True
)
db.add(user); db.commit()
print('Admin created: admin@example.com / changeme123')
"

# For Docker
docker compose exec backend python -c "
import sys; sys.path.insert(0, '.')
from database.connection import get_session_local
from database.models import User
from passlib.context import CryptContext
pwd_ctx = CryptContext(schemes=['bcrypt'])
db = get_session_local()()
user = User(email='admin@example.com', username='admin', hashed_password=pwd_ctx.hash('changeme123'), role='admin')
db.add(user); db.commit()
print('Admin created')
"

Testing Instructions

# Activate virtual environment
source venv/bin/activate  # Mac/Linux
venv\Scripts\activate      # Windows

# Install test dependencies (included in requirements.txt)
pip install pytest pytest-asyncio httpx

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_evaluator.py -v
pytest tests/test_api.py -v
pytest tests/test_integration.py -v

# Run with coverage
pip install pytest-cov
pytest tests/ --cov=. --cov-report=html

# Run a specific test class
pytest tests/test_evaluator.py::TestConfidenceScorer -v

# Run a specific test
pytest tests/test_evaluator.py::TestConfidenceScorer::test_perfect_scores_give_high_confidence -v

# Run with detailed output on failure
pytest tests/ -v --tb=long

Tests use in-memory SQLite and mock all external services (Gemini API, Redis). No real API keys are needed for testing.

Troubleshooting

"GOOGLE_API_KEY not set" or quota errors

Verify the key is set: echo $GOOGLE_API_KEY
Check Gemini quotas at https://aistudio.google.com
The system automatically falls back through model chain on quota errors

Database connection errors (PostgreSQL)

# Check if Postgres is running
docker compose ps postgres
# Check logs
docker compose logs postgres
# Test connection manually
docker compose exec postgres psql -U evaluser -d evalMind -c "\dt"

Redis connection refused

Redis is optional. Set REDIS_ENABLED=false in .env to disable caching
Check if Redis is running: docker compose ps redis

Port already in use

# Mac/Linux:
lsof -ti:8000 | xargs kill -9
# Windows:
netstat -ano | findstr :8000
taskkill /PID <PID> /F

Alembic migration errors

# Check current revision
alembic current
# Show migration history
alembic history
# Downgrade to base
alembic downgrade base
# Re-run all migrations
alembic upgrade head

Docker "no space left" errors

docker system prune -af
docker volume prune -f

Streamlit cannot connect to backend

Ensure API_BASE_URL=http://localhost:8000/api/v1 is set in .env
When running in Docker, use API_BASE_URL=http://backend:8000/api/v1
Check backend health: curl http://localhost:8000/health

JWT token expired

Default access token expires in 30 minutes
Use the refresh token endpoint: POST /api/v1/auth/refresh
Increase JWT_ACCESS_TOKEN_EXPIRE_MINUTES in settings if needed

n8n webhook not receiving alerts

Confirm n8n workflow is Active (toggle in workflow editor)
Verify webhook URL matches N8N_WEBHOOK_URL env variable
Check n8n execution history for errors

Grafana panels show "No data"

Verify Prometheus data source is configured and shows "Data source is working"
Ensure backend has processed at least one evaluation
Check Prometheus targets at http://localhost:9090/targets — backend should show UP
Confirm PROMETHEUS_ENABLED=true in your env file

Project Structure

evalMind/
├── api/
│   ├── auth/           # JWT auth: register, login, refresh
│   ├── middleware/     # Logging, rate limiting
│   ├── routes/         # evaluate, metrics, alerts, history, admin
│   └── main.py         # FastAPI app factory
├── cache/
│   └── redis_cache.py  # Redis cache manager
├── config/
│   └── settings.py     # Pydantic settings (all env vars)
├── database/
│   ├── connection.py   # SQLAlchemy engine + session
│   ├── models.py       # ORM models
│   └── migrations/     # Alembic migration files
│       └── versions/
│           └── 001_initial.py
├── evaluator/
│   ├── confidence_scorer.py      # Weighted composite scoring
│   ├── core_evaluator.py         # RAGAS evaluation wrapper
│   ├── hallucination_detector.py # Claim verification
│   └── llm_judge.py              # LLM-as-Judge scoring
├── frontend/
│   └── app.py          # Streamlit UI
├── monitoring/
│   ├── prometheus.py           # Prometheus metric definitions
│   ├── prometheus.yml          # Prometheus scrape config
│   └── grafana_dashboard.json  # Pre-built Grafana dashboard
├── tests/
│   ├── conftest.py          # Pytest fixtures
│   ├── test_evaluator.py    # Unit tests for evaluator modules
│   ├── test_api.py          # API endpoint tests
│   ├── test_integration.py  # Integration tests
│   └── test_cases.json      # 20 real QA test cases
├── tracker/
│   └── cost_tracker.py      # Cost calculation and DB recording
├── workflows/
│   └── n8n_workflow.json    # n8n alert + daily report workflows
├── Dockerfile
├── docker-compose.yml       # Development stack
├── docker-compose.prod.yml  # Production stack
├── alembic.ini
└── requirements.txt

=======

cf90cafaa941778d4996da00d650900c979d4ef5

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
api		api
cache		cache
config		config
data/documents		data/documents
database		database
evaluator		evaluator
frontend		frontend
monitoring		monitoring
tests		tests
tracker		tracker
workflows		workflows
.env.development		.env.development
.env.example		.env.example
.env.production		.env.production
Dockerfile		Dockerfile
README.md		README.md
alembic.ini		alembic.ini
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

EvalMind — LLM Output Quality Monitor

Architecture

Features

Prerequisites

Quick Start

Windows

Mac / Linux

Environment Setup

Getting a Google API Key

Getting a Gmail App Password

Local Dev Setup (without Docker)

Docker Setup

Development

Initialize DB in Docker

API Documentation Reference

Deployment Guides

a. Render.com (Free Tier)

b. AWS EC2 (t3.medium)

c. Azure VM (B2s)

d. Docker on VPS

Manual Configuration Checklist

Google API Key Setup

Gmail App Password Setup

n8n Webhook URL Configuration

Grafana Dashboard Import

First Admin User Creation

Testing Instructions

Troubleshooting

"GOOGLE_API_KEY not set" or quota errors

Database connection errors (PostgreSQL)

Redis connection refused

Port already in use

Alembic migration errors

Docker "no space left" errors

Streamlit cannot connect to backend

JWT token expired

n8n webhook not receiving alerts

Grafana panels show "No data"

Project Structure

EVALMIND

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages