<<<<<<< HEAD
Monitor, evaluate, and alert on the quality of your RAG pipeline responses using RAGAS metrics, LLM-as-Judge scoring, and hallucination detection.
┌─────────────────────────────────────────────┐
│ EvalMind Stack │
└─────────────────────────────────────────────┘
User / Browser
│
▼
┌─────────────┐ ┌──────────────────────────────────────────────────┐
│ nginx │──────▶│ Streamlit Frontend :8501 │
│ :80 / 443 │ │ (frontend/app.py) │
│ │ └─────────────────┬────────────────────────────────┘
│ │ │ HTTP /api/v1/*
│ │ ┌─────────────────▼────────────────────────────────┐
│ │──────▶│ FastAPI Backend :8000 │
└─────────────┘ │ (api/main.py) │
│ ├── /auth (JWT) │
│ ├── /evaluate (RAGAS + Judge + Hallucination) │
│ ├── /metrics (summary, worst) │
│ ├── /alerts (CRUD) │
│ ├── /history (conversation) │
│ └── /admin (admin-only) │
└───┬──────┬──────┬───────────────────────────────┘
│ │ │
┌─────────────┘ │ └──────────────────┐
▼ ▼ ▼
┌────────────────┐ ┌────────────────┐ ┌────────────────┐
│ PostgreSQL │ │ Redis │ │ Google Gemini │
│ :5432 │ │ :6379 │ │ (API calls) │
│ (Primary DB) │ │ (Cache) │ └────────────────┘
└────────────────┘ └────────────────┘
│
▼
┌────────────────┐ ┌──────────────┐ ┌──────────────┐
│ Prometheus │◀────────│ /metrics │ │ Grafana │
│ :9090 │ │ endpoint │ │ :3000 │
└────────────────┘ └──────────────┘ └──────┬───────┘
│ dashboards
┌────────▼───────┐
│ n8n Workflows │
│ (Alerts/Email)│
└────────────────┘
- RAGAS Evaluation: Faithfulness, Answer Relevancy, Context Precision, Context Recall
- LLM-as-Judge: Accuracy, Completeness, Clarity, Citation Quality (1-10 scale)
- Hallucination Detection: Claim-level verification with SUPPORTED/UNSUPPORTED/UNCERTAIN verdicts
- Composite Confidence Score: Weighted combination (0-100) of all metrics
- Threshold Alerting: Auto-create alerts when scores fall below configurable thresholds
- Cost Tracking: Per-evaluation and daily USD cost summaries
- Redis Cache: Semantic caching to avoid redundant evaluations
- Prometheus + Grafana: Built-in metrics endpoint with a pre-built dashboard
- n8n Integration: Webhook-triggered email alerts and daily reports
- JWT Authentication: Role-based access (user/admin)
- Batch Evaluation: Evaluate up to 10 QA pairs in a single request
- Conversation History: Per-session conversation tracking
- Python 3.11+
- Docker & Docker Compose (for containerized setup)
- Google Cloud / AI Studio API key (Gemini)
- Gmail account with App Password (for email alerts)
- n8n instance (optional, for workflow automation)
REM Clone and enter the project
cd C:\path\to\EVALMIND\evalMind
REM Copy environment template
copy .env.example .env.development
REM Edit .env.development and set GOOGLE_API_KEY
notepad .env.development
REM Create virtual environment
python -m venv venv
venv\Scripts\activate
REM Install dependencies
pip install -r requirements.txt
REM Initialize database
python -c "from database.connection import init_db; init_db()"
REM Start backend
uvicorn api.main:app --reload --port 8000
REM In another terminal, start frontend
venv\Scripts\activate
streamlit run frontend/app.py --server.port 8501# Clone and enter the project
cd /path/to/EVALMIND/evalMind
# Copy environment template
cp .env.example .env.development
# Edit .env.development and set GOOGLE_API_KEY
nano .env.development
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Initialize database
python -c "from database.connection import init_db; init_db()"
# Start backend (terminal 1)
uvicorn api.main:app --reload --port 8000
# Start frontend (terminal 2)
source venv/bin/activate
streamlit run frontend/app.py --server.port 8501Copy .env.example to .env.development and populate:
# ── Required ──────────────────────────────────────────────────
GOOGLE_API_KEY=your-google-ai-studio-api-key
SECRET_KEY=your-random-secret-key-min-32-chars
JWT_SECRET_KEY=your-random-jwt-secret-min-32-chars
# ── Database ──────────────────────────────────────────────────
# Dev (SQLite - no setup needed):
DATABASE_URL=sqlite:///./evalMind_dev.db
# Production (PostgreSQL):
# DATABASE_URL=postgresql://evaluser:evalpass@localhost:5432/evalMind
# ── Redis (optional but recommended) ─────────────────────────
REDIS_URL=redis://localhost:6379/0
REDIS_ENABLED=true
# ── Gmail Alerts (optional) ───────────────────────────────────
GMAIL_USER=your.email@gmail.com
GMAIL_APP_PASSWORD=your-16-char-app-password
ALERT_EMAIL_RECIPIENT=alerts@yourdomain.com
# ── Quality Thresholds ────────────────────────────────────────
FAITHFULNESS_THRESHOLD=0.7
RELEVANCY_THRESHOLD=0.7
CONFIDENCE_THRESHOLD=60.0
# ── Environment ───────────────────────────────────────────────
ENVIRONMENT=development
DEBUG=false
PROMETHEUS_ENABLED=true- Go to Google AI Studio
- Click Create API Key
- Copy the key and set
GOOGLE_API_KEY=...in your.env.development
- Enable 2FA on your Google account
- Go to Google Account > Security > 2-Step Verification > App passwords
- Create an app password for "Mail"
- Use the 16-character password (no spaces) as
GMAIL_APP_PASSWORD
# 1. Set up Python environment
python3 -m venv venv && source venv/bin/activate
# 2. Install all dependencies
pip install -r requirements.txt
# 3. Set up environment
cp .env.example .env.development
# Edit .env.development - set GOOGLE_API_KEY at minimum
# 4. Initialize SQLite database (dev default)
python -c "from database.connection import init_db; init_db()"
# 5. (Optional) Run Alembic migrations instead
alembic upgrade head
# 6. Create first admin user
python -c "
from database.connection import get_session_local
from database.models import User
from passlib.context import CryptContext
pwd = CryptContext(schemes=['bcrypt'])
db = get_session_local()()
admin = User(email='admin@example.com', username='admin', hashed_password=pwd.hash('changeme'), role='admin')
db.add(admin); db.commit()
print('Admin user created: admin@example.com / changeme')
"
# 7. Start backend
uvicorn api.main:app --reload --host 0.0.0.0 --port 8000
# 8. Start frontend (new terminal)
streamlit run frontend/app.py --server.port 8501 --server.address 0.0.0.0Access:
- Frontend: http://localhost:8501
- API Docs: http://localhost:8000/docs
- Health: http://localhost:8000/health
# Build and start all services
docker compose up --build
# Run in background
docker compose up -d --build
# View logs
docker compose logs -f backend
docker compose logs -f frontend
# Stop all services
docker compose down
# Stop and remove volumes (reset database)
docker compose down -vServices after docker compose up:
| Service | URL |
|---|---|
| Frontend | http://localhost:8501 |
| Backend | http://localhost:8000 |
| API Docs | http://localhost:8000/docs |
| Prometheus | http://localhost:9090 |
| Grafana | http://localhost:3000 |
Grafana default login: admin / evalMind2024
# Run migration inside backend container
docker compose exec backend alembic upgrade head
# Or create tables directly
docker compose exec backend python -c "from database.connection import init_db; init_db()"All endpoints are at http://localhost:8000/api/v1/
| Method | Endpoint | Auth | Description |
|---|---|---|---|
| POST | /auth/register | None | Register new user |
| POST | /auth/login | None | Login, get JWT tokens |
| POST | /auth/refresh | None | Refresh access token |
| GET | /auth/me | User | Get current user info |
| POST | /evaluate/ | User | Single evaluation |
| POST | /evaluate/batch | User | Batch evaluation (max 10) |
| GET | /metrics/summary | User | Aggregated metrics summary |
| GET | /metrics/worst | User | Worst-performing evaluations |
| GET | /alerts | User | List alerts (paginated) |
| PATCH | /alerts/{id}/resolve | User | Resolve an alert |
| GET | /history | User | Conversation history |
| GET | /documents | User | List documents |
| GET | /admin/users | Admin | List all users |
| GET | /admin/stats | Admin | System-wide statistics |
| GET | /health | None | Health check |
| GET | /live | None | Liveness probe |
| GET | /ready | None | Readiness probe |
| GET | /metrics | None | Prometheus metrics |
Example: Single Evaluation
curl -X POST http://localhost:8000/api/v1/evaluate/ \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"question": "What is the capital of France?",
"answer": "The capital of France is Paris.",
"context_chunks": ["France is a country in Western Europe. Its capital is Paris."],
"ground_truth": "Paris is the capital of France.",
"model_used": "gemini-2.5-flash"
}'- Push your code to GitHub
- Go to render.com and create a new Web Service
- Connect your GitHub repo
- Set Build Command:
pip install -r requirements.txt - Set Start Command:
uvicorn api.main:app --host 0.0.0.0 --port $PORT - Add Environment Variables (Settings > Environment):
GOOGLE_API_KEY= your keySECRET_KEY= random 32+ char stringJWT_SECRET_KEY= random 32+ char stringDATABASE_URL= (use Render PostgreSQL add-on URL)REDIS_ENABLED=false(free tier has no Redis)ENVIRONMENT=production
- Add a PostgreSQL database from Render Dashboard
- For frontend: create another Web Service with Start Command:
streamlit run frontend/app.py --server.port $PORT --server.address 0.0.0.0
# 1. Launch t3.medium Ubuntu 22.04 instance
# 2. SSH into instance
ssh -i your-key.pem ubuntu@your-ec2-ip
# 3. Install Docker
sudo apt-get update
sudo apt-get install -y docker.io docker-compose-v2
sudo usermod -aG docker ubuntu
newgrp docker
# 4. Clone your repo
git clone https://github.com/youruser/evalmind.git
cd evalmind/evalMind
# 5. Create production env file
cp .env.example .env.production
nano .env.production # Set all required values
# 6. Start production stack
docker compose -f docker-compose.prod.yml up -d
# 7. Set up nginx (already included in prod compose)
# 8. Configure domain DNS to point to EC2 public IP
# 9. Set up SSL with Let's Encrypt
sudo apt-get install -y certbot
sudo certbot certonly --standalone -d yourdomain.com
# Update nginx.conf with SSL cert paths
# 10. Open security group ports: 80, 443# 1. Create B2s VM with Ubuntu 22.04 in Azure Portal
# 2. Open ports 80, 443, 22 in Network Security Group
# 3. SSH into VM
ssh azureuser@your-vm-ip
# 4. Install Docker
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker azureuser
newgrp docker
# 5. Clone repo and configure
git clone https://github.com/youruser/evalmind.git
cd evalmind/evalMind
cp .env.example .env.production
nano .env.production
# 6. Start production stack
docker compose -f docker-compose.prod.yml up -d
# 7. Configure Azure DNS and SSL as needed# 1. SSH to your VPS
ssh root@your-vps-ip
# 2. Install Docker
curl -fsSL https://get.docker.com | sh
# 3. Set up the project
git clone https://github.com/youruser/evalmind.git
cd evalmind/evalMind
cp .env.example .env.production
# 4. Edit production environment
nano .env.production
# Required:
# GOOGLE_API_KEY, SECRET_KEY, JWT_SECRET_KEY
# DATABASE_URL (PostgreSQL in compose handles this)
# ENVIRONMENT=production
# 5. Launch
docker compose -f docker-compose.prod.yml up -d
# 6. Check status
docker compose -f docker-compose.prod.yml ps
docker compose -f docker-compose.prod.yml logs backend
# 7. Run database migrations
docker compose -f docker-compose.prod.yml exec backend alembic upgrade head- Visit https://aistudio.google.com/app/apikey
- Create a new API key
- Set
GOOGLE_API_KEY=<key>in your.envfile - Verify:
curl "https://generativelanguage.googleapis.com/v1beta/models?key=YOUR_KEY"
- Go to your Google Account settings
- Security > 2-Step Verification (must be enabled)
- App passwords > Create for "Mail" and "Other device"
- Set
GMAIL_USER=your@gmail.com - Set
GMAIL_APP_PASSWORD=<16-char-password>(no spaces) - Set
ALERT_EMAIL_RECIPIENT=alerts@domain.com
- Install n8n:
docker run -d -p 5678:5678 n8nio/n8n - Access n8n at http://localhost:5678
- Import
workflows/n8n_workflow.jsonvia Settings > Import Workflow - Activate the "Alert Webhook" workflow
- Copy the webhook URL shown (e.g.,
http://localhost:5678/webhook/evalMind-alert) - Set environment variable:
N8N_WEBHOOK_URL=<webhook-url> - Configure Gmail credentials in n8n (Credentials > Add > Gmail OAuth2)
- Set
EVALMIND_API_URLandEVALMIND_API_TOKENin n8n environment variables - Test the webhook:
curl -X POST <webhook-url> \ -H "Content-Type: application/json" \ -d '{"alert_type":"test","severity":"critical","message":"Test alert","evaluation_id":1,"confidence_score":25}'
- Access Grafana at http://localhost:3000
- Login: admin / evalMind2024 (change on first login)
- Add Prometheus data source: Configuration > Data Sources > Add > Prometheus
- URL:
http://prometheus:9090(if in Docker) orhttp://localhost:9090 - Click Save & Test
- URL:
- Import dashboard: Dashboards > Import
- Upload
monitoring/grafana_dashboard.json - Select Prometheus data source
- Click Import
- Upload
- The EvalMind Monitor dashboard will appear with all 10 panels
Option A — via API (after server is running):
curl -X POST http://localhost:8000/api/v1/auth/register \
-H "Content-Type: application/json" \
-d '{"email":"admin@example.com","username":"admin","password":"changeme123","role":"admin"}'Option B — directly in database:
# For local Python
python -c "
import sys; sys.path.insert(0, '.')
from database.connection import get_session_local
from database.models import User
from passlib.context import CryptContext
pwd_ctx = CryptContext(schemes=['bcrypt'])
db = get_session_local()()
user = User(
email='admin@example.com',
username='admin',
hashed_password=pwd_ctx.hash('changeme123'),
role='admin',
is_active=True
)
db.add(user); db.commit()
print('Admin created: admin@example.com / changeme123')
"
# For Docker
docker compose exec backend python -c "
import sys; sys.path.insert(0, '.')
from database.connection import get_session_local
from database.models import User
from passlib.context import CryptContext
pwd_ctx = CryptContext(schemes=['bcrypt'])
db = get_session_local()()
user = User(email='admin@example.com', username='admin', hashed_password=pwd_ctx.hash('changeme123'), role='admin')
db.add(user); db.commit()
print('Admin created')
"# Activate virtual environment
source venv/bin/activate # Mac/Linux
venv\Scripts\activate # Windows
# Install test dependencies (included in requirements.txt)
pip install pytest pytest-asyncio httpx
# Run all tests
pytest tests/ -v
# Run specific test file
pytest tests/test_evaluator.py -v
pytest tests/test_api.py -v
pytest tests/test_integration.py -v
# Run with coverage
pip install pytest-cov
pytest tests/ --cov=. --cov-report=html
# Run a specific test class
pytest tests/test_evaluator.py::TestConfidenceScorer -v
# Run a specific test
pytest tests/test_evaluator.py::TestConfidenceScorer::test_perfect_scores_give_high_confidence -v
# Run with detailed output on failure
pytest tests/ -v --tb=longTests use in-memory SQLite and mock all external services (Gemini API, Redis). No real API keys are needed for testing.
- Verify the key is set:
echo $GOOGLE_API_KEY - Check Gemini quotas at https://aistudio.google.com
- The system automatically falls back through model chain on quota errors
# Check if Postgres is running
docker compose ps postgres
# Check logs
docker compose logs postgres
# Test connection manually
docker compose exec postgres psql -U evaluser -d evalMind -c "\dt"- Redis is optional. Set
REDIS_ENABLED=falsein.envto disable caching - Check if Redis is running:
docker compose ps redis
# Mac/Linux:
lsof -ti:8000 | xargs kill -9
# Windows:
netstat -ano | findstr :8000
taskkill /PID <PID> /F# Check current revision
alembic current
# Show migration history
alembic history
# Downgrade to base
alembic downgrade base
# Re-run all migrations
alembic upgrade headdocker system prune -af
docker volume prune -f- Ensure
API_BASE_URL=http://localhost:8000/api/v1is set in.env - When running in Docker, use
API_BASE_URL=http://backend:8000/api/v1 - Check backend health:
curl http://localhost:8000/health
- Default access token expires in 30 minutes
- Use the refresh token endpoint:
POST /api/v1/auth/refresh - Increase
JWT_ACCESS_TOKEN_EXPIRE_MINUTESin settings if needed
- Confirm n8n workflow is Active (toggle in workflow editor)
- Verify webhook URL matches
N8N_WEBHOOK_URLenv variable - Check n8n execution history for errors
- Verify Prometheus data source is configured and shows "Data source is working"
- Ensure backend has processed at least one evaluation
- Check Prometheus targets at http://localhost:9090/targets — backend should show UP
- Confirm
PROMETHEUS_ENABLED=truein your env file
evalMind/
├── api/
│ ├── auth/ # JWT auth: register, login, refresh
│ ├── middleware/ # Logging, rate limiting
│ ├── routes/ # evaluate, metrics, alerts, history, admin
│ └── main.py # FastAPI app factory
├── cache/
│ └── redis_cache.py # Redis cache manager
├── config/
│ └── settings.py # Pydantic settings (all env vars)
├── database/
│ ├── connection.py # SQLAlchemy engine + session
│ ├── models.py # ORM models
│ └── migrations/ # Alembic migration files
│ └── versions/
│ └── 001_initial.py
├── evaluator/
│ ├── confidence_scorer.py # Weighted composite scoring
│ ├── core_evaluator.py # RAGAS evaluation wrapper
│ ├── hallucination_detector.py # Claim verification
│ └── llm_judge.py # LLM-as-Judge scoring
├── frontend/
│ └── app.py # Streamlit UI
├── monitoring/
│ ├── prometheus.py # Prometheus metric definitions
│ ├── prometheus.yml # Prometheus scrape config
│ └── grafana_dashboard.json # Pre-built Grafana dashboard
├── tests/
│ ├── conftest.py # Pytest fixtures
│ ├── test_evaluator.py # Unit tests for evaluator modules
│ ├── test_api.py # API endpoint tests
│ ├── test_integration.py # Integration tests
│ └── test_cases.json # 20 real QA test cases
├── tracker/
│ └── cost_tracker.py # Cost calculation and DB recording
├── workflows/
│ └── n8n_workflow.json # n8n alert + daily report workflows
├── Dockerfile
├── docker-compose.yml # Development stack
├── docker-compose.prod.yml # Production stack
├── alembic.ini
└── requirements.txt
=======
cf90cafaa941778d4996da00d650900c979d4ef5