rag-eval-observatory

RAG evaluation harness with API endpoints, scenario datasets, and failure taxonomy for regression tracking.

Problem

Most RAG projects demo answers but do not provide repeatable evaluation evidence for retrieval and grounding quality.

Architecture

API: src/rag_eval_observatory/api/main.py
Datasets: legal + support (src/rag_eval_observatory/datasets.py)
Evaluation engine: src/rag_eval_observatory/evaluate.py
Error taxonomy: src/rag_eval_observatory/taxonomy.py
Persistence: SQLite run store (src/rag_eval_observatory/db.py)

See docs/ARCHITECTURE.md.

Local Run

python -m venv .venv
source .venv/bin/activate
pip install -e .[dev]
python scripts/init_db.py
uvicorn rag_eval_observatory.api.main:app --reload --port 8800

Optional DB path override:

export RAG_EVAL_DB_PATH=data/runs.db

API Spec

GET /health
GET /version
POST /v1/eval/run
GET /v1/eval/{run_id}
GET /v1/eval/summary

Response envelope:

{
  "status": "ok",
  "data": {},
  "meta": {"model_version": "0.1.0", "latency_ms": 0},
  "error": null
}

Evaluation

pytest

Benchmark artifacts:

reports/benchmark.md
reports/metrics.json

Results

Provides retrieval metrics (precision@k, recall@k, mrr) and answer relevance with explicit failure buckets.

Limitations

Heuristic answer relevance scoring
Limited built-in scenarios (legal/support)

Roadmap

Add persistence backend for run history
Add LLM-judge optional evaluation mode
Add CI regression thresholds against baseline metrics

Docs

docs/ARCHITECTURE.md
docs/CASE_STUDY.md
docs/DEMO_SCRIPT_90S.md
SECURITY.md

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
docs		docs
reports		reports
scripts		scripts
src/rag_eval_observatory		src/rag_eval_observatory
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rag-eval-observatory

Problem

Architecture

Local Run

API Spec

Evaluation

Results

Limitations

Roadmap

Docs

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

rag-eval-observatory

Problem

Architecture

Local Run

API Spec

Evaluation

Results

Limitations

Roadmap

Docs

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages