GitHub - Suhaskumard/RAG-Insight-Evaluator: Evaluation and optimization of Retrieval-Augmented Generation systems

RAG Evaluation and Improvement System

📌 Overview This project implements a Retrieval-Augmented Generation (RAG) based question answering system and evaluates its performance using quantitative metrics. The system focuses on identifying weaknesses in RAG pipelines such as poor retrieval, hallucinations, and inconsistent answers.

The project is designed as an evaluation layer on top of RAG, rather than just another chatbot.

🎯 Problem Statement

Most RAG systems are deployed without proper evaluation. This leads to:

Irrelevant document retrieval
Hallucinated answers
Unstable responses for the same query

This project addresses the problem by measuring and visualizing RAG quality.

🧠 System Architecture

Document ingestion
Text chunking
Embedding generation
Vector search using FAISS
Reranking retrieved documents
Answer generation
Evaluation using faithfulness and stability metrics
Visualization using Streamlit dashboard

File Structure RAG-Evaluation-System/

data
- docs.txt
app.py
main.py
requirement.txt
README.md

📊 Evaluation Metrics

Faithfulness Measures how well the generated answer is supported by retrieved documents. Low scores may indicate hallucinations.

Stability Measures consistency of answers when the same query is asked multiple times. Low stability indicates unreliable generation.

🛠️ Tech Stack

Python
Sentence Transformers
FAISS
HuggingFace Transformers
Streamlit
ngrok (for dashboard exposure)

▶️ How to Run 1️⃣ Install Dependencies pip install -r requirements.txt

2️⃣ Run Evaluation Pipeline python main.py

3️⃣ Launch Dashboard streamlit run app.py

(Optional) Use ngrok to expose the dashboard publicly.

📈 Output

Evaluated answers with faithfulness and stability scores
Logged failure cases
Interactive dashboard showing evaluation results

🎓 Academic Relevance This project demonstrates:

Applied Natural Language Processing
Vector similarity search
Model evaluation techniques
System-level design thinking

🔮 Future Work

Add Recall@K metric for retrieval evaluation
Compare multiple embedding models
Automate chunk size optimization
Deploy dashboard using Streamlit Cloud

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎯 Problem Statement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
README.md		README.md
app .py		app .py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🎯 Problem Statement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages