🧠 RAG Engine

Note: The README might not be up-to-date with the current code/folder structure since development focus is in implementing new features. I will try my best to update it as soon as possible😄.

🧠 RAG Engine

A modular Retrieval-Augmented Generation (RAG) system built for scalable, real-world AI applications.

🚀 Overview

This project implements the core pipeline of a RAG system:

🔍 Semantic retrieval using embeddings
⚡ Fast similarity search with FAISS
🧠 Hybrid ranking (semantic + keyword)
🧩 Modular design for easy extension (LLMs, APIs, datasets)

Designed to start simple and scale into a full AI system.

🧱 Architecture

User Query
    ↓
Embedding (query)
    ↓
Vector Search (FAISS)
    ↓
Top-K Retrieval
    ↓
Ranking (semantic + keyword)
    ↓
Context Output → (LLM ready)

✨ Features

Sentence-level chunking for precise retrieval
Semantic similarity search using embeddings
FAISS-based vector database
Hybrid ranking to improve relevance
Modular structure for scalability
Local-first setup (can be extended to APIs)

📁 Project Structure

rag-engine/
│
├── app/
│   │
│   ├── core/                     # 🧠 Core logic (no I/O)
│   │   ├── __init__.py
│   │   ├── chunking.py
│   │   ├── embeddings.py
│   │   ├── retriever.py
│   │
│   ├── storage/                  # 💾 Persistence layer
│   │   ├── faiss_store.py
│   │
│   ├── ingestion/                # 📂 Data loading
│   │   ├── loader.py             # txt/md (PDF later)
│   │
│   ├── services/                 # ⚙️ Orchestration
│       ├── __init__.py
│       ├── rag_pipeline.py
│
│
├── data/                         # ⚠️ Runtime-generated (gitignored)
│   ├── raw/                      # user files
│   │   ├── *.txt
│   │   ├── *.md
│   │
│   ├── faiss.index
│   ├── metadata.pkl
│
├── models/                       # 🤖 Embedding models (gitignored)
│
│
├── test.py                       # Scratch testing (gitignored)
├── main.py                       # Entry point
├── requirements.txt
├── README.md
├── Dare.md
├── .gitignore

IMPORTANT NOTES:

data\ folder is only added as a part of a feature release, and is left for immediate start-up without any prior training, future pushes will not be made for \data.
sample.txt is AI Generated and has be added for immediate exploration of the system; future pushes will not have any new changes.
everything in main.py is from test.py; it is hard-tested and hence main.py is free for any errors.

⚙️ Setup

1. Clone the repository

git clone https://github.com/your-username/rag-engine.git
cd rag-engine

2. Install dependencies

pip install -r requirements.txt

3. Download embedding model

Download from: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

Place inside:

model/all-MiniLM-L6-v2/

4. Run the system

python main.py

🧪 Example

sample.txt may either have or not have this!

Ask something: What is Python?

Top Matches:

- Python is a programming language used for AI and web development.

🧠 How it Works

Text is split into meaningful chunks
Each chunk is converted into embeddings
FAISS indexes embeddings for fast search
Query is embedded and compared
Relevant chunks are retrieved and ranked

🔌 Extensibility

This system is designed to be extended:

🔹 Plug in LLMs (OpenAI, local models, etc.)
🔹 Add PDF or document ingestion (done in feature release; document ingestion)
🔹 Replace embedding providers
🔹 Build APIs or UI layers

🚧 Future Improvements

Context-aware response generation
Multi-document ingestion (PDF, web, etc.)
Persistent vector storage
API backend (FastAPI)
UI interface

🧨 Why this Project?

Most RAG implementations focus only on LLMs.

This project focuses on the retrieval layer, which is the foundation of any reliable RAG system.

📌 Tech Stack

Python
SentenceTransformers
FAISS
NumPy

📄 License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Note: The README might not be up-to-date with the current code/folder structure since development focus is in implementing new features. I will try my best to update it as soon as possible😄.

🧠 RAG Engine

🚀 Overview

🧱 Architecture

✨ Features

📁 Project Structure

⚙️ Setup

1. Clone the repository

2. Install dependencies

3. Download embedding model

4. Run the system

🧪 Example

🧠 How it Works

🔌 Extensibility

🚧 Future Improvements

🧨 Why this Project?

📌 Tech Stack

📄 License

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
app		app
data		data
.gitignore		.gitignore
Dare.md		Dare.md
LICENSE		LICENSE
README.md		README.md
index.html		index.html
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Note: The README might not be up-to-date with the current code/folder structure since development focus is in implementing new features. I will try my best to update it as soon as possible😄.

🧠 RAG Engine

🚀 Overview

🧱 Architecture

✨ Features

📁 Project Structure

⚙️ Setup

1. Clone the repository

2. Install dependencies

3. Download embedding model

4. Run the system

🧪 Example

🧠 How it Works

🔌 Extensibility

🚧 Future Improvements

🧨 Why this Project?

📌 Tech Stack

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages