Voice‑Enabled Retrieval‑Augmented Generation (RAG) System

Overview

This repository presents an end-to-end, voice-enabled conversational system designed to answer user queries grounded in external knowledge sources. The system follows a modular, production-oriented design and covers the full pipeline from speech input to answer generation, with an optional user interface layer. The implementation is provided solely for technical evaluation purposes and adheres to the stated guidelines.

High‑Level Architecture

Audio Input (User speaks a question)
ASR Service (FastAPI) – Transcribes speech to text
Translation Module – Translates text to English via API call
Vector Database (FAISS) – Retrieves relevant knowledge chunks
RAG Pipeline – Combines query + retrieved context
Answer Output (Text response)
Bonus UI – Simple chat‑style interface for interaction

Each component is independently modular, testable, and replaceable.

Project Structure

rag-voice-bot/
│
├── asr_service/               # Task 3: ASR deployment
│   ├── __init__.py
│   ├── app.py                 # FastAPI ASR service with docs & validation
│   └── asr_model.py           # ASR model wrapper
│
├── data_collection/           # Task 1: Wikipedia data collection
│   ├── wiki_scraper.py        # CLI script to fetch & store article text
│   └── wiki_machine_learning.txt
│
├── vector_db/                 # Task 2: Vector database creation
│   ├── build_vector_db.py     # Chunking + embeddings + FAISS index
│   ├── faiss.index
│   └── chunks.pkl
│
├── translation/               # Task 4: Translation
│   ├── __init__.py
│   └── sarvam_translate.py    # API‑based translation to English
│
├── rag/                       # Task 5: RAG pipeline
│   └── rag_pipeline.py
│
├── ui/                        # Bonus task: Simple UI
│   └── app.py                 # Gradio chat interface
│
├── requirements.txt
├── .env.example
├── README.md

Setup Instructions

1. Clone the Repository

git clone <repository_url>
cd rag-voice-bot

2. Create Virtual Environment

python -m venv venv
source venv/bin/activate      # Linux / macOS
venv\Scripts\activate         # Windows

3. Install Dependencies

pip install -r requirements.txt

4. Environment Variables

Create a .env file using the template below:

SARVAM_API_KEY=your_api_key_here

How to Run Each Task

Task 1 – Wikipedia Data Collection

Fetch and store the closest Wikipedia article for a given topic.

python data_collection/wiki_scraper.py --query "Machine Learning"

Output: A cleaned .txt file containing article content.

Task 2 – Build Vector Database

Chunks the article text, generates embeddings, and stores them in FAISS.

python vector_db/build_vector_db.py

Chunking Strategy Justification:

Chunk size: 500 characters
Overlap: 50 characters

This balances semantic completeness with retrieval precision and avoids context loss at chunk boundaries.

Task 3 – Deploy ASR Service

Start the FastAPI ASR server.

uvicorn asr_service.app:app --reload

Available endpoints:

GET /health – Service health check
POST /transcribe – Upload WAV/FLAC audio and receive transcription

Interactive API docs:

http://127.0.0.1:8000/docs

Task 4 – Translation

Translation is handled internally via API calls. No separate deployment is required.

Task 5 – End‑to‑End RAG Pipeline

Ask a question using an audio file and receive a grounded answer.

python rag/rag_pipeline.py

Pipeline steps executed automatically:

Audio → ASR endpoint
Text → English translation
Query embedding & vector search
Context‑aware answer generation

Bonus Task – UI

Launch the interactive voice‑enabled chatbot UI.

python ui/app.py

Features:

Audio upload
Chat‑style response display
End‑to‑end pipeline integration

Error Handling & Edge Cases

Unsupported audio formats rejected at API level
Temporary file cleanup ensured
API failures handled with meaningful messages
Environment variables validated before runtime

Observations & Challenges Faced

Large ASR models on Windows may raise filesystem permission errors due to symbolic link restrictions in the local cache.
This is a known OS‑level constraint and does not affect correctness of the code or its execution on Linux‑based systems.
The codebase is fully compatible with GPU‑enabled Linux servers, which are typically used in production and evaluation environments.

Design & Best Practices Followed

Clear separation of concerns across modules
Extensive inline comments and docstrings
Single‑responsibility functions
Configurable and replaceable components
Clean FastAPI documentation

License & Usage

This repository is open-sourced under the MIT License.

The code has been developed as part of a technical evaluation and is intended for learning, experimentation, and research demonstration purposes.

Thank you for reviewing this submission.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice‑Enabled Retrieval‑Augmented Generation (RAG) System

Overview

High‑Level Architecture

Project Structure

Setup Instructions

1. Clone the Repository

2. Create Virtual Environment

3. Install Dependencies

4. Environment Variables

How to Run Each Task

Task 1 – Wikipedia Data Collection

Task 2 – Build Vector Database

Task 3 – Deploy ASR Service

Task 4 – Translation

Task 5 – End‑to‑End RAG Pipeline

Bonus Task – UI

Error Handling & Edge Cases

Observations & Challenges Faced

Design & Best Practices Followed

License & Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
asr_service		asr_service
data_collection		data_collection
translation		translation
ui		ui
vector_db		vector_db
.env.example		.env.example
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Voice‑Enabled Retrieval‑Augmented Generation (RAG) System

Overview

High‑Level Architecture

Project Structure

Setup Instructions

1. Clone the Repository

2. Create Virtual Environment

3. Install Dependencies

4. Environment Variables

How to Run Each Task

Task 1 – Wikipedia Data Collection

Task 2 – Build Vector Database

Task 3 – Deploy ASR Service

Task 4 – Translation

Task 5 – End‑to‑End RAG Pipeline

Bonus Task – UI

Error Handling & Edge Cases

Observations & Challenges Faced

Design & Best Practices Followed

License & Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages