Skip to content

Vikhram-S/Rag-Voice-Bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voice‑Enabled Retrieval‑Augmented Generation (RAG) System

Overview

This repository presents an end-to-end, voice-enabled conversational system designed to answer user queries grounded in external knowledge sources. The system follows a modular, production-oriented design and covers the full pipeline from speech input to answer generation, with an optional user interface layer. The implementation is provided solely for technical evaluation purposes and adheres to the stated guidelines.


High‑Level Architecture

  1. Audio Input (User speaks a question)
  2. ASR Service (FastAPI) – Transcribes speech to text
  3. Translation Module – Translates text to English via API call
  4. Vector Database (FAISS) – Retrieves relevant knowledge chunks
  5. RAG Pipeline – Combines query + retrieved context
  6. Answer Output (Text response)
  7. Bonus UI – Simple chat‑style interface for interaction

Each component is independently modular, testable, and replaceable.


Project Structure

rag-voice-bot/
│
├── asr_service/               # Task 3: ASR deployment
│   ├── __init__.py
│   ├── app.py                 # FastAPI ASR service with docs & validation
│   └── asr_model.py           # ASR model wrapper
│
├── data_collection/           # Task 1: Wikipedia data collection
│   ├── wiki_scraper.py        # CLI script to fetch & store article text
│   └── wiki_machine_learning.txt
│
├── vector_db/                 # Task 2: Vector database creation
│   ├── build_vector_db.py     # Chunking + embeddings + FAISS index
│   ├── faiss.index
│   └── chunks.pkl
│
├── translation/               # Task 4: Translation
│   ├── __init__.py
│   └── sarvam_translate.py    # API‑based translation to English
│
├── rag/                       # Task 5: RAG pipeline
│   └── rag_pipeline.py
│
├── ui/                        # Bonus task: Simple UI
│   └── app.py                 # Gradio chat interface
│
├── requirements.txt
├── .env.example
├── README.md


Setup Instructions

1. Clone the Repository

git clone <repository_url>
cd rag-voice-bot

2. Create Virtual Environment

python -m venv venv
source venv/bin/activate      # Linux / macOS
venv\Scripts\activate         # Windows

3. Install Dependencies

pip install -r requirements.txt

4. Environment Variables

Create a .env file using the template below:

SARVAM_API_KEY=your_api_key_here

How to Run Each Task

Task 1 – Wikipedia Data Collection

Fetch and store the closest Wikipedia article for a given topic.

python data_collection/wiki_scraper.py --query "Machine Learning"

Output: A cleaned .txt file containing article content.


Task 2 – Build Vector Database

Chunks the article text, generates embeddings, and stores them in FAISS.

python vector_db/build_vector_db.py

Chunking Strategy Justification:

  • Chunk size: 500 characters
  • Overlap: 50 characters

This balances semantic completeness with retrieval precision and avoids context loss at chunk boundaries.


Task 3 – Deploy ASR Service

Start the FastAPI ASR server.

uvicorn asr_service.app:app --reload

Available endpoints:

  • GET /health – Service health check
  • POST /transcribe – Upload WAV/FLAC audio and receive transcription

Interactive API docs:

http://127.0.0.1:8000/docs

Task 4 – Translation

Translation is handled internally via API calls. No separate deployment is required.


Task 5 – End‑to‑End RAG Pipeline

Ask a question using an audio file and receive a grounded answer.

python rag/rag_pipeline.py

Pipeline steps executed automatically:

  1. Audio → ASR endpoint
  2. Text → English translation
  3. Query embedding & vector search
  4. Context‑aware answer generation

Bonus Task – UI

Launch the interactive voice‑enabled chatbot UI.

python ui/app.py

Features:

  • Audio upload
  • Chat‑style response display
  • End‑to‑end pipeline integration

Error Handling & Edge Cases

  • Unsupported audio formats rejected at API level
  • Temporary file cleanup ensured
  • API failures handled with meaningful messages
  • Environment variables validated before runtime

Observations & Challenges Faced

  • Large ASR models on Windows may raise filesystem permission errors due to symbolic link restrictions in the local cache.
  • This is a known OS‑level constraint and does not affect correctness of the code or its execution on Linux‑based systems.
  • The codebase is fully compatible with GPU‑enabled Linux servers, which are typically used in production and evaluation environments.

Design & Best Practices Followed

  • Clear separation of concerns across modules
  • Extensive inline comments and docstrings
  • Single‑responsibility functions
  • Configurable and replaceable components
  • Clean FastAPI documentation

License & Usage

This repository is open-sourced under the MIT License.

The code has been developed as part of a technical evaluation and is intended for learning, experimentation, and research demonstration purposes.


Thank you for reviewing this submission.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages