This repository presents an end-to-end, voice-enabled conversational system designed to answer user queries grounded in external knowledge sources. The system follows a modular, production-oriented design and covers the full pipeline from speech input to answer generation, with an optional user interface layer. The implementation is provided solely for technical evaluation purposes and adheres to the stated guidelines.
- Audio Input (User speaks a question)
- ASR Service (FastAPI) – Transcribes speech to text
- Translation Module – Translates text to English via API call
- Vector Database (FAISS) – Retrieves relevant knowledge chunks
- RAG Pipeline – Combines query + retrieved context
- Answer Output (Text response)
- Bonus UI – Simple chat‑style interface for interaction
Each component is independently modular, testable, and replaceable.
rag-voice-bot/
│
├── asr_service/ # Task 3: ASR deployment
│ ├── __init__.py
│ ├── app.py # FastAPI ASR service with docs & validation
│ └── asr_model.py # ASR model wrapper
│
├── data_collection/ # Task 1: Wikipedia data collection
│ ├── wiki_scraper.py # CLI script to fetch & store article text
│ └── wiki_machine_learning.txt
│
├── vector_db/ # Task 2: Vector database creation
│ ├── build_vector_db.py # Chunking + embeddings + FAISS index
│ ├── faiss.index
│ └── chunks.pkl
│
├── translation/ # Task 4: Translation
│ ├── __init__.py
│ └── sarvam_translate.py # API‑based translation to English
│
├── rag/ # Task 5: RAG pipeline
│ └── rag_pipeline.py
│
├── ui/ # Bonus task: Simple UI
│ └── app.py # Gradio chat interface
│
├── requirements.txt
├── .env.example
├── README.md
git clone <repository_url>
cd rag-voice-botpython -m venv venv
source venv/bin/activate # Linux / macOS
venv\Scripts\activate # Windowspip install -r requirements.txtCreate a .env file using the template below:
SARVAM_API_KEY=your_api_key_here
Fetch and store the closest Wikipedia article for a given topic.
python data_collection/wiki_scraper.py --query "Machine Learning"Output: A cleaned .txt file containing article content.
Chunks the article text, generates embeddings, and stores them in FAISS.
python vector_db/build_vector_db.pyChunking Strategy Justification:
- Chunk size: 500 characters
- Overlap: 50 characters
This balances semantic completeness with retrieval precision and avoids context loss at chunk boundaries.
Start the FastAPI ASR server.
uvicorn asr_service.app:app --reloadAvailable endpoints:
GET /health– Service health checkPOST /transcribe– Upload WAV/FLAC audio and receive transcription
Interactive API docs:
http://127.0.0.1:8000/docs
Translation is handled internally via API calls. No separate deployment is required.
Ask a question using an audio file and receive a grounded answer.
python rag/rag_pipeline.pyPipeline steps executed automatically:
- Audio → ASR endpoint
- Text → English translation
- Query embedding & vector search
- Context‑aware answer generation
Launch the interactive voice‑enabled chatbot UI.
python ui/app.pyFeatures:
- Audio upload
- Chat‑style response display
- End‑to‑end pipeline integration
- Unsupported audio formats rejected at API level
- Temporary file cleanup ensured
- API failures handled with meaningful messages
- Environment variables validated before runtime
- Large ASR models on Windows may raise filesystem permission errors due to symbolic link restrictions in the local cache.
- This is a known OS‑level constraint and does not affect correctness of the code or its execution on Linux‑based systems.
- The codebase is fully compatible with GPU‑enabled Linux servers, which are typically used in production and evaluation environments.
- Clear separation of concerns across modules
- Extensive inline comments and docstrings
- Single‑responsibility functions
- Configurable and replaceable components
- Clean FastAPI documentation
This repository is open-sourced under the MIT License.
The code has been developed as part of a technical evaluation and is intended for learning, experimentation, and research demonstration purposes.
Thank you for reviewing this submission.