🎙️ Vaani RAG — Multi-Language Voice Agent

A production-grade, vectorless RAG-powered voice agent supporting Hindi, English, Hinglish, Bengali, Marathi & Bhojpuri.

Features • Architecture • Installation • Usage • Configuration • Contributing • License

📌 Overview

Vaani RAG is a multilingual voice-enabled Retrieval-Augmented Generation (RAG) system built for Indian languages. Unlike traditional RAG systems that rely on vector databases, Vaani RAG uses a Vectorless RAG approach powered by an LLM Tree Index — making it simpler to set up, cheaper to run, and more context-aware.

Ask questions about your PDF documents using your voice in any of 6 supported languages, and get accurate, sourced answers in the same language — spoken back to you.

✨ Features

Feature	Description
🎤 Voice Input	Speak your question using your microphone
🔊 Voice Output	Hear the answer via Text-to-Speech
🌍 6 Languages	Hindi, English, Hinglish, Bengali, Marathi, Bhojpuri
📄 PDF Upload	Upload any PDF at runtime or use pre-loaded docs
🌳 Vectorless RAG	JSON Tree Index — no Vector DB required
⚡ Groq Powered	Ultra-fast LLM inference (sub-second responses)
📊 Confidence Score	Know how reliable each answer is
🌳 Tree Visualization	See the document structure in the sidebar
⚡ Cache System	Rebuilt trees are cached — no reprocessing
🎨 Dark UI	Clean, modern dark-themed Streamlit interface

🏗️ Architecture

Why Vectorless RAG?

Traditional RAG requires a heavy pipeline:

Document → Chunking → Embedding Model → Vector DB → Similarity Search → LLM → Answer

Vaani RAG simplifies this to:

Document → LLM Tree Builder → JSON Index → LLM Traversal → Answer

Benefits:

✅ No Vector DB setup or maintenance
✅ No embedding model or costs
✅ Better context preservation (no arbitrary chunking)
✅ Human-like document navigation
✅ Cached JSON tree for repeated queries

System Flow

┌─────────────────────────────────────────────────────────┐
│                    INGESTION PIPELINE                    │
│  PDF File → PDF Parser → Plain Text → LLM Tree Builder  │
│                              ↓                          │
│                    JSON Tree Index (cached)              │
└─────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────┐
│                     QUERY PIPELINE                       │
│  User Voice/Text → Language Detection → Embed Query     │
│       ↓                                                  │
│  LLM Traverses Tree → Relevant Nodes Selected           │
│       ↓                                                  │
│  Groq LLM → Multilingual Answer → TTS Output            │
└─────────────────────────────────────────────────────────┘

📁 Project Structure

vaani-rag/
├── 📄 app.py                        # Main Streamlit entry point
├── 📄 requirements.txt              # Python dependencies
├── 📄 .env.example                  # Environment variable template
├── 📄 .gitignore                    # Git ignore rules
├── 📄 README.md                     # Project documentation
├── 📄 CHANGELOG.md                  # Version history
├── 📄 CONTRIBUTING.md               # Contribution guidelines
├── 📄 LICENSE                       # MIT License
│
├── 📁 src/
│   ├── 📁 core/
│   │   ├── __init__.py
│   │   ├── pdf_parser.py            # PDF → Text bridge (PyMuPDF)
│   │   ├── tree_builder.py          # Vectorless RAG — JSON Tree
│   │   ├── rag_pipeline.py          # LangChain RAG pipeline
│   │   └── groq_client.py           # Groq API wrapper
│   │
│   ├── 📁 ui/
│   │   ├── __init__.py
│   │   ├── components.py            # Reusable Streamlit components
│   │   └── styles.py                # Custom dark theme CSS
│   │
│   └── 📁 utils/
│       ├── __init__.py
│       ├── language_detector.py     # 6-language auto detection
│       ├── voice_handler.py         # STT (Whisper) + TTS (gTTS)
│       └── logger.py                # Loguru logging setup
│
├── 📁 assets/
│   └── 📁 sample_docs/              # Pre-loaded sample PDF documents
│
├── 📁 tests/
│   ├── test_pdf_parser.py
│   ├── test_tree_builder.py
│   └── test_rag_pipeline.py
│
├── 📁 logs/                         # Auto-generated log files
├── 📁 .cache/trees/                 # Cached JSON tree indexes
└── 📁 .github/
    └── 📁 workflows/
        └── ci.yml                   # GitHub Actions CI pipeline

🚀 Installation

Prerequisites

Python 3.10 or higher
Groq API Key — Get it free here

ffmpeg — Required for audio processing

# Windows (via Chocolatey)
choco install ffmpeg

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

Step-by-Step Setup

# 1. Clone the repository
git clone https://github.com/yourusername/vaani-rag.git
cd vaani-rag

# 2. Create and activate virtual environment
python -m venv venv

# Windows
venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure environment variables
cp .env.example .env
# Open .env and add your GROQ_API_KEY

# 5. Launch the application
streamlit run app.py

🎯 Usage

Voice Mode

Click the 🎤 Record button in the chat interface
Speak your question in any supported language
Vaani automatically detects your language
Receive a spoken answer in the same language

Text Mode

Type your question in the chat input box
Select language manually or leave on Auto Detect
Get an instant answer sourced from your documents

PDF Management

Runtime Upload — Drag and drop any PDF in the sidebar
Pre-loaded Documents — Select from the available sample documents
Tree View — Expand the sidebar to see the document's JSON tree structure

⚙️ Configuration

Copy .env.example to .env and configure:

# Groq API
GROQ_API_KEY=your_groq_api_key_here
GROQ_MODEL=llama3-70b-8192

# LLM Settings
MAX_TOKENS=1024
TEMPERATURE=0.1

# RAG Settings
DEFAULT_LANGUAGE=auto
TREE_CACHE_DIR=.cache/trees

# Voice Settings
STT_MODEL=base

# App Settings
DEBUG=false
LOG_LEVEL=INFO

🌍 Supported Languages

Language	Code	Voice Input	Voice Output
English	`en`	✅	✅
Hindi	`hi`	✅	✅
Hinglish	`hi-en`	✅	✅
Bengali	`bn`	✅	✅
Marathi	`mr`	✅	✅
Bhojpuri	`bho`	✅	✅

🧪 Running Tests

# Run all tests
pytest

# Run with coverage report
pytest --cov=src tests/

# Run specific test file
pytest tests/test_pdf_parser.py -v

🤝 Contributing

Contributions are welcome! Please read CONTRIBUTING.md before submitting a PR.

# 1. Fork the repository
# 2. Create your feature branch
git checkout -b feature/your-feature-name

# 3. Commit your changes
git commit -m "feat: add your feature description"

# 4. Push to your branch
git push origin feature/your-feature-name

# 5. Open a Pull Request

Commit Convention

We follow Conventional Commits:

Prefix	Usage
`feat:`	New feature
`fix:`	Bug fix
`docs:`	Documentation update
`refactor:`	Code refactor
`test:`	Adding tests
`chore:`	Maintenance

📋 Roadmap

📝 License

This project is licensed under the MIT License — see the LICENSE file for details.

🙏 Acknowledgements

Groq — Lightning-fast LLM inference
LangChain — RAG pipeline framework
Streamlit — Python-native UI framework
OpenAI Whisper — Accurate multilingual STT
PyMuPDF — Fast PDF parsing
gTTS — Google Text-to-Speech

Built with Ashutosh ❤️ for learning production-grade AI systems

⭐ Star this repo if you found it helpful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Vaani RAG — Multi-Language Voice Agent

📌 Overview

✨ Features

🏗️ Architecture

Why Vectorless RAG?

System Flow

📁 Project Structure

🚀 Installation

Prerequisites

Step-by-Step Setup

🎯 Usage

Voice Mode

Text Mode

PDF Management

⚙️ Configuration

🌍 Supported Languages

🧪 Running Tests

🤝 Contributing

Commit Convention

📋 Roadmap

📝 License

🙏 Acknowledgements

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
logs		logs
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🎙️ Vaani RAG — Multi-Language Voice Agent

📌 Overview

✨ Features

🏗️ Architecture

Why Vectorless RAG?

System Flow

📁 Project Structure

🚀 Installation

Prerequisites

Step-by-Step Setup

🎯 Usage

Voice Mode

Text Mode

PDF Management

⚙️ Configuration

🌍 Supported Languages

🧪 Running Tests

🤝 Contributing

Commit Convention

📋 Roadmap

📝 License

🙏 Acknowledgements

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages