A production-grade, vectorless RAG-powered voice agent supporting Hindi, English, Hinglish, Bengali, Marathi & Bhojpuri.
Features β’ Architecture β’ Installation β’ Usage β’ Configuration β’ Contributing β’ License
Vaani RAG is a multilingual voice-enabled Retrieval-Augmented Generation (RAG) system built for Indian languages. Unlike traditional RAG systems that rely on vector databases, Vaani RAG uses a Vectorless RAG approach powered by an LLM Tree Index β making it simpler to set up, cheaper to run, and more context-aware.
Ask questions about your PDF documents using your voice in any of 6 supported languages, and get accurate, sourced answers in the same language β spoken back to you.
| Feature | Description |
|---|---|
| π€ Voice Input | Speak your question using your microphone |
| π Voice Output | Hear the answer via Text-to-Speech |
| π 6 Languages | Hindi, English, Hinglish, Bengali, Marathi, Bhojpuri |
| π PDF Upload | Upload any PDF at runtime or use pre-loaded docs |
| π³ Vectorless RAG | JSON Tree Index β no Vector DB required |
| β‘ Groq Powered | Ultra-fast LLM inference (sub-second responses) |
| π Confidence Score | Know how reliable each answer is |
| π³ Tree Visualization | See the document structure in the sidebar |
| β‘ Cache System | Rebuilt trees are cached β no reprocessing |
| π¨ Dark UI | Clean, modern dark-themed Streamlit interface |
Traditional RAG requires a heavy pipeline:
Document β Chunking β Embedding Model β Vector DB β Similarity Search β LLM β Answer
Vaani RAG simplifies this to:
Document β LLM Tree Builder β JSON Index β LLM Traversal β Answer
Benefits:
- β No Vector DB setup or maintenance
- β No embedding model or costs
- β Better context preservation (no arbitrary chunking)
- β Human-like document navigation
- β Cached JSON tree for repeated queries
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β INGESTION PIPELINE β
β PDF File β PDF Parser β Plain Text β LLM Tree Builder β
β β β
β JSON Tree Index (cached) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QUERY PIPELINE β
β User Voice/Text β Language Detection β Embed Query β
β β β
β LLM Traverses Tree β Relevant Nodes Selected β
β β β
β Groq LLM β Multilingual Answer β TTS Output β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
vaani-rag/
βββ π app.py # Main Streamlit entry point
βββ π requirements.txt # Python dependencies
βββ π .env.example # Environment variable template
βββ π .gitignore # Git ignore rules
βββ π README.md # Project documentation
βββ π CHANGELOG.md # Version history
βββ π CONTRIBUTING.md # Contribution guidelines
βββ π LICENSE # MIT License
β
βββ π src/
β βββ π core/
β β βββ __init__.py
β β βββ pdf_parser.py # PDF β Text bridge (PyMuPDF)
β β βββ tree_builder.py # Vectorless RAG β JSON Tree
β β βββ rag_pipeline.py # LangChain RAG pipeline
β β βββ groq_client.py # Groq API wrapper
β β
β βββ π ui/
β β βββ __init__.py
β β βββ components.py # Reusable Streamlit components
β β βββ styles.py # Custom dark theme CSS
β β
β βββ π utils/
β βββ __init__.py
β βββ language_detector.py # 6-language auto detection
β βββ voice_handler.py # STT (Whisper) + TTS (gTTS)
β βββ logger.py # Loguru logging setup
β
βββ π assets/
β βββ π sample_docs/ # Pre-loaded sample PDF documents
β
βββ π tests/
β βββ test_pdf_parser.py
β βββ test_tree_builder.py
β βββ test_rag_pipeline.py
β
βββ π logs/ # Auto-generated log files
βββ π .cache/trees/ # Cached JSON tree indexes
βββ π .github/
βββ π workflows/
βββ ci.yml # GitHub Actions CI pipeline
- Python 3.10 or higher
- Groq API Key β Get it free here
ffmpegβ Required for audio processing# Windows (via Chocolatey) choco install ffmpeg # macOS brew install ffmpeg # Ubuntu/Debian sudo apt install ffmpeg
# 1. Clone the repository
git clone https://github.com/yourusername/vaani-rag.git
cd vaani-rag
# 2. Create and activate virtual environment
python -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Configure environment variables
cp .env.example .env
# Open .env and add your GROQ_API_KEY
# 5. Launch the application
streamlit run app.py- Click the π€ Record button in the chat interface
- Speak your question in any supported language
- Vaani automatically detects your language
- Receive a spoken answer in the same language
- Type your question in the chat input box
- Select language manually or leave on Auto Detect
- Get an instant answer sourced from your documents
- Runtime Upload β Drag and drop any PDF in the sidebar
- Pre-loaded Documents β Select from the available sample documents
- Tree View β Expand the sidebar to see the document's JSON tree structure
Copy .env.example to .env and configure:
# Groq API
GROQ_API_KEY=your_groq_api_key_here
GROQ_MODEL=llama3-70b-8192
# LLM Settings
MAX_TOKENS=1024
TEMPERATURE=0.1
# RAG Settings
DEFAULT_LANGUAGE=auto
TREE_CACHE_DIR=.cache/trees
# Voice Settings
STT_MODEL=base
# App Settings
DEBUG=false
LOG_LEVEL=INFO| Language | Code | Voice Input | Voice Output |
|---|---|---|---|
| English | en |
β | β |
| Hindi | hi |
β | β |
| Hinglish | hi-en |
β | β |
| Bengali | bn |
β | β |
| Marathi | mr |
β | β |
| Bhojpuri | bho |
β | β |
# Run all tests
pytest
# Run with coverage report
pytest --cov=src tests/
# Run specific test file
pytest tests/test_pdf_parser.py -vContributions are welcome! Please read CONTRIBUTING.md before submitting a PR.
# 1. Fork the repository
# 2. Create your feature branch
git checkout -b feature/your-feature-name
# 3. Commit your changes
git commit -m "feat: add your feature description"
# 4. Push to your branch
git push origin feature/your-feature-name
# 5. Open a Pull RequestWe follow Conventional Commits:
| Prefix | Usage |
|---|---|
feat: |
New feature |
fix: |
Bug fix |
docs: |
Documentation update |
refactor: |
Code refactor |
test: |
Adding tests |
chore: |
Maintenance |
- Vectorless RAG with JSON Tree
- 6 Indian language support
- Voice Input + Output
- PDF runtime upload
- Confidence scoring
- Cache system
- ElevenLabs TTS integration (English)
- Vapi voice agent integration
- Docker support
- REST API endpoint
- Support for DOCX, TXT files
This project is licensed under the MIT License β see the LICENSE file for details.
- Groq β Lightning-fast LLM inference
- LangChain β RAG pipeline framework
- Streamlit β Python-native UI framework
- OpenAI Whisper β Accurate multilingual STT
- PyMuPDF β Fast PDF parsing
- gTTS β Google Text-to-Speech
Built with Ashutosh β€οΈ for learning production-grade AI systems
β Star this repo if you found it helpful!