Skip to content

ASHUTOSH-KUMAR-RAO/Vaani-Rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ Vaani RAG β€” Multi-Language Voice Agent

Python Streamlit Groq LangChain License PRs Welcome

A production-grade, vectorless RAG-powered voice agent supporting Hindi, English, Hinglish, Bengali, Marathi & Bhojpuri.

Features β€’ Architecture β€’ Installation β€’ Usage β€’ Configuration β€’ Contributing β€’ License


πŸ“Œ Overview

Vaani RAG is a multilingual voice-enabled Retrieval-Augmented Generation (RAG) system built for Indian languages. Unlike traditional RAG systems that rely on vector databases, Vaani RAG uses a Vectorless RAG approach powered by an LLM Tree Index β€” making it simpler to set up, cheaper to run, and more context-aware.

Ask questions about your PDF documents using your voice in any of 6 supported languages, and get accurate, sourced answers in the same language β€” spoken back to you.


✨ Features

Feature Description
🎀 Voice Input Speak your question using your microphone
πŸ”Š Voice Output Hear the answer via Text-to-Speech
🌍 6 Languages Hindi, English, Hinglish, Bengali, Marathi, Bhojpuri
πŸ“„ PDF Upload Upload any PDF at runtime or use pre-loaded docs
🌳 Vectorless RAG JSON Tree Index β€” no Vector DB required
⚑ Groq Powered Ultra-fast LLM inference (sub-second responses)
πŸ“Š Confidence Score Know how reliable each answer is
🌳 Tree Visualization See the document structure in the sidebar
⚑ Cache System Rebuilt trees are cached β€” no reprocessing
🎨 Dark UI Clean, modern dark-themed Streamlit interface

πŸ—οΈ Architecture

Why Vectorless RAG?

Traditional RAG requires a heavy pipeline:

Document β†’ Chunking β†’ Embedding Model β†’ Vector DB β†’ Similarity Search β†’ LLM β†’ Answer

Vaani RAG simplifies this to:

Document β†’ LLM Tree Builder β†’ JSON Index β†’ LLM Traversal β†’ Answer

Benefits:

  • βœ… No Vector DB setup or maintenance
  • βœ… No embedding model or costs
  • βœ… Better context preservation (no arbitrary chunking)
  • βœ… Human-like document navigation
  • βœ… Cached JSON tree for repeated queries

System Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    INGESTION PIPELINE                    β”‚
β”‚  PDF File β†’ PDF Parser β†’ Plain Text β†’ LLM Tree Builder  β”‚
β”‚                              ↓                          β”‚
β”‚                    JSON Tree Index (cached)              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     QUERY PIPELINE                       β”‚
β”‚  User Voice/Text β†’ Language Detection β†’ Embed Query     β”‚
β”‚       ↓                                                  β”‚
β”‚  LLM Traverses Tree β†’ Relevant Nodes Selected           β”‚
β”‚       ↓                                                  β”‚
β”‚  Groq LLM β†’ Multilingual Answer β†’ TTS Output            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Project Structure

vaani-rag/
β”œβ”€β”€ πŸ“„ app.py                        # Main Streamlit entry point
β”œβ”€β”€ πŸ“„ requirements.txt              # Python dependencies
β”œβ”€β”€ πŸ“„ .env.example                  # Environment variable template
β”œβ”€β”€ πŸ“„ .gitignore                    # Git ignore rules
β”œβ”€β”€ πŸ“„ README.md                     # Project documentation
β”œβ”€β”€ πŸ“„ CHANGELOG.md                  # Version history
β”œβ”€β”€ πŸ“„ CONTRIBUTING.md               # Contribution guidelines
β”œβ”€β”€ πŸ“„ LICENSE                       # MIT License
β”‚
β”œβ”€β”€ πŸ“ src/
β”‚   β”œβ”€β”€ πŸ“ core/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ pdf_parser.py            # PDF β†’ Text bridge (PyMuPDF)
β”‚   β”‚   β”œβ”€β”€ tree_builder.py          # Vectorless RAG β€” JSON Tree
β”‚   β”‚   β”œβ”€β”€ rag_pipeline.py          # LangChain RAG pipeline
β”‚   β”‚   └── groq_client.py           # Groq API wrapper
β”‚   β”‚
β”‚   β”œβ”€β”€ πŸ“ ui/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ components.py            # Reusable Streamlit components
β”‚   β”‚   └── styles.py                # Custom dark theme CSS
β”‚   β”‚
β”‚   └── πŸ“ utils/
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ language_detector.py     # 6-language auto detection
β”‚       β”œβ”€β”€ voice_handler.py         # STT (Whisper) + TTS (gTTS)
β”‚       └── logger.py                # Loguru logging setup
β”‚
β”œβ”€β”€ πŸ“ assets/
β”‚   └── πŸ“ sample_docs/              # Pre-loaded sample PDF documents
β”‚
β”œβ”€β”€ πŸ“ tests/
β”‚   β”œβ”€β”€ test_pdf_parser.py
β”‚   β”œβ”€β”€ test_tree_builder.py
β”‚   └── test_rag_pipeline.py
β”‚
β”œβ”€β”€ πŸ“ logs/                         # Auto-generated log files
β”œβ”€β”€ πŸ“ .cache/trees/                 # Cached JSON tree indexes
└── πŸ“ .github/
    └── πŸ“ workflows/
        └── ci.yml                   # GitHub Actions CI pipeline

πŸš€ Installation

Prerequisites

  • Python 3.10 or higher
  • Groq API Key β€” Get it free here
  • ffmpeg β€” Required for audio processing
    # Windows (via Chocolatey)
    choco install ffmpeg
    
    # macOS
    brew install ffmpeg
    
    # Ubuntu/Debian
    sudo apt install ffmpeg

Step-by-Step Setup

# 1. Clone the repository
git clone https://github.com/yourusername/vaani-rag.git
cd vaani-rag

# 2. Create and activate virtual environment
python -m venv venv

# Windows
venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure environment variables
cp .env.example .env
# Open .env and add your GROQ_API_KEY

# 5. Launch the application
streamlit run app.py

🎯 Usage

Voice Mode

  1. Click the 🎀 Record button in the chat interface
  2. Speak your question in any supported language
  3. Vaani automatically detects your language
  4. Receive a spoken answer in the same language

Text Mode

  1. Type your question in the chat input box
  2. Select language manually or leave on Auto Detect
  3. Get an instant answer sourced from your documents

PDF Management

  • Runtime Upload β€” Drag and drop any PDF in the sidebar
  • Pre-loaded Documents β€” Select from the available sample documents
  • Tree View β€” Expand the sidebar to see the document's JSON tree structure

βš™οΈ Configuration

Copy .env.example to .env and configure:

# Groq API
GROQ_API_KEY=your_groq_api_key_here
GROQ_MODEL=llama3-70b-8192

# LLM Settings
MAX_TOKENS=1024
TEMPERATURE=0.1

# RAG Settings
DEFAULT_LANGUAGE=auto
TREE_CACHE_DIR=.cache/trees

# Voice Settings
STT_MODEL=base

# App Settings
DEBUG=false
LOG_LEVEL=INFO

🌍 Supported Languages

Language Code Voice Input Voice Output
English en βœ… βœ…
Hindi hi βœ… βœ…
Hinglish hi-en βœ… βœ…
Bengali bn βœ… βœ…
Marathi mr βœ… βœ…
Bhojpuri bho βœ… βœ…

πŸ§ͺ Running Tests

# Run all tests
pytest

# Run with coverage report
pytest --cov=src tests/

# Run specific test file
pytest tests/test_pdf_parser.py -v

🀝 Contributing

Contributions are welcome! Please read CONTRIBUTING.md before submitting a PR.

# 1. Fork the repository
# 2. Create your feature branch
git checkout -b feature/your-feature-name

# 3. Commit your changes
git commit -m "feat: add your feature description"

# 4. Push to your branch
git push origin feature/your-feature-name

# 5. Open a Pull Request

Commit Convention

We follow Conventional Commits:

Prefix Usage
feat: New feature
fix: Bug fix
docs: Documentation update
refactor: Code refactor
test: Adding tests
chore: Maintenance

πŸ“‹ Roadmap

  • Vectorless RAG with JSON Tree
  • 6 Indian language support
  • Voice Input + Output
  • PDF runtime upload
  • Confidence scoring
  • Cache system
  • ElevenLabs TTS integration (English)
  • Vapi voice agent integration
  • Docker support
  • REST API endpoint
  • Support for DOCX, TXT files

πŸ“ License

This project is licensed under the MIT License β€” see the LICENSE file for details.


πŸ™ Acknowledgements

  • Groq β€” Lightning-fast LLM inference
  • LangChain β€” RAG pipeline framework
  • Streamlit β€” Python-native UI framework
  • OpenAI Whisper β€” Accurate multilingual STT
  • PyMuPDF β€” Fast PDF parsing
  • gTTS β€” Google Text-to-Speech

Built with Ashutosh ❀️ for learning production-grade AI systems

⭐ Star this repo if you found it helpful!

About

πŸŽ™οΈ Vaani RAG β€” Multi-Language Voice Agent

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors

Languages