Multilingual legal information assistant for Indian law — ask questions about the Bharatiya Nyaya Sanhita (BNS), Indian Penal Code (IPC), and their mappings in 13 languages.
Built on Databricks Free Edition with FAISS RAG, Llama Maverick (AI Gateway), and Sarvam AI (translation, speech-to-text, text-to-speech). Deployed as a Databricks App via Gradio.
Not legal advice. General information only — consult a qualified lawyer for your specific situation.
Question (any of 13 languages)
→ Sarvam Mayura translates to English
→ FAISS semantic search (900+ legal text chunks)
→ Databricks Llama Maverick generates answer with citations
→ Sarvam Mayura translates back to selected language
→ Bilingual response (selected language + English) + sources
Supported languages: English, Hindi, Bengali, Kannada, Tamil, Telugu, Malayalam, Marathi, Gujarati, Odia, Punjabi, Assamese, Urdu.
Voice support: microphone input via Sarvam Saaras STT, answer read aloud via Sarvam Bulbul TTS.
| Document | Audience | What it covers |
|---|---|---|
| App User Guide | End users | How to use the app — language selection, asking questions, understanding responses, voice features |
| Developer Guide | Developers | Deploying the app, secrets/auth, Gradio + Databricks Apps pitfalls, translation pipeline, dependency pins |
| UI Design | Designers / developers | UI/UX spec and Sarvam pipeline design |
| Playground to App | Developers | Mapping Playground "Get code" to app env vars |
| Workspace Setup | Admins | Databricks secret scopes, GitHub Repos, key rotation |
| Benchmark Evaluation | Developers | RAG quality evaluation with BhashaBench-Legal + internal test questions |
| Plan | Team | Product plan and architecture decisions |
Open the app URL → select language → ask a question. See App User Guide.
# 1. Authenticate
databricks auth login --host https://dbc-6651e87a-25a5.cloud.databricks.com --profile free-aws
export DATABRICKS_CONFIG_PROFILE=free-aws
# 2. Store secrets
databricks secrets create-scope nyaya-dhwani
databricks secrets put-secret nyaya-dhwani sarvam_api_key
databricks secrets put-secret nyaya-dhwani hf_token # HuggingFace (for benchmark dataset)
# 3. Run notebooks (on a Databricks cluster)
# - notebooks/india_legal_policy_ingest.ipynb (ingest → legal_rag_corpus table)
# - notebooks/build_rag_index.ipynb (FAISS index → UC Volume)
# 4. Deploy the app
# Compute → Apps → Create → connect this Git repo → Deploy
# 5. Grant service principal permissions
# - CAN_QUERY on AI Gateway endpoint
# - READ on UC Volume main.india_legal.legal_files
# - READ on secret scope nyaya-dhwaniFull details: Developer Guide.
pip install -e ".[dev,rag,rag_embed,app]"
cp .env.example .env # fill in values
export $(grep -v '^#' .env | xargs)
python app/main.py| Path | Purpose |
|---|---|
app/main.py |
Gradio app (RAG + LLM + Sarvam multilingual) |
app.yaml |
Databricks Apps entry point + env config |
src/nyaya_dhwani/ |
Python package: embedder, retrieval, llm_client, sarvam_client |
notebooks/ |
Data ingestion + FAISS index build |
requirements.txt |
Databricks Apps pip install |
tests/ |
pytest suite |
docs/ |
All documentation (see table above) |
pip install -e ".[dev,rag]"
pytest tests/ -v| Component | Technology |
|---|---|
| LLM | Databricks Llama 4 Maverick (AI Gateway) |
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 |
| Vector search | FAISS (IndexFlatIP, cosine similarity) |
| Translation | Sarvam Mayura |
| Speech-to-text | Sarvam Saaras |
| Text-to-speech | Sarvam Bulbul |
| App framework | Gradio 4.44 on Databricks Apps |
| Data platform | Databricks (Unity Catalog, Volumes, Apps) |