Understand complex legal documents in seconds using AI.
Upload contracts, notices, agreements, or scanned legal files and receive intelligent legal insights, OCR-powered extraction, clause analysis, and AI-generated explanations.
- ✨ Features
- 🖼️ Screenshots
- 🛠️ Tech Stack
- 📂 Project Structure
- ⚙️ Installation & Setup
- 🧪 Frontend Validation
- 🔑 Environment Variables
- 📡 API Endpoints
- 🔍 OCR Workflow
⚠️ OCR Failure Protection- 🤝 Contributing
- 🔒 Security & Disclaimer
- 🗺️ Future Roadmap
- 🐛 Troubleshooting
- ❤️ Contributors
- 📄 License
NyayaVanni intelligently analyzes legal documents and provides:
- Document type detection
- Key party identification
- Important dates extraction
- Simplified legal summaries
- Legal clause understanding
- Agreements
- Contracts
- Legal notices
- Consumer complaints
- Financial/legal documents
- Scanned PDFs and images
Automatically identifies:
- High-risk clauses
- Legal obligations
- Financial liabilities
- Penalty conditions
- Potential legal consequences
- Risk severity analysis
- Recommended actions
- Easy-to-understand explanations
Chat directly with uploaded legal documents.
- “What are my risks in this contract?”
- “What is the termination clause?”
- “Who is liable for damages?”
- “Summarize this document.”
- Gemini AI
- Context-aware legal retrieval
- RAG-based querying
Supports OCR for:
- Scanned PDFs
- Images
- Low-quality legal scans
- Handwritten-friendly preprocessing
- PNG
- JPG
- JPEG
Extracts:
- Payment clauses
- Liability clauses
- Arbitration clauses
- Termination clauses
- Legal obligations
- Penalty sections
- Responsive design
- Clean dashboard
- Dark/Light mode support
- Beginner-friendly interface
- Fast document uploads
| Technology | Purpose |
|---|---|
| FastAPI | Backend framework |
| Gemini AI | AI-powered legal analysis |
| FAISS | Vector similarity search |
| PyMuPDF | PDF text extraction |
| Tesseract OCR | OCR engine |
| Pillow | Image preprocessing |
| Python | Core backend language |
| Technology | Purpose |
|---|---|
| React 19 | Frontend framework |
| Vite | Build tool |
| Tailwind CSS | Styling |
| Axios | API requests |
| Lucide React | Icons |
NyayaVanni/
│
├── .github/ # GitHub configuration
│ ├── ISSUE_TEMPLATE/ # Issue report templates
│ └── workflows/ # CI/CD GitHub Actions workflows
│
├── backend/ # FastAPI backend server
│ ├── api/ # API route definitions (routes.py)
│ ├── data/ # Static data and reference files
│ ├── models/ # Pydantic schemas (schemas.py, llm_schemas.py)
│ ├── scripts/ # Utility and manual test scripts
│ ├── services/ # Core business logic
│ │ ├── document_classifier.py # Legal document type classification
│ │ ├── gemini_service.py # Google Gemini AI integration
│ │ ├── knowledge_graph_service.py # Knowledge graph construction
│ │ ├── legal_processor.py # Legal document processing pipeline
│ │ ├── ocr_service.py # OCR text extraction
│ │ ├── rag_service.py # Retrieval-Augmented Generation
│ │ └── storage_service.py # File storage management
│ ├── uploads/ # Uploaded document storage (runtime)
│ └── main.py # FastAPI application entry point
│
├── frontend/ # React + Tailwind CSS frontend
│ ├── public/ # Static public assets
│ └── src/ # React source code
│ ├── assets/ # Images, icons, and static assets
│ ├── components/ # Reusable UI components
│ ├── contexts/ # React context providers (global state)
│ ├── hooks/ # Custom React hooks
│ ├── pages/ # Page-level components (routes)
│ └── utils/ # Helper utilities and API clients
│
├── designs/ # UI/UX design files and mockups
├── screenshots/ # Application screenshots for docs
├── tests/ # Backend integration tests
│
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── SECURITY.md
├── main.py # Top-level entry point
└── README.md
Before running the project install:
- Python 3.10+
- Node.js 18+
- Git
- Tesseract OCR
<-- Fix: Replaced placeholder repository URL with the actual project repository URL -->
git clone https://github.com/<actual-owner>/NyayaVanni.git
cd NyayaVannicd backendpython -m venv venv
venv\Scripts\activatepython3 -m venv venv
source venv/bin/activatepip install -r requirements.txtCreate .env
GEMINI_API_KEY=your_gemini_api_keyuvicorn main:app --reloadBackend runs at:
http://127.0.0.1:8000
cd frontendnpm installCreate .env
VITE_API_URL=http://127.0.0.1:8000npm run devFrontend runs at:
http://localhost:5173
Run these commands from the frontend/ directory before submitting UI changes.
npm installnpm run lintnpm run buildnpm run previewOpen the preview URL shown in the terminal and manually verify the touched UI flow.
The frontend currently does not define a dedicated unit or integration test script in frontend/package.json. Until a test runner is added, use npm run lint, npm run build, and a local preview smoke check as the required frontend validation path.
When a test runner is introduced, add the command to frontend/package.json and document it here, for example:
npm run testGEMINI_API_KEY=your_api_keyVITE_API_URL=http://localhost:8000| Method | Endpoint | Description |
|---|---|---|
| POST | /api/upload |
Upload legal document |
| POST | /api/analyze/{id} |
Analyze uploaded document |
| POST | /api/chat |
Chat with uploaded document |
| GET | /health |
Health check |
Document Upload
↓
PDF/Image Processing
↓
OCR/Text Extraction
↓
Text Validation
↓
AI Legal Analysis
↓
Risk Assessment & Clause Extraction
NyayaVanni prevents AI hallucinations when OCR extraction fails.
The backend now checks:
- Empty OCR output
- Extremely low text extraction
- Unreadable or corrupted documents
- Failed parsing attempts
If OCR fails:
- AI legal analysis is stopped
- Fake legal sections are NOT generated
- Users receive a clean fallback message
{
"status": "ocr_failed",
"message": "Unable to extract readable text from the document."
}We welcome contributions from developers and open-source enthusiasts.
git checkout -b fix/your-feature-namegit commit -m "fix: improve OCR validation flow"git push origin fix/your-feature-name| Type | Example |
|---|---|
| feat | feat: add OCR confidence validation |
| fix | fix: prevent fake legal analysis |
| docs | docs: improve README formatting |
| style | style: improve dashboard UI |
- Uploaded documents are processed securely
- Sensitive legal information should be handled carefully
- Avoid uploading confidential government/legal records publicly
NyayaVanni provides AI-generated legal assistance for educational and informational purposes only.
It does NOT replace:
- Professional legal consultation
- Certified legal advice
- Court-approved legal interpretation
Always consult a qualified legal professional for official legal decisions.
- Multi-language legal support
- Voice-based legal assistant
- PDF annotation support
- Case law recommendation engine
- Cloud document storage
- Advanced legal summarization
- AI-powered compliance checker
- Mobile application support
Make sure your backend .env contains:
GEMINI_API_KEY=your_api_keyActivate virtual environment first.
venv\Scripts\activateThen run:
uvicorn main:app --reloadInstall Tesseract OCR.
https://github.com/UB-Mannheim/tesseract/wiki
After installation, add Tesseract to system PATH.
Check frontend .env
VITE_API_URL=http://127.0.0.1:8000Proudly contributing to:
- GirlScript Summer of Code (GSSoC)
- Open-source legal AI innovation
Heartiest thanks to all the brilliant minds helping shape NyayaVanni! Open source is all about collaboration, and this project wouldn't be where it is today without your invaluable contributions.
Big or small, your pull requests, issue reports, and feedback make a world of difference. Thank you for being a part of this journey! 💖
This project is licensed under the MIT License.
See the LICENSE file for more information.
If you found this project useful:
- ⭐ Star the repository
- 🍴 Fork the project
- 🐛 Report issues
- 🚀 Contribute improvements
Built with ❤️ using FastAPI, React, Gemini AI, and OCR Intelligence



