Live Demo: https://explainable-legal-rag.streamlit.app/
LegalRAG is an advanced Agentic AI system designed to act as an intelligent co-pilot for legal defense strategy. Unlike standard chatbots that simply "guess" answers or basic RAG systems that blindly retrieve text, LegalRAG uses a Reason+Act (ReAct) cognitive architecture.
It autonomously analyzes case files, identifies the correct jurisdiction (Civil vs. Military), selects the appropriate legal statutes using discrete tools, and generates a procedurally sound, step-by-step legal roadmap.
Powered by LangGraph, the system doesn't follows a linear script. It thinks before it acts.
- Dynamic Routing: Automatically detects if a case involves Military Personnel and routes queries to the
Army Actinstead of Civil Code. - Chain of Thought: You can see the agent's reasoning process (e.g., "The user is asking about a court-martial. I should check Section 63 of the Army Act.").
Legal documents are complex—filled with multi-column layouts, tables, and marginalia.
- Layout Awareness: We use
UnstructuredPDFLoaderto parse PDFs, ensuring that tabular data (like schedules of fines or dates) is preserved as structured information, not jumbled text. - Semantic Chunking: Text is split in a way that preserves the meaning of legal clauses.
The agent has access to a secure "Toolkit" to prevent hallucination:
retrieve_from_case_file: Reads the specific facts of the uploaded PDF case.retrieve_from_cpc: Searches the Code of Civil Procedure (CPC) 1908.retrieve_from_army_code: Searches the Army Act, 1950 and Rules.
Trust is critical in law.
- Trace Logs: The sidebar displays the exact active "Thought Process" and JSON outputs of every tool call.
- Citations: Every claim is backed by a specific Section or Page Number from the source document.
| Feature | Standard LLM (ChatGPT/GPT-4) | Standard RAG | LegalRAG (Agentic) |
|---|---|---|---|
| Data Source | Training Data (Often Outdated) | Static Document Search | Dynamic Tool Selection |
| Reasoning | Implicit / Hazy | None (Keyword Matching) | Explicit Multi-Step (ReAct) |
| Hallucination | High Risk (Invents Laws) | Medium (Wrong Context) | Near Zero (Grounded) |
| Input Parsing | Text Paste (Loses Formatting) | Basic PDF Readers | Unstructured.io (Layout Aware) |
| Transparency | Black Box | "Sources" Link | Full Execution Trace |
We rigorously tested the pipeline against complex, multi-jurisdictional synthetic scenarios.
- 100% Keyword Recall: The agent successfully identified all key legal concepts (e.g., "Section 63", "Tribunal", "Appeal Procedure") in every test case.
- 100% Routing Accuracy: It correctly distinguished when to apply Military Law vs. Civil Law in 3/3 complex test scenarios.
- Latency: ~100s average response time. (We prioritize deep reasoning correctness over speed).
-
Clone the Repository
git clone https://github.com/yourusername/LegalRAG.git cd LegalRAG -
Install Dependencies
pip install -r requirements.txt
-
Set API Keys Create a
.envfile:GROQ_API_KEY=your_groq_api_key_here HUGGINGFACEHUB_API_TOKEN=your_hf_token_here
-
Ingest Data (Build Vector Store)
python LegalRAG/data_ingestion.py
-
Run the App
streamlit run LegalRAG/app.py
To run the automated evaluation suite:
python LegalRAG/evaluate_pipeline.pyThis will generate a pipeline_evaluation_report.md with detailed performance metrics.
- Orchestration: LangChain, LangGraph
- LLM: Llama 3 / GPT-OSS (via Groq)
- Vector Store: FAISS
- Embeddings: HuggingFace (
all-MiniLM-L6-v2) - ETL/Ingestion: Unstructured.io
- Frontend: Streamlit
- Parthiv Godrihal
- Nilay Jain
- Manan Bansal