Skip to content

SamvidAI — Enterprise Contract Intelligence powered by OpticalRAG Multimodal document understanding system for clause extraction, legal risk scoring, and explainable contract analysis using layout-aware RAG pipelines.

License

Notifications You must be signed in to change notification settings

VisionExpo/SamvidAI

Repository files navigation

🧠 SamvidAI

Intelligent Contract Analysis Engine powered by OpticalRAG

Version Status Python UI Backend LLM Vision Docker License


📐 System Design:
High Level Design → docs/HLD.md


🚧 Current Project Status

SamvidAI is in active design and prototyping.

  • Core architecture and OpticalRAG pipeline are defined
  • Experiments use synthetic or publicly available documents
  • No confidential or client legal data is used
  • Benchmarks are prototype / controlled estimates
  • Legal workflow validation is ongoing via practicing professionals

The project prioritizes correctness, safety, and validation before scale.


🧭 What is SamvidAI?

SamvidAI is a next-generation legal document intelligence system built to analyze long, complex contracts (50–300+ pages) using layout-aware, vision-first retrieval.

Traditional OCR + NLP systems flatten documents into text, losing structure and introducing hallucinations.
SamvidAI instead introduces OpticalRAG — a multimodal retrieval-augmented architecture that preserves spatial context while reducing cost and latency.


🧠 Philosophy

We do not replace attorneys. We empower them.

SamvidAI automates extraction, retrieval, and risk flagging so legal professionals can focus on judgment, validation, and strategy.

Human-in-the-loop review is a first-class design principle.


📘 Architecture & Design Documentation

SamvidAI follows a documentation-first, system-design-driven approach.
All major architectural and design decisions are formally documented and versioned.

🔹 High Level Design (HLD)

The High Level Design document covers:

  • End-to-end system architecture
  • OpticalRAG design rationale
  • Component responsibilities
  • Model & LLM strategy (Gemini 2.5 Pro usage)
  • Data flow, security, ethics, and deployment
  • Scalability, risks, and future extensions

📄 Read on GitHub
👉 docs/HLD.md

⬇️ Download full design document (DOCX, 90+ pages)
👉 docs/HLD.docx

The DOCX version is the authoritative long-form design, suitable for deep review and offline reading.


❌ The Problem

Legal contracts are:

  • Long and dense
  • Highly structured (clauses, tables, headers)
  • Extremely risk-sensitive

Existing approaches fail because:

  • OCR destroys layout semantics
  • Full-document LLM ingestion is expensive
  • Long-context hallucinations are common
  • Clause hierarchy and visual grouping are ignored

✅ The Solution — OpticalRAG

OpticalRAG is a vision-first RAG pipeline that:

  • Treats documents as visual data
  • Retrieves only relevant regions
  • Converts to text only when necessary

Key Benefits

  • 🔻 Significant token reduction
  • ⚡ Faster inference
  • 🧭 Layout-aware reasoning
  • 📄 Scales to very long contracts
  • 🧠 Reduced hallucinations

🧠 OpticalRAG Architecture

Traditional RAG pipelines fail on massive legal documents due to lossy OCR and limited context windows.

OpticalRAG solves this by design.

PDF Contract
↓
High-Resolution Page Images (300 DPI)
↓
Layout-Aware Segmentation (LayoutLMv3)
↓
Semantic Regions (Clauses, Tables, Headers)
↓
Multimodal Embeddings (Text + Vision)
↓
Vector Retrieval (Query-Aware)
↓
LLM Reasoning on Relevant Regions Only

Why OpticalRAG Works

  • Preserves spatial and structural context
  • Prevents lost-in-the-middle failures
  • Optimized for consumer GPUs
  • LLM is used for reasoning, not retrieval

🧩 Core Capabilities

🔍 Layout-Aware Retrieval

  • Vision-first document understanding
  • Hierarchical retrieval (page → section → clause)
  • Query-aware region selection

⚠️ Risk Flagging (Prototype)

Identifies potentially risky clauses such as:

  • One-sided obligations
  • Unusual termination rights
  • Missing liability protections

Risk levels:

  • 🔴 High Risk
  • 🟠 Review Needed
  • 🟢 Standard

📝 Smart Summarization

Role-specific summaries:

  • Executive overview
  • Key obligations
  • Financial exposure
  • Termination & liability highlights

👨‍⚖️ Human-in-the-Loop Review

  • Attorneys can accept or reject AI findings
  • Feedback enables iterative improvement
  • Designed for assistive decision-making

🛠️ Tech Stack

Core

  • Python 3.10+
  • FastAPI
  • Streamlit

Vision & Layout

  • LayoutLMv3
  • OpenCV
  • PaddleOCR

Embeddings & Retrieval

  • OpenCLIP (ViT-H/14)
  • BGE / E5
  • ChromaDB

LLMs

  • Gemini 2.5 Pro (primary, cloud reasoning)
  • Qwen2.5-7B / Mistral-7B (local fallback, quantized)

Gemini is used strictly for reasoning over retrieved regions, not full-document ingestion.


💻 Local Setup (Prototype)

1️⃣ Clone Repository

git clone https://github.com/your-username/SamvidAI.git
cd SamvidAI

2️⃣ Create Virtual Environment

python -m venv venv
source venv/bin/activate   # Linux / Mac
venv\Scripts\activate      # Windows

3️⃣ Install Dependencies

pip install -r requirements.txt

🎮 GPU Usage & Hardware

Tested Configuration

Component Specification
GPU RTX 4060 (8 GB VRAM)
RAM 24 GB
CPU 12-core
OS Windows / Linux

Optimized for consumer GPUs (RTX 4060, 8 GB VRAM) using quantization.

Memory Optimization

  • 4-bit quantized LLMs
  • Batched embeddings
  • Lazy region loading

Peak VRAM Usage

  • LayoutLMv3: ~2.1 GB
  • OpenCLIP: ~1.8 GB
  • LLM (7B, 4-bit): ~3.5 GB
  • ✅ Runs comfortably on consumer GPUs

▶️ Run Demo Locally

Start Backend

uvicorn api.main:app --reload

Launch UI

streamlit run ui/streamlit_app.py

Demo Flow

1. Upload a contract PDF

2. Ask a question (e.g. "What are termination risks?")

3. View:

  • Highlighted contract regions
  • Risk flags
  • Explanations

4. Accept or reject AI findings


📊 Benchmarks (Internal Evaluation)

Contract Size: 120 Pages

Based on controlled experiments and synthetic contracts. Not production guarantees.

Metric OCR + Text RAG SamvidAI (OpticalRAG)
Tokens to LLM High Significantly Lower
Latency High Reduced
Layout Accuracy Poor High
Hallucination Risk High Lower

💰 Cost Comparison (Per Document)

Approach Estimated Cost
Full-Text GPT-4 ~$4.20
OCR + RAG ~$1.90
SamvidAI ~$0.65

➡ ~65% cost reduction


📂 Project Structure

SamvidAI/
│
├── README.md                  # Product-facing overview (FIRST IMPRESSION)
├── WEBSITE.md                 # Landing page copy
├── DEMO.md                    # Demo links + walkthrough
│
├── docs/                      # SYSTEM & ENGINEERING (AUTHORITATIVE DESIGN)
│   ├── HLD.md                 # High-Level Design
│   ├── HLD.docx               # High-Level Design (Long-form, downloadable)    
│   ├── LLD.md                 # Low-Level Design
│   ├── ARCHITECTURE.md        # Component & deployment architecture
│   ├── PIPELINE.md            # End-to-end data & inference pipeline
│   ├── DATA_REPORTS.md        # Metrics, charts, evaluations
│   ├── EXPERIMENTS.md         # Ablations, experiments
│   ├── BENCHMARKS.md          # Performance comparisons
│   ├── SECURITY.md            # Security considerations
│   ├── ETHICS.md              # Ethics & safety
│
├── research/                  # SCIENTIFIC THINKING
│   ├── related_work.md        # Prior research & models
│   ├── papers.md              # Paper summaries & links
│   ├── findings.md            # Your insights & failures
│
├── product/                   # FOUNDER MODE
│   ├── roadmap.md             # 30-90-365 day plan
│   ├── monetization.md        # Business model
│   ├── user_personas.md       # Target users
│   ├── go_to_market.md        # Distribution strategy
│
├── src/                       # CODE
│   └── samvidai/
│       ├── __init__.py
│       │
│       ├── ingestion/         # PDF → image → layout
│       │   ├── __init__.py
│       │   ├── pdf_to_image.py
│       │   └── preprocess.py
│       │
│       ├── layout/            # Layout-aware segmentation
│       │   ├── __init__.py
│       │   └── layoutlm.py
│       │
│       ├── retrieval/         # OpticalRAG core
│       │   ├── __init__.py
│       │   ├── embeddings.py
│       │   ├── vector_store.py
│       │   └── retriever.py
│       │
│       ├── risk_engine/       # Clause classification & risk scoring
│       │   ├── __init__.py
│       │   ├── classifier.py
│       │   └── scorer.py
│       │
│       ├── llm/               # LLM interfaces
│       │   ├── __init__.py
│       │   ├── prompts.py
│       │   └── inference.py
│       │
│       └── utils/
│           ├── __init__.py
│           └── logger.py
│
├── api/                       # BACKEND
│   └── main.py                # FastAPI app
│
├── ui/                        # FRONTEND
│   └── streamlit_app.py
│
├── assets/                    # VISUALS
│   ├── images/
│   ├── videos/
│   └── diagrams/
│
├── tests/
│    └── TESTING_PLAN.md/
├── docker/
│   └── Dockerfile
│
├── requirements.txt
└── .gitignore


🧪 Research Techniques Used

SamvidAI incorporates modern retrieval and LLM research, including:

  • Hierarchical RAG
  • Query-aware retrieval
  • Late chunking
  • Lost-in-the-middle mitigation
  • Contrastive multimodal embeddings
  • Hybrid rule-based + LLM reasoning
  • Human-in-the-loop active learning

🗺️ Roadmap

Phase 1 — Ingestion

  • PDF → image conversion
  • Layout segmentation

Phase 2 — OpticalRAG

  • Multimodal retrieval
  • Query-aware chunking

Phase 3 — Risk Engine

  • Clause classification
  • Red / Amber / Green scoring

Phase 4 — Review UI

  • Attorney validation
  • Feedback storage

Phase 5 — Optimization

  • Latency tuning
  • Dataset-driven improvements

🎯 Vision

SamvidAI is built with a startup-first mindset:

  • Solves a real legal pain point
  • Optimized for limited hardware
  • Open-source friendly
  • Enterprise-ready foundation

The long-term goal is to evolve SamvidAI into a full legal intelligence platform for contract review, compliance, and dispute risk forecasting.


🤝 Contributing

Contributions, ideas, and discussions are welcome.

If you're interested in:

  • Legal AI
  • Multimodal RAG
  • Human-in-the-loop systems

You’ll feel right at home here.


📜 License

MIT License

If you like this project, ⭐ star the repo and join the journey.

About

SamvidAI — Enterprise Contract Intelligence powered by OpticalRAG Multimodal document understanding system for clause extraction, legal risk scoring, and explainable contract analysis using layout-aware RAG pipelines.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published