Skip to content

danialtranz/LexCompanion

Repository files navigation

Lex Companion Logo

Lex Companion

Agentic Legal AI for Vietnam

🇻🇳 Tiếng Việt · 🇬🇧 English

Lex Companion is an agentic AI legal companion for Vietnam that helps individuals and businesses understand regulations, research legal issues, evaluate options, and generate legal documents through specialized legal agents grounded in authoritative legal sources.

Python 3.12+ FastAPI LangGraph Next.js


Project Overview

Navigating Vietnamese law often requires searching across thousands of legal provisions, understanding relationships between regulations, and translating legal language into practical actions.

Lex Companion acts as an AI legal companion that assists users throughout this process. Instead of functioning as a traditional chatbot, it coordinates specialized legal agents capable of legal research, information retrieval, document drafting, decision support, and problem-solving.

Core capabilities include:

  • Agentic legal workflows powered by intent-specific LangGraph agents
  • Grounded legal reasoning over the Vietnamese Pháp điển legal codex (~64k articles)
  • Hybrid retrieval architecture combining keyword search, semantic search, and reranking
  • Citation-backed responses with transparent references to legal sources
  • Human-in-the-loop document generation for contracts and legal forms
  • Session-based knowledge augmentation through user-provided documents

For detailed architecture documentation, see docs/ARCHITECTURE.md · Tiếng Việt


Features

Feature Description
Legal Q&A Ask questions about Vietnamese law; get answers grounded in Pháp điển articles
Legal Research Multi-query RAG with ontology-aware query expansion and retry
Hybrid Search Elasticsearch keyword + KNN vector fusion with BGE reranking
Citation Tracking Every factual claim links to [n] inline citations and a reference panel
Intent Routing 6 specialized agent workflows: information, decision, problem-solving, exploration, task execution, communication
Document Generation Contract template selection, form fill, and DOCX output with HITL checkpoints
User Knowledge Base Upload personal documents for session-scoped retrieval
Legal Corpus Visualization Interactive graph of topics, subjects, and articles (admin)
Web Fallback Tavily web search when legal corpus context is insufficient
i18n Vietnamese and English UI

Architecture Overview

flowchart TB
    subgraph Client
        WEB["Next.js :3004"]
    end

    subgraph Backend
        API["FastAPI :5999"]
        WORKER["Redis Worker"]
    end

    subgraph AI
        AGENTS["LangGraph Agents<br/>(6 intents)"]
        RAG["RAG Pipeline<br/>ES → Rerank → LLM"]
    end

    subgraph Infrastructure
        PG[(PostgreSQL)]
        ES[(Elasticsearch)]
        MINIO[(MinIO)]
        REDIS[(Redis)]
    end

    WEB --> API
    API --> AGENTS
    AGENTS --> RAG
    RAG --> ES
    API --> PG
    API --> MINIO
    WORKER --> REDIS
    WORKER --> ES
Loading

Request flow: User message → JWT auth → intent routing → LangGraph agent → hybrid retrieval → rerank → LLM with citations → persist & respond.

See docs/ARCHITECTURE.md for complete diagrams covering request lifecycle, agent workflows, RAG pipeline, database design, and deployment.


Technology Stack

Layer Technologies
Frontend Next.js 16, React 19, TanStack Query, Tailwind CSS 4
Backend FastAPI, Uvicorn, Peewee ORM, Pydantic v2
AI/Agents LangGraph, LangChain, FlagEmbedding
Search Elasticsearch 8.13 (hybrid keyword + KNN)
Embedding AITeamVN/Vietnamese_Embedding_v2 (1024 dims)
Reranking BAAI/bge-reranker-v2-m3
LLM OpenAI-compatible API
Document Processing Docling, PyMuPDF, python-docx
Storage PostgreSQL, MinIO, Redis
Package Management uv (Python), npm (Frontend)

Quick Start

Prerequisites

Component Version
Python 3.12+
uv Latest
Docker For infrastructure services
Node.js 20+ (for frontend)

1. Clone and configure

git clone <repository-url>
cd langgraph-base
cp .env.example .env
# Edit .env with your credentials (see Configuration section)

2. Start with Docker Compose (recommended)

# Linux: ensure ES can start
sudo sysctl -w vm.max_map_count=262144

docker compose -f docker/docker-compose.yml up -d --build
Service URL
Web UI http://localhost:3005
API http://localhost:6000
API Docs http://localhost:6000/docs
Kibana http://localhost:5602

3. Or run locally (development)

Infrastructure (Postgres, MinIO, Redis, Elasticsearch):

# See api/deployment.readme.md and
# model_serving/retrievers/elastic_search/deployment.readme.md

Backend:

uv venv --python 3.12
uv sync
uv run --env-file .env python -m api.lex_companion_server
# API at http://localhost:5999

Frontend:

cd web
npm install
npm run dev
# UI at http://localhost:3004

Installation

Backend dependencies

All Python dependencies are managed via uv from the repository root:

uv sync                  # Install production dependencies
uv sync --group dev      # Include LangGraph CLI for development

Important: Do not run uv pip install -e . — this project uses package = false and runs via PYTHONPATH.

Frontend dependencies

cd web
npm install

Embedding service (optional, self-hosted)

cd model_serving/embeddings/vie_embedding_v2
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python app.py  # Runs on port 6501

Configuration

Copy .env.example to .env and configure:

Required

# Database
POSTGRES_USER=your_user
POSTGRES_PASSWORD=your_password
POSTGRES_DB=lex_companion
POSTGRES_HOST=localhost
POSTGRES_PORT=5432

# Object Storage
MINIO_HOST=localhost:6503
MINIO_USER=your_user
MINIO_PASSWORD=your_password
MINIO_BUCKET=lex-companion

# Search
ELASTIC_HOST=localhost:6505
ELASTIC_PASSWORD=your_password
LEX_CHUNKS_INDEX=lex_chunks_v1
LEGAL_VECTOR_DIMS=1024

# LLM
OPENAI_API_KEY=your_key
OPENAI_BASE_URL=https://api.openai.com/v1
LLM_MODEL=gpt-4o

# Embedding
EMBEDDING_PROVIDER=openai
EMBEDDING_BASE_URL=http://localhost:6501/v1
EMBEDDING_MODEL=AITeamVN/Vietnamese_Embedding_v2

# Auth
JWT_SECRET_KEY=your_secret
GOOGLE_CLIENT_ID=your_client_id
GOOGLE_CLIENT_SECRET=your_client_secret
GOOGLE_REDIRECT_URI=http://localhost:3004/auth/google/callback

Recommended

# Reranking (significantly improves retrieval quality)
RERANK_ENABLED=true
RERANK_MODEL_NAME=BAAI/bge-reranker-v2-m3

# Redis (enables background document processing)
REDIS_HOST=localhost
REDIS_PORT=6376
REDIS_PASSWORD=your_password

# Web search fallback
TAVILY_API_KEY=your_key

Frontend (build-time)

# web/.env
NEXT_PUBLIC_API_SERVER=http://localhost:5999
NEXT_PUBLIC_GOOGLE_CLIENT_ID=your_client_id
NEXT_PUBLIC_GOOGLE_OAUTH2_CALLBACK=http://localhost:3004/auth/google/callback

See docs/ARCHITECTURE.md for the complete environment variable reference.


Development

Project structure

langgraph-base/
├── api/                    # FastAPI backend
│   ├── lex_companion_server.py
│   ├── apps/
│   │   ├── routers/        # Auto-loaded route definitions
│   │   ├── controllers/    # Request handlers
│   │   └── services/       # Business logic + orchestration
│   ├── db/models.py        # Peewee ORM models
│   └── worker/             # Redis stream background worker
├── deepagent/              # LangGraph agents + document processing
│   ├── multiagent/legal_assistant/  # Intent-specific graph workflows
│   └── core/               # Rerank, splitters, embeddings, HITL
├── model_serving/          # Standalone embedding + LLM services
├── web/                    # Next.js frontend
├── docker/                 # Docker Compose + Dockerfiles
├── docs/                   # Technical documentation
├── scripts/                # Startup scripts
├── pyproject.toml          # Python dependencies (uv)
└── .env.example            # Environment template

Running the API

# Recommended
uv run --env-file .env python -m api.lex_companion_server

# Or via script
./scripts/start_lex_api.sh

Do not run python api/lex_companion_server.py directly — use python -m api.lex_companion_server with PYTHONPATH=..

Adding dependencies

uv add requests              # Production dependency
uv add --dev pytest          # Dev dependency
uv sync                      # Reinstall from lockfile

Creating new API endpoints

Follow the layered pattern documented in api_creating_instruction.md:

Router → Controller → Service → DB/ES/Agent

LangGraph development

uv sync --group dev
uv run langgraph dev

Note: langgraph.json references a legacy graph path. Active graphs are in deepagent/multiagent/legal_assistant/.

Running tests

uv run python -m pytest tests/

Deployment

Docker Compose (full stack)

docker compose -f docker/docker-compose.yml up -d --build

Services and ports:

Service Host Port Purpose
PostgreSQL 5445 Relational database
MinIO 6503/6504 Object storage
Redis 6376 Task queue
Elasticsearch 6505 Search + vectors
Kibana 5602 ES management UI
Embedding 6502 Vietnamese embedding model
API 6000 FastAPI backend
Web 3005 Next.js frontend

Production considerations

  • Set strong secrets for JWT_SECRET_KEY, database passwords, and MinIO credentials
  • Configure RERANK_DEVICE=cuda:0 if GPU is available
  • Implement persistent checkpointer (Redis/Postgres scaffold exists) for HITL reliability
  • Import Pháp điển corpus via POST /v1/admin/doc/upload after deployment
  • No CI/CD pipeline is included — set up your own (Inferred from implementation)

See docs/ARCHITECTURE.md for networking diagrams and detailed deployment notes.


API Overview

Domain Prefix Key Endpoints
Auth /v1/user POST /oAuth-login
Chat /v1/user POST /user_chat, GET /sessions, GET /session
Contract /v1/user POST /contract/fill, GET /contract/draft/*
Documents /v1 POST /doc/upload, GET /docs, POST /doc/run
Admin /v1/admin POST /doc/retrieval, POST /doc/upload, GET /doc/topic

Full API documentation with inputs/outputs: docs/ARCHITECTURE.md

Interactive docs available at /docs when the API is running.


Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/your-feature)
  3. Follow the existing code patterns:
    • Backend: Router → Controller → Service layering
    • Agents: Add nodes to intent-specific graphs in deepagent/multiagent/legal_assistant/
    • Frontend: React Query hooks in web/hooks/, services in web/service/
  4. Run tests: uv run python -m pytest tests/
  5. Submit a pull request

Code conventions

  • Python 3.12+, type hints, Pydantic v2 models
  • Peewee ORM for database (not SQLAlchemy)
  • LangGraph StateGraph with typed state (LegalAssistantState)
  • API response envelope: { code, msg, data }
  • Environment variables via .env (never commit secrets)

License

Maintained by project contributors.


Documentation

Document Description
docs/ARCHITECTURE.md Full technical architecture (English)
docs/ARCHITECTURE.vi.md Kiến trúc kỹ thuật (Tiếng Việt)
api/deployment.readme.md Manual Docker run for Postgres/MinIO/Redis
api_creating_instruction.md API development conventions
model_serving/retrievers/elastic_search/deployment.readme.md Elasticsearch setup

Acknowledgements

Lex Companion would not be possible without the open legal data shared by the community.

We are grateful to tmquan/phapdien-moj-gov-vn on Hugging Face for publishing the Vietnamese legal codex (Pháp điển) dataset sourced from the Ministry of Justice. This project uses multiple configs from that dataset — including tree_nodes, articles, subjects, and ontology metadata — as the foundation of our legal knowledge base, Elasticsearch indexing pipeline, and citation-backed retrieval.

Thank you to the maintainers and contributors of that dataset for making structured Vietnamese legal knowledge openly available.


Roadmap & Future Improvements

Lex Companion is actively evolving toward a full agentic legal assistant for Vietnam. The core RAG and information-intent workflows are in place; several specialized agents still need to be built out.

Agent completion

Agent Path Current state Target
Decision deepagent/multiagent/legal_assistant/decision/ Single-node flow: retrieval + placeholder options and estimates Multi-step decision reasoning — risk analysis, option comparison, consequence mapping, and structured recommendations grounded in retrieved law
Problem solving deepagent/multiagent/legal_assistant/problem_solving/ Single-node flow: retrieval + static strategy template Dynamic legal problem decomposition — step-by-step action plans, milestone tracking, and iterative clarification when facts are incomplete

Other areas on the roadmap:

  • Exploration agent — richer open-ended legal research with web + corpus fusion
  • Persistent HITL checkpointing — Redis/Postgres checkpointer for reliable contract-fill resume across restarts
  • User document ingestion — complete Docling parse pipeline for uploaded KB documents
  • Calculator tools — real fine/penalty estimation logic (currently placeholder)
  • CI/CD & production hardening — automated testing, deployment pipelines, and observability

Contributions toward any of these areas are welcome — see Contributing above.

About

Vietnamese Legal AI Assistant built with LangGraph, DeepAgent, Elasticsearch, and RAG. Supports legal research, legal Q&A, document generation, and citation-backed reasoning.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors