A full-stack knowledge graph-based Retrieval Augmented Generation (RAG) system for Vietnamese tax law documents. Compares vector search vs graph-based retrieval approaches.
Document-Graph-Representation/
├── api/ # FastAPI backend
│ ├── routers/ # API endpoints (graph, rag, health)
│ ├── services/ # Business logic
│ └── schemas/
├── frontend/ # React frontend
│ ├── src/
│ │ ├── components/
│ │ ├── pages/
│ │ ├── services/
│ │ └── stores/ # Zustand state
├── rag_model/ # ML pipeline
│ ├── model/ # NER, RE, document processing
│ └── retrieval_pipeline/ # Retrieval strategies
├── shared_functions/ # Utilities (Neo4j, S3, eval)
└── docs/ # Documentation
| Component | Technology |
|---|---|
| Framework | FastAPI 0.115.6 |
| Graph DB | Neo4j 5.27.0 (AuraDB) |
| Storage | AWS S3 |
| Embeddings | sentence-transformers 3.3.1 |
| NLP | Underthesea (Vietnamese) |
| Component | Technology |
|---|---|
| Framework | React 18.3.1 |
| Language | TypeScript 5.8.3 |
| Build | Vite 5.4.19 |
| State | Zustand 5.0.8 |
| Data | TanStack Query 5.83.0 |
| UI | shadcn/ui + Tailwind CSS |
| Graph Viz | react-force-graph |
- Python 3.8+
- Node.js 18+
- Neo4j database (local or AuraDB) - Required for backend to work
# Clone repo
git clone https://github.com/GinHikat/Document-Graph-Representation.git
cd Document-Graph-Representation
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirement.txt
pip install -r requirements-api.txt# Copy example config
cp .env.example .env
# Ask team lead for the actual credentials to fill in:
# - NEO4J_URI, NEO4J_AUTH (Neo4j database)
# - GOOGLE_API_KEY (Gemini API for RAG answers)Note: The project uses a shared Neo4j database. Contact the team for credentials - don't create a new one.
# From project root (NOT from api/ folder)
uvicorn api.main:app --reload --port 8000
# Verify: Open http://localhost:8000/api/health
# Should return {"status": "healthy", ...}# In a new terminal
cd frontend
npm install
# Configure environment
cp .env.example .env
# Default VITE_API_URL=http://localhost:8000/api is correct
# Run dev server
npm run devFrontend runs at http://localhost:8080, API at http://localhost:8000.
| Method | Endpoint | Description |
|---|---|---|
| GET | /nodes |
Fetch graph nodes with optional filters |
| POST | /execute |
Execute Cypher queries |
| GET | /schema |
Get graph schema |
| GET | /stats |
Graph statistics |
| Method | Endpoint | Description |
|---|---|---|
| POST | /query |
RAG query with SSE streaming |
| POST | /retrieve |
Retrieve relevant context |
| POST | /rerank |
Rerank retrieved results |
| GET | /tools |
List available tools |
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Health check |
modes = {
1: "default", # Standard embedding search
2: "traverse_embed", # Embeddings + Graph Traversal
3: "traverse_exact", # Exact Match + Graph TRaversal
4: "exact_match", # Exact Match
5: "exact_match_with_rerank", # Exact match then Rerank with embeddings
6: "hybrid_search", # Top k by both Embeddings and Exact match
}models = {
0: "paraphrase-multilingual-MiniLM-L12-v2",
1: "distiluse-base-multilingual-cased-v2",
2: "all-mpnet-base-v2",
3: "all-MiniLM-L12-v2",
4: "vinai/phobert-base", # Vietnamese-specific
5: "BAAI/bge-m3" # Evaluation only
}mode_map = {
1: 'embedding',
2: 'jaccard',
3: 'combined'
}# Neo4j
NEO4J_URI=neo4j+s://xxx.databases.neo4j.io
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your-password
# AWS S3
AWS_ACCESS_KEY_ID=your-key
AWS_SECRET_ACCESS_KEY=your-secret
AWS_BUCKET_NAME=your-bucket
AWS_REGION=ap-southeast-1
# Optional
OPENAI_API_KEY=your-openai-keyVITE_API_URL=http://localhost:8000/api
VITE_ENABLE_GRAPH_VIEW=true
VITE_ENABLE_ANNOTATIONS=truefrom shared_functions.batch_retrieval_neo4j import Neo4j_retriever
retriever = Neo4j_retriever()
# Single query
result = retriever.query_neo4j(
text="Thuế thu nhập cá nhân",
mode=6, # Hybrid search
graph=True, # Use Graph Embedding, None if only use Textual Embedding
chunks=None, # Include chunk nodes (only available in GraphSAGE integrated database)
hop=2, # Number of hops in traversal
namespace = 'Test_rel_3' # Namespace (Node label for filtering)
)
# Batch query, df should include "question" column
df = retriever.batch_query(df, mode=2, graph=True, chunks=True, hop=2, namespace = 'Test')from shared_functions.eval import Evaluator
from shared_functions.batch_retrieve_neo4j import *
retriever = Neo4j_retriever()
eval = Evaluator(embedding_as_judge=5)
# Combined evaluation
result = eval.combined_evaluator(
referenced_context="...",
retrieved_context="...",
embedding_threshold=0.7,
jaccard_threshold=0.3,
scaling_factor=0.5
)
# For batch evaluation, df must have supporting_context and retrieved_context columns with List type
retriever.str_to_list(df, 'supporting_context')
retriever.str_to_list(df, 'retrieved_context')
eval.run_evaluation(df, embedding_threshold = , jaccard_threshold = , scaling_factor = , mode = )
# RAGAS evaluation
eval.ragas(df) # df: question, answer, retrieved_contexts| Page | Route | Description |
|---|---|---|
| Home | / |
Dashboard overview |
| Documents | /documents |
Document management |
| Q&A | /qa |
Query interface |
| Graph | /graph |
Knowledge graph visualization |
| Annotate | /annotate |
Document annotation |
Backend Tests
# Run all tests (40 tests, 100% pass rate)
pytest api/tests/ -v
# Run with coverage (69% coverage)
pytest api/tests/ --cov=api --cov-report=html
# Run specific test file
pytest api/tests/test_auth.py -v # 13 auth tests
pytest api/tests/test_documents.py -v # 17 document tests
pytest api/tests/test_rag.py -v # 10 RAG tests
# See docs/testing.md for detailed testing documentationFrontend Tests
# Linting
npm run lint
# Type checking
npm run build# Frontend
cd frontend
npm run build
# Output in frontend/dist/
# Serve with backend
# Configure FastAPI to serve static filesSee docs/ for detailed documentation:
docs/system-architecture.md- Architecture diagramsdocs/codebase-summary.md- Component detailsdocs/code-standards.md- Coding conventionsdocs/project-roadmap.md- Development roadmapdocs/testing.md- Testing guide and coverage reports
Cause: Backend server is not running or crashed on startup.
Solutions:
-
Make sure you're running the backend from project root:
# Correct (from project root) uvicorn api.main:app --reload --port 8000 # Wrong (from api/ folder) cd api && uvicorn main:app --reload --port 8000
-
Check if
.envhas valid credentials (ask team for credentials):# Required in .env NEO4J_URI=<get from team> NEO4J_AUTH=<get from team>
-
Verify backend health:
curl http://localhost:8000/api/health # Should return {"status": "healthy", ...}
The frontend can't reach the API. Check:
- Is backend running on port 8000?
- Is
VITE_API_URL=http://localhost:8000/apiset infrontend/.env? - Any CORS errors in browser console?
# Find process on port 8000
lsof -i :8000
# Kill it
kill -9 <PID>MIT
- Tax Legal RAG Team