Skip to content

VIK-GraphRAG/Finance_GraphRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

56 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

VIK AI - Privacy-First Financial GraphRAG

Enterprise-grade financial intelligence system powered by knowledge graphs.

๐Ÿš€ Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Start Neo4j (Docker)
docker run -d \
  --name neo4j \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password \
  neo4j:latest

# 3. Configure environment
cp .env.backup .env
# Edit .env with your settings

# 4. Start services
./start.sh

Visit: http://localhost:8501

โœจ Features

  • Privacy-First: Offline processing with local LLMs (Ollama)
  • Graph Intelligence: Neo4j-powered knowledge graph
  • Multi-Hop Reasoning: 2-3 hop logical inference for hidden insights
  • Data Integration: Merge PDF + CSV + JSON into unified knowledge graph
  • Multi-Agent: Collaborative AI agents for deep analysis
  • 8GB RAM Optimized: Efficient memory management
  • Real-time Analysis: Fast query processing with caching
  • Path Visualization: Interactive reasoning path display

๐Ÿ“ฆ Architecture

src/
โ”œโ”€โ”€ agents/          # Multi-agent system (Analyst, Planner, Writer)
โ”œโ”€โ”€ engine/          # Graph processing engine
โ”‚   โ”œโ”€โ”€ extractor.py       # Entity/Relationship extraction
โ”‚   โ”œโ”€โ”€ translator.py      # JSON โ†’ Cypher
โ”‚   โ”œโ”€โ”€ integrator.py      # PDF + CSV + JSON integration
โ”‚   โ”œโ”€โ”€ reasoner.py        # Multi-hop reasoning engine
โ”‚   โ”œโ”€โ”€ graphrag_engine.py # Core engine
โ”‚   โ””โ”€โ”€ privacy_graph_builder.py # Privacy-optimized builder
โ”œโ”€โ”€ db/              # Neo4j integration
โ”œโ”€โ”€ mcp/             # External tool integration
โ”œโ”€โ”€ streamlit_app.py # Web UI
โ””โ”€โ”€ reasoning_ui.py  # Multi-hop reasoning UI

๐Ÿ”ง Configuration

Key environment variables in .env:

# Mode
RUN_MODE=API              # API (OpenAI) or LOCAL (Ollama)
PRIVACY_MODE=true         # Enable privacy-first mode

# OpenAI
OPENAI_API_KEY=sk-...

# Neo4j
NEO4J_URI=bolt://localhost:7687
NEO4J_PASSWORD=password

# Ollama (for Privacy Mode)
OLLAMA_BASE_URL=http://localhost:11434

๐Ÿ“Š Usage

PDF Analysis

  1. Go to "Data Ingestion" tab
  2. Upload PDF document
  3. System extracts entities and builds knowledge graph

Query Interface

  1. Go to "Query Interface" tab
  2. Ask questions about your data
  3. Get citation-backed answers with confidence scores

Advanced Settings

  • Temperature: Control creativity (0.0-2.0)
  • Retrieval Chunks: Number of context chunks (5-50)
  • Web Search: Enable real-time web data
  • Multi-Agent: Use collaborative AI pipeline

๐Ÿ› ๏ธ Development

# Run tests
python -m pytest tests/

# Check lints
python -m flake8 src/

# Format code
python -m black src/

๐Ÿ“ License

MIT License - See LICENSE file for details

๐Ÿ•ธ๏ธ Graph Visualization

์‹ค์‹œ๊ฐ„ ๊ทธ๋ž˜ํ”„ ์‹œ๊ฐํ™”

๋ฉ”์ธ Streamlit UI์˜ "๐Ÿ•ธ๏ธ Graph Visualizer" ํƒญ์—์„œ ๋ฐ”๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค!

./start.sh
# ๋˜๋Š”
streamlit run src/streamlit_app.py --server.port 8501

Visit: http://localhost:8501 โ†’ Graph Visualizer ํƒญ

๊ธฐ๋Šฅ

  • All Nodes: ์ „์ฒด ๊ทธ๋ž˜ํ”„ ๋ณด๊ธฐ
  • Company Focus: ํŠน์ • ๊ธฐ์—… ์ค‘์‹ฌ ๋„คํŠธ์›Œํฌ
  • Risk Analysis: ๋ฆฌ์Šคํฌ ๊ด€๊ณ„ ์‹œ๊ฐํ™”
  • Custom Query: Cypher ์ฟผ๋ฆฌ ์ง์ ‘ ์ž…๋ ฅ

์ƒ‰์ƒ ๊ตฌ๋ถ„

  • ๐Ÿ”ด Company (๊ธฐ์—…)
  • ๐Ÿ”ต Country (๊ตญ๊ฐ€)
  • ๐ŸŸข Industry (์‚ฐ์—…)
  • ๐ŸŸ  MacroIndicator (๊ฑฐ์‹œ๊ฒฝ์ œ)
  • ๐ŸŸฃ FinancialMetric (์žฌ๋ฌด์ง€ํ‘œ)

์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ๊ธฐ๋Šฅ

  • ๋…ธ๋“œ ๋“œ๋ž˜๊ทธ๋กœ ์œ„์น˜ ์กฐ์ •
  • ํด๋ฆญ์œผ๋กœ ์—ฐ๊ฒฐ๋œ ๋…ธ๋“œ ํ™•์ธ
  • ์คŒ/ํŒฌ์œผ๋กœ ๊ทธ๋ž˜ํ”„ ํƒ์ƒ‰
  • ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์œผ๋กœ ์ž๋™ ๋ฐฐ์น˜
  • ์‹ค์‹œ๊ฐ„ ๋…ธ๋“œ ๊ฒ€์ƒ‰ ๋ฐ ํ•„ํ„ฐ๋ง

๐Ÿง  Multi-Hop Reasoning System

ํ†ตํ•ฉ๋œ ์ธํ„ฐํŽ˜์ด์Šค

๋ชจ๋“  ๊ธฐ๋Šฅ์ด ํ•˜๋‚˜์˜ Streamlit ์•ฑ (Port 8501) ์— ํ†ตํ•ฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค!

./start.sh

Visit: http://localhost:8501

ํƒญ ๊ตฌ์กฐ:

  • ๐Ÿ“Š Query Interface: ์งˆ๋ฌธ & ๋‹ต๋ณ€
  • ๐Ÿ“ฅ Data Ingestion: PDF ์—…๋กœ๋“œ & ์ธ๋ฑ์‹ฑ
  • ๐Ÿ“ Data Sources: ๋ฐ์ดํ„ฐ ์†Œ์Šค ๊ด€๋ฆฌ
  • ๐Ÿ•ธ๏ธ Graph Visualizer: ์ง€์‹ ๊ทธ๋ž˜ํ”„ ์‹œ๊ฐํ™”

ํ•ต์‹ฌ ๊ธฐ๋Šฅ

1. ๋ฐ์ดํ„ฐ ํ†ตํ•ฉ (Data Integration)

  • PDF + CSV + JSON ํ†ตํ•ฉ ์ธ๋ฑ์‹ฑ
  • ์—”ํ‹ฐํ‹ฐ ์ž๋™ ๋ณ‘ํ•ฉ (์˜ˆ: 'NVDA' โ†’ 'Nvidia')
  • ์ง€ํ‘œ ๋ฐ์ดํ„ฐ ์—ฐ๊ฒฐ

2. ๋ฉ€ํ‹ฐํ™‰ ์ถ”๋ก  (Multi-Hop Reasoning)

  • 2-3 hop ๋…ผ๋ฆฌ์  ์ถ”๋ก  ์ฒด์ธ
  • A โ†’ B โ†’ C โ†’ D ์ธ๊ณผ๊ด€๊ณ„ ๋ถ„์„
  • ์ˆจ๊ฒจ์ง„ ๋ฆฌ์Šคํฌ ๋ฐœ๊ฒฌ

3. ์ถ”๋ก  ๊ฒฝ๋กœ ์‹œ๊ฐํ™”

  • ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ๊ฒฝ๋กœ ๊ทธ๋ž˜ํ”„
  • ๋…ธ๋“œ ๋ฐ ๊ด€๊ณ„ ์ƒ์„ธ ์ •๋ณด
  • ์‹ ๋ขฐ๋„ ๊ธฐ๋ฐ˜ ์ƒ‰์ƒ ์ฝ”๋”ฉ

์‚ฌ์šฉ ์˜ˆ์‹œ

# ์งˆ๋ฌธ: "How does Taiwan tension affect Nvidia?"

# ์ถ”๋ก  ๊ฒฐ๊ณผ:
๐Ÿ’ก Because Nvidia depends on TSMC (high criticality), 
   and TSMC is located in Taiwan, and Taiwan faces 
   geopolitical tension, therefore Nvidia is exposed 
   to significant supply chain disruption risk.

๐Ÿ“Š Confidence: 85%

๐Ÿ”— Reasoning Path:
   Taiwan Strait Tension โ†’ Taiwan โ†’ TSMC โ†’ Nvidia

๊ณ ๊ธ‰ ์‚ฌ์šฉ๋ฒ•

์ž์„ธํ•œ ๋‚ด์šฉ์€ Multi-Hop Reasoning Guide ์ฐธ์กฐ

API ์‚ฌ์šฉ

import asyncio
from engine.reasoner import MultiHopReasoner

async def analyze():
    reasoner = MultiHopReasoner()
    result = await reasoner.reason(
        question="Nvidia์˜ ๊ณต๊ธ‰๋ง ๋ฆฌ์Šคํฌ๋Š”?",
        max_hops=3
    )
    print(result['inference'])
    reasoner.close()

asyncio.run(analyze())

๐Ÿงช Testing

๋ฉ€ํ‹ฐํ™‰ ์‹œ์Šคํ…œ ํ…Œ์ŠคํŠธ

python test_multihop_system.py

ํ…Œ์ŠคํŠธ ํ•ญ๋ชฉ:

  1. โœ… Entity Resolver - ์—”ํ‹ฐํ‹ฐ ์ด๋ฆ„ ์ •๊ทœํ™”
  2. โœ… Data Integrator - CSV/JSON ํ†ตํ•ฉ
  3. โœ… Multi-Hop Reasoner - ์ถ”๋ก  ์—”์ง„
  4. โœ… End-to-End - ์ „์ฒด ์›Œํฌํ”Œ๋กœ์šฐ

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages