Skip to content

AditM99/RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Graph-Powered Conversational Search (RAG + Neo4j Vector DB) Example project demonstrating a Hybrid RAG flow using Neo4j as both a knowledge graph and a vector database. Use Hugging Face embeddings to generate passage vectors and store them directly in Neo4j. Use spaCy to extract entities and store them as nodes/relationships in Neo4j (entity graph).

Hybrid retrieval: Neo4j vector index returns relevant passages. Neo4j graph returns connected entities. Both are merged into a prompt for the LLM.

FastAPI backend exposes /ingest and /query endpoints for ingestion and question answering.

What is included

backend/ : FastAPI app, ingestion and retrieval code docker-compose.yml : spins up Neo4j for local testing .env.template : environment variables required requirements.txt : Python packages example_data/ : small example text file to ingest

Setup (quick) Copy .env.template to .env and fill values: HUGGINGFACEHUB_API_TOKEN (required for Hugging Face models) NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD (if using local docker-compose, defaults are in the template)

Create a Python venv and install requirements:

python -m venv venv source venv/bin/activate pip install -r requirements.txt

Start Neo4j locally for testing:

docker-compose up -d

open http://localhost:7474 (browser) and login with neo4j/letmein (see .env.template)

Create a vector index in Neo4j (one-time setup, example for 384-dim embeddings):

CREATE VECTOR INDEX chunk_embeddings IF NOT EXISTS FOR (c:Passage) ON (c.embedding) OPTIONS {indexConfig: { vector.dimensions: 384, vector.similarity_function: 'cosine' }};

Run the API:

uvicorn backend.main:app --reload --port 8000

Ingest a file (example provided): curl -X POST "http://localhost:8000/ingest" -F "file=@example_data/example.txt"

Query: curl -X POST "http://localhost:8000/query"
-H "Content-Type: application/json"
-d '{"query":"What unusual activity was flagged on John Doe’s account?"}'

Notes & caveats

All semantic search is performed inside Neo4j’s vector index, no external vector DB like Pinecone is needed. In production, you should add batching, error handling, authentication, rate-limiting, and robust entity linking. The LLM used here is Hugging Face (flan-t5-base or similar). You can swap it for OpenAI or any other provider. Keep an eye on embedding dimensions — they must match between Hugging Face model and Neo4j vector index configuration.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors