Discover the right papers faster: hybrid BM25 + BERT search with an interactive citation graph for research influence patterns.
AI-powered research paper discovery engine with hybrid search (BM25 + BERT embeddings), citation network visualization, and a modern Next.js frontend. Backend is built with FastAPI, PostgreSQL, Redis caching, and ChromaDB for vector search. Fully containerized with Docker Compose.
- Hybrid search: custom BM25 (implemented from scratch) keyword relevance + BERT semantic similarity (Sentence Transformers all-MiniLM-L6-v2)
- Redis-backed caching with cache warming and cache management endpoints
- ChromaDB vector store integration for semantic retrieval
- PostgreSQL relational schema for papers, authors, venues, citations
- Interactive citation network visualization using graph data structures (O(1) vertex lookup, efficient traversal) rendered with D3/React
- Production-friendly: health checks, service-level status, and observability hooks
- Backend: FastAPI, SQLAlchemy, Pydantic, Uvicorn
- Search/AI: Custom BM25 implementation, Sentence Transformers, ChromaDB
- Data: PostgreSQL, Redis
- Frontend: Next.js 15, React 19, TypeScript, Tailwind, Radix UI, D3
- Infra: Docker, Docker Compose
src/app— FastAPI application (APIs, services, models)visual-search-engine— Next.js + TypeScript frontenddocker-compose.yml— Postgres, Redis, ChromaDB servicesrequirements.txt— Python backend dependencies
Prerequisites: Docker and Docker Compose installed.
# From repo root
docker compose up -d
# Services exposed
# - FastAPI: http://localhost:8000
# - PostgreSQL: localhost:5432 (scholarnet / scholarnet / scholarnet)
# - Redis: localhost:6379
# - ChromaDB: http://localhost:8001Then, run the frontend:
cd visual-search-engine
pnpm install # or npm install / yarn
pnpm dev # or npm run dev
# Frontend: http://localhost:3000Then, run the backend.
In the root directory, start a python virtual environment and install requirements:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtIn the src directory, run:
python run.pyTo load the database, you may either use your own data in line with src.app.models, or use the following CSV sourced from Kaggle: https://www.kaggle.com/datasets/nechbamohammed/research-papers-dataset. Place either csv file in the root directory.
Next, initialize the database by navigating to src.app.core and running init_db.py. Ensure you have changed the variable PAPERS_CSV_FILE to the appropriate name of your CSV file in the root directory.
Prerequisites: Python 3.11+, Node 18+/20+, PostgreSQL 15+, Redis 7+, ChromaDB server.
- Python environment and deps In the root directory, start a python virtual environment and install requirements:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt- Environment variables (adjust as needed)
Create
.envfromenv.exampleat repo root.
Key variables the backend reads (defaults shown):
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0
DATABASE_URL=postgresql+psycopg2://scholarnet:scholarnet@localhost:5432/scholarnet
CHROMA_HOST=localhost
CHROMA_PORT=8001- Start backend (FastAPI).
In the
srcdirectory, run:
python run.py- Start frontend (Next.js)
cd visual-search-engine
pnpm install && pnpm devTo load the database, you may either use your own data in line with src.app.models, or use the following CSV sourced from Kaggle: https://www.kaggle.com/datasets/nechbamohammed/research-papers-dataset. Place either csv file in the root directory.
Next, initialize the database by navigating to src.app.core and running init_db.py. Ensure you have changed the variable PAPERS_CSV_FILE to the appropriate name of your CSV file in the root directory.
- Create/update papers via REST (see endpoints below). The BM25 index auto-builds and updates on changes.
- Add vectors to ChromaDB for semantic search:
# Triggers embedding of all non-stub, unembedded papers
POST http://localhost:8000/api/v1/papers/vectors/Base URL: http://localhost:8000
GET /— API info and feature flagsGET /health— Aggregated service health (DB, Redis, Chroma)
POST /api/v1/search- Request body:
{ "query": "transformers", "page": 1, "size": 20, "bert_weight": 2.0, "citation_weight": 0.5 } - Returns hybrid-ranked results combining BM25 and BERT (with optional citation boost)
- Request body:
GET /api/v1/suggest/{text}— Semantic suggestions using ChromaDB
GET /api/v1/papers— Paginated listGET /api/v1/papers/{paper_id}— Paper details (authors, references)POST /api/v1/papers— Bulk create papers (seePaperTemplatein backend)PUT /api/v1/papers/{paper_id}— Update paper (also updates BM25 index)DELETE /api/v1/papers/{paper_id}