Skip to content

A sophisticated research paper search and visualization platform that combines advanced information retrieval techniques with interactive graph visualization

Notifications You must be signed in to change notification settings

GaminRick7/ScholarSearch

Repository files navigation

ScholarSearch

Discover the right papers faster: hybrid BM25 + BERT search with an interactive citation graph for research influence patterns.

AI-powered research paper discovery engine with hybrid search (BM25 + BERT embeddings), citation network visualization, and a modern Next.js frontend. Backend is built with FastAPI, PostgreSQL, Redis caching, and ChromaDB for vector search. Fully containerized with Docker Compose.

Features

  • Hybrid search: custom BM25 (implemented from scratch) keyword relevance + BERT semantic similarity (Sentence Transformers all-MiniLM-L6-v2)
  • Redis-backed caching with cache warming and cache management endpoints
  • ChromaDB vector store integration for semantic retrieval
  • PostgreSQL relational schema for papers, authors, venues, citations
  • Interactive citation network visualization using graph data structures (O(1) vertex lookup, efficient traversal) rendered with D3/React
  • Production-friendly: health checks, service-level status, and observability hooks

Tech Stack

  • Backend: FastAPI, SQLAlchemy, Pydantic, Uvicorn
  • Search/AI: Custom BM25 implementation, Sentence Transformers, ChromaDB
  • Data: PostgreSQL, Redis
  • Frontend: Next.js 15, React 19, TypeScript, Tailwind, Radix UI, D3
  • Infra: Docker, Docker Compose

Repository Layout

  • src/app — FastAPI application (APIs, services, models)
  • visual-search-engine — Next.js + TypeScript frontend
  • docker-compose.yml — Postgres, Redis, ChromaDB services
  • requirements.txt — Python backend dependencies

Quickstart (Docker Compose)

Prerequisites: Docker and Docker Compose installed.

# From repo root
docker compose up -d

# Services exposed
# - FastAPI:     http://localhost:8000
# - PostgreSQL:  localhost:5432 (scholarnet / scholarnet / scholarnet)
# - Redis:       localhost:6379
# - ChromaDB:    http://localhost:8001

Then, run the frontend:

cd visual-search-engine
pnpm install   # or npm install / yarn
pnpm dev       # or npm run dev
# Frontend: http://localhost:3000

Then, run the backend.

In the root directory, start a python virtual environment and install requirements:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

In the src directory, run:

python run.py

To load the database, you may either use your own data in line with src.app.models, or use the following CSV sourced from Kaggle: https://www.kaggle.com/datasets/nechbamohammed/research-papers-dataset. Place either csv file in the root directory. Next, initialize the database by navigating to src.app.core and running init_db.py. Ensure you have changed the variable PAPERS_CSV_FILE to the appropriate name of your CSV file in the root directory.

Local Development (without Docker)

Prerequisites: Python 3.11+, Node 18+/20+, PostgreSQL 15+, Redis 7+, ChromaDB server.

  1. Python environment and deps In the root directory, start a python virtual environment and install requirements:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
  1. Environment variables (adjust as needed) Create .env from env.example at repo root.

Key variables the backend reads (defaults shown):

REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_DB=0
DATABASE_URL=postgresql+psycopg2://scholarnet:scholarnet@localhost:5432/scholarnet
CHROMA_HOST=localhost
CHROMA_PORT=8001
  1. Start backend (FastAPI). In the src directory, run:
python run.py
  1. Start frontend (Next.js)
cd visual-search-engine
pnpm install && pnpm dev

Data Ingestion and Embedding

To load the database, you may either use your own data in line with src.app.models, or use the following CSV sourced from Kaggle: https://www.kaggle.com/datasets/nechbamohammed/research-papers-dataset. Place either csv file in the root directory. Next, initialize the database by navigating to src.app.core and running init_db.py. Ensure you have changed the variable PAPERS_CSV_FILE to the appropriate name of your CSV file in the root directory.

  • Create/update papers via REST (see endpoints below). The BM25 index auto-builds and updates on changes.
  • Add vectors to ChromaDB for semantic search:
# Triggers embedding of all non-stub, unembedded papers
POST http://localhost:8000/api/v1/papers/vectors/

Key API Endpoints

Base URL: http://localhost:8000

Health and status

  • GET / — API info and feature flags
  • GET /health — Aggregated service health (DB, Redis, Chroma)

Search

  • POST /api/v1/search
    • Request body:
      {
        "query": "transformers",
        "page": 1,
        "size": 20,
        "bert_weight": 2.0,
        "citation_weight": 0.5
      }
    • Returns hybrid-ranked results combining BM25 and BERT (with optional citation boost)

Suggestions

  • GET /api/v1/suggest/{text} — Semantic suggestions using ChromaDB

Papers

  • GET /api/v1/papers — Paginated list
  • GET /api/v1/papers/{paper_id} — Paper details (authors, references)
  • POST /api/v1/papers — Bulk create papers (see PaperTemplate in backend)
  • PUT /api/v1/papers/{paper_id} — Update paper (also updates BM25 index)
  • DELETE /api/v1/papers/{paper_id}

About

A sophisticated research paper search and visualization platform that combines advanced information retrieval techniques with interactive graph visualization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •