VectorDB — Vector Database Engine from Scratch

A production-grade vector database built in C++ with HNSW, KD-Tree, and Brute Force ANN search, a REST API backend, and a full RAG pipeline using local LLMs via Ollama.

What This Is

Most developers use vector databases as a black box. This project builds one from scratch.

It implements the core algorithms that power production databases like Pinecone, Weaviate, and Chroma — and layers a full RAG (Retrieval-Augmented Generation) pipeline on top using a local LLM. The result: you can insert text documents, ask questions in natural language, and get answers grounded in your own data — no cloud, no API keys.

Tech Stack

Language: C++17
HTTP Server: cpp-httplib (single-header)
Local LLM runtime: Ollama
Embedding model: nomic-embed-text (768-dimensional vectors)
LLM: llama3.2
Frontend: Vanilla HTML/CSS/JS

Features

Three ANN search algorithms implemented from scratch: Brute Force, KD-Tree, and HNSW
Pluggable distance metrics: Cosine, Euclidean, Manhattan
Real-time benchmarking — compare all three algorithms side-by-side with microsecond-level timing
HNSW graph visualizer — inspect layers, nodes, and edges in the browser
Document Q&A (RAG pipeline) — insert any text, ask questions, get LLM answers backed by retrieved context
Thread-safe inserts and searches via std::mutex
Full REST API — 10 endpoints, all returning JSON
Interactive web UI — no framework needed

Architecture

main.cpp (1100+ lines)
├── Distance Metrics        cosine, euclidean, manhattan
├── BruteForce              O(N·d) linear scan
├── KDTree                  O(log N) axis-aligned pruning
├── HNSW                    O(log N) multilayer graph, approximate
├── VectorDB                wraps all 3 — 16D demo vectors, benchmarking
├── DocumentDB              HNSW only — real 768D Ollama embeddings
├── OllamaClient            HTTP calls to /api/embeddings and /api/generate
├── chunkText()             splits docs into 250-word overlapping windows
└── main()                  httplib::Server — 10 REST endpoints

Two databases run in parallel:

VectorDB — 16-dimensional demo vectors across all 3 algorithms (for comparison and benchmarking)
DocumentDB — 768-dimensional real text embeddings, HNSW only (for actual RAG)

How the Algorithms Work

Brute Force

Computes distance from the query vector to every single vector in the database. Exact results, but O(N) — doubles in cost as data doubles. Used as a correctness baseline.

KD-Tree

Builds a binary tree by splitting space along alternating dimensions. Prunes entire subtrees during search when they can't contain a closer neighbor. Works well at low dimensions but degrades at 768D due to the curse of dimensionality.

HNSW (Hierarchical Navigable Small World)

The production algorithm. Builds a multi-layer graph:

Upper layers — sparse, long-range connections (like highways)
Lower layers — denser, short-range connections (local streets)
Search — enter at the top, greedily zoom toward the query, drop to the next layer, repeat

Maintains O(log N) performance even at 768 dimensions because navigation is graph-based, not geometry-based. This is what Pinecone, Weaviate, and Chroma use internally.

RAG Pipeline

Ingestion:

Document → split into 250-word chunks → embed each chunk (nomic-embed-text) → store in HNSW index

Query:

Question → embed question → HNSW finds top 3 chunks → chunks passed as context to llama3.2 → answer

The LLM only answers from the retrieved context — no hallucination on your own documents.

REST API

Method	Endpoint	Description
`POST`	`/insert`	Insert a demo vector
`GET`	`/search`	K-NN search with algo + metric params
`GET`	`/benchmark`	Run all 3 algos and compare latency
`GET`	`/hnsw-info`	Full HNSW graph structure (layers, edges)
`DELETE`	`/delete`	Remove a vector by ID
`POST`	`/doc/insert`	Embed + store a text document
`POST`	`/doc/ask`	Full RAG: question → retrieve → LLM answer

Example search request:

GET /search?v=0.1,0.4,0.9,...&k=5&metric=cosine&algo=hnsw

Getting Started

Prerequisites

C++17 compiler (GCC / Clang / MSVC)
Ollama installed and running locally
Pull the required models:

ollama pull nomic-embed-text
ollama pull llama3.2

Build

g++ -std=c++17 -O2 -o vectordb main.cpp

Run

./vectordb

Server starts at http://localhost:8080. Open index.html in your browser.

Demo

Benchmark — HNSW vs KD-Tree vs Brute Force

Insert vectors and hit /benchmark to compare all three algorithms. Output shows latency in microseconds.

Document Q&A

Insert any text via the Documents tab in the UI (or POST to /doc/insert)
Ask a question in natural language
The system retrieves the most relevant chunks and generates a grounded answer via llama3.2

Key C++ Concepts Used

std::priority_queue — max-heap for HNSW beam search
std::unordered_map<int, Node> — O(1) node lookup in the graph
std::mutex + std::lock_guard — thread-safe concurrent access (RAII)
std::chrono::high_resolution_clock — microsecond benchmarking
std::function<float(...)> — strategy pattern for pluggable distance metrics
Lambda captures on HTTP route handlers

What I Learned

Building this forced a deep understanding of:

Why KD-Trees fail at high dimensions (curse of dimensionality)
How HNSW's multilayer graph achieves sub-linear search without geometry tricks
The full RAG pipeline — from chunking and embedding to context-grounded generation
Thread safety in a multi-client HTTP server
The difference between cosine, Euclidean, and Manhattan distance for semantic similarity tasks

References

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
httplib.h		httplib.h
index.html		index.html
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VectorDB — Vector Database Engine from Scratch

What This Is

Tech Stack

Features

Architecture

How the Algorithms Work

Brute Force

KD-Tree

HNSW (Hierarchical Navigable Small World)

RAG Pipeline

REST API

Getting Started

Prerequisites

Build

Run

Demo

Benchmark — HNSW vs KD-Tree vs Brute Force

Document Q&A

Key C++ Concepts Used

What I Learned

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VectorDB — Vector Database Engine from Scratch

What This Is

Tech Stack

Features

Architecture

How the Algorithms Work

Brute Force

KD-Tree

HNSW (Hierarchical Navigable Small World)

RAG Pipeline

REST API

Getting Started

Prerequisites

Build

Run

Demo

Benchmark — HNSW vs KD-Tree vs Brute Force

Document Q&A

Key C++ Concepts Used

What I Learned

References

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages