Skip to content

sdi2200288/vector-search-algorithms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

151 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔍 Vector Search Algorithms in C/C++

Approximate Nearest Neighbor (ANN) search on high-dimensional vectors — implemented from scratch.

Language Analysis Dataset Course Team


About This Project

This project implements and benchmarks four Approximate Nearest Neighbor (ANN) algorithms for searching high-dimensional vectors. The algorithms are evaluated on two real-world datasets (MNIST and SIFT) and compared across key performance metrics: QPS (Queries Per Second), Recall, and Approximation Factor (AF).

This is the 1st Programming Assignment for the course "Software Development for Algorithmic Problems".


Team

Name Student ID
Παπαθανασίου Ελένη 1115202200135
Τόντου Αλτάνη-Δάφνη 1115202200288

Algorithms Implemented

Algorithm Description
LSH Locality Sensitive Hashing — hash-based ANN search
Hypercube Random projection onto a hypercube for fast ANN
IVFFlat Inverted File index with flat (exact) distance computation
IVFPQ Inverted File index with Product Quantization for compressed search

Tech Stack

C++ Python pandas matplotlib Make

  • C++ (g++) — algorithm implementation
  • Python 3 — experimental analysis & plotting (pandas, matplotlib, seaborn, numpy)
  • Makefile — build system
  • Bash — automated experiment scripts

Project Structure

📁 src/
   ├── lsh.cpp           → LSH algorithm
   ├── hypercube.cpp     → Hypercube algorithm
   ├── IVFFlat.cpp       → IVFFlat algorithm
   ├── ivfpq.cpp         → IVFPQ algorithm
   ├── ivfpq_index.cpp   → IVFPQ auxiliary index structure
   ├── k_means.cpp       → k-means clustering (used by IVF methods)
   ├── mnist_data.cpp    → MNIST dataset loader
   ├── sift_data.cpp     → SIFT dataset loader
   └── main.cpp

📁 include/
   └── *.hpp             → Header files for all modules

📁 experiment/
   ├── parse_results.py  → Extract metrics from output files
   └── create_plot.py    → Generate comparison plots

📄 Makefile
📄 run_sift.sh           → SIFT experiment runner
📄 run_mnist.sh          → MNIST experiment runner

Build & Run

Compile

make all

Run individual algorithms

make run_lsh
make run_hypercube
make run_ivfflat
make run_ivfpq

Run experimental analysis (SIFT)

make sift_bash
make run_sift_bash
make run_parse_results
make run_create_plots

Run experimental analysis (MNIST)

make mnist_bash
make run_mnist_bash
make run_parse_results
make run_create_plots

Clean build files

make clean

Experimental Output

Running the analysis pipeline generates the following files:

CSV reports:

  • algorithm_comparison_table.csv
  • final_metrics_comparison.csv
  • detailed_summary.csv

Plots:

  • algorithm_comparison.png
  • qps_by_algorithm.png — queries per second per algorithm
  • recall_by_algorithm.png — recall per algorithm
  • qps_vs_recall.png — speed vs accuracy tradeoff
  • qps_vs_af.png — speed vs approximation factor
  • af_vs_recall_correlation.png
  • lsh_parameter_sensitivity.png
  • correlation_matrix.png

System Requirements

  • OS: Linux
  • C++ compiler: g++
  • Python 3 with: pandas, matplotlib, seaborn, numpy

Key Concepts Demonstrated

  • Approximate Nearest Neighbor search in high-dimensional spaces
  • Locality Sensitive Hashing (LSH) theory and implementation
  • Product Quantization for vector compression
  • k-means clustering from scratch
  • Performance benchmarking: QPS, Recall, Approximation Factor
  • Experimental analysis and data visualization

1st Programming Assignment · Software Development for Algorithmic Problems

About

Approximate Nearest Neighbor search using LSH, Hypercube, IVFFlat & IVFPQ in C++

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors