Approximate Nearest Neighbor (ANN) search on high-dimensional vectors — implemented from scratch.
This project implements and benchmarks four Approximate Nearest Neighbor (ANN) algorithms for searching high-dimensional vectors. The algorithms are evaluated on two real-world datasets (MNIST and SIFT) and compared across key performance metrics: QPS (Queries Per Second), Recall, and Approximation Factor (AF).
This is the 1st Programming Assignment for the course "Software Development for Algorithmic Problems".
| Name | Student ID |
|---|---|
| Παπαθανασίου Ελένη | 1115202200135 |
| Τόντου Αλτάνη-Δάφνη | 1115202200288 |
| Algorithm | Description |
|---|---|
| LSH | Locality Sensitive Hashing — hash-based ANN search |
| Hypercube | Random projection onto a hypercube for fast ANN |
| IVFFlat | Inverted File index with flat (exact) distance computation |
| IVFPQ | Inverted File index with Product Quantization for compressed search |
- C++ (g++) — algorithm implementation
- Python 3 — experimental analysis & plotting (
pandas,matplotlib,seaborn,numpy) - Makefile — build system
- Bash — automated experiment scripts
📁 src/
├── lsh.cpp → LSH algorithm
├── hypercube.cpp → Hypercube algorithm
├── IVFFlat.cpp → IVFFlat algorithm
├── ivfpq.cpp → IVFPQ algorithm
├── ivfpq_index.cpp → IVFPQ auxiliary index structure
├── k_means.cpp → k-means clustering (used by IVF methods)
├── mnist_data.cpp → MNIST dataset loader
├── sift_data.cpp → SIFT dataset loader
└── main.cpp
📁 include/
└── *.hpp → Header files for all modules
📁 experiment/
├── parse_results.py → Extract metrics from output files
└── create_plot.py → Generate comparison plots
📄 Makefile
📄 run_sift.sh → SIFT experiment runner
📄 run_mnist.sh → MNIST experiment runner
make allmake run_lsh
make run_hypercube
make run_ivfflat
make run_ivfpqmake sift_bash
make run_sift_bash
make run_parse_results
make run_create_plotsmake mnist_bash
make run_mnist_bash
make run_parse_results
make run_create_plotsmake cleanRunning the analysis pipeline generates the following files:
CSV reports:
algorithm_comparison_table.csvfinal_metrics_comparison.csvdetailed_summary.csv
Plots:
algorithm_comparison.pngqps_by_algorithm.png— queries per second per algorithmrecall_by_algorithm.png— recall per algorithmqps_vs_recall.png— speed vs accuracy tradeoffqps_vs_af.png— speed vs approximation factoraf_vs_recall_correlation.pnglsh_parameter_sensitivity.pngcorrelation_matrix.png
- OS: Linux
- C++ compiler: g++
- Python 3 with:
pandas,matplotlib,seaborn,numpy
- Approximate Nearest Neighbor search in high-dimensional spaces
- Locality Sensitive Hashing (LSH) theory and implementation
- Product Quantization for vector compression
- k-means clustering from scratch
- Performance benchmarking: QPS, Recall, Approximation Factor
- Experimental analysis and data visualization
1st Programming Assignment · Software Development for Algorithmic Problems