Benchmark semantic search (nearest-neighbor search) on NVIDIA DGX Spark hardware, comparing Milvus CPU indexes vs GPU-accelerated indexes (powered by cuVS) in the same database.
| Index Type | Methods | Backend |
|---|---|---|
| Milvus CPU | FLAT, IVF_FLAT, HNSW | CPU |
| Milvus GPU | GPU_BRUTE_FORCE, GPU_IVF_FLAT, GPU_CAGRA | cuVS on GPU |
- NVIDIA DGX Spark (or any machine with an NVIDIA GPU and CUDA)
- Docker and Docker Compose (with NVIDIA Container Toolkit for GPU support)
- uv package manager
docker compose up -dThis starts Milvus standalone with etcd and MinIO. The default configuration uses the GPU-enabled Milvus image.
If the GPU image doesn't work on your platform (e.g., no ARM64 GPU image available), edit docker-compose.yml:
- Change the image from
milvusdb/milvus:v2.5.6-gputomilvusdb/milvus:v2.5.6 - Remove the
deployblock
The notebook will detect whether GPU indexes are available and skip them if not.
uv syncFor GPU support (cuVS and CuPy):
uv sync --extra gpuuv run jupyter labThen open benchmark.ipynb and run all cells.
Key parameters in Cell 2 of the notebook:
| Parameter | Default | Description |
|---|---|---|
CORPUS_SIZES |
[10K, 100K, 500K, 1M] | Vector corpus sizes to benchmark |
DIM |
384 | Embedding dimension |
N_QUERIES |
100 | Number of query vectors |
K |
10 | Top-K neighbors to retrieve |
METRIC_TYPE |
L2 | Distance metric (GPU indexes don't support COSINE) |
N_RUNS |
10 | Timed iterations per method |
- benchmark_results.png -- Four-panel chart (latency, throughput, build time, recall)
- Speedup summary table -- CPU vs GPU latency with speedup factors
- Full results DataFrame -- All metrics for every method at every corpus size
docker compose downTo also remove persisted data:
docker compose down -v