Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,592 changes: 1,297 additions & 295 deletions Cargo.lock

Large diffs are not rendered by default.

19 changes: 17 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1031,7 +1031,22 @@ Run RuVector wherever your application lives — as a server, a PostgreSQL exten

## Performance

Real numbers from real benchmarks — measured on Apple M4 Pro (48GB RAM) with Criterion.rs statistical sampling.
### Independent Benchmark (Real Competitors)

Measured against hnswlib (C++) and numpy brute-force with ground-truth recall (2026-03-24, aarch64 Linux):

| Scale | Engine | QPS | Recall@10 | Build (s) | p50 (ms) |
|-------|--------|-----|-----------|-----------|----------|
| 10K | hnswlib (M=32) | 1153 | 0.9895 | 7.5 | 0.73 |
| 10K | **ruvector-core** | **443** | **0.9830** | **44.0** | **1.98** |
| 100K | hnswlib (M=32) | 250 | 0.7427 | 395 | 2.57 |
| 100K | **ruvector-core** | **86** | **0.8675** | **856** | **10.14** |

See [`bench_results/real_comparison_benchmark.md`](./bench_results/real_comparison_benchmark.md) for full methodology and raw data.

### Criterion.rs Benchmarks

Numbers from Criterion.rs statistical sampling on Apple M4 Pro (48GB RAM):

<details>
<summary>📈 Performance Benchmarks</summary>
Expand All @@ -1046,7 +1061,7 @@ Real numbers from real benchmarks — measured on Apple M4 Pro (48GB RAM) with C
| Python baseline | 77 | 11.88ms | 11.88ms | 100% | 10K vectors, 384D |
| Brute force | 12 | 77.76ms | 77.76ms | 100% | 10K vectors, 384D |

**15.7x faster than Python** — 100% recall at every configuration.
**Note:** The "Python baseline" and "Brute force" rows above are from the internal Criterion benchmark which simulates competitors. See the Independent Benchmark section above for real competitor comparisons. Actual recall@10 ranges from 86.75% (100K) to 98.3% (10K) — not 100%.

| Search k | p50 Latency | Throughput |
|----------|-------------|------------|
Expand Down
6 changes: 5 additions & 1 deletion bench_results/comparison_benchmark.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
# Ruvector Benchmark Results
# Ruvector Benchmark Results (DEPRECATED)

> **Note**: These results use simulated competitors and hardcoded memory/recall values.
> See [`real_comparison_benchmark.md`](./real_comparison_benchmark.md) for actual measurements
> against real competitors (hnswlib, numpy) with ground-truth recall.

Generated: 2026-01-18 21:59:06 UTC

Expand Down
99 changes: 99 additions & 0 deletions bench_results/real_comparison_benchmark.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# RuVector Real Benchmark Results

**Generated**: 2026-03-24
**Platform**: aarch64 Linux, Rust 1.94.0, Python 3.11.2
**Method**: All measurements are real — recall measured against brute-force ground truth, memory from RSS, no simulated competitors.

---

## Test Configuration

| Parameter | Value |
|-----------|-------|
| HNSW M | 32 |
| HNSW ef_construction | 200 |
| HNSW ef_search | 200 |
| Distance metric | Cosine |
| Dimensions | 128 |
| Query count | 200 (ruvector), 1000 (hnswlib) |
| Dataset | Random uniform, deterministic seed |

---

## 10,000 Vectors (128 dimensions)

| Engine | QPS | Recall@10 | Build (s) | Latency p50 (ms) | Latency p95 (ms) |
|--------|-----|-----------|-----------|-------------------|-------------------|
| numpy brute-force (baseline) | 134.8 | 1.0000 | 0.003 | 3.264 | 27.540 |
| hnswlib (M=16, ef_c=128, ef_s=64) | 2568.0 | 0.7572 | 4.514 | 0.276 | 0.568 |
| hnswlib (M=16, ef_c=200, ef_s=200) | 1899.6 | 0.9188 | 6.419 | 0.470 | 0.743 |
| hnswlib (M=32, ef_c=200, ef_s=200) | 1152.6 | 0.9895 | 7.494 | 0.730 | 1.369 |
| **ruvector-core (M=32, ef=200)** | **443.1** | **0.9830** | **43.940** | **1.975** | **4.069** |

### Analysis (10K)

- ruvector recall (98.3%) is within 0.65% of hnswlib (98.95%) — essentially equivalent search quality
- ruvector QPS (443) is 2.6x slower than hnswlib (1153)
- ruvector build time (44s) is 5.9x slower than hnswlib (7.5s)
- All engines produce correct results (verified against brute-force ground truth)

---

## 100,000 Vectors (128 dimensions)

| Engine | QPS | Recall@10 | Build (s) | Latency p50 (ms) | Latency p95 (ms) |
|--------|-----|-----------|-----------|-------------------|-------------------|
| numpy brute-force (baseline) | 69.2 | 1.0000 | 0.016 | 10.202 | 35.417 |
| hnswlib (M=16, ef_c=128, ef_s=64) | 1471.6 | 0.2993 | 72.544 | 0.607 | 0.941 |
| hnswlib (M=16, ef_c=200, ef_s=200) | 739.2 | 0.4777 | 114.454 | 1.201 | 2.147 |
| hnswlib (M=32, ef_c=200, ef_s=200) | 249.5 | 0.7427 | 395.322 | 2.567 | 11.101 |
| **ruvector-core (M=32, ef=200)** | **85.7** | **0.8675** | **855.646** | **10.144** | **21.850** |

### Analysis (100K)

- ruvector recall (86.75%) is **higher** than hnswlib (74.27%) with identical parameters
- This suggests ruvector's HNSW implementation explores more candidates (better recall, lower QPS)
- ruvector QPS (86) is 2.9x slower than hnswlib (250) but still faster than brute-force (69)
- ruvector build time (856s) is 2.2x slower than hnswlib (395s) — gap narrows at scale
- ruvector memory: ~523MB RSS for 100K vectors (includes HNSW graph + REDB persistence overhead)

---

## Comparison with Previously Published Results

The previous benchmark results in this directory (`comparison_benchmark.md`) contained:

| Issue | Details |
|-------|---------|
| **Memory: 0.00 MB** | Memory was hardcoded to 0.0 in benchmark source. Real RSS: ~523MB for 100K vectors. |
| **Recall: 100%** | Recall was hardcoded to 1.0 without ground-truth measurement. Real recall@10: 86.75-98.3% depending on scale. |
| **Simulated competitors** | Python and brute-force baselines were simulated by multiplying ruvector's own latency. This report uses real hnswlib (C++) measurements. |
| **Build Time: 0.00s** | Build time was hardcoded to 0.0. Real build: 44-856s depending on scale. |

These issues were identified in the [benchmark audit](https://github.com/ruvnet/RuVector/issues/269) and are addressed by this report.

---

## Methodology

### ruvector-core
- Rust test binary (`tests/bench_hnsw.rs`) using ruvector-core VectorDB API
- Release build (`--release`)
- Each query measured individually with `Instant::now()` wall-clock timing
- Recall computed against brute-force cosine similarity ground truth

### hnswlib
- Python 3.11 with `hnswlib` 0.8.0 (C++ via Python bindings)
- Same dataset (generated with same PRNG seed, same dimensions)
- Same HNSW parameters (M=32, ef_construction=200, ef_search=200)
- Recall computed against numpy brute-force ground truth

### Ground Truth
- numpy brute-force: exact cosine similarity, sorted, top-k
- Used as recall reference for both hnswlib and ruvector

---

## Raw Data

Machine-readable results: [`results/competitors.json`](./results/competitors.json)
138 changes: 138 additions & 0 deletions bench_results/results/competitors.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
[
{
"engine": "numpy-brute-force",
"dataset": "random-10000",
"dimensions": 128,
"num_vectors": 10000,
"num_queries": 1000,
"build_time_sec": 0.0032,
"memory_mb": 4.88,
"qps": 134.8,
"latency_p50_ms": 3.264,
"latency_p95_ms": 27.54,
"latency_p99_ms": 45.44,
"recall_at_1": 1.0,
"recall_at_10": 1.0,
"recall_at_100": 1.0,
"simulated": false
},
{
"engine": "hnswlib (M=16, ef_c=128, ef_s=64)",
"dataset": "random-10000",
"dimensions": 128,
"num_vectors": 10000,
"num_queries": 1000,
"build_time_sec": 4.5135,
"memory_mb": 0.15,
"qps": 2568.0,
"latency_p50_ms": 0.276,
"latency_p95_ms": 0.568,
"latency_p99_ms": 3.649,
"recall_at_1": 0.832,
"recall_at_10": 0.7572,
"recall_at_100": 0.6468,
"simulated": false
},
{
"engine": "hnswlib (M=16, ef_c=200, ef_s=200)",
"dataset": "random-10000",
"dimensions": 128,
"num_vectors": 10000,
"num_queries": 1000,
"build_time_sec": 6.419,
"memory_mb": 0.15,
"qps": 1899.6,
"latency_p50_ms": 0.47,
"latency_p95_ms": 0.743,
"latency_p99_ms": 1.311,
"recall_at_1": 0.952,
"recall_at_10": 0.9188,
"recall_at_100": 0.8452,
"simulated": false
},
{
"engine": "hnswlib (M=32, ef_c=200, ef_s=200)",
"dataset": "random-10000",
"dimensions": 128,
"num_vectors": 10000,
"num_queries": 1000,
"build_time_sec": 7.4937,
"memory_mb": 0.15,
"qps": 1152.6,
"latency_p50_ms": 0.73,
"latency_p95_ms": 1.369,
"latency_p99_ms": 3.303,
"recall_at_1": 0.997,
"recall_at_10": 0.9895,
"recall_at_100": 0.9646,
"simulated": false
},
{
"engine": "numpy-brute-force",
"dataset": "random-100000",
"dimensions": 128,
"num_vectors": 100000,
"num_queries": 1000,
"build_time_sec": 0.0159,
"memory_mb": 48.83,
"qps": 69.2,
"latency_p50_ms": 10.202,
"latency_p95_ms": 35.417,
"latency_p99_ms": 52.396,
"recall_at_1": 1.0,
"recall_at_10": 1.0,
"recall_at_100": 1.0,
"simulated": false
},
{
"engine": "hnswlib (M=16, ef_c=128, ef_s=64)",
"dataset": "random-100000",
"dimensions": 128,
"num_vectors": 100000,
"num_queries": 1000,
"build_time_sec": 72.5436,
"memory_mb": 1.53,
"qps": 1471.6,
"latency_p50_ms": 0.607,
"latency_p95_ms": 0.941,
"latency_p99_ms": 2.342,
"recall_at_1": 0.355,
"recall_at_10": 0.2993,
"recall_at_100": 0.2298,
"simulated": false
},
{
"engine": "hnswlib (M=16, ef_c=200, ef_s=200)",
"dataset": "random-100000",
"dimensions": 128,
"num_vectors": 100000,
"num_queries": 1000,
"build_time_sec": 114.4544,
"memory_mb": 1.53,
"qps": 739.2,
"latency_p50_ms": 1.201,
"latency_p95_ms": 2.147,
"latency_p99_ms": 3.368,
"recall_at_1": 0.548,
"recall_at_10": 0.4777,
"recall_at_100": 0.3829,
"simulated": false
},
{
"engine": "hnswlib (M=32, ef_c=200, ef_s=200)",
"dataset": "random-100000",
"dimensions": 128,
"num_vectors": 100000,
"num_queries": 1000,
"build_time_sec": 395.322,
"memory_mb": 1.53,
"qps": 249.5,
"latency_p50_ms": 2.567,
"latency_p95_ms": 11.101,
"latency_p99_ms": 18.729,
"recall_at_1": 0.802,
"recall_at_10": 0.7427,
"recall_at_100": 0.626,
"simulated": false
}
]
Loading