ruvnet · aepod · Mar 24, 2026 · Mar 24, 2026 · Mar 24, 2026 · Mar 24, 2026
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/README.md b/README.md
@@ -1031,7 +1031,22 @@ Run RuVector wherever your application lives — as a server, a PostgreSQL exten
 
 ## Performance
 
-Real numbers from real benchmarks — measured on Apple M4 Pro (48GB RAM) with Criterion.rs statistical sampling.
+### Independent Benchmark (Real Competitors)
+
+Measured against hnswlib (C++) and numpy brute-force with ground-truth recall (2026-03-24, aarch64 Linux):
+
+| Scale | Engine | QPS | Recall@10 | Build (s) | p50 (ms) |
+|-------|--------|-----|-----------|-----------|----------|
+| 10K | hnswlib (M=32) | 1153 | 0.9895 | 7.5 | 0.73 |
+| 10K | **ruvector-core** | **443** | **0.9830** | **44.0** | **1.98** |
+| 100K | hnswlib (M=32) | 250 | 0.7427 | 395 | 2.57 |
+| 100K | **ruvector-core** | **86** | **0.8675** | **856** | **10.14** |
+
+See [`bench_results/real_comparison_benchmark.md`](./bench_results/real_comparison_benchmark.md) for full methodology and raw data.
+
+### Criterion.rs Benchmarks
+
+Numbers from Criterion.rs statistical sampling on Apple M4 Pro (48GB RAM):
 
 <details>
 <summary>📈 Performance Benchmarks</summary>
@@ -1046,7 +1061,7 @@ Real numbers from real benchmarks — measured on Apple M4 Pro (48GB RAM) with C
 | Python baseline | 77 | 11.88ms | 11.88ms | 100% | 10K vectors, 384D |
 | Brute force | 12 | 77.76ms | 77.76ms | 100% | 10K vectors, 384D |
 
-**15.7x faster than Python** — 100% recall at every configuration.
+**Note:** The "Python baseline" and "Brute force" rows above are from the internal Criterion benchmark which simulates competitors. See the Independent Benchmark section above for real competitor comparisons. Actual recall@10 ranges from 86.75% (100K) to 98.3% (10K) — not 100%.
 
 | Search k | p50 Latency | Throughput |
 |----------|-------------|------------|

diff --git a/bench_results/comparison_benchmark.md b/bench_results/comparison_benchmark.md
@@ -1,4 +1,8 @@
-# Ruvector Benchmark Results
+# Ruvector Benchmark Results (DEPRECATED)
+
+> **Note**: These results use simulated competitors and hardcoded memory/recall values.
+> See [`real_comparison_benchmark.md`](./real_comparison_benchmark.md) for actual measurements
+> against real competitors (hnswlib, numpy) with ground-truth recall.
 
 Generated: 2026-01-18 21:59:06 UTC
 

diff --git a/bench_results/real_comparison_benchmark.md b/bench_results/real_comparison_benchmark.md
@@ -0,0 +1,99 @@
+# RuVector Real Benchmark Results
+
+**Generated**: 2026-03-24
+**Platform**: aarch64 Linux, Rust 1.94.0, Python 3.11.2
+**Method**: All measurements are real — recall measured against brute-force ground truth, memory from RSS, no simulated competitors.
+
+---
+
+## Test Configuration
+
+| Parameter | Value |
+|-----------|-------|
+| HNSW M | 32 |
+| HNSW ef_construction | 200 |
+| HNSW ef_search | 200 |
+| Distance metric | Cosine |
+| Dimensions | 128 |
+| Query count | 200 (ruvector), 1000 (hnswlib) |
+| Dataset | Random uniform, deterministic seed |
+
+---
+
+## 10,000 Vectors (128 dimensions)
+
+| Engine | QPS | Recall@10 | Build (s) | Latency p50 (ms) | Latency p95 (ms) |
+|--------|-----|-----------|-----------|-------------------|-------------------|
+| numpy brute-force (baseline) | 134.8 | 1.0000 | 0.003 | 3.264 | 27.540 |
+| hnswlib (M=16, ef_c=128, ef_s=64) | 2568.0 | 0.7572 | 4.514 | 0.276 | 0.568 |
+| hnswlib (M=16, ef_c=200, ef_s=200) | 1899.6 | 0.9188 | 6.419 | 0.470 | 0.743 |
+| hnswlib (M=32, ef_c=200, ef_s=200) | 1152.6 | 0.9895 | 7.494 | 0.730 | 1.369 |
+| **ruvector-core (M=32, ef=200)** | **443.1** | **0.9830** | **43.940** | **1.975** | **4.069** |
+
+### Analysis (10K)
+
+- ruvector recall (98.3%) is within 0.65% of hnswlib (98.95%) — essentially equivalent search quality
+- ruvector QPS (443) is 2.6x slower than hnswlib (1153)
+- ruvector build time (44s) is 5.9x slower than hnswlib (7.5s)
+- All engines produce correct results (verified against brute-force ground truth)
+
+---
+
+## 100,000 Vectors (128 dimensions)
+
+| Engine | QPS | Recall@10 | Build (s) | Latency p50 (ms) | Latency p95 (ms) |
+|--------|-----|-----------|-----------|-------------------|-------------------|
+| numpy brute-force (baseline) | 69.2 | 1.0000 | 0.016 | 10.202 | 35.417 |
+| hnswlib (M=16, ef_c=128, ef_s=64) | 1471.6 | 0.2993 | 72.544 | 0.607 | 0.941 |
+| hnswlib (M=16, ef_c=200, ef_s=200) | 739.2 | 0.4777 | 114.454 | 1.201 | 2.147 |
+| hnswlib (M=32, ef_c=200, ef_s=200) | 249.5 | 0.7427 | 395.322 | 2.567 | 11.101 |
+| **ruvector-core (M=32, ef=200)** | **85.7** | **0.8675** | **855.646** | **10.144** | **21.850** |
+
+### Analysis (100K)
+
+- ruvector recall (86.75%) is **higher** than hnswlib (74.27%) with identical parameters
+- This suggests ruvector's HNSW implementation explores more candidates (better recall, lower QPS)
+- ruvector QPS (86) is 2.9x slower than hnswlib (250) but still faster than brute-force (69)
+- ruvector build time (856s) is 2.2x slower than hnswlib (395s) — gap narrows at scale
+- ruvector memory: ~523MB RSS for 100K vectors (includes HNSW graph + REDB persistence overhead)
+
+---
+
+## Comparison with Previously Published Results
+
+The previous benchmark results in this directory (`comparison_benchmark.md`) contained:
+
+| Issue | Details |
+|-------|---------|
+| **Memory: 0.00 MB** | Memory was hardcoded to 0.0 in benchmark source. Real RSS: ~523MB for 100K vectors. |
+| **Recall: 100%** | Recall was hardcoded to 1.0 without ground-truth measurement. Real recall@10: 86.75-98.3% depending on scale. |
+| **Simulated competitors** | Python and brute-force baselines were simulated by multiplying ruvector's own latency. This report uses real hnswlib (C++) measurements. |
+| **Build Time: 0.00s** | Build time was hardcoded to 0.0. Real build: 44-856s depending on scale. |
+
+These issues were identified in the [benchmark audit](https://github.com/ruvnet/RuVector/issues/269) and are addressed by this report.
+
+---
+
+## Methodology
+
+### ruvector-core
+- Rust test binary (`tests/bench_hnsw.rs`) using ruvector-core VectorDB API
+- Release build (`--release`)
+- Each query measured individually with `Instant::now()` wall-clock timing
+- Recall computed against brute-force cosine similarity ground truth
+
+### hnswlib
+- Python 3.11 with `hnswlib` 0.8.0 (C++ via Python bindings)
+- Same dataset (generated with same PRNG seed, same dimensions)
+- Same HNSW parameters (M=32, ef_construction=200, ef_search=200)
+- Recall computed against numpy brute-force ground truth
+
+### Ground Truth
+- numpy brute-force: exact cosine similarity, sorted, top-k
+- Used as recall reference for both hnswlib and ruvector
+
+---
+
+## Raw Data
+
+Machine-readable results: [`results/competitors.json`](./results/competitors.json)
diff --git a/bench_results/results/competitors.json b/bench_results/results/competitors.json
@@ -0,0 +1,138 @@
+[
+  {
+    "engine": "numpy-brute-force",
+    "dataset": "random-10000",
+    "dimensions": 128,
+    "num_vectors": 10000,
+    "num_queries": 1000,
+    "build_time_sec": 0.0032,
+    "memory_mb": 4.88,
+    "qps": 134.8,
+    "latency_p50_ms": 3.264,
+    "latency_p95_ms": 27.54,
+    "latency_p99_ms": 45.44,
+    "recall_at_1": 1.0,
+    "recall_at_10": 1.0,
+    "recall_at_100": 1.0,
+    "simulated": false
+  },
+  {
+    "engine": "hnswlib (M=16, ef_c=128, ef_s=64)",
+    "dataset": "random-10000",
+    "dimensions": 128,
+    "num_vectors": 10000,
+    "num_queries": 1000,
+    "build_time_sec": 4.5135,
+    "memory_mb": 0.15,
+    "qps": 2568.0,
+    "latency_p50_ms": 0.276,
+    "latency_p95_ms": 0.568,
+    "latency_p99_ms": 3.649,
+    "recall_at_1": 0.832,
+    "recall_at_10": 0.7572,
+    "recall_at_100": 0.6468,
+    "simulated": false
+  },
+  {
+    "engine": "hnswlib (M=16, ef_c=200, ef_s=200)",
+    "dataset": "random-10000",
+    "dimensions": 128,
+    "num_vectors": 10000,
+    "num_queries": 1000,
+    "build_time_sec": 6.419,
+    "memory_mb": 0.15,
+    "qps": 1899.6,
+    "latency_p50_ms": 0.47,
+    "latency_p95_ms": 0.743,
+    "latency_p99_ms": 1.311,
+    "recall_at_1": 0.952,
+    "recall_at_10": 0.9188,
+    "recall_at_100": 0.8452,
+    "simulated": false
+  },
+  {
+    "engine": "hnswlib (M=32, ef_c=200, ef_s=200)",
+    "dataset": "random-10000",
+    "dimensions": 128,
+    "num_vectors": 10000,
+    "num_queries": 1000,
+    "build_time_sec": 7.4937,
+    "memory_mb": 0.15,
+    "qps": 1152.6,
+    "latency_p50_ms": 0.73,
+    "latency_p95_ms": 1.369,
+    "latency_p99_ms": 3.303,
+    "recall_at_1": 0.997,
+    "recall_at_10": 0.9895,
+    "recall_at_100": 0.9646,
+    "simulated": false
+  },
+  {
+    "engine": "numpy-brute-force",
+    "dataset": "random-100000",
+    "dimensions": 128,
+    "num_vectors": 100000,
+    "num_queries": 1000,
+    "build_time_sec": 0.0159,
+    "memory_mb": 48.83,
+    "qps": 69.2,
+    "latency_p50_ms": 10.202,
+    "latency_p95_ms": 35.417,
+    "latency_p99_ms": 52.396,
+    "recall_at_1": 1.0,
+    "recall_at_10": 1.0,
+    "recall_at_100": 1.0,
+    "simulated": false
+  },
+  {
+    "engine": "hnswlib (M=16, ef_c=128, ef_s=64)",
+    "dataset": "random-100000",
+    "dimensions": 128,
+    "num_vectors": 100000,
+    "num_queries": 1000,
+    "build_time_sec": 72.5436,
+    "memory_mb": 1.53,
+    "qps": 1471.6,
+    "latency_p50_ms": 0.607,
+    "latency_p95_ms": 0.941,
+    "latency_p99_ms": 2.342,
+    "recall_at_1": 0.355,
+    "recall_at_10": 0.2993,
+    "recall_at_100": 0.2298,
+    "simulated": false
+  },
+  {
+    "engine": "hnswlib (M=16, ef_c=200, ef_s=200)",
+    "dataset": "random-100000",
+    "dimensions": 128,
+    "num_vectors": 100000,
+    "num_queries": 1000,
+    "build_time_sec": 114.4544,
+    "memory_mb": 1.53,
+    "qps": 739.2,
+    "latency_p50_ms": 1.201,
+    "latency_p95_ms": 2.147,
+    "latency_p99_ms": 3.368,
+    "recall_at_1": 0.548,
+    "recall_at_10": 0.4777,
+    "recall_at_100": 0.3829,
+    "simulated": false
+  },
+  {
+    "engine": "hnswlib (M=32, ef_c=200, ef_s=200)",
+    "dataset": "random-100000",
+    "dimensions": 128,
+    "num_vectors": 100000,
+    "num_queries": 1000,
+    "build_time_sec": 395.322,
+    "memory_mb": 1.53,
+    "qps": 249.5,
+    "latency_p50_ms": 2.567,
+    "latency_p95_ms": 11.101,
+    "latency_p99_ms": 18.729,
+    "recall_at_1": 0.802,
+    "recall_at_10": 0.7427,
+    "recall_at_100": 0.626,
+    "simulated": false
+  }
+]