Skip to content

Razshy/Resonance-Search

Repository files navigation

Resonance Search

Every search engine in production right now does the same thing — find stuff near your query and add up scores. Vector search, BM25, reranking, whatever. It's all the same algorithm from 1973 with a fresh coat of paint.

I built something different. Resonance Search treats documents like they have gravity. Your query doesn't just find the nearest neighbors and call it a day — it falls into the deepest basin of relevant results through actual gradient descent. And instead of just adding scores together (which loses information), I use interference terms that only fire when multiple signals agree at the same time.

The result: better ranking, no ML at query time, sub-millisecond latency, pure Rust.

How It Works

Traditional Search Resonance Search
HNSW finds nearest neighbors Queries fall into potential basins via gradient descent
BM25 ranks by tf-idf Text scoring is mass-weighted — connected documents amplify matches
Linear fusion: alpha * V + beta * T Interference terms: sqrt(V * T) fires only when both signals agree
One-shot: query in, results out, done Convergence: query moves itself toward the deepest basin
Fixed top-k Basin detection adapts recall to the data

The Math (short version)

Every document warps embedding space like a gravitational body:

Phi(x) = -Sum_i m_i / ||x - e_i||^2

Scoring uses interference instead of linear addition:

score = a*V + b*T + c*P
      + d*sqrt(V*T)       <-- vector-text resonance
      + e*sqrt(V*P)       <-- vector-pattern resonance
      + f*sqrt(T*P)       <-- text-pattern resonance
      + g*cbrt(V*T*P)     <-- triple resonance

The search converges:

1. Start at your query embedding
2. Score chunks using all three signals
3. Compute the combined gradient (field + text + pattern)
4. Move the query: q1 = q0 + lr * gradient(q0)
5. Track the best position across all steps
6. Repeat until it converges
7. Return the basin contents, ranked by resonance score

Full math breakdown in MATH.md. Formal proofs in PROOFS.md. Paper-style writeup in PAPER.md.

Quick Start

use resonance_search::{ResonanceIndex, Document, Chunk, Query};

let mut index = ResonanceIndex::new();

index.add_document(Document {
    id: "case1".into(),
    chunks: vec![Chunk {
        id: "c1".into(),
        text: "human trafficking 18 USC 1591".into(),
        embedding: vec![1.0, 0.1, 0.0],
        metadata: Default::default(),
    }],
    metadata: Default::default(),
});

let results = index.search(&Query {
    text: "trafficking statute".into(),
    embedding: vec![0.9, 0.0, 0.0],
    pattern: Some(r"1591".into()),
    weights: None,
    max_steps: None,
});

for hit in &results.hits {
    println!("[{:.4}] {} — {}", hit.score, hit.chunk_id, hit.text);
}

Running

# Run all tests
cargo test

# Start the REST API server
cargo run --release --bin resonance-api

# Run the CUAD benchmark with ablation study
cargo run --release --bin resonance-cuad -- data/cuad_embedded.json

# Run the scale benchmark (1K -> 25K chunks)
cargo run --release --bin resonance-scale

# Run the micro-benchmark (per-query timing)
cargo run --release --bin resonance-micro

# Run the weight optimizer
cargo run --release --bin resonance-optimize -- data/cuad_embedded.json

# Embed a dataset
python3 scripts/embed_beir.py

Architecture

resonance-search/
├── src/
│   ├── lib.rs              # Crate root
│   ├── types.rs            # Core types (Document, Chunk, Query, Hit)
│   ├── error.rs            # Error types
│   ├── simd_ops.rs         # SIMD-optimized vector ops + brute-force top-k
│   ├── stemmer.rs          # Custom Porter stemmer (zero dependencies)
│   ├── vptree.rs           # VP-Tree for spatial locality
│   ├── physics.rs          # Barnes-Hut N-body tree (potential field, gradients, mass)
│   ├── landscape.rs        # Score landscape analysis + basin detection
│   ├── text_field.rs       # Text Resonance Field (replaces BM25) + batch scoring
│   ├── pattern_field.rs    # Pattern Field (native regex in the paradigm)
│   ├── resonance.rs        # Resonance scoring formula + basin boost
│   ├── convergence.rs      # Convergence search (gradient descent with momentum)
│   ├── chaos_rng.rs        # True entropy RNG for multi-seed search
│   ├── query_expansion.rs  # Pseudo-relevance feedback during convergence
│   ├── index.rs            # ResonanceIndex (top-level API + parallel search)
│   ├── eval.rs             # IR metrics (NDCG, MRR, Recall@k, Precision@k, MAP)
│   ├── dataset.rs          # Dataset loading + synthetic generation
│   ├── persistence.rs      # Index serialization (save/load JSON)
│   └── bin/
│       ├── api.rs           # REST API server (axum)
│       ├── cuad_bench.rs    # CUAD benchmark with ablation study
│       ├── scale_bench.rs   # Scale benchmark (1K -> 25K chunks)
│       ├── micro_bench.rs   # Per-component profiling
│       ├── bench.rs         # Synthetic adversarial benchmark
│       ├── optimize_weights.rs  # Nelder-Mead weight optimizer
│       ├── ab_sweep.rs      # Coordinate-descent parameter sweep
│       └── demo.rs          # Interactive CLI demo
├── scripts/
│   ├── embed_beir.py        # Embed BEIR benchmark datasets
│   ├── embed_cohere_v4.py   # Embed with Cohere embed-v4
│   ├── embed_cuad.py        # Generate MiniLM embeddings for CUAD
│   ├── embed_legal.py       # Generate legal domain embeddings
│   ├── bench_all.py         # Run benchmarks across all datasets
│   ├── cross_encoder_baseline.py  # Cross-encoder baselines
│   ├── sota_baselines.py    # Published BEIR/MTEB baselines
│   └── plot_convergence.py  # Convergence trajectory visualization
├── MATH.md                  # Why the math works
├── PAPER.md                 # Paper-style writeup
├── PROOFS.md                # Formal proofs
├── Dockerfile               # Container build for resonance-api
└── Cargo.toml

Benchmarks

CUAD — Real Legal Contracts

50 contracts, 980 chunks, 200 queries, all-MiniLM-L6-v2 (384-dim)

Method NDCG@10 MRR Recall@20 Speed
Resonance Search (full) 0.2751 0.2506 0.5400 8.7ms
Deep Convergence 0.2612 0.5350 9.5ms
One-Shot Resonance 0.2154 0.1930 0.4450 7.0ms
Linear Fusion 0.2321 0.2166 0.5400 7.0ms
BM25 (text-only) 0.1685 0.5200 8.3ms
Vector Only 0.1103 0.0963 0.4800 6.9ms

Every claim confirmed

Claim Status Delta
Resonance beats Linear Fusion Confirmed +18.5% NDCG@10
Convergence beats One-Shot Confirmed +27.7% NDCG@10
Deep Convergence beats One-Shot Confirmed +21.3% NDCG@10
Triple (V+T+P) beats Dual (V+T) Confirmed +150.0% NDCG@10
Resonance beats BM25 Confirmed +63.3% NDCG@10

Ablation — each interference term matters

Configuration NDCG@10 Cumulative Delta
Linear (no interference) 0.2321 baseline
+ VP resonance 0.2386 +2.8%
+ TP resonance 0.2527 +8.9%
+ Triple resonance 0.2552 +10.0%
+ Convergence (5 steps) 0.2751 +18.5%

Convergence gets better with scale

Chunks One-Shot Convergence Delta
1,000 0.6406 0.6431 +0.4%
5,000 0.6768 0.6917 +2.2%
10,000 0.7269 0.7692 +5.8%
25,000 0.7485 0.7856 +5.0%

This is the whole point — the bigger your corpus, the more convergence matters. At 1K chunks the query already sees most of the data. At 25K+ chunks, one-shot retrieval misses entire basins that convergence finds.

Speed

Metric Time
One-shot query 0.94ms
Full convergence (5 steps) 2.6ms
Per-step cost 327us
71 unit tests 0.04s

License

Dual license: GNU AGPL-3.0 or a commercial license from the copyright holder. Copyright attribution: NOTICE.

  • AGPL-3.0 — free to use, modify, and distribute. If you run a network service that users interact with over a network, AGPL requires you to offer users the corresponding source (including your own code that is combined with this library in the way AGPL describes). That is stricter than MIT and is why many enterprises prefer a paid license.
  • Commercial — for companies that need to use Resonance Search in proprietary products or hosted offerings without AGPL obligations. See LICENSE-COMMERCIAL.md for how to inquire.

Reality check: A license cannot stop Google (or anyone) from reimplementing the ideas in a new codebase; it only governs this code. Patents are a separate strategy if you need that kind of protection.

About

A new search paradigm where documents have gravity, queries converge into basins, and multi-signal scoring uses interference instead of linear fusion.

Resources

License

AGPL-3.0, Unknown licenses found

Licenses found

AGPL-3.0
LICENSE
Unknown
LICENSE-COMMERCIAL.md

Stars

Watchers

Forks

Contributors