Skip to content

research(memory): context-aware seed selection for SYNAPSE using structural graph features #2167

@bug-ops

Description

@bug-ops

Background

SYNAPSE currently finds seed entities via FTS5 fuzzy word matching on entity names (find_seed_entities). This approach has two weaknesses:

  1. WAL visibility bug (fix(memory): SYNAPSE seeds=0 immediately after entity extraction (FTS5+WAL visibility) #2166): FTS5 returns 0 results immediately after cross-session entity insertion.
  2. Surface-level matching: FTS5 matches on name tokens but ignores graph structure (entity degree, edge type distribution, community membership).

Research

SE-GNN (2025) proposes seed expansion strategies that combine local/global structural information with semantic attributes:

  • Neighborhood profiling: high-degree entities (hubs) make better seeds than leaf nodes
  • Edge type distribution: entities connected via multiple edge types are more informative starting points
  • Community anchors: entities bridging multiple communities activate broader subgraphs

Paper: https://arxiv.org/abs/2503.20801

Applicability to Zeph

High impact, medium complexity.

Current find_seed_entities only searches by name similarity. Proposed enhancement:

  1. Hybrid seed ranking: combine FTS5/embedding similarity score with structural score (entity degree, edge diversity)
  2. Fallback to embedding seeds: when FTS5 returns 0 results (WAL issue), fall back to embedding similarity search on graph_entities.summary field
  3. Community-aware seed capping: limit seeds per community to force activation breadth

Implementation sketch

// In find_seed_entities:
// 1. Try FTS5 fuzzy (existing)
let fts_seeds = store.find_entities_fuzzy(word, limit*2).await?;

// 2. If FTS5 empty, try embedding similarity (new fallback)
if fts_seeds.is_empty() {
    let emb_seeds = store.find_entities_by_embedding(query_embedding, limit).await?;
    // score = cosine_sim * degree_boost
}

// 3. Rank by: fts_score * 0.6 + structural_score * 0.4

Expected Benefit

  • Fixes cross-session zero-seed problem as side effect (embedding fallback)
  • Improves causal/relational query results (hub-biased seeds activate more relevant subgraphs)
  • No schema change required (degree can be derived from edge count)

Priority

Medium — implement after #2166 (WAL checkpoint fix) is merged. If WAL fix resolves the zero-seed issue, this becomes a quality improvement rather than a bug fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    memoryzeph-memory crate (SQLite)researchResearch-driven improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions