-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
memoryzeph-memory crate (SQLite)zeph-memory crate (SQLite)researchResearch-driven improvementResearch-driven improvement
Description
Background
SYNAPSE currently finds seed entities via FTS5 fuzzy word matching on entity names (find_seed_entities). This approach has two weaknesses:
- WAL visibility bug (fix(memory): SYNAPSE seeds=0 immediately after entity extraction (FTS5+WAL visibility) #2166): FTS5 returns 0 results immediately after cross-session entity insertion.
- Surface-level matching: FTS5 matches on name tokens but ignores graph structure (entity degree, edge type distribution, community membership).
Research
SE-GNN (2025) proposes seed expansion strategies that combine local/global structural information with semantic attributes:
- Neighborhood profiling: high-degree entities (hubs) make better seeds than leaf nodes
- Edge type distribution: entities connected via multiple edge types are more informative starting points
- Community anchors: entities bridging multiple communities activate broader subgraphs
Paper: https://arxiv.org/abs/2503.20801
Applicability to Zeph
High impact, medium complexity.
Current find_seed_entities only searches by name similarity. Proposed enhancement:
- Hybrid seed ranking: combine FTS5/embedding similarity score with structural score (entity degree, edge diversity)
- Fallback to embedding seeds: when FTS5 returns 0 results (WAL issue), fall back to embedding similarity search on
graph_entities.summaryfield - Community-aware seed capping: limit seeds per community to force activation breadth
Implementation sketch
// In find_seed_entities:
// 1. Try FTS5 fuzzy (existing)
let fts_seeds = store.find_entities_fuzzy(word, limit*2).await?;
// 2. If FTS5 empty, try embedding similarity (new fallback)
if fts_seeds.is_empty() {
let emb_seeds = store.find_entities_by_embedding(query_embedding, limit).await?;
// score = cosine_sim * degree_boost
}
// 3. Rank by: fts_score * 0.6 + structural_score * 0.4Expected Benefit
- Fixes cross-session zero-seed problem as side effect (embedding fallback)
- Improves causal/relational query results (hub-biased seeds activate more relevant subgraphs)
- No schema change required (degree can be derived from edge count)
Priority
Medium — implement after #2166 (WAL checkpoint fix) is merged. If WAL fix resolves the zero-seed issue, this becomes a quality improvement rather than a bug fix.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
memoryzeph-memory crate (SQLite)zeph-memory crate (SQLite)researchResearch-driven improvementResearch-driven improvement