Summary
Currently --path filtering works for FTS but is ignored by vector/semantic search. The vector pipeline scans the full embedding table and scores all documents, with path filtering only applied after ANN scoring. This means kbx search "topic" --path memory/meetings/ returns FTS results scoped to that path but vector results from the entire KB.
Proposed behaviour: Pre-filter the vector candidate set by document path before ANN scoring, so both FTS and vector search respect the same --path argument consistently.
Use Cases
- Scope to a date range of meetings:
kbx search "deployment" --path memory/meetings/2026/03/ — only search March 2026 meetings
- Search only entity files:
kbx search "infrastructure lead" --path memory/people/ — find people matching a role
- Agent-driven scoped search: Callers that know exactly which KB subtree is relevant can reduce noise from unrelated documents
- Project-scoped search:
kbx search "blockers" --path memory/projects/ — only search project entity files
Current Behaviour
# FTS results are correctly scoped to meetings
kbx search "migration timeline" --fast --path memory/meetings/ --json
# → only results from memory/meetings/
# Hybrid search: FTS results are scoped, but vector results come from everywhere
kbx search "migration timeline" --path memory/meetings/ --json
# → FTS hits from memory/meetings/ + vector hits from memory/people/, memory/notes/, etc.
The inconsistency means --path is unreliable in hybrid mode — users can't trust that all results come from the specified path.
Implementation Notes
LanceDB Pre-Filtering
LanceDB supports where clauses on vector search that filter before ANN scoring:
# Current: no path filter on vector search
results = table.search(query_vector).limit(limit).to_list()
# Proposed: pre-filter by path prefix
results = (
table.search(query_vector)
.where(f"source_path LIKE '{path_prefix}%'")
.limit(limit)
.to_list()
)
Pre-filtering reduces the ANN candidate set, which is both more correct and potentially faster (smaller search space).
Glob Pattern Support
--path currently accepts glob patterns (e.g. memory/meetings/2026/0[1-3]/). The vector pre-filter needs to handle both forms:
| Pattern |
LanceDB where clause |
memory/meetings/ |
source_path LIKE 'memory/meetings/%' |
memory/meetings/2026/03/ |
source_path LIKE 'memory/meetings/2026/03/%' |
memory/people/*.md |
source_path LIKE 'memory/people/%.md' (approximate) |
| Complex globs |
Fall back to post-filter (fetch more candidates, filter in Python) |
For simple prefix patterns (the common case), translate directly to a LIKE clause. For complex globs with character classes or alternation, fetch a larger candidate set and post-filter — still better than no filtering at all.
FTS Parity
FTS already filters by path correctly. The fix is isolated to the vector search path — ensure it receives and applies the same --path argument that FTS does. The hybrid merge step should then combine two equally-scoped result sets.
Acceptance Criteria
Summary
Currently
--pathfiltering works for FTS but is ignored by vector/semantic search. The vector pipeline scans the full embedding table and scores all documents, with path filtering only applied after ANN scoring. This meanskbx search "topic" --path memory/meetings/returns FTS results scoped to that path but vector results from the entire KB.Proposed behaviour: Pre-filter the vector candidate set by document path before ANN scoring, so both FTS and vector search respect the same
--pathargument consistently.Use Cases
kbx search "deployment" --path memory/meetings/2026/03/— only search March 2026 meetingskbx search "infrastructure lead" --path memory/people/— find people matching a rolekbx search "blockers" --path memory/projects/— only search project entity filesCurrent Behaviour
The inconsistency means
--pathis unreliable in hybrid mode — users can't trust that all results come from the specified path.Implementation Notes
LanceDB Pre-Filtering
LanceDB supports
whereclauses on vector search that filter before ANN scoring:Pre-filtering reduces the ANN candidate set, which is both more correct and potentially faster (smaller search space).
Glob Pattern Support
--pathcurrently accepts glob patterns (e.g.memory/meetings/2026/0[1-3]/). The vector pre-filter needs to handle both forms:whereclausememory/meetings/source_path LIKE 'memory/meetings/%'memory/meetings/2026/03/source_path LIKE 'memory/meetings/2026/03/%'memory/people/*.mdsource_path LIKE 'memory/people/%.md'(approximate)For simple prefix patterns (the common case), translate directly to a
LIKEclause. For complex globs with character classes or alternation, fetch a larger candidate set and post-filter — still better than no filtering at all.FTS Parity
FTS already filters by path correctly. The fix is isolated to the vector search path — ensure it receives and applies the same
--pathargument that FTS does. The hybrid merge step should then combine two equally-scoped result sets.Acceptance Criteria
kbx search "query" --path memory/meetings/returns only results frommemory/meetings/in both FTS and vector resultskbx search "query" --path memory/people/ --json— all results havepathstarting withmemory/people/wherepre-filter (no post-filter overhead)--path--explain(feat: search explain/trace mode for retrieval diagnostics #68) shows the path filter applied to both FTS and vector pipelines