feat(vector): support scoped embedding builds#413
Conversation
roborev: Combined Review (
|
|
looking at this |
roborev: Combined Review (
|
roborev: Combined Review (
|
roborev: Combined Review (
|
roborev: Combined Review (
|
roborev: Combined Review (
|
|
I think I'll have to wait to rebase this after #411 lands |
Add parsed message_type filters through the search, vector, remote, and MCP paths so scoped SMS/MMS/email queries do not silently widen after parsing. This keeps result rows, stats, DuckDB fast paths, SQLite FTS, pgvector filters, hybrid search, and remote query reconstruction aligned around the same user-visible operator. The embedding scope work also needs durable enqueue behavior across Synctech imports and generation transitions. Preserve best-effort import semantics, enqueue already-persisted Synctech messages even after partial failures, wire the manual Synctech sync path into vector enqueueing, and filter pending embeddings per generation scope so full-corpus and scoped generations remain independently complete. Included follow-up fixes: - feat(vector): support scoped embedding builds - fix(vector): enqueue synctech sms imports - fix(search): honor message_type scopes - fix(vector): close scoped search gaps - fix(mcp): validate similar index before seed load - fix(mcp): preserve no-active similar error - fix(vector): align pg parity sqlite fixture - fix(search): keep scoped stats consistent - fix(search): complete scoped stats coverage - fix(remote): preserve message type searches - fix(vector): preserve scoped error contracts - fix(vector): enqueue manual synctech syncs - fix(vector): enqueue per generation scope Generated with Codex Co-authored-by: Wes McKinney <wesmckinn+git@gmail.com>
d08f444 to
f1983c5
Compare
The shared pgvector test schema already includes messages.message_type. Re-adding that column in individual tests makes the setup non-idempotent on PostgreSQL and can fail before the filters are exercised. Remove the redundant ALTER TABLE statements so the tests rely on the common fixture schema and only seed message_type values needed by each case. Generated with Codex Co-authored-by: Codex <codex@openai.com>
roborev: Combined Review (
|
Scoped embedding builds only stamp messages that match the configured build scope. Management commands were still evaluating activation coverage and backend coverage gates against the full live corpus, so a valid scoped generation could look incomplete whenever out-of-scope messages were unstamped. Thread the configured build scope through management backend opens and coverage reads so list/activate/retire evaluate the same message universe as the build worker. The regression seeds an out-of-scope missing email beside an in-scope stamped SMS to lock the activation preflight to scoped coverage. Generated with Codex Co-authored-by: Codex <codex@openai.com>
Scoped embedding coverage must keep all four displayed legs in the same message universe. After management switched live/stamped/missing to the configured build scope, the backend embedded count could still include out-of-scope stamped vectors, producing impossible list output such as embedded > live. Apply the backend build scope inside EmbeddedMessageCount for sqlitevec and pgvector so the embedded leg matches scoped coverage reads. Add a command-level fillFullCoverage regression with an out-of-scope stamped vector, plus backend invariant tests for both storage engines. Generated with Codex Co-authored-by: Codex <codex@openai.com>
roborev: Combined Review (
|
Depends on #412. This branch is stacked on the message_type filter PR until that lands.