feat: EmbeddingProvider protocol, hash/tfidf embedders, vector search#36
feat: EmbeddingProvider protocol, hash/tfidf embedders, vector search#36prakashUXtech wants to merge 3 commits intomainfrom
Conversation
… strategy Add pluggable embedding subsystem for semantic memory search: - EmbeddingProvider: runtime-checkable Protocol for any embedding backend - HashEmbedder: deterministic hash-based embedder for testing pipelines - TFIDFEmbedder: stdlib-only TF-IDF embedder with corpus fitting - Similarity functions: cosine_similarity, euclidean_distance, dot_product - VectorSearchStrategy: semantic search over MemoryEntry candidates All implementations use only stdlib (no numpy required). 81 new tests covering protocol compliance, embedder behavior, similarity math, vector strategy integration, and end-to-end topic-based search. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Issues (must fix)
Heads up
Please update your PR to address these points. |
Two-layer architecture: core/ holds protocol primitives (EmbeddingProvider, cosine_similarity, euclidean_distance, dot_product), while embeddings/ re-exports them for backward compatibility. Implementations (HashEmbedder, TFIDFEmbedder, VectorSearchStrategy) stay at engine level. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Self-Review: PR #36 — EmbeddingProvider + Vector SearchWhat's Good
Issues Found (Blocking)
Issues Found (Non-blocking)
VerdictChanges requested. Two blocking issues: the index/search disconnect and the silent vector length truncation. The overall implementation quality is high — fix those two and this is a strong merge. |
…tions - search() now uses pre-built index as vector cache instead of re-embedding every candidate on each call, making index()/index_batch() useful - Added vector length mismatch guards (ValueError) to cosine_similarity, euclidean_distance, and dot_product - Fixed vacuous assertion in test_threshold_filters_results (was <=, now <) - Added warnings.warn() to TFIDFEmbedder.embed() when called before fit() - Added tests for vector length mismatch errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Implements the vector search layer from the vision spec:
dimensions,embed(),embed_batch()fit()(better similarity, L2-normalized)cosine_similarity,euclidean_distance,dot_product(pure stdlib)Test plan