fix: re-embed deletes the vector it just inserted (single-chunk entries vanish from recall) by mikestanley00 · Pull Request #136 · rahilp/second-brain-cloudflare

mikestanley00 · 2026-06-08T00:09:56Z

Summary

For single-chunk entries, re-embedding could delete the entry's vector from Vectorize, leaving the row in D1 but unsearchable. This deletes only genuinely-stale vectors instead.

Root cause

storeEntry keys a single chunk by the entry id:

id: chunks.length === 1 ? id : `${id}-chunk-${i}`

The re-embed paths use "insert new → delete old", deleting the full previous vector_ids set. For a single-chunk entry the new vector reuses the old id, so the cleanup step deletes the vector that was just inserted. The entry then exists in D1 with vector_ids pointing at a vector that's no longer in the index, so recall never returns it — semantic search, and even exact-term queries, miss it.

Affected: POST /update, the MCP update tool, the large-append re-embed, and the smart-merge / replace capture paths.

Fix

Add deleteStaleVectors(old, new) which deletes only ids not reused by the new embedding, and route the four re-embed sites through it. The genuine full-deletes (forget, conflicting-entry removal) are unchanged.

Reproduce (before this PR)

Capture a short (single-chunk) memory.
Update it (POST /update), or trigger a smart-merge / replace.
recall no longer returns it; wrangler vectorize get-vectors <index> --ids <entryId> returns nothing, though D1 still lists it in vector_ids.

Testing

Updated the four tests that asserted the old full-delete behavior.
Added a single-chunk id-reuse regression test.
npm run typecheck clean; npm run test:coverage → 271 tests pass.

For single-chunk entries the Vectorize id equals the entry id (storeEntry keys a single chunk by `id`). The "insert new -> delete old" re-embed pattern then deleted the full previous `vector_ids` set — including the id the new embedding had just reused — so the entry was left in D1 but absent from Vectorize, and therefore invisible to recall (semantic search, and even exact-term queries). This affected POST /update, the MCP `update` tool, the large-append re-embed path, and the smart-merge / replace capture paths. Add `deleteStaleVectors(old, new)` which deletes only ids not reused by the new embedding, and route the four re-embed sites through it. The genuine full-deletes (forget, conflicting-entry removal) are unchanged. Update the four tests that asserted the old full-delete behavior, and add a single-chunk id-reuse regression test. typecheck + 271 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

rahilp · 2026-06-08T02:28:29Z

Thanks @mikestanley00 -- great find and perfect fix! Merging this in to the branch now and will be available in main. I'll be cutting a new release later this week and this will be included in that release! Thank you so much!

rahilp merged commit e7eefc6 into rahilp:main Jun 8, 2026
1 check passed

rahilp linked an issue Jun 8, 2026 that may be closed by this pull request

Single-chunk memories disappear from recall after update/merge #135

Closed

mikestanley00 deleted the fix/reembed-preserve-reused-vector branch June 8, 2026 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: re-embed deletes the vector it just inserted (single-chunk entries vanish from recall)#136

fix: re-embed deletes the vector it just inserted (single-chunk entries vanish from recall)#136
rahilp merged 1 commit into
rahilp:mainfrom
mikestanley00:fix/reembed-preserve-reused-vector

mikestanley00 commented Jun 8, 2026

Uh oh!

rahilp commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mikestanley00 commented Jun 8, 2026

Summary

Root cause

Fix

Reproduce (before this PR)

Testing

Uh oh!

rahilp commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants