Skip to content

Multi-process live memory, namespaces, topic erasure, 8x cheaper hybrid#2

Merged
vornicx merged 1 commit into
mainfrom
cursor/eval-methodology-guard-provenance
Jun 10, 2026
Merged

Multi-process live memory, namespaces, topic erasure, 8x cheaper hybrid#2
vornicx merged 1 commit into
mainfrom
cursor/eval-methodology-guard-provenance

Conversation

@vornicx

@vornicx vornicx commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Weak points hardened against the LLM-at-ingest / hosted-memory field, each measured or tested:

  • SQLiteStore: lock-guarded shared connection (MCP worker threads) + PRAGMA data_version staleness probe, so several MCP clients pointed at one DB file see each other's writes live, without restarts. Two-process + threaded tests.
  • Namespaces: SDK metadata_filter on recall/build_context (neighbour-window expansion respects it too — no scope leaks), MIDAS_MCP_NAMESPACE + per-call namespace on every MCP tool, stats by_namespace.
  • Right-to-be-forgotten: Memory.forget_matching (relevance-matched erasure, dry-run preview by default in MCP, full deletion audit, deliberately bypasses durability protections) and chain-safe Memory.forget (supersession chains are relinked, not orphaned). Policy text gains a FORGET ON REQUEST step.
  • Hybrid recall: BM25 index cached on a store change counter + per-term posting lists — ~66 ms -> ~8 ms/query on a stable 5k store, scores bit-identical.
  • MCP build_context: ships the measured temporal grounding ("# Today is" anchor
    • relative ages) and exposes limit/hybrid/namespace.

Measured negative, kept opt-in: hybrid BM25+RRF hurts LongMemEval-s retrieval (multi-session recall@k 0.97->0.81, temporal 0.95->0.86; ties elsewhere), so it stays off by default — documented in BENCHMARKS.md. Default dense path re-run after all changes: bit-identical to baseline (0.97/0.95/0.89, n=40 seed 0).

156 tests pass.

Weak points hardened against the LLM-at-ingest / hosted-memory field, each
measured or tested:

- SQLiteStore: lock-guarded shared connection (MCP worker threads) + PRAGMA
  data_version staleness probe, so several MCP clients pointed at one DB file
  see each other's writes live, without restarts. Two-process + threaded tests.
- Namespaces: SDK metadata_filter on recall/build_context (neighbour-window
  expansion respects it too — no scope leaks), MIDAS_MCP_NAMESPACE + per-call
  namespace on every MCP tool, stats by_namespace.
- Right-to-be-forgotten: Memory.forget_matching (relevance-matched erasure,
  dry-run preview by default in MCP, full deletion audit, deliberately bypasses
  durability protections) and chain-safe Memory.forget (supersession chains are
  relinked, not orphaned). Policy text gains a FORGET ON REQUEST step.
- Hybrid recall: BM25 index cached on a store change counter + per-term posting
  lists — ~66 ms -> ~8 ms/query on a stable 5k store, scores bit-identical.
- MCP build_context: ships the measured temporal grounding ("# Today is" anchor
  + relative ages) and exposes limit/hybrid/namespace.

Measured negative, kept opt-in: hybrid BM25+RRF hurts LongMemEval-s retrieval
(multi-session recall@k 0.97->0.81, temporal 0.95->0.86; ties elsewhere), so it
stays off by default — documented in BENCHMARKS.md. Default dense path re-run
after all changes: bit-identical to baseline (0.97/0.95/0.89, n=40 seed 0).

156 tests pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@vornicx vornicx merged commit 4be2533 into main Jun 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant