feat(mcp): MCP server exposing memory tools by brgsk · Pull Request #35 · vstorm-co/memv

brgsk · 2026-05-17T23:24:02Z

Summary

New memv-mcp CLI exposing memory operations over MCP (stdio + streamable-http) with 5 tools: search_memory, add_memory, add_conversation, list_memories, delete_memory.
Optional mcp extra (pip install memvee[mcp]); LLM model is optional — knowledge extraction stays off without it.
Server factory (memv.mcp.server.create_server) accepts injected embedding/LLM clients; tool logic split into plain do_* coroutines for direct unit testing.
Docs page advanced/mcp-server.md with install, CLI flags, client configs (Claude Desktop / Code / Cursor), and HTTP-transport notes.
Pin griffe>=1.0,<2 — griffe 2.0 split into griffe/griffecli/griffelib and breaks mkdocstrings.

Test plan

make all — ruff format/check, ty, 242 pytest passed (incl. tests/test_mcp.py)
make docs builds clean under --strict
memv-mcp --help loads via the mcp extra
End-to-end round-trip from an MCP client (Claude Desktop): add_conversation → search_memory returns expected statement

5 tools: search_memory, add_memory, add_conversation, list_memories, delete_memory. Server runs via `memv-mcp` CLI entry point over stdio. - `memvee[mcp]` optional dependency - Config-level default_user_id, optional per-call override - `dev.py` entry point for MCP Inspector (`mcp dev`) - Accepts pre-built embedding/LLM clients for programmatic use - 13 tests covering tool logic + full add/search/delete cycle

- New advanced/mcp-server.md covers install, CLI flags, tools, client setup (Claude Desktop / Code / Cursor), HTTP transport, and programmatic embedding. - mkdocs nav: add MCP Server under Advanced. - Pin griffe>=1.0,<2 — griffe 2.0 split into griffe/griffecli/griffelib and the new wheel ships no public API surface mkdocstrings expects.

- Move griffe pin from core dependencies to the docs group — it's a mkdocstrings transitive, not a runtime dep. - Add include_expired param to list_memories so the [expired] status branch is reachable; default off keeps current behavior. - Document add_conversation latency (inline LLM round-trip) in the tool docstring and the MCP docs page.

- do_add_conversation: reword extraction message to "from all pending messages" — process() drains the whole user buffer, not just the freshly-added exchange, so the prior wording overclaimed. - test_mcp: switch top_k assertion to line-level matching ('- ' could match inside a statement body). - test_mcp: add test_add_conversation_with_llm_extracts_knowledge covering the has_llm=True path end-to-end via MockLLM.

delete_knowledge in the storage layer keys only on UUID — without an explicit ownership check at the MCP boundary, any caller knowing another user's knowledge UUID could delete it. - do_delete_memory now requires user_id and verifies ownership via get_knowledge before deleting; unknown UUIDs and foreign-user UUIDs both return "not found" to avoid leaking existence. - delete_memory MCP tool gains an optional user_id arg (falls back to the server's default), matching the rest of the surface. - New test_delete_rejects_cross_user covers the isolation guarantee.

claude · 2026-05-17T23:56:21Z

Review

Two issues found; everything else looks good.

Bug — concurrent `process()` calls in `do_add_conversation` (blocking)

memory.process(user_id) has no per-user lock at the Pipeline level. Two concurrent add_conversation tool calls for the same user_id (easy to trigger under the HTTP transport) both race through Pipeline.process: they load the same unprocessed messages, fire separate LLM round-trips, and create duplicate extraction results that then have to be suppressed by the knowledge-dedup threshold. The existing guard lives in TaskManager.schedule_processing but memory.process() bypasses the TaskManager entirely.

Suggested fix: replace memory.process(user_id) with memory.flush(user_id) in do_add_conversation. flush routes through schedule_processing (which skips if a task is already running) and then awaits the result, so concurrent callers coalesce onto the same task. (See inline comment on server.py:49.)

Minor — weak assertion in `test_search_respects_top_k`

assert len(lines) <= 2 passes with 0 results, so it does not actually verify that top_k limits the output. Should be 1 <= len(lines) <= 2. (See inline comment on test_mcp.py:55.)

…ertion - do_add_conversation: switch memory.process() -> memory.flush(), which routes through TaskManager.schedule_processing's per-user guard. Concurrent add_conversation calls for the same user now coalesce onto a single task instead of racing through the pipeline and double-charging the LLM. - test_search_respects_top_k: tighten upper bound to a closed range so the assertion can't pass trivially when the retriever returns zero results.

brgsk added 2 commits April 8, 2026 10:12