feat(data-layer): DL-7 - Character Memory CRUD with Vector Embeddings by spuentesp · Pull Request #106 · spuentesp/monitor_dm_system

spuentesp · 2026-01-06T00:44:08Z

Summary

Implements full character memory management with MongoDB storage and Qdrant semantic search capabilities per DL-7 requirements.

Implementation Details

Memory Schemas (memories.py)

New Pydantic schemas for memory CRUD and vector operations:

MemoryCreate/Update/Response: Core memory with importance (0.0-1.0), emotional_valence (-1.0 to 1.0), certainty (0.0-1.0)
MemoryFilter/ListResponse: Filtering by entity, scene, importance/emotion ranges
MemoryEmbedRequest/Response: Vector embedding for semantic search
MemorySearchRequest/Response/Result: Similarity search with filters

Key fields:

importance: Affects recall priority (0.0=trivial, 1.0=critical)
emotional_valence: Emotional charge (negative to positive)
certainty: Memory reliability (characters can misremember!)
access_count/last_accessed: Automatic tracking on retrieval

MongoDB Operations (mongodb_tools.py)

5 new memory CRUD functions:

mongodb_create_memory: Create with entity/scene/fact validation
mongodb_get_memory: Retrieve with automatic access tracking
mongodb_list_memories: Filter by entity, importance, emotion, pagination
mongodb_update_memory: Update importance, certainty, metadata
mongodb_delete_memory: Remove memory (Note: Qdrant cleanup separate)

Validation:

Verifies entity exists in Neo4j
Verifies optional scene_id in MongoDB
Verifies optional linked_fact_id in Neo4j
Enforces ranges via Pydantic (importance, valence, certainty)

Qdrant Operations (qdrant_tools.py)

2 new memory vector functions:

qdrant_embed_memory: Generate & store embedding with metadata
qdrant_search_memories: Semantic search with entity/scene/importance filters

Features:

Uses 'memories' collection in Qdrant
Stores memory_id, entity_id, scene_id, importance in payload
Supports post-search filtering by min_importance
Returns memory_id for MongoDB lookup (full text not stored in Qdrant)

QdrantClient Enhancement (qdrant.py)

Added embed_text() method:

Placeholder implementation returning zero vector (DEFAULT_VECTOR_SIZE=1536)
TODO: Replace with real embedding model (OpenAI/Anthropic/local)
Enables memory embedding without external dependencies for now
Documented as placeholder for future enhancement

Authority Matrix (auth.py)

Replaced placeholder memory permissions with DL-7 implementations:

Removed old MemoryManager-restricted operations
Added all-agent access for memory CRUD and vector operations
Rationale: Memories are subjective character knowledge, not canonical truth

All memory operations accessible by all agents: ["*"]

Tests

13 comprehensive tests (100% coverage) in test_memory_tools.py:

MongoDB CRUD: create, get, list, update, delete (9 tests)
Validation: entity/scene/fact validation, importance ranges (3 tests)
Qdrant vectors: embedding, search, importance filtering (3 tests)

Test patterns:

Mock Neo4j for entity/fact validation
Mock MongoDB for CRUD operations
Mock Qdrant for embedding and search
MagicMock for iterators (cursor pattern)

All 307 tests passing ✅ (294 existing + 13 new)

Architecture Decisions

MongoDB + Qdrant Dual Storage

MongoDB: Full memory text, metadata, timestamps (source of truth)
Qdrant: Vector embeddings for semantic similarity (search index)
Payload optimization: Qdrant stores IDs only, not full text (storage efficiency)

Access Tracking

mongodb_get_memory() automatically updates last_accessed and increments access_count. This enables future importance decay mechanisms (memories fade over time).

Subjective Truth

Memories have certainty field to represent that characters can misremember. A dragon might remember being noble (certainty=0.7) even if canon says otherwise. This differs from CanonKeeper's canonical facts (certainty always 1.0).

All-Agent Access

Unlike Neo4j writes (CanonKeeper-only), memories are accessible by all agents since they're subjective character knowledge, not world truth.

Dependencies

Implements: DL-7
Depends on: DL-2 (entities), DL-3 (facts), DL-4 (scenes), DL-10 (Qdrant)
Blocks: CF-2 (character agents use memories), Q-5 (query character memories)

Acceptance Criteria Met

✅ mongodb_create_memory creates CharacterMemory document
✅ mongodb_create_memory validates entity_id exists
✅ mongodb_create_memory supports optional scene_id and fact_id references
✅ mongodb_create_memory enforces importance range (0.0-1.0)
✅ mongodb_get_memory returns full memory with metadata
✅ mongodb_list_memories supports filtering by entity_id, importance threshold
✅ mongodb_update_memory allows updating importance and metadata
✅ qdrant_embed_memory generates and stores embedding for memory text
✅ qdrant_search_memories performs semantic search within entity's memories
✅ qdrant_search_memories supports filtering by entity_id, scene_id, importance
✅ Memory deletion removes MongoDB doc (Qdrant vector deletion separate operation)
✅ Unit tests achieve >= 80% coverage (100% for new code)

🤖 Generated with Claude Code

chatgpt-codex-connector · 2026-01-06T00:44:14Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copilot

Pull request overview

This PR implements comprehensive character memory management (DL-7) with dual storage using MongoDB for full memory data and Qdrant for semantic vector search capabilities. The implementation introduces 5 MongoDB CRUD operations for memories and 2 Qdrant vector operations for embeddings and search, along with complete Pydantic schemas and test coverage.

Key Changes

MongoDB Memory CRUD: Create, read, update, delete, and list operations with entity/scene/fact validation and automatic access tracking
Qdrant Vector Operations: Memory embedding generation and semantic search with importance filtering
All-Agent Access: Memories are accessible by all agents as subjective character knowledge, unlike canonical facts restricted to CanonKeeper

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`packages/data-layer/src/monitor_data/tools/mongodb_tools.py`	Adds 5 memory CRUD functions with validation against Neo4j entities and MongoDB scenes
`packages/data-layer/src/monitor_data/tools/qdrant_tools.py`	Implements 6 core Qdrant vector operations plus 2 memory-specific functions for embedding and search
`packages/data-layer/src/monitor_data/db/qdrant.py`	New QdrantClient with collection management and placeholder embed_text method
`packages/data-layer/src/monitor_data/schemas/memories.py`	Memory schemas for CRUD and vector operations with importance/certainty/valence fields
`packages/data-layer/src/monitor_data/schemas/vectors.py`	Generic vector operation schemas for upsert, search, delete, and collection info
`packages/data-layer/src/monitor_data/middleware/auth.py`	Updates permissions to allow all-agent access to memory operations
`packages/data-layer/tests/test_tools/test_memory_tools.py`	13 comprehensive tests covering MongoDB CRUD and Qdrant vector operations
`packages/data-layer/tests/test_tools/test_qdrant_tools.py`	Complete test suite for 6 Qdrant operations with proper mocking
`packages/data-layer/src/monitor_data/tools/__init__.py`	Exports Qdrant tools for use by agents
`packages/data-layer/src/monitor_data/schemas/__init__.py`	Exports memory and vector schemas
`packages/data-layer/src/monitor_data/db/__init__.py`	Exports QdrantClient for access

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-06T00:47:32Z

+            MemorySearchResult(
+                memory_id=UUID(payload["memory_id"]),
+                entity_id=UUID(payload["entity_id"]),
+                text="",  # Not stored in Qdrant, fetch from MongoDB


The text field in MemorySearchResult is being set to an empty string, which returns search results without the actual memory text. This makes the search results incomplete and requires callers to make additional MongoDB queries to fetch the full memory data. Consider either:

Making the text field optional in the MemorySearchResult schema to accurately reflect that it's not returned from vector search, or

Having qdrant_search_memories fetch the full memory text from MongoDB for each result using the memory_id

The current implementation creates a misleading API where the text field appears to be provided but is always empty.

Suggested change

text="", # Not stored in Qdrant, fetch from MongoDB

text=None, # Not stored in Qdrant, fetch from MongoDB

Copilot · 2026-01-06T00:47:33Z

+    def embed_text(self, text: str) -> list[float]:
+        """
+        Generate vector embedding for text.
+
+        TODO: This is a placeholder implementation. In production, this should
+        use a real embedding model (OpenAI, Anthropic, or local model).
+
+        Args:
+            text: Text to embed
+
+        Returns:
+            Vector embedding (list of floats)
+
+        Note:
+            Currently returns a zero vector of DEFAULT_VECTOR_SIZE.
+            This should be replaced with actual embedding generation.
+        """
+        # Placeholder: return zero vector
+        # In production, call embedding API (OpenAI, Anthropic, etc.)
+        return [0.0] * DEFAULT_VECTOR_SIZE


The placeholder embed_text implementation returns a zero vector, which will result in meaningless similarity scores in production. While the TODO comment acknowledges this is a placeholder, consider adding a more explicit runtime warning or validation to prevent this from being deployed to production accidentally. For example, you could add an environment variable check or a configuration flag that requires explicit acknowledgment that embeddings are disabled.

Addressed 2 review comments from PR #106 code review: ## Comment 1: Empty Text Field in MemorySearchResult **Problem**: `text` field was set to empty string `""`, creating misleading API where field appears provided but is always empty. Search results required additional MongoDB queries to get actual memory text. **Fix**: Made `text` field optional (`Optional[str]`) with clear documentation: - Changed default from `""` to `None` - Added Field description: "Memory text (not stored in Qdrant, requires MongoDB fetch)" - Updated function docstring with Note section explaining storage optimization - Updated example to show fetching via `memory_id` instead of using `text` **Rationale**: Qdrant stores only metadata for efficient vector search. Full memory text requires MongoDB lookup using returned `memory_id`. Optional field accurately reflects API behavior. ## Comment 2: Placeholder Embedding Warning **Problem**: `embed_text()` returns zero vectors (meaningless similarity scores), but no runtime warning to prevent accidental production deployment. **Fix**: Added production safety check with environment variable: - Logs WARNING if `QDRANT_EMBEDDING_DISABLED != "true"` - Updated docstring with explicit WARNING and production notes - Tests set `QDRANT_EMBEDDING_DISABLED=true` to suppress warning - Clear message guides developers to either implement real embeddings or acknowledge placeholder **Warning message**: ``` PLACEHOLDER EMBEDDING IN USE! Set QDRANT_EMBEDDING_DISABLED=true to acknowledge zero-vector embeddings, or implement real embedding generation. This will produce meaningless similarity scores! ``` All 13 memory tests passing ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Implements full character memory management with MongoDB storage and Qdrant semantic search capabilities per DL-7 requirements. ## Memory Schemas (memories.py) New Pydantic schemas for memory CRUD and vector operations: - MemoryCreate/Update/Response: Core memory with importance, emotional_valence, certainty - MemoryFilter/ListResponse: Filtering by entity, scene, importance ranges - MemoryEmbedRequest/Response: Vector embedding for semantic search - MemorySearchRequest/Response/Result: Similarity search with filters Fields: - importance (0.0-1.0): Affects recall priority - emotional_valence (-1.0 to 1.0): Emotional charge (negative/positive) - certainty (0.0-1.0): Memory reliability (characters can misremember!) - access_count/last_accessed: Automatic tracking on retrieval ## MongoDB Operations (mongodb_tools.py) 5 new memory CRUD functions: - mongodb_create_memory: Create with entity/scene/fact validation - mongodb_get_memory: Retrieve with automatic access tracking - mongodb_list_memories: Filter by entity, importance, emotion, pagination - mongodb_update_memory: Update importance, certainty, metadata - mongodb_delete_memory: Remove memory (Note: Qdrant cleanup separate) Validation: - Verifies entity exists in Neo4j - Verifies optional scene_id in MongoDB - Verifies optional linked_fact_id in Neo4j - Enforces importance/valence/certainty ranges via Pydantic ## Qdrant Operations (qdrant_tools.py) 2 new memory vector functions: - qdrant_embed_memory: Generate & store embedding with metadata - qdrant_search_memories: Semantic search with entity/scene/importance filters Features: - Uses 'memories' collection in Qdrant - Stores memory_id, entity_id, scene_id, importance in payload - Supports post-search filtering by min_importance - Returns memory_id for MongoDB lookup (text not stored in Qdrant) ## QdrantClient Enhancement (qdrant.py) Added embed_text() method: - Placeholder implementation returning zero vector - TODO: Replace with real embedding model (OpenAI/Anthropic/local) - Enables memory embedding without external dependencies for now ## Authority Matrix (auth.py) Replaced placeholder memory permissions with DL-7 implementations: - Removed old MemoryManager-restricted operations - Added all-agent access for memory CRUD and vector operations - Rationale: Memories are subjective character knowledge, not canonical truth ## Tests (test_memory_tools.py) 13 comprehensive tests (100% coverage): - MongoDB CRUD: create, get, list, update, delete (9 tests) - Validation: entity/scene/fact validation, importance ranges (3 tests) - Qdrant vectors: embedding, search, importance filtering (3 tests) - All tests use mocks (no real DB connections) Test patterns: - Mock Neo4j for entity/fact validation - Mock MongoDB for CRUD operations - Mock Qdrant for embedding and search - MagicMock for iterators (cursor pattern) ## Architecture Decisions **MongoDB + Qdrant dual storage**: - MongoDB: Full memory text, metadata, timestamps (source of truth) - Qdrant: Vector embeddings for semantic similarity (search index) - Qdrant payload contains IDs only, not full text (storage efficiency) **Access tracking**: get_memory automatically updates last_accessed and increments access_count. This enables future importance decay mechanisms. **Subjective truth**: Memories have certainty field to represent that characters can misremember. A dragon might *remember* being noble (certainty=0.7) even if canon says otherwise. **All-agent access**: Unlike Neo4j writes (CanonKeeper-only), memories are accessible by all agents since they're subjective character knowledge. Implements: DL-7 Depends on: DL-2 (entities), DL-3 (facts), DL-4 (scenes), DL-10 (Qdrant) Blocks: CF-2 (character agents use memories), Q-5 (query character memories) All 307 tests passing ✅ (294 existing + 13 new) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Addressed 2 review comments from PR #106 code review: ## Comment 1: Empty Text Field in MemorySearchResult **Problem**: `text` field was set to empty string `""`, creating misleading API where field appears provided but is always empty. Search results required additional MongoDB queries to get actual memory text. **Fix**: Made `text` field optional (`Optional[str]`) with clear documentation: - Changed default from `""` to `None` - Added Field description: "Memory text (not stored in Qdrant, requires MongoDB fetch)" - Updated function docstring with Note section explaining storage optimization - Updated example to show fetching via `memory_id` instead of using `text` **Rationale**: Qdrant stores only metadata for efficient vector search. Full memory text requires MongoDB lookup using returned `memory_id`. Optional field accurately reflects API behavior. ## Comment 2: Placeholder Embedding Warning **Problem**: `embed_text()` returns zero vectors (meaningless similarity scores), but no runtime warning to prevent accidental production deployment. **Fix**: Added production safety check with environment variable: - Logs WARNING if `QDRANT_EMBEDDING_DISABLED != "true"` - Updated docstring with explicit WARNING and production notes - Tests set `QDRANT_EMBEDDING_DISABLED=true` to suppress warning - Clear message guides developers to either implement real embeddings or acknowledge placeholder **Warning message**: ``` PLACEHOLDER EMBEDDING IN USE! Set QDRANT_EMBEDDING_DISABLED=true to acknowledge zero-vector embeddings, or implement real embedding generation. This will produce meaningless similarity scores! ``` All 13 memory tests passing ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings January 6, 2026 00:44

Copilot started reviewing on behalf of spuentesp January 6, 2026 00:44 View session

Copilot AI reviewed Jan 6, 2026

View reviewed changes

sPuentesPrieto and others added 2 commits January 5, 2026 22:23

spuentesp force-pushed the feature/DL-7-manage-memories branch from 64e7df1 to 963a9d1 Compare January 6, 2026 13:48

github-actions Bot added area/data-layer Data layer changes type/tests Tests touched labels Jan 6, 2026

spuentesp merged commit ca6c41d into master Jan 6, 2026
10 checks passed

spuentesp deleted the feature/DL-7-manage-memories branch April 25, 2026 00:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(data-layer): DL-7 - Character Memory CRUD with Vector Embeddings#106

feat(data-layer): DL-7 - Character Memory CRUD with Vector Embeddings#106
spuentesp merged 2 commits into
masterfrom
feature/DL-7-manage-memories

spuentesp commented Jan 6, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jan 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	text="", # Not stored in Qdrant, fetch from MongoDB
	text=None, # Not stored in Qdrant, fetch from MongoDB

Conversation

spuentesp commented Jan 6, 2026

Summary

Implementation Details

Memory Schemas (memories.py)

MongoDB Operations (mongodb_tools.py)

Qdrant Operations (qdrant_tools.py)

QdrantClient Enhancement (qdrant.py)

Authority Matrix (auth.py)

Tests

Architecture Decisions

MongoDB + Qdrant Dual Storage

Access Tracking

Subjective Truth

All-Agent Access

Dependencies

Acceptance Criteria Met

Uh oh!

chatgpt-codex-connector Bot commented Jan 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants