Skip to content

refactor: Cypher-Pushdown für Entitätsfilter + Bounded Queue im GraphMemoryUpdater#8

Merged
arn0ld87 merged 1 commit intomainfrom
claude/architecture-audit-refactor-9Vn6W
Apr 23, 2026
Merged

refactor: Cypher-Pushdown für Entitätsfilter + Bounded Queue im GraphMemoryUpdater#8
arn0ld87 merged 1 commit intomainfrom
claude/architecture-audit-refactor-9Vn6W

Conversation

@arn0ld87
Copy link
Copy Markdown
Owner

Summary

Setzt die zwei konkret umrissenen Refactorings aus dem Architektur-Audit um:

  • Cypher-Pushdown für EntityReader — statt get_all_nodes + get_all_edges im RAM zu filtern, läuft Filterung + Adjazenz jetzt als Cypher-Query in Neo4j. Neue Storage-Methode GraphStorage.get_filtered_entities_with_edges. Kontrakt von FilteredEntities bleibt identisch; die vier bestehenden Call-Sites (simulation_entities, simulation_prepare, simulation_history, simulation_manager) brauchen keine Änderung.
  • Bounded Queue + Backpressure im GraphMemoryUpdater — unbegrenzte Queue() ersetzt durch Queue(maxsize=GRAPH_MEMORY_QUEUE_MAX) mit put(timeout=GRAPH_MEMORY_PUT_TIMEOUT). Bei Überlauf wird das Event verworfen (neuer dropped_count in den Stats), statt OOM zu triggern. max_queue_size=0 behält die alte unbounded-Semantik als Opt-in.

Die großen Architektur-Vorschläge aus dem Audit (Event-Bus-IPC via Redis, Temporal Graph Evolution, Dynamic Ontology Mutation, Multi-Agent Consensus Metrics) sind als separate GitHub-Issues eingetragen — siehe unten in der Issue-Liste.

Konkrete Änderungen

  • backend/app/storage/graph_storage.py — neue abstrakte Methode get_filtered_entities_with_edges
  • backend/app/storage/neo4j_storage.py — Cypher-Implementierung mit zwei Varianten (mit/ohne Edge-Enrichment) und defensivem Post-Processing (Direction-Logic, Dedup, Typ-Whitelist als Parameter, nicht als Interpolation)
  • backend/app/services/entity_reader.py — delegiert jetzt an die Storage-Methode
  • backend/app/services/graph_memory_updater.py — bounded Queue, dropped_count, Config-Anbindung
  • backend/app/config.pyGRAPH_MEMORY_QUEUE_MAX (10000), GRAPH_MEMORY_PUT_TIMEOUT (2.0)
  • package.jsonlint:backend-Scope um refaktorierte Dateien und neue Tests erweitert
  • CHANGELOG.md — Eintrag unter v0.4.1
  • Neue Tests (14 Cases): test_entity_reader.py, test_graph_memory_updater.py, test_neo4j_filtered_entities.py

Test plan

  • npm run test:backend102/102 grün
  • npm run lint:backend (gescopter Rollout, inkl. neuer Dateien) — clean
  • npm run lint:frontend — clean
  • npm run build:frontend — clean
  • uv run python -m compileall app scripts — clean

Ausstehend vor Merge (benötigt laufendes Neo4j): Live-Probe des neuen Cypher-Queries gegen einen real gebauten Graphen, inkl. Messung der Memory/Latency-Verbesserung auf einem Graphen mit >1k Entitäten.

https://claude.ai/code/session_01CiS1Gg8J8YBkRy3S2QJvPi


Generated by Claude Code

Addresses two concrete bottlenecks from the architecture audit:

1. EntityReader.filter_defined_entities used to pull every node and every
   edge of a graph into Python memory and filter there. For graphs with
   more than a few thousand entities this caused large memory spikes and
   latency. The filtering, type whitelist check, and adjacency lookup now
   run inside Neo4j via a new GraphStorage.get_filtered_entities_with_edges
   method. EntityReader only assembles EntityNode dataclasses from the
   returned rows.

2. GraphMemoryUpdater held an unbounded queue of agent activities. When
   Neo4j ingestion was slower than OASIS event generation the queue grew
   without limit and could OOM the backend. The queue is now bounded via
   GRAPH_MEMORY_QUEUE_MAX (default 10000) with a put-side timeout
   (GRAPH_MEMORY_PUT_TIMEOUT) that applies backpressure and, on overflow,
   drops events with a dropped_count stat instead of crashing.

Public contracts are preserved: FilteredEntities keeps the same shape and
callers in simulation_entities / simulation_prepare / simulation_history /
simulation_manager do not need changes. GraphMemoryUpdater's constructor
keeps backwards compatibility via default parameters, and max_queue_size=0
still gives the old unbounded behaviour as an opt-in.

Tests: 14 new tests covering EntityReader delegation, Neo4jStorage dict
assembly (direction logic, dedup, type whitelist as parameter, edgeless
path), and the bounded queue (drop on overflow, DO_NOTHING skip path,
unbounded opt-in, stats surface).

Quality gate: 102/102 backend tests green, scoped ruff clean, frontend
lint + build clean.

https://claude.ai/code/session_01CiS1Gg8J8YBkRy3S2QJvPi
@arn0ld87 arn0ld87 marked this pull request as ready for review April 23, 2026 14:59
@arn0ld87 arn0ld87 merged commit 9f0050e into main Apr 23, 2026
5 checks passed
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Cypher-pushdown optimization for entity filtering and adjacency fetching, delegating these operations to Neo4j to reduce memory consumption in the backend. It also implements a bounded queue with backpressure in the GraphMemoryUpdater to protect against OOM crashes during high-load simulations. The updates include new configuration options, storage interface extensions, and a suite of unit tests. Reviewer feedback suggests adopting f-strings for logging to maintain stylistic consistency with the rest of the project.

Comment on lines +246 to +253
logger.info(
"GraphMemoryUpdater initialized: graph_id=%s, batch_size=%s, "
"queue_max=%s, put_timeout=%ss",
graph_id,
self.BATCH_SIZE,
self._max_queue_size or "unbounded",
self._put_timeout,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Zur Verbesserung der Konsistenz im Code-Stil würde ich vorschlagen, hier f-Strings für das Logging zu verwenden, so wie es auch in anderen Teilen dieser Datei (stop()-Methode) und des Projekts der Fall ist.

Die Verwendung von %-Formatierung ist zwar für das Logging nicht falsch, aber die Mischung verschiedener Stile kann die Lesbarkeit und Wartbarkeit erschweren.

        logger.info(
            f"GraphMemoryUpdater initialized: graph_id={graph_id}, batch_size={self.BATCH_SIZE}, "
            f"queue_max={self._max_queue_size or 'unbounded'}, put_timeout={self._put_timeout}s"
        )

Comment on lines +311 to +318
logger.error(
"Activity queue full for graph %s (maxsize=%s) — dropping "
"event: %s by %s",
self.graph_id,
self._max_queue_size,
activity.action_type,
activity.agent_name,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Auch hier wird die %-Formatierung für das Logging verwendet, während an anderen Stellen f-Strings genutzt werden. Um die Konsistenz im Code zu wahren, empfehle ich, auch diesen Log-Aufruf auf f-Strings umzustellen.

            logger.error(
                f"Activity queue full for graph {self.graph_id} (maxsize={self._max_queue_size}) — dropping "
                f"event: {activity.action_type} by {activity.agent_name}"
            )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants