-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Summary
Graph extraction silently fails with timeout in production when context is large. Observed 6+ timeouts in a single session.
Evidence (production log 2026-03-23)
WARN zeph_memory::semantic::graph: graph extraction timed out
WARN zeph_memory::semantic::graph: graph extraction timed out
WARN zeph_memory::semantic::graph: graph extraction timed out
WARN zeph_memory::semantic::graph: graph extraction timed out
WARN zeph_memory::semantic::graph: graph extraction timed out
WARN zeph_memory::semantic::graph: graph extraction timed out
Session had 71 messages, ~313KB request payload, 77 tools loaded.
Root Cause
extraction_timeout_secs = 30 (default in config). The extraction task calls the LLM provider with the full content + context messages. For gpt-5.4-mini class models with large prompts, inference can take >30s.
Config path: [memory.graph] extraction_timeout_secs = 30
Code: crates/zeph-memory/src/semantic/graph.rs — spawn_graph_extraction wraps with tokio::time::timeout(Duration::from_secs(config.extraction_timeout_secs), ...).
Impact
- Silent failure — no user-visible error, no retry. Entity extraction silently skipped.
- Entities from those messages are never added to the graph → SYNAPSE recall degraded for affected sessions.
- Metrics
graph_extraction_failuresincrements but only visible in TUI debug panel.
Proposed Fix
Options (in preference order):
- Increase default
extraction_timeout_secsfrom 30 to 60 or 90. - Add configurable
extraction_max_content_bytesto truncate input before sending to LLM (faster inference). - Log a WARN with content length when timeout fires to make diagnosis easier:
graph extraction timed out content_len=N timeout_secs=30.
Option 3 should be done regardless as a diagnostic improvement.
Reproduction
Use production config with gpt-5.4-mini, conversation with 50+ messages, and extraction_timeout_secs = 30. Each assistant turn triggers extraction → timeouts accumulate.