Skip to content

fix(memory): graph extraction timeout too short for large contexts (30s) #2169

@bug-ops

Description

@bug-ops

Summary

Graph extraction silently fails with timeout in production when context is large. Observed 6+ timeouts in a single session.

Evidence (production log 2026-03-23)

WARN zeph_memory::semantic::graph: graph extraction timed out
WARN zeph_memory::semantic::graph: graph extraction timed out
WARN zeph_memory::semantic::graph: graph extraction timed out
WARN zeph_memory::semantic::graph: graph extraction timed out
WARN zeph_memory::semantic::graph: graph extraction timed out
WARN zeph_memory::semantic::graph: graph extraction timed out

Session had 71 messages, ~313KB request payload, 77 tools loaded.

Root Cause

extraction_timeout_secs = 30 (default in config). The extraction task calls the LLM provider with the full content + context messages. For gpt-5.4-mini class models with large prompts, inference can take >30s.

Config path: [memory.graph] extraction_timeout_secs = 30

Code: crates/zeph-memory/src/semantic/graph.rsspawn_graph_extraction wraps with tokio::time::timeout(Duration::from_secs(config.extraction_timeout_secs), ...).

Impact

  • Silent failure — no user-visible error, no retry. Entity extraction silently skipped.
  • Entities from those messages are never added to the graph → SYNAPSE recall degraded for affected sessions.
  • Metrics graph_extraction_failures increments but only visible in TUI debug panel.

Proposed Fix

Options (in preference order):

  1. Increase default extraction_timeout_secs from 30 to 60 or 90.
  2. Add configurable extraction_max_content_bytes to truncate input before sending to LLM (faster inference).
  3. Log a WARN with content length when timeout fires to make diagnosis easier: graph extraction timed out content_len=N timeout_secs=30.

Option 3 should be done regardless as a diagnostic improvement.

Reproduction

Use production config with gpt-5.4-mini, conversation with 50+ messages, and extraction_timeout_secs = 30. Each assistant turn triggers extraction → timeouts accumulate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmemoryzeph-memory crate (SQLite)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions