Skip to content

fix(memory): enforce wall-clock deadline on compact LLM path (#131)#162

Open
tcconnally wants to merge 1 commit into
mainfrom
fix/131-memory-compact-total-timeout
Open

fix(memory): enforce wall-clock deadline on compact LLM path (#131)#162
tcconnally wants to merge 1 commit into
mainfrom
fix/131-memory-compact-total-timeout

Conversation

@tcconnally
Copy link
Copy Markdown
Owner

Summary

Closes #131 — fourth PR in v1.0.6 milestone. Second-most complex remaining item (concurrency + deadline + deterministic fallback).

Pre-1.0.6, perseus memory compact with an LLM provider configured could hang for hours. The root cause: _mneme_compact_llm()run_llm() only enforced llm.timeout_s (default 30s) on the HTTP request itself. With streaming-token providers like Ollama serving large models, individual tokens arrive within timeout but total wall time was unbounded.

Fix

Wall-clock deadline + deterministic fallback

_memory_do_compact() now wraps the LLM call in a ThreadPoolExecutor.future.result(timeout=…). On timeout, the LLM future is abandoned and _deterministic_narrative() produces a usable narrative — operators get SOMETHING, plus a clear stderr signal:

> ⚠ Mnēmē compact: LLM provider 'ollama' exceeded
compact_total_timeout_s=180s; falling back to deterministic narrative.

Same fallback for any LLM exception

If the LLM call raises (provider unreachable, payload error, etc.) — memory compact no longer propagates the failure. Deterministic narrative is built; a stderr message is printed. Operators always get a usable narrative.

New config knob

memory:
  compact_total_timeout_s: 180   # 0 = pre-1.0.6 unbounded behavior

Observability

New audit event memory_compact_timeout with provider, total_timeout_s, workspace_hash fields.

Limitation (documented)

Python's ThreadPoolExecutor cannot truly kill a running thread. The in-flight HTTP request continues until urllib's per-request timeout fires. Worst-case wait is therefore compact_total_timeout_s + llm.timeout_s. The leaked thread is daemonized via cancel_futures=True + wait=False and cannot block process exit.

A true interruption would require switching from urllib.request to requests with a (connect, read) tuple timeout, OR running the LLM call in a child process. Both are larger refactors; deferred to v1.1+.

Files Changed (5)

  • src/perseus/agora.py_memory_do_compact() deadline wrapper + fallback
  • src/perseus/config.pymemory.compact_total_timeout_s: 180 in DEFAULT_CONFIG
  • tests/test_memory.py — 4 regression tests
  • CHANGELOG.md — v1.0.6 entry
  • perseus.py — rebuilt artifact

Tests

4 new regression tests in tests/test_memory.py:

  • test_memory_compact_total_timeout_falls_back_to_deterministic — slow LLM mock (2.0s) exceeds 0.5s deadline; assert <1.5s return + deterministic body + stderr
  • test_memory_compact_succeeds_within_total_timeout — fast LLM mock under deadline; assert LLM body used
  • test_memory_compact_llm_exception_falls_back_to_deterministic — exception path; assert no propagation + deterministic body + stderr
  • test_memory_compact_default_timeout_is_180s — config sanity

Test results

  • All 4 new regression tests pass
  • All 47 tests in test_memory.py + test_mneme.py pass

Migration Notes

The new default compact_total_timeout_s: 180 is strictly safer than pre-1.0.6 behavior. Users who want the old (unbounded) behavior can set it to 0 — but this is not recommended.


Fourth of 12 PRs in the v1.0.6 milestone. Suggested next: #139 (MCP _call_tool subprocess leak — third-most complex remaining).

Pre-1.0.6, `perseus memory compact` with an LLM provider configured could
hang for hours. The root cause: _mneme_compact_llm() → run_llm() only
enforced llm.timeout_s (default 30s) on the HTTP request itself. With
streaming-token providers like Ollama serving large models, individual
tokens arrive within timeout but total wall time was unbounded.

This patch adds a true wall-clock deadline at the _memory_do_compact()
level, with deterministic fallback so operators always get a usable
narrative.

src/perseus/agora.py:
- _memory_do_compact() now wraps the LLM call in
  ThreadPoolExecutor.future.result(timeout=total_timeout).
- New knob: memory.compact_total_timeout_s (default 180s).
- On timeout: stderr message + audit_event('memory_compact_timeout', ...)
  + fall back to _deterministic_narrative.
- On generic LLM exception (provider unreachable, payload error): same
  deterministic fallback path. memory compact never propagates LLM
  failures up to the operator.
- Executor is shutdown(wait=False, cancel_futures=True) so the call
  returns immediately on timeout. The worker thread is daemonized and
  cannot block process exit.

src/perseus/config.py:
- Add memory.compact_total_timeout_s: 180 to DEFAULT_CONFIG with
  explanatory comment about pre-1.0.6 behavior and the 0=disabled escape
  hatch.

Limitation (documented in code + CHANGELOG): Python's ThreadPoolExecutor
cannot truly kill a running thread. The in-flight HTTP request continues
until urllib's per-request timeout fires. Worst-case wait is therefore
compact_total_timeout_s + llm.timeout_s. Daemonized so it doesn't block
exit.

Tests (tests/test_memory.py):
- test_memory_compact_total_timeout_falls_back_to_deterministic — slow
  LLM mock exceeds 0.5s deadline; assert <1.5s return + deterministic body
  + stderr message
- test_memory_compact_succeeds_within_total_timeout — fast LLM mock under
  deadline; assert LLM body present
- test_memory_compact_llm_exception_falls_back_to_deterministic —
  exception in LLM path; assert no propagation + deterministic body +
  stderr message
- test_memory_compact_default_timeout_is_180s — config default

All 4 new regression tests pass. All 47 tests in test_memory.py and
test_mneme.py pass.

Closes #131
Refs milestone v1.0.6
@tcconnally tcconnally added this to the v1.0.6 milestone Jun 3, 2026
@tcconnally tcconnally added bug Something isn't working mneme P1-high labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working mneme P1-high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: perseus memory compact hangs indefinitely when Ollama is configured but the model is slow

2 participants