fix(memory): enforce wall-clock deadline on compact LLM path (#131)#162
Open
tcconnally wants to merge 1 commit into
Open
fix(memory): enforce wall-clock deadline on compact LLM path (#131)#162tcconnally wants to merge 1 commit into
tcconnally wants to merge 1 commit into
Conversation
Pre-1.0.6, `perseus memory compact` with an LLM provider configured could
hang for hours. The root cause: _mneme_compact_llm() → run_llm() only
enforced llm.timeout_s (default 30s) on the HTTP request itself. With
streaming-token providers like Ollama serving large models, individual
tokens arrive within timeout but total wall time was unbounded.
This patch adds a true wall-clock deadline at the _memory_do_compact()
level, with deterministic fallback so operators always get a usable
narrative.
src/perseus/agora.py:
- _memory_do_compact() now wraps the LLM call in
ThreadPoolExecutor.future.result(timeout=total_timeout).
- New knob: memory.compact_total_timeout_s (default 180s).
- On timeout: stderr message + audit_event('memory_compact_timeout', ...)
+ fall back to _deterministic_narrative.
- On generic LLM exception (provider unreachable, payload error): same
deterministic fallback path. memory compact never propagates LLM
failures up to the operator.
- Executor is shutdown(wait=False, cancel_futures=True) so the call
returns immediately on timeout. The worker thread is daemonized and
cannot block process exit.
src/perseus/config.py:
- Add memory.compact_total_timeout_s: 180 to DEFAULT_CONFIG with
explanatory comment about pre-1.0.6 behavior and the 0=disabled escape
hatch.
Limitation (documented in code + CHANGELOG): Python's ThreadPoolExecutor
cannot truly kill a running thread. The in-flight HTTP request continues
until urllib's per-request timeout fires. Worst-case wait is therefore
compact_total_timeout_s + llm.timeout_s. Daemonized so it doesn't block
exit.
Tests (tests/test_memory.py):
- test_memory_compact_total_timeout_falls_back_to_deterministic — slow
LLM mock exceeds 0.5s deadline; assert <1.5s return + deterministic body
+ stderr message
- test_memory_compact_succeeds_within_total_timeout — fast LLM mock under
deadline; assert LLM body present
- test_memory_compact_llm_exception_falls_back_to_deterministic —
exception in LLM path; assert no propagation + deterministic body +
stderr message
- test_memory_compact_default_timeout_is_180s — config default
All 4 new regression tests pass. All 47 tests in test_memory.py and
test_mneme.py pass.
Closes #131
Refs milestone v1.0.6
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #131 — fourth PR in v1.0.6 milestone. Second-most complex remaining item (concurrency + deadline + deterministic fallback).
Pre-1.0.6,
perseus memory compactwith an LLM provider configured could hang for hours. The root cause:_mneme_compact_llm()→run_llm()only enforcedllm.timeout_s(default 30s) on the HTTP request itself. With streaming-token providers like Ollama serving large models, individual tokens arrive within timeout but total wall time was unbounded.Fix
Wall-clock deadline + deterministic fallback
_memory_do_compact()now wraps the LLM call in aThreadPoolExecutor.future.result(timeout=…). On timeout, the LLM future is abandoned and_deterministic_narrative()produces a usable narrative — operators get SOMETHING, plus a clear stderr signal:Same fallback for any LLM exception
If the LLM call raises (provider unreachable, payload error, etc.) —
memory compactno longer propagates the failure. Deterministic narrative is built; a stderr message is printed. Operators always get a usable narrative.New config knob
Observability
New audit event
memory_compact_timeoutwithprovider,total_timeout_s,workspace_hashfields.Limitation (documented)
Python's
ThreadPoolExecutorcannot truly kill a running thread. The in-flight HTTP request continues until urllib's per-request timeout fires. Worst-case wait is thereforecompact_total_timeout_s + llm.timeout_s. The leaked thread is daemonized viacancel_futures=True+wait=Falseand cannot block process exit.A true interruption would require switching from
urllib.requesttorequestswith a(connect, read)tuple timeout, OR running the LLM call in a child process. Both are larger refactors; deferred to v1.1+.Files Changed (5)
src/perseus/agora.py—_memory_do_compact()deadline wrapper + fallbacksrc/perseus/config.py—memory.compact_total_timeout_s: 180in DEFAULT_CONFIGtests/test_memory.py— 4 regression testsCHANGELOG.md— v1.0.6 entryperseus.py— rebuilt artifactTests
4 new regression tests in
tests/test_memory.py:test_memory_compact_total_timeout_falls_back_to_deterministic— slow LLM mock (2.0s) exceeds 0.5s deadline; assert <1.5s return + deterministic body + stderrtest_memory_compact_succeeds_within_total_timeout— fast LLM mock under deadline; assert LLM body usedtest_memory_compact_llm_exception_falls_back_to_deterministic— exception path; assert no propagation + deterministic body + stderrtest_memory_compact_default_timeout_is_180s— config sanityTest results
test_memory.py+test_mneme.pypassMigration Notes
The new default
compact_total_timeout_s: 180is strictly safer than pre-1.0.6 behavior. Users who want the old (unbounded) behavior can set it to 0 — but this is not recommended.Fourth of 12 PRs in the v1.0.6 milestone. Suggested next: #139 (MCP
_call_toolsubprocess leak — third-most complex remaining).