Summary
The agentmemory service (@agentmemory/agentmemory@0.9.3) crashes intermittently with an uncaught IIIInvocationError: TIMEOUT: invocation timed out after 30000ms on function_id: 'state::set'. The rejection escapes through iii-sdk/dist/index.mjs:405 and terminates the Node process. Under sustained write load (passive observation capture from CC hooks across multiple projects) we observed 5–15 crashes/hour. systemd auto-restart recovers in ~10s, but in-flight BM25/vector index updates that hadn't fired their 5s IndexPersistence debounce are lost across the crash boundary, breaking memory_smart_search recall for very recent saves.
Environment
- Node.js v20.20.2
- Linux 6.8.0-106-generic, x86_64
@agentmemory/agentmemory v0.9.3 installed via npm install -g
iii engine v0.11.0 native binary
- Embedding provider:
openai
AGENTMEMORY_AUTO_COMPRESS=true, CONSOLIDATION_ENABLED=true, GRAPH_EXTRACTION_ENABLED=true
- Load profile: 5 Claude Code agents + ~1.7K observations/day (~75/hour avg) via plugin hooks
Reproduction
- Run service with the env vars above
- Drive sustained write load: ~1 observation/sec via
POST /agentmemory/observe (or via the plugin hooks under active CC sessions)
- Within 1–10 minutes, the process exits with the trace below
Crash log
[agentmemory] Ready. Triple-stream (BM25+Vector+Graph) search active.
... (40–90 s of normal operation, observations captured/compressed) ...
file:///usr/lib/node_modules/@agentmemory/agentmemory/node_modules/iii-sdk/dist/index.mjs:405
reject(new IIIInvocationError({
^
IIIInvocationError: TIMEOUT: invocation timed out after 30000ms
at Timeout._onTimeout (file:///.../iii-sdk/dist/index.mjs:405:14)
at listOnTimeout (node:internal/timers:581:17)
at process.processTimers (node:internal/timers:519:7) {
code: 'TIMEOUT',
function_id: 'state::set',
stacktrace: undefined
}
Node.js v20.20.2
agentmemory.service: Main process exited, code=exited, status=1/FAILURE
The same crash repeats with restart counter climbing (we see >40 starts/24h on a busy day).
What we expect
The state::set invocation timing out shouldn't crash the whole process. Either:
- Catch and log the rejection (degrade gracefully — drop or queue the write), or
- Surface a configurable longer timeout / retry policy for KV writes, or
- Add
process.on('unhandledRejection', …) at the entrypoint as a hard floor.
Side effects observed
- Recent BM25 / vector index additions (since the last
IndexPersistence.scheduleSave() debounce flush — 5s) are lost across the crash, since they live only in memory until persisted via state::set(KV.bm25Index, …).
memory_smart_search doesn't return content saved within ~30s of the crash, even though kv.set(KV.memories, …) itself completed.
- Even after restart, OTel WebSocket reconnect loop ("WebSocket error: Unexpected server response: 404") spams logs with exponential backoff up to ~30s.
Possibly related
- Whether the iii engine has an internal queue limit / write backpressure that surfaces as a 30s timeout under load — happy to share a journalctl dump if useful.
function_id: 'state::set' is the only function_id we see crash; state::get and others time out gracefully.
Suggested fixes (any one helps)
Happy to PR a .catch() wrapper if you point me at the preferred location.
Summary
The agentmemory service (
@agentmemory/agentmemory@0.9.3) crashes intermittently with an uncaughtIIIInvocationError: TIMEOUT: invocation timed out after 30000msonfunction_id: 'state::set'. The rejection escapes throughiii-sdk/dist/index.mjs:405and terminates the Node process. Under sustained write load (passive observation capture from CC hooks across multiple projects) we observed 5–15 crashes/hour. systemd auto-restart recovers in ~10s, but in-flight BM25/vector index updates that hadn't fired their 5sIndexPersistencedebounce are lost across the crash boundary, breakingmemory_smart_searchrecall for very recent saves.Environment
@agentmemory/agentmemoryv0.9.3 installed vianpm install -giiiengine v0.11.0 native binaryopenaiAGENTMEMORY_AUTO_COMPRESS=true,CONSOLIDATION_ENABLED=true,GRAPH_EXTRACTION_ENABLED=trueReproduction
POST /agentmemory/observe(or via the plugin hooks under active CC sessions)Crash log
The same crash repeats with restart counter climbing (we see >40 starts/24h on a busy day).
What we expect
The
state::setinvocation timing out shouldn't crash the whole process. Either:process.on('unhandledRejection', …)at the entrypoint as a hard floor.Side effects observed
IndexPersistence.scheduleSave()debounce flush — 5s) are lost across the crash, since they live only in memory until persisted viastate::set(KV.bm25Index, …).memory_smart_searchdoesn't return content saved within ~30s of the crash, even thoughkv.set(KV.memories, …)itself completed.Possibly related
function_id: 'state::set'is the only function_id we see crash;state::getand others time out gracefully.Suggested fixes (any one helps)
state::setcalls inIndexPersistence.save()andkv.set(...)paths with a.catch()that logs + drops, instead of letting the rejection propagate.process.on('unhandledRejection', …)handler in the service entrypoint so a single SDK timeout doesn't take down the whole memory mesh.iii-enginetuning for sustained-write workloads (e.g.AGENTMEMORY_OBSERVE_QUEUE_LIMIT=…) — if such knobs exist.Happy to PR a
.catch()wrapper if you point me at the preferred location.