CRITICAL fix: use async save_session to avoid blocking tokio runtime by skymoore · Pull Request #534 · RightNow-AI/openfang

skymoore · 2026-03-11T21:11:18Z

Summary

Prevents 1 cpu openfang deployments from hanging on save session and not responding to any requests

Changes

save_session() was synchronous, holding a Mutex on the tokio worker thread during SQLite writes. On pods with 1 CPU core (1 tokio worker thread), this starved the entire runtime — including health check endpoints — causing K8s to mark the pod not-ready and return 504 on all subsequent requests.

Add save_session_async() that wraps the SQLite write in spawn_blocking, matching the pattern already used by other memory operations (recall, remember, etc.). Update all 12 call sites in the agent loop.

Testing

cargo clippy --workspace --all-targets -- -D warnings passes
cargo test --workspace passes
Live integration tested (if applicable)

Security

No new unsafe code
No secrets or API keys in diff
User input validated at boundaries

save_session() was synchronous, holding a Mutex<Connection> on the tokio worker thread during SQLite writes. On pods with 1 CPU core (1 tokio worker thread), this starved the entire runtime — including health check endpoints — causing K8s to mark the pod not-ready and return 504 on all subsequent requests. Add save_session_async() that wraps the SQLite write in spawn_blocking, matching the pattern already used by other memory operations (recall, remember, etc.). Update all 12 call sites in the agent loop.

The health endpoint called structured_get() synchronously on the tokio async runtime, acquiring the shared std::sync::Mutex<Connection> on a worker thread. When the agent loop held this mutex during session saves, the health check blocked the tokio thread, starving the SSE stream and causing Kubernetes probe timeouts. - Health and health_detail now run the DB check via spawn_blocking - SSE message/stream endpoint now includes keep_alive to flush periodic heartbeats even during contention

This was referenced Mar 12, 2026

Avoid blocking the tokio runtime on session saves librefang/librefang#14

Open

fix: avoid blocking tokio on session saves librefang/librefang#15

Open

skymoore added 2 commits March 12, 2026 15:50

skymoore force-pushed the main branch from 83c7ad4 to 5b69050 Compare March 12, 2026 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CRITICAL fix: use async save_session to avoid blocking tokio runtime#534

CRITICAL fix: use async save_session to avoid blocking tokio runtime#534
skymoore wants to merge 2 commits intoRightNow-AI:mainfrom
skymoore:main

skymoore commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

skymoore commented Mar 11, 2026

Summary

Changes

Testing

Security

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant