Skip to content

feat: add connection-phase timeout for agent calls#1

Open
baochipham942-eng wants to merge 1 commit intoeLeanwang:mainfrom
baochipham942-eng:feat/agent-timeout
Open

feat: add connection-phase timeout for agent calls#1
baochipham942-eng wants to merge 1 commit intoeLeanwang:mainfrom
baochipham942-eng:feat/agent-timeout

Conversation

@baochipham942-eng
Copy link
Copy Markdown

Problem

When an agent's query() call hangs during the connection/handshake phase (before the event stream starts), the session's message queue is permanently blocked. The existing StreamIdleMonitor only protects the event iteration phase — it relies on recordEvent() calls that never fire if the stream never starts.

This means:

  • A network issue during API handshake → session blocked indefinitely
  • All subsequent messages for that session queue up and never get processed
  • The idle monitor's kill threshold (5× = 600s) only applies after the stream begins

Solution

Two-layer protection:

1. AbortController for Claude Runner (claude-runner.ts)

Added an AbortController per query (matching the pattern already used in Codex Runner):

  • Created in runQuery(), passed to query() via abortController option
  • interrupt() calls controller.abort() first (covers connection phase), then falls through to SDK's stream interrupt
  • Cleaned up in cleanupStream() and closeSession()

2. Connection-phase timeout (message-processor.ts)

Wrapped agent.runQuery() with Promise.race against a configurable timeout:

  • Default: 30 seconds (vs 600s for idle monitor kill)
  • On timeout: calls agent.interrupt() to abort, rejects with clear error message
  • Timer is properly cleared when runQuery() succeeds (no false kills)
  • Configurable via config.idleMonitor.connectionTimeout (seconds)

Changes

File Change
src/agents/claude-runner.ts AbortController lifecycle (create → pass → abort → cleanup)
src/core/message/message-processor.ts Promise.race connection timeout around runQuery()
src/types.ts connectionTimeout field in idleMonitor config

What's NOT changed

  • Codex Runner — already has AbortController
  • Gemini Runner — already has SIGINT/SIGTERM subprocess management
  • StreamIdleMonitor thresholds — separate concern, still protects the streaming phase

Add AbortController to Claude Runner (matching Codex Runner's pattern) and
wrap the runQuery() call in MessageProcessor with a connection-phase timeout
(default 30s, configurable via idleMonitor.connectionTimeout). This prevents
permanent session blocking when the API connection hangs before the stream
starts — a gap the existing StreamIdleMonitor cannot cover.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant