fix(search): bound glob/grep tree-walk with cooperative + watchdog timeout#201
Merged
Conversation
…meout A Glob over a path containing a network mount (rclone/OSS NFS) hung the agent loop forever: the single-threaded walker exhaustively traversed the whole tree for a zero-match pattern with no deadline, blocking the spawn_blocking thread that the agent loop awaits. Two-layer fix: - Inner cooperative deadline (ResourceLimits::walk_timeout, 30s) checked between walk entries; returns partial results + timed_out so the tool can tell the LLM to narrow `path`. - Outer hard floor via the runtime per-tool watchdog, now extended from Bash-only to all fs-read tools (Glob/Grep/Ls/Read/ReadPdf/ReadImage/ ReadHtml) with a typed StaleReason. The watchdog is applied at a single convergence point (execute_tool_watchdogged) that BOTH the streaming early-start path and the normal approval path route through, so a tool can never be bounded on one path and unbounded on the other. Also: parallelize the glob walker (mirrors grep's build_parallel), default follow_links to false (ripgrep parity, avoids cross-mount escape/cycles), decouple `truncated` from the timeout signal (no spurious overflow files), add a deterministic path tiebreak to glob output, dedup the timeout notice into loopal-tool-api, and clamp search `max` to >=1. Design notes under design/glob-traversal-hang/.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Globover a path containing a network mount (rclone/OSS NFS) hung the agent loop forever — the single-threaded walker exhaustively traversed the whole tree for a zero-match pattern with no deadline, blocking thespawn_blockingthread the loop awaits.walk_timeout(partial results) + an outer per-tool watchdog hard floor, applied at a single convergence point both execution paths route through.follow_links(false)(ripgrep parity),truncated/timeout decoupling, deterministic glob ordering, and shared timeout-notice const.Changes
glob.rsrewritten to a parallelignorewalker with a cooperativeInstantdeadline;grep.rsdeadline +truncatedderived from the match-cap (not the timeout flag);walker.rsfollow_links(true)→false;mod.rsreverts to plainspawn_blocking(outer bound moved to the watchdog);limits.rsaddswalk_timeout(30s).execute_tool_watchdoggedconvergence helper intool_exec.rs;streaming_tool_exec.rs(early-start path) andexecute_approved_toolsboth route through it;tool_watchdog.rscovers all 7 fs-read tools at 60s withStaleReason::WatchdogTimeout.timed_outfield onGlob/GrepSearchResult; sharedSEARCH_TIMEOUT_NOTICE.pathsort tiebreak.cfg(unix)), truncation tolerance, timeout, determinism, watchdog coverage.Test plan
bazel build //..., affected tests, clippy, rustfmt)