fix(threat-model): bound document_endpoint child await with a liveness watchdog#819
fix(threat-model): bound document_endpoint child await with a liveness watchdog#819jorgeraad wants to merge 4 commits into
Conversation
…s watchdog document_endpoint blocks on `await generateThreatModelForEndpoint`, which awaits a spawned CodeAgent's consume() with no timeout. The only global idle timeout is suppressed while a tool is in flight, so when the threat-model child wedges (a hung post-stream step, a stuck provider call, or a consume() that never settles even after the child's session already ended) the await blocks forever — freezing the tool-call, the parent agent session, and the whole recon at "N-1/N apps". Bound the await on child liveness: as long as the child emits stream activity we keep waiting (a slow-but-healthy threat model is never killed); if it goes completely silent past 8m (above the child's own 5m model-idle timeout + auto-resume) we abort the child via a child-scoped AbortController and return null. document_endpoint already degrades to the heuristic risk score on a null return, so the tool-call completes and the recon advances instead of hanging.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issues.
Reviewed by Cursor Bugbot for commit 05ff46a. Configure here.
| `${Math.round(THREAT_MODEL_LIVENESS_TIMEOUT_MS / 1000)}s); ` + | ||
| `abandoning and falling back to heuristic-only.`, | ||
| ); | ||
| return null; |
There was a problem hiding this comment.
Orphaned consume after race
Medium Severity
When the liveness watchdog wins Promise.race, the function returns after childAbort.abort() without handling the still-running agent.consume() promise. That promise can later reject or resolve, producing an unhandled rejection or stray completion after the threat model was already abandoned.
Reviewed by Cursor Bugbot for commit 05ff46a. Configure here.
| "tool-call-delta", | ||
| "tool-call-complete", | ||
| "tool-result", | ||
| ] as const; |
There was a problem hiding this comment.
Long tools trip liveness
Medium Severity
The watchdog only resets on stream events, not while a child tool is executing. The child’s own stream idle timeout pauses during in-flight tools, but this timer keeps counting after tool-call-complete. A legitimate tool run longer than eight minutes (e.g. execute_command without timeout or a long whitebox job) can be misclassified as stuck and return null.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 05ff46a. Configure here.
|
@jorgeraad Would you mind cleaning up the comments/making them much more concise. |
|
@Yuvanesh-ux hold up, my b. This should still be a draft. But will do. |
The silence-based liveness watchdog failed in production: a threat-model child that loops (never calls `response`) keeps emitting events, so it's never "silent", and a finished-but-non-closing stream iterator isn't silent either — document_endpoint's await hung for hours, freezing the recon. Replace it with two first-principles bounds (no wall-clock deadline on the work): - stepCountIs(40) on the child agent so its stream ALWAYS terminates. A threat model that can't emit its structured `response` within 40 steps is looping, not progressing — an iteration bound, not a clock. Merges with the responseSchema's own response-tool stop. Fixes the never-finishing child. - Race consume() against streamResult.finishReason (authoritative "stream done") with a short settle window, so a finished stream whose fullStream iterator never closes can't hang consume(). Uses the captured response when present (only the iterator hung — the result is good), else degrades to the heuristic score. Fixes the finished-but-wedged-iterator case. Expose OffensiveSecurityAgent.capturedResponse so the terminal-signal path can recover the structured result without routing through the hung consume().
The document_endpoint tool blocked forever on a single threat-model child that never let consume() return. Three distinct modes produce this and no single mechanism catches all of them, so consume() is now raced against all three abandonment signals (no wall-clock deadline on the work itself): - mode A (loop, never calls response): stepCountIs bounds the stream so it always terminates. - mode B (stream finished, iterator never closed): terminal-signal race on streamResult.finishReason, with a settle window so a healthy run where consume() returns a tick later still wins. - mode C (frozen mid-step on a hung sub-tool, completely silent): liveness watchdog abandons after a silence threshold no progressing run can hit. On abandonment, salvage the child's captured structured response if it has one (only the plumbing wedged) else degrade to null; document_endpoint falls back to the heuristic risk score so the recon advances. cleanup() clears both timers and the bus listeners in finally.
Production proof showed the prior race did not drain wedged children: a threat-model child would call its response tool, produce a complete result (full part count, session marked completed), then its fullStream iterator wedged before emitting the final finish chunk. In that state BOTH consume() and streamResult.finishReason hang on the same missing chunk, so the finishReason terminal-signal can never fire. The silence watchdog also failed to fire — the wedged child emits low-level bus chatter (an AI-SDK retry loop) that keeps bumping lastActivity even though no part is persisted and no real progress is made. Result: document_endpoint hung 15+ min and the recon froze at N/M apps, reproducing the original bug. Root-principle fix: a threat-model child is semantically DONE the instant it calls response and captures the structured result. Everything after is stream teardown, which can wedge. OffensiveSecurityAgent now exposes a responseCaptured promise resolved synchronously from the response-tool callback; generateThreatModelForEndpoint races consume() against it (plus a 5s grace so a healthy consume() still wins cleanly) and settles with the captured result. This is immune to finishReason wedging, retry chatter, and stream-teardown hangs — it settles in ~5s instead of never. The liveness watchdog (freeze BEFORE any response) and step budget (loop before response) remain as backstops. The inert finishReason terminal- signal is removed.


Problem
A recon run gets stuck at N-1/N apps with a permanent spinner: one
document_endpointtool call hangs forever even though the threat-model sub-agent it spawned finished.Root cause — a tool-await / agent-lifecycle divergence:
document_endpoint.execute()blocks onawait generateThreatModelForEndpoint(...), which awaits a spawnedCodeAgent'sconsume()(threatModelGenerator.ts) with no timeout.createToolExecutionGate), so there is no wall-clock bound on the parent while the child runs.consume()that never settles even after the child's session already reached terminal — theawaitblocks forever. That freezes the tool-call, the parent agent session (statusstuckrunning), and the whole recon.execute()and never reaches thefinallythat emits synthetics.Confirmed in production data: a stuck "API Endpoints" session with 23 threat-model children all
completed, but 2document_endpointtool parts with nooutput.Fix
Bound the child await on liveness, not a wall-clock deadline (so a slow-but-healthy threat model is never killed):
AbortController(parent aborts still propagate) lets us cancel just this threat model.text-delta,tool-call-*,tool-result).text-deltafires per token, so a healthy run resets it constantly.THREAT_MODEL_LIVENESS_TIMEOUT_MS(8 min — comfortably above the child's own 5-min model-idle timeout + auto-resume, so a recoverable stall produces activity and resets the timer first), we abort the child and returnnull.document_endpointalready degrades to the heuristic risk score on anullreturn, so the tool-call completes and the recon advances instead of hanging.Single call site (
document_endpoint), no behavior change on the happy path.tsc+biomeclean.Note
Medium Risk
Changes failure handling for long-running subagents; happy path is unchanged, but silent wedges now drop full threat models in favor of heuristic scoring.
Overview
generateThreatModelForEndpointno longer waits indefinitely on a wedged threat-model child. It racesagent.consume()against a liveness watchdog that resets whenever the child emits stream events (text-delta, tool-call events,tool-result). After 8 minutes of silence, the child is aborted via a child-scopedAbortController(parent abort still propagates), the subagent is marked failed, and the function returnsnullsodocument_endpointcan finish with the existing heuristic risk fallback instead of freezing recon.Watchdog timers and listeners are torn down in
finally; errors also abort the child explicitly.Reviewed by Cursor Bugbot for commit 05ff46a. Bugbot is set up for automated code reviews on this repo. Configure here.