Skip to content

fix(threat-model): bound document_endpoint child await with a liveness watchdog#819

Draft
jorgeraad wants to merge 4 commits into
feat/native-execution-idsfrom
fix/threat-model-liveness-watchdog
Draft

fix(threat-model): bound document_endpoint child await with a liveness watchdog#819
jorgeraad wants to merge 4 commits into
feat/native-execution-idsfrom
fix/threat-model-liveness-watchdog

Conversation

@jorgeraad
Copy link
Copy Markdown
Collaborator

@jorgeraad jorgeraad commented Jun 4, 2026

Problem

A recon run gets stuck at N-1/N apps with a permanent spinner: one document_endpoint tool call hangs forever even though the threat-model sub-agent it spawned finished.

Root cause — a tool-await / agent-lifecycle divergence:

  • document_endpoint.execute() blocks on await generateThreatModelForEndpoint(...), which awaits a spawned CodeAgent's consume() (threatModelGenerator.ts) with no timeout.
  • The only global guard, the 5-min stream idle timeout, is deliberately suppressed while a tool is in flight (createToolExecutionGate), so there is no wall-clock bound on the parent while the child runs.
  • When the child wedges in a way its own idle timeout doesn't catch — a hung post-stream step, a stuck provider call, or a consume() that never settles even after the child's session already reached terminal — the await blocks forever. That freezes the tool-call, the parent agent session (status stuck running), and the whole recon.
  • PR fix(offSecAgent): emit synthetic tool-result on stream abort/error #779's synthetic-tool-result safeguard can't fire here: the parent's stream is blocked inside execute() and never reaches the finally that emits synthetics.

Confirmed in production data: a stuck "API Endpoints" session with 23 threat-model children all completed, but 2 document_endpoint tool parts with no output.

Fix

Bound the child await on liveness, not a wall-clock deadline (so a slow-but-healthy threat model is never killed):

  • A child-scoped AbortController (parent aborts still propagate) lets us cancel just this threat model.
  • A watchdog resets on every child stream event (text-delta, tool-call-*, tool-result). text-delta fires per token, so a healthy run resets it constantly.
  • If the child goes completely silent past THREAT_MODEL_LIVENESS_TIMEOUT_MS (8 min — comfortably above the child's own 5-min model-idle timeout + auto-resume, so a recoverable stall produces activity and resets the timer first), we abort the child and return null.
  • document_endpoint already degrades to the heuristic risk score on a null return, so the tool-call completes and the recon advances instead of hanging.

Single call site (document_endpoint), no behavior change on the happy path. tsc + biome clean.


Note

Medium Risk
Changes failure handling for long-running subagents; happy path is unchanged, but silent wedges now drop full threat models in favor of heuristic scoring.

Overview
generateThreatModelForEndpoint no longer waits indefinitely on a wedged threat-model child. It races agent.consume() against a liveness watchdog that resets whenever the child emits stream events (text-delta, tool-call events, tool-result). After 8 minutes of silence, the child is aborted via a child-scoped AbortController (parent abort still propagates), the subagent is marked failed, and the function returns null so document_endpoint can finish with the existing heuristic risk fallback instead of freezing recon.

Watchdog timers and listeners are torn down in finally; errors also abort the child explicitly.

Reviewed by Cursor Bugbot for commit 05ff46a. Bugbot is set up for automated code reviews on this repo. Configure here.

…s watchdog

document_endpoint blocks on `await generateThreatModelForEndpoint`, which awaits
a spawned CodeAgent's consume() with no timeout. The only global idle timeout is
suppressed while a tool is in flight, so when the threat-model child wedges (a
hung post-stream step, a stuck provider call, or a consume() that never settles
even after the child's session already ended) the await blocks forever —
freezing the tool-call, the parent agent session, and the whole recon at
"N-1/N apps".

Bound the await on child liveness: as long as the child emits stream activity we
keep waiting (a slow-but-healthy threat model is never killed); if it goes
completely silent past 8m (above the child's own 5m model-idle timeout +
auto-resume) we abort the child via a child-scoped AbortController and return
null. document_endpoint already degrades to the heuristic risk score on a null
return, so the tool-call completes and the recon advances instead of hanging.
@github-actions github-actions Bot requested a review from Yuvanesh-ux June 4, 2026 17:44
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issues.

Reviewed by Cursor Bugbot for commit 05ff46a. Configure here.

`${Math.round(THREAT_MODEL_LIVENESS_TIMEOUT_MS / 1000)}s); ` +
`abandoning and falling back to heuristic-only.`,
);
return null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orphaned consume after race

Medium Severity

When the liveness watchdog wins Promise.race, the function returns after childAbort.abort() without handling the still-running agent.consume() promise. That promise can later reject or resolve, producing an unhandled rejection or stray completion after the threat model was already abandoned.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 05ff46a. Configure here.

"tool-call-delta",
"tool-call-complete",
"tool-result",
] as const;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long tools trip liveness

Medium Severity

The watchdog only resets on stream events, not while a child tool is executing. The child’s own stream idle timeout pauses during in-flight tools, but this timer keeps counting after tool-call-complete. A legitimate tool run longer than eight minutes (e.g. execute_command without timeout or a long whitebox job) can be misclassified as stuck and return null.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 05ff46a. Configure here.

@Yuvanesh-ux
Copy link
Copy Markdown
Collaborator

@jorgeraad Would you mind cleaning up the comments/making them much more concise.

@jorgeraad
Copy link
Copy Markdown
Collaborator Author

jorgeraad commented Jun 4, 2026

@Yuvanesh-ux hold up, my b. This should still be a draft. But will do.

@jorgeraad jorgeraad marked this pull request as draft June 4, 2026 17:55
jorgeraad added 3 commits June 4, 2026 22:07
The silence-based liveness watchdog failed in production: a threat-model child
that loops (never calls `response`) keeps emitting events, so it's never
"silent", and a finished-but-non-closing stream iterator isn't silent either —
document_endpoint's await hung for hours, freezing the recon. Replace it with
two first-principles bounds (no wall-clock deadline on the work):

- stepCountIs(40) on the child agent so its stream ALWAYS terminates. A threat
  model that can't emit its structured `response` within 40 steps is looping,
  not progressing — an iteration bound, not a clock. Merges with the
  responseSchema's own response-tool stop. Fixes the never-finishing child.
- Race consume() against streamResult.finishReason (authoritative "stream
  done") with a short settle window, so a finished stream whose fullStream
  iterator never closes can't hang consume(). Uses the captured response when
  present (only the iterator hung — the result is good), else degrades to the
  heuristic score. Fixes the finished-but-wedged-iterator case.

Expose OffensiveSecurityAgent.capturedResponse so the terminal-signal path can
recover the structured result without routing through the hung consume().
The document_endpoint tool blocked forever on a single threat-model child
that never let consume() return. Three distinct modes produce this and no
single mechanism catches all of them, so consume() is now raced against
all three abandonment signals (no wall-clock deadline on the work itself):

- mode A (loop, never calls response): stepCountIs bounds the stream so it
  always terminates.
- mode B (stream finished, iterator never closed): terminal-signal race on
  streamResult.finishReason, with a settle window so a healthy run where
  consume() returns a tick later still wins.
- mode C (frozen mid-step on a hung sub-tool, completely silent): liveness
  watchdog abandons after a silence threshold no progressing run can hit.

On abandonment, salvage the child's captured structured response if it has
one (only the plumbing wedged) else degrade to null; document_endpoint
falls back to the heuristic risk score so the recon advances. cleanup()
clears both timers and the bus listeners in finally.
Production proof showed the prior race did not drain wedged children: a
threat-model child would call its response tool, produce a complete result
(full part count, session marked completed), then its fullStream iterator
wedged before emitting the final finish chunk. In that state BOTH consume()
and streamResult.finishReason hang on the same missing chunk, so the
finishReason terminal-signal can never fire. The silence watchdog also
failed to fire — the wedged child emits low-level bus chatter (an AI-SDK
retry loop) that keeps bumping lastActivity even though no part is persisted
and no real progress is made. Result: document_endpoint hung 15+ min and
the recon froze at N/M apps, reproducing the original bug.

Root-principle fix: a threat-model child is semantically DONE the instant
it calls response and captures the structured result. Everything after is
stream teardown, which can wedge. OffensiveSecurityAgent now exposes a
responseCaptured promise resolved synchronously from the response-tool
callback; generateThreatModelForEndpoint races consume() against it (plus a
5s grace so a healthy consume() still wins cleanly) and settles with the
captured result. This is immune to finishReason wedging, retry chatter, and
stream-teardown hangs — it settles in ~5s instead of never.

The liveness watchdog (freeze BEFORE any response) and step budget (loop
before response) remain as backstops. The inert finishReason terminal-
signal is removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants