fix(threat-model): bound document_endpoint child await with a liveness watchdog by jorgeraad · Pull Request #819 · pensarai/apex

jorgeraad · 2026-06-04T17:44:16Z

Problem

A recon run gets stuck at N-1/N apps with a permanent spinner: one document_endpoint tool call hangs forever even though the threat-model sub-agent it spawned finished.

Root cause — a tool-await / agent-lifecycle divergence:

document_endpoint.execute() blocks on await generateThreatModelForEndpoint(...), which awaits a spawned CodeAgent's consume() (threatModelGenerator.ts) with no timeout.
The only global guard, the 5-min stream idle timeout, is deliberately suppressed while a tool is in flight (createToolExecutionGate), so there is no wall-clock bound on the parent while the child runs.
When the child wedges in a way its own idle timeout doesn't catch — a hung post-stream step, a stuck provider call, or a consume() that never settles even after the child's session already reached terminal — the await blocks forever. That freezes the tool-call, the parent agent session (status stuck running), and the whole recon.
PR fix(offSecAgent): emit synthetic tool-result on stream abort/error #779's synthetic-tool-result safeguard can't fire here: the parent's stream is blocked inside execute() and never reaches the finally that emits synthetics.

Confirmed in production data: a stuck "API Endpoints" session with 23 threat-model children all completed, but 2 document_endpoint tool parts with no output.

Fix

Bound the child await on liveness, not a wall-clock deadline (so a slow-but-healthy threat model is never killed):

A child-scoped AbortController (parent aborts still propagate) lets us cancel just this threat model.
A watchdog resets on every child stream event (text-delta, tool-call-*, tool-result). text-delta fires per token, so a healthy run resets it constantly.
If the child goes completely silent past THREAT_MODEL_LIVENESS_TIMEOUT_MS (8 min — comfortably above the child's own 5-min model-idle timeout + auto-resume, so a recoverable stall produces activity and resets the timer first), we abort the child and return null.
document_endpoint already degrades to the heuristic risk score on a null return, so the tool-call completes and the recon advances instead of hanging.

Single call site (document_endpoint), no behavior change on the happy path. tsc + biome clean.

Note

Medium Risk
Changes failure handling for long-running subagents; happy path is unchanged, but silent wedges now drop full threat models in favor of heuristic scoring.

Overview
generateThreatModelForEndpoint no longer waits indefinitely on a wedged threat-model child. It races agent.consume() against a liveness watchdog that resets whenever the child emits stream events (text-delta, tool-call events, tool-result). After 8 minutes of silence, the child is aborted via a child-scoped AbortController (parent abort still propagates), the subagent is marked failed, and the function returns null so document_endpoint can finish with the existing heuristic risk fallback instead of freezing recon.

Watchdog timers and listeners are torn down in finally; errors also abort the child explicitly.

^{Reviewed by Cursor Bugbot for commit 05ff46a. Bugbot is set up for automated code reviews on this repo. Configure here.}

…s watchdog document_endpoint blocks on `await generateThreatModelForEndpoint`, which awaits a spawned CodeAgent's consume() with no timeout. The only global idle timeout is suppressed while a tool is in flight, so when the threat-model child wedges (a hung post-stream step, a stuck provider call, or a consume() that never settles even after the child's session already ended) the await blocks forever — freezing the tool-call, the parent agent session, and the whole recon at "N-1/N apps". Bound the await on child liveness: as long as the child emits stream activity we keep waiting (a slow-but-healthy threat model is never killed); if it goes completely silent past 8m (above the child's own 5m model-idle timeout + auto-resume) we abort the child via a child-scoped AbortController and return null. document_endpoint already degrades to the heuristic risk score on a null return, so the tool-call completes and the recon advances instead of hanging.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issues.}

^{Reviewed by Cursor Bugbot for commit 05ff46a. Configure here.}

cursor · 2026-06-04T17:45:49Z

+            `${Math.round(THREAT_MODEL_LIVENESS_TIMEOUT_MS / 1000)}s); ` +
+            `abandoning and falling back to heuristic-only.`,
+        );
+        return null;


Orphaned consume after race

Medium Severity

When the liveness watchdog wins Promise.race, the function returns after childAbort.abort() without handling the still-running agent.consume() promise. That promise can later reject or resolve, producing an unhandled rejection or stray completion after the threat model was already abandoned.

^{Reviewed by Cursor Bugbot for commit 05ff46a. Configure here.}

cursor · 2026-06-04T17:45:49Z

+  "tool-call-delta",
+  "tool-call-complete",
+  "tool-result",
+] as const;


Long tools trip liveness

Medium Severity

The watchdog only resets on stream events, not while a child tool is executing. The child’s own stream idle timeout pauses during in-flight tools, but this timer keeps counting after tool-call-complete. A legitimate tool run longer than eight minutes (e.g. execute_command without timeout or a long whitebox job) can be misclassified as stuck and return null.

Additional Locations (1)

src/core/agents/offSecAgent/tools/threatModelGenerator.ts#L387-L393

^{Reviewed by Cursor Bugbot for commit 05ff46a. Configure here.}

Yuvanesh-ux · 2026-06-04T17:47:00Z

@jorgeraad Would you mind cleaning up the comments/making them much more concise.

jorgeraad · 2026-06-04T17:55:44Z

@Yuvanesh-ux hold up, my b. This should still be a draft. But will do.

The silence-based liveness watchdog failed in production: a threat-model child that loops (never calls `response`) keeps emitting events, so it's never "silent", and a finished-but-non-closing stream iterator isn't silent either — document_endpoint's await hung for hours, freezing the recon. Replace it with two first-principles bounds (no wall-clock deadline on the work): - stepCountIs(40) on the child agent so its stream ALWAYS terminates. A threat model that can't emit its structured `response` within 40 steps is looping, not progressing — an iteration bound, not a clock. Merges with the responseSchema's own response-tool stop. Fixes the never-finishing child. - Race consume() against streamResult.finishReason (authoritative "stream done") with a short settle window, so a finished stream whose fullStream iterator never closes can't hang consume(). Uses the captured response when present (only the iterator hung — the result is good), else degrades to the heuristic score. Fixes the finished-but-wedged-iterator case. Expose OffensiveSecurityAgent.capturedResponse so the terminal-signal path can recover the structured result without routing through the hung consume().

The document_endpoint tool blocked forever on a single threat-model child that never let consume() return. Three distinct modes produce this and no single mechanism catches all of them, so consume() is now raced against all three abandonment signals (no wall-clock deadline on the work itself): - mode A (loop, never calls response): stepCountIs bounds the stream so it always terminates. - mode B (stream finished, iterator never closed): terminal-signal race on streamResult.finishReason, with a settle window so a healthy run where consume() returns a tick later still wins. - mode C (frozen mid-step on a hung sub-tool, completely silent): liveness watchdog abandons after a silence threshold no progressing run can hit. On abandonment, salvage the child's captured structured response if it has one (only the plumbing wedged) else degrade to null; document_endpoint falls back to the heuristic risk score so the recon advances. cleanup() clears both timers and the bus listeners in finally.

Production proof showed the prior race did not drain wedged children: a threat-model child would call its response tool, produce a complete result (full part count, session marked completed), then its fullStream iterator wedged before emitting the final finish chunk. In that state BOTH consume() and streamResult.finishReason hang on the same missing chunk, so the finishReason terminal-signal can never fire. The silence watchdog also failed to fire — the wedged child emits low-level bus chatter (an AI-SDK retry loop) that keeps bumping lastActivity even though no part is persisted and no real progress is made. Result: document_endpoint hung 15+ min and the recon froze at N/M apps, reproducing the original bug. Root-principle fix: a threat-model child is semantically DONE the instant it calls response and captures the structured result. Everything after is stream teardown, which can wedge. OffensiveSecurityAgent now exposes a responseCaptured promise resolved synchronously from the response-tool callback; generateThreatModelForEndpoint races consume() against it (plus a 5s grace so a healthy consume() still wins cleanly) and settles with the captured result. This is immune to finishReason wedging, retry chatter, and stream-teardown hangs — it settles in ~5s instead of never. The liveness watchdog (freeze BEFORE any response) and step budget (loop before response) remain as backstops. The inert finishReason terminal- signal is removed.

github-actions Bot requested a review from Yuvanesh-ux June 4, 2026 17:44

cursor Bot reviewed Jun 4, 2026

View reviewed changes

jorgeraad marked this pull request as draft June 4, 2026 17:55

jorgeraad added 3 commits June 4, 2026 22:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(threat-model): bound document_endpoint child await with a liveness watchdog#819

fix(threat-model): bound document_endpoint child await with a liveness watchdog#819
jorgeraad wants to merge 4 commits into
feat/native-execution-idsfrom
fix/threat-model-liveness-watchdog

jorgeraad commented Jun 4, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 4, 2026

Uh oh!

cursor Bot Jun 4, 2026

Uh oh!

Yuvanesh-ux commented Jun 4, 2026

Uh oh!

jorgeraad commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jorgeraad commented Jun 4, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 4, 2026

Choose a reason for hiding this comment

Orphaned consume after race

Uh oh!

cursor Bot Jun 4, 2026

Choose a reason for hiding this comment

Long tools trip liveness

Uh oh!

Yuvanesh-ux commented Jun 4, 2026

Uh oh!

jorgeraad commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jorgeraad commented Jun 4, 2026 •

edited by cursor Bot

Loading

jorgeraad commented Jun 4, 2026 •

edited

Loading