v1.16.2 — T33 TRIM: ask_agentic loop streaming + stallMs heartbeat watchdog#58
Merged
Conversation
…tchdog Loop iterations now use generateContentStream + collectStream instead of non-streaming generateContent. New per-call `stallMs` schema field + `GEMINI_CODE_CONTEXT_AGENTIC_STALL_MS` env var. TIMEOUT errorResult surfaces `timeoutKind: 'stall' | 'total'` so wrappers can route retry policy. Rescue at the post-loop forced-finalization site is intentionally NOT migrated — the v1.15.0 P3+P4+P5 `apiCallSucceeded` contract was stabilized through 4 review rounds + /6step cross-corroboration; mid-stream failures cannot be retried (Gemini has no resume), so re-opening that surface for a once-per-call path is the wrong trade. 3 new tests + buildCtx mock factory updated to support generateContentStream. 762 tests pass (759 → 762). Lint + typecheck clean. Closes T33 — last open item in the v1.6/v1.7 streaming refactor track. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ode parity Round-1 cross-review (gemini-code-context + grok + codex). Codex + grok clean; gemini flagged two empirically verifiable issues. Both folded: Finding #1 (HIGH) — collectStream candidate preservation src/tools/shared/stream-collector.ts:174 — pre-fix gate was outer-array length only. If Gemini's stream protocol ever emits a final terminator shape `{ candidates: [{ content: { parts: [] }, finishReason: 'STOP' }] }`, the previous chunk's functionCall Part would be silently overwritten — ask_agentic's loop would treat the iteration as final-text and never dispatch the tool. Defensive fix: gate also requires non-empty parts. Empirical current-protocol behaviour (functionCall + finishReason packed into one chunk) preserved either way; this hardens against fragmentation patterns thinkingLevel=HIGH or future SDK versions could legitimately introduce. Finding #2 (MEDIUM) — timeoutMs payload contract divergence from ask/code src/tools/ask-agentic.tool.ts — initial commit collapsed timeoutMs to "the limit that fired". ask.tool.ts:1132 always reports the configured total wall-clock cap regardless of which watchdog fired — discriminator is timeoutKind. Reverted to ask/code parity. Two new regression tests: - test/unit/stream-collector.test.ts — fragmentation-guard pin at the collector level - test/unit/ask-agentic.test.ts — fragmentation-guard pin at the agentic-loop level (verifies read_file dispatches + organic final-text) 764 tests pass (762 → 764). Lint + typecheck + build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR completes the ask_agentic loop’s migration to streaming, adding a heartbeat-aware stall watchdog (stallMs) and live “thinking” progress events to match ask/code behavior, while keeping the post-loop rescue path non-streaming.
Changes:
- Switch
ask_agenticloop iterations togenerateContentStream+collectStream, wiring chunk heartbeats to the stall watchdog and emitting live thought previews. - Harden
collectStreamso terminator-only chunks with emptypartsdon’t overwrite earlierfunctionCallcandidates (fragmentation guard). - Add/extend unit tests for stall-vs-total timeout behavior and fragmented functionCall dispatch; bump version/docs/changelog for v1.16.2.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/tools/ask-agentic.tool.ts |
Streams loop iterations, adds stallMs input + env fallback, surfaces timeoutKind, emits live thought progress. |
src/tools/shared/stream-collector.ts |
Prevents empty-parts terminator candidates from overwriting earlier candidates (preserves functionCalls). |
test/unit/ask-agentic.test.ts |
Adds stall watchdog + timeoutKind regression tests and a fragmented functionCall dispatch test. |
test/unit/stream-collector.test.ts |
Adds a collector-level regression test for the empty-parts terminator overwrite case. |
docs/FOLLOW-UP-PRS.md |
Marks T33 as shipped and documents TRIM scope/rationale. |
CHANGELOG.md |
Adds v1.16.2 release notes for streaming + stall watchdog changes. |
package.json |
Bumps package version to 1.16.2. |
server.json |
Bumps server/package version references to 1.16.2. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| - **`stallMs` per-call schema field** (1s–10min, optional) — heartbeat-aware stall watchdog that resets on every chunk yielded by `generateContentStream` (text or thought). Fires ONLY when the stream goes silent for this long. Does NOT fire while the model is actively thinking. Documented as the right knob for "kill dead sockets quickly without penalising deep reasoning." | ||
| - **`GEMINI_CODE_CONTEXT_AGENTIC_STALL_MS` env var** — fallback path identical to `ask`/`code` (`GEMINI_CODE_CONTEXT_ASK_STALL_MS`, `GEMINI_CODE_CONTEXT_CODE_STALL_MS`). Recommended: `60_000` (60s). | ||
| - **`structuredContent.timeoutKind`** on `errorCode: 'TIMEOUT'` — `'stall' | 'total'` discriminates which watchdog fired. Surfaces the active limit in `timeoutMs` and the configured stall in `stallMs` (mirrors the ask/code pattern). Wrappers can apply different retry policies — `'stall'` is usually safe to retry (dead socket); `'total'` means raise `iterationTimeoutMs` or narrow the prompt. |
| - Existing `iterationTimeoutMs` stays orthogonal (wall-clock cap; both can be set, whichever fires first wins). | ||
| - Update `createTimeoutController` call site to pass composite `{ totalMs, stallMs }` opts (already supported since v1.12.0; just call with the new field). | ||
| - `createTimeoutController` loop call site now passes composite `{ totalMs, stallMs }` opts. | ||
| - TIMEOUT errorResult now surfaces `timeoutKind: 'stall' | 'total'` + both `timeoutMs` (active limit that fired) and `stallMs` (configured stall watchdog) — mirrors ask/code conventions. |
… timeoutMs semantics Copilot Round-1 review on PR #58 flagged 2 stale wording residues from the Round-1 fold of gemini Finding #2. Both doc-only: - CHANGELOG.md "What's new" bullet (line 24) — claimed `timeoutMs` surfaces "the active limit", contradicting the actual implementation (always-configured-total mirroring ask.tool.ts:1132). - docs/FOLLOW-UP-PRS.md T33 entry (line 577) — same stale "active limit that fired" wording. Both reworded to match the implementation: `timeoutMs` reports the configured TOTAL wall-clock cap (or null when disabled); `stallMs` reports the configured stall watchdog (or null when disabled); `timeoutKind` is the discriminator. Both findings are TRUE POSITIVE LOW per /6step. Empirical verification: ask.tool.ts:1132 = `timeoutMs: ms` (always configured total). Implementation in ask-agentic.tool.ts post-Round-1-fold mirrors verbatim. Doc text was the only residue left from the initial commit's collapsed-to-active-limit shape. No code or test changes — pure doc-correctness fold. 764 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ask_agenticloop iterations now usegenerateContentStream+collectStream(closes T33 — the last open item in the v1.6/v1.7 streaming refactor track).stallMsschema field (1s–10min) +GEMINI_CODE_CONTEXT_AGENTIC_STALL_MSenv var. Heartbeat-aware watchdog that resets on every chunk; fires only when the stream goes silent. Does NOT fire while the model is actively thinking.timeoutKind: 'stall' | 'total'so wrappers can route retry policy. Mirrorsask/codev1.12.0 conventions.thinking: …progress events during the 30-120s thinking phase. UX parity with ask/code.Scope-exclusion: rescue stays non-streaming
The post-loop forced-finalization rescue intentionally keeps
generateContent(not streaming). After 2-of-2 cross-reviewer consultation (gcc-ask: GO full scope; grok: TRIM), TRIM was adopted because:apiCallSucceededcontract was stabilized in v1.15.0 (P3+P4+P5) through 4 review rounds +/6stepcross-corroboration — re-opening that recently-stabilized surface for marginal liveness gain on a once-per-call path is the wrong trade.generateContentStreamhas no resume) — the v1.6/v1.7 plan §11 explicitly cites this. The rescue's transient-failure budget routing depends on knowing whether the API was billed; partial-stream-then-fail blurs that signal.iterationTimeoutMs.docs/FOLLOW-UP-PRS.mdT33 entry updated with ACCEPTED-DEFERRED rationale + revisit criteria.Backward compatibility
stallMsis an optional new schema field. Existing callers that don't set it see identical behavior to v1.16.1.structuredContent.timeoutMsnow reflects the LIMIT THAT FIRED (mirrors ask/code). Pre-v1.16.2 onlyiterationTimeoutMscould fire, so that field always held theiterationTimeoutMsvalue. Wrappers that branch ontimeoutKindget unambiguous routing; wrappers that read onlytimeoutMssee the active cap.convergenceForced,rescueErrorCode, etc.).Test plan
npm run lintclean.npm run typecheckclean.npm run buildclean./Users/,qmt-mail,michal@, or.claude/localreferences in published artifacts./6stepon every finding.mcp__github__request_copilot_review)./6stepon Copilot findings + fold any TP.GEMINI_CODE_CONTEXT_AGENTIC_STALL_MS=30000, run a multi-file investigation prompt, observe livethinking: …progress events.🤖 Generated with Claude Code