Skip to content

v1.16.2 — T33 TRIM: ask_agentic loop streaming + stallMs heartbeat watchdog#58

Merged
qmt merged 3 commits into
mainfrom
v1.16.2-t33-trim-loop-streaming
Apr 30, 2026
Merged

v1.16.2 — T33 TRIM: ask_agentic loop streaming + stallMs heartbeat watchdog#58
qmt merged 3 commits into
mainfrom
v1.16.2-t33-trim-loop-streaming

Conversation

@qmt
Copy link
Copy Markdown
Member

@qmt qmt commented Apr 30, 2026

Summary

  • ask_agentic loop iterations now use generateContentStream + collectStream (closes T33 — the last open item in the v1.6/v1.7 streaming refactor track).
  • New per-call stallMs schema field (1s–10min) + GEMINI_CODE_CONTEXT_AGENTIC_STALL_MS env var. Heartbeat-aware watchdog that resets on every chunk; fires only when the stream goes silent. Does NOT fire while the model is actively thinking.
  • TIMEOUT errorResult surfaces timeoutKind: 'stall' | 'total' so wrappers can route retry policy. Mirrors ask / code v1.12.0 conventions.
  • Live thinking: … progress events during the 30-120s thinking phase. UX parity with ask/code.

Scope-exclusion: rescue stays non-streaming

The post-loop forced-finalization rescue intentionally keeps generateContent (not streaming). After 2-of-2 cross-reviewer consultation (gcc-ask: GO full scope; grok: TRIM), TRIM was adopted because:

  1. The rescue's apiCallSucceeded contract was stabilized in v1.15.0 (P3+P4+P5) through 4 review rounds + /6step cross-corroboration — re-opening that recently-stabilized surface for marginal liveness gain on a once-per-call path is the wrong trade.
  2. Mid-stream failures cannot be retried (Gemini's generateContentStream has no resume) — the v1.6/v1.7 plan §11 explicitly cites this. The rescue's transient-failure budget routing depends on knowing whether the API was billed; partial-stream-then-fail blurs that signal.
  3. Empirical wall-clock dominance is in the LOOP, not the rescue. Streaming where the wall-clock lives gets 80-90% of the value at half the effort and zero risk to recently-stabilized code.
  4. Rescue cost is ONE call. Stalls there are bounded by iterationTimeoutMs.

docs/FOLLOW-UP-PRS.md T33 entry updated with ACCEPTED-DEFERRED rationale + revisit criteria.

Backward compatibility

  • No breaking change. stallMs is an optional new schema field. Existing callers that don't set it see identical behavior to v1.16.1.
  • structuredContent.timeoutMs now reflects the LIMIT THAT FIRED (mirrors ask/code). Pre-v1.16.2 only iterationTimeoutMs could fire, so that field always held the iterationTimeoutMs value. Wrappers that branch on timeoutKind get unambiguous routing; wrappers that read only timeoutMs see the active cap.
  • Rescue path completely unchanged. Operators who hit the rescue see byte-identical behavior to v1.16.1 (convergenceForced, rescueErrorCode, etc.).

Test plan

  • 762 unit tests pass (759 → 762; added 3 new stallMs tests).
  • All v1.15.0 rescue tests still green (proves rescue scope-exclusion held).
  • All v1.16.0 content-aware NO_PROGRESS tests still green (regression check).
  • npm run lint clean.
  • npm run typecheck clean.
  • npm run build clean.
  • Pre-publish audit clean — no /Users/, qmt-mail, michal@, or .claude/local references in published artifacts.
  • CI green on initial commit.
  • 3-way review (gemini-cli + gemini-chat + grok via MCP) with /6step on every finding.
  • Round-2 review LGTM.
  • Copilot review (via mcp__github__request_copilot_review).
  • /6step on Copilot findings + fold any TP.
  • CI green on every fold-commit.
  • Manual: set GEMINI_CODE_CONTEXT_AGENTIC_STALL_MS=30000, run a multi-file investigation prompt, observe live thinking: … progress events.

🤖 Generated with Claude Code

qmt and others added 2 commits April 30, 2026 23:27
…tchdog

Loop iterations now use generateContentStream + collectStream instead of
non-streaming generateContent. New per-call `stallMs` schema field +
`GEMINI_CODE_CONTEXT_AGENTIC_STALL_MS` env var. TIMEOUT errorResult
surfaces `timeoutKind: 'stall' | 'total'` so wrappers can route retry
policy.

Rescue at the post-loop forced-finalization site is intentionally
NOT migrated — the v1.15.0 P3+P4+P5 `apiCallSucceeded` contract was
stabilized through 4 review rounds + /6step cross-corroboration; mid-stream
failures cannot be retried (Gemini has no resume), so re-opening that
surface for a once-per-call path is the wrong trade.

3 new tests + buildCtx mock factory updated to support generateContentStream.
762 tests pass (759 → 762). Lint + typecheck clean.

Closes T33 — last open item in the v1.6/v1.7 streaming refactor track.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ode parity

Round-1 cross-review (gemini-code-context + grok + codex). Codex + grok
clean; gemini flagged two empirically verifiable issues. Both folded:

Finding #1 (HIGH) — collectStream candidate preservation
src/tools/shared/stream-collector.ts:174 — pre-fix gate was outer-array
length only. If Gemini's stream protocol ever emits a final terminator
shape `{ candidates: [{ content: { parts: [] }, finishReason: 'STOP' }] }`,
the previous chunk's functionCall Part would be silently overwritten —
ask_agentic's loop would treat the iteration as final-text and never
dispatch the tool. Defensive fix: gate also requires non-empty parts.
Empirical current-protocol behaviour (functionCall + finishReason packed
into one chunk) preserved either way; this hardens against fragmentation
patterns thinkingLevel=HIGH or future SDK versions could legitimately
introduce.

Finding #2 (MEDIUM) — timeoutMs payload contract divergence from ask/code
src/tools/ask-agentic.tool.ts — initial commit collapsed timeoutMs to
"the limit that fired". ask.tool.ts:1132 always reports the configured
total wall-clock cap regardless of which watchdog fired — discriminator
is timeoutKind. Reverted to ask/code parity.

Two new regression tests:
- test/unit/stream-collector.test.ts — fragmentation-guard pin at the
  collector level
- test/unit/ask-agentic.test.ts — fragmentation-guard pin at the
  agentic-loop level (verifies read_file dispatches + organic final-text)

764 tests pass (762 → 764). Lint + typecheck + build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes the ask_agentic loop’s migration to streaming, adding a heartbeat-aware stall watchdog (stallMs) and live “thinking” progress events to match ask/code behavior, while keeping the post-loop rescue path non-streaming.

Changes:

  • Switch ask_agentic loop iterations to generateContentStream + collectStream, wiring chunk heartbeats to the stall watchdog and emitting live thought previews.
  • Harden collectStream so terminator-only chunks with empty parts don’t overwrite earlier functionCall candidates (fragmentation guard).
  • Add/extend unit tests for stall-vs-total timeout behavior and fragmented functionCall dispatch; bump version/docs/changelog for v1.16.2.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/tools/ask-agentic.tool.ts Streams loop iterations, adds stallMs input + env fallback, surfaces timeoutKind, emits live thought progress.
src/tools/shared/stream-collector.ts Prevents empty-parts terminator candidates from overwriting earlier candidates (preserves functionCalls).
test/unit/ask-agentic.test.ts Adds stall watchdog + timeoutKind regression tests and a fragmented functionCall dispatch test.
test/unit/stream-collector.test.ts Adds a collector-level regression test for the empty-parts terminator overwrite case.
docs/FOLLOW-UP-PRS.md Marks T33 as shipped and documents TRIM scope/rationale.
CHANGELOG.md Adds v1.16.2 release notes for streaming + stall watchdog changes.
package.json Bumps package version to 1.16.2.
server.json Bumps server/package version references to 1.16.2.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CHANGELOG.md Outdated

- **`stallMs` per-call schema field** (1s–10min, optional) — heartbeat-aware stall watchdog that resets on every chunk yielded by `generateContentStream` (text or thought). Fires ONLY when the stream goes silent for this long. Does NOT fire while the model is actively thinking. Documented as the right knob for "kill dead sockets quickly without penalising deep reasoning."
- **`GEMINI_CODE_CONTEXT_AGENTIC_STALL_MS` env var** — fallback path identical to `ask`/`code` (`GEMINI_CODE_CONTEXT_ASK_STALL_MS`, `GEMINI_CODE_CONTEXT_CODE_STALL_MS`). Recommended: `60_000` (60s).
- **`structuredContent.timeoutKind`** on `errorCode: 'TIMEOUT'` — `'stall' | 'total'` discriminates which watchdog fired. Surfaces the active limit in `timeoutMs` and the configured stall in `stallMs` (mirrors the ask/code pattern). Wrappers can apply different retry policies — `'stall'` is usually safe to retry (dead socket); `'total'` means raise `iterationTimeoutMs` or narrow the prompt.
Comment thread docs/FOLLOW-UP-PRS.md Outdated
- Existing `iterationTimeoutMs` stays orthogonal (wall-clock cap; both can be set, whichever fires first wins).
- Update `createTimeoutController` call site to pass composite `{ totalMs, stallMs }` opts (already supported since v1.12.0; just call with the new field).
- `createTimeoutController` loop call site now passes composite `{ totalMs, stallMs }` opts.
- TIMEOUT errorResult now surfaces `timeoutKind: 'stall' | 'total'` + both `timeoutMs` (active limit that fired) and `stallMs` (configured stall watchdog) — mirrors ask/code conventions.
… timeoutMs semantics

Copilot Round-1 review on PR #58 flagged 2 stale wording residues from
the Round-1 fold of gemini Finding #2. Both doc-only:

- CHANGELOG.md "What's new" bullet (line 24) — claimed `timeoutMs`
  surfaces "the active limit", contradicting the actual implementation
  (always-configured-total mirroring ask.tool.ts:1132).
- docs/FOLLOW-UP-PRS.md T33 entry (line 577) — same stale "active limit
  that fired" wording.

Both reworded to match the implementation: `timeoutMs` reports the
configured TOTAL wall-clock cap (or null when disabled); `stallMs` reports
the configured stall watchdog (or null when disabled); `timeoutKind` is
the discriminator.

Both findings are TRUE POSITIVE LOW per /6step. Empirical verification:
ask.tool.ts:1132 = `timeoutMs: ms` (always configured total). Implementation
in ask-agentic.tool.ts post-Round-1-fold mirrors verbatim. Doc text was
the only residue left from the initial commit's collapsed-to-active-limit
shape.

No code or test changes — pure doc-correctness fold. 764 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@qmt qmt merged commit 371bd8f into main Apr 30, 2026
4 checks passed
@qmt qmt deleted the v1.16.2-t33-trim-loop-streaming branch April 30, 2026 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants