v1.16.0 — content-aware NO_PROGRESS dedupe (P2 Phase B)#56
Conversation
The v1.14.4 threshold bump (3 → 5) was a stop-gap. v1.16.0 ships the
planned Phase B content-aware refactor:
Same call signature counts toward the no-progress threshold ONLY
if no `read_file` against an unseen path happened between repeats.
If filesReadSet grew between identical signatures, count resets to 1.
Why: today's empirical Gemini 3 Pro under HIGH thinking re-runs the
same grep ("FunctionCallingConfigMode") while ALSO reading new files
between iterations. That IS progress. The simple counter (v1.14.4)
tripped before maxIterations, killing the rescue path. Phase B
distinguishes "exploring with repeated lookups" from "genuinely stuck".
Why read_file specifically: list_directory/find_files/grep are
exploration without commitment. read_file is the progress signal — it
grows the conversation the rescue path will replay if the loop
exhausts iterations.
Implementation:
- signatureCounts Map type: <string, number> → <string, {count, lastFilesReadSize}>
- Capture filesReadSet.size ONCE per iter (after tool execution) so
all signatures within the same iter see the same snapshot.
- On each sig: if prev.lastFilesReadSize < current → count=1 (progress);
else count=prev.count+1.
- Threshold (5) unchanged. Hard caps (maxIterations, maxTotalInputTokens)
unchanged.
- Error message now reads "... repeated N times without new file reads
between repeats" so operators see why.
Coverage: 758 pass | 9 skipped (was 756). +2 test cases:
- content-aware positive: 5× same grep + read_file interspersed → continues
- content-aware negative: 5× same grep + NO file growth → still trips
No API/schema change. No structuredContent field changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Implements Phase B of ask_agentic’s NO_PROGRESS detection by making repeated tool-call signature dedupe “content-aware”: identical signatures only count toward the threshold when no new read_file (unseen path) occurred between repeats, addressing premature bails during multi-file investigations.
Changes:
- Update
ask_agenticno-progress tracking to store{ count, lastFilesReadSize }per signature and reset counts whenfilesReadSetgrows. - Update operator-facing NO_PROGRESS error message to explicitly mention “without new file reads between repeats”.
- Add unit tests pinning both the positive (progress via file reads) and negative (no progress) cases; bump version + changelog for
v1.16.0.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/tools/ask-agentic.tool.ts |
Refines NO_PROGRESS dedupe to be content-aware via filesReadSet.size snapshots and per-signature state; updates error message. |
test/unit/ask-agentic.test.ts |
Adds two unit tests pinning Phase B behavior (progress resets count; true no-progress still trips). |
CHANGELOG.md |
Documents v1.16.0 behavior change, rationale, and coverage notes. |
package.json |
Bumps package version to 1.16.0. |
server.json |
Bumps server/package version references to 1.16.0. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ### Behavioural impact | ||
|
|
||
| - **Wins**: Gemini 3 Pro multi-file investigations that re-issue the same `grep`/`list_directory` between `read_file` calls now run to completion (or to maxIterations + rescue) instead of bailing prematurely. Today's empirical regression on the v1.14.1 self-review prompt (50% NO_PROGRESS rate at threshold-3, partially mitigated by threshold-5 in v1.14.4) is now structurally addressed. | ||
| - **Negative pin preserved**: when the model truly is stuck (5× identical signature with NO new `read_file` between), dedupe still trips. The new test `content-aware: 5× same signature with NO file growth between WINS trips dedupe` pins this. |
There was a problem hiding this comment.
Changelog entry references the new test name with “between WINS trips dedupe”, which looks like a typo and won’t match the actual test title once corrected. Please update this snippet to use the corrected wording so the changelog remains searchable/accurate.
| 2 net new test cases in `test/unit/ask-agentic.test.ts`: | ||
|
|
||
| - `content-aware: same signature does NOT trip dedupe when read_file growth happens between repeats` — scripts 5× same `grep` with `read_file` interspersed; loop reaches the scripted final-text on iter 6 (would have tripped at iter 5 pre-Phase-B). | ||
| - `content-aware: 5× same signature with NO file growth between WINS trips dedupe` — negative pin: pure-grep loop with no file reads still bails at threshold-5. Also pins the new error message wording. |
There was a problem hiding this comment.
Same typo appears again in the Coverage bullet for the test name (“between WINS trips dedupe”). Please update to the corrected wording to keep changelog/test names consistent.
| - `content-aware: 5× same signature with NO file growth between WINS trips dedupe` — negative pin: pure-grep loop with no file reads still bails at threshold-5. Also pins the new error message wording. | |
| - `content-aware: 5× same signature with NO file growth between repeats trips dedupe` — negative pin: pure-grep loop with no file reads still bails at threshold-5. Also pins the new error message wording. |
| * number>` tripped on N consecutive identical signatures regardless | ||
| * of whether the model was making progress between repeats. That | ||
| * was empirically wrong on 2026-04-30 — Gemini 3 Pro under HIGH | ||
| * thinking re-runs the same `grep` while ALSO reading new files, | ||
| * which IS progress, but the simple counter still tripped. |
There was a problem hiding this comment.
The comment says the pre-Phase-B counter “tripped on N consecutive identical signatures”, but the implementation (and the tests above) treat it as a per-signature repeat counter rather than requiring consecutive identical calls. Please reword this to avoid implying a stricter “consecutive” requirement than what the code enforces (e.g., describe it as repeating the same signature without intervening new file reads).
| * number>` tripped on N consecutive identical signatures regardless | |
| * of whether the model was making progress between repeats. That | |
| * was empirically wrong on 2026-04-30 — Gemini 3 Pro under HIGH | |
| * thinking re-runs the same `grep` while ALSO reading new files, | |
| * which IS progress, but the simple counter still tripped. | |
| * number>` tracked repeat occurrences of the same signature and | |
| * tripped after N repeats even if the model had made progress | |
| * elsewhere since an earlier occurrence. That was empirically wrong | |
| * on 2026-04-30 — Gemini 3 Pro under HIGH thinking re-runs the same | |
| * `grep` while ALSO reading new files, which IS progress, but the | |
| * simple counter still tripped. |
| // `NO_PROGRESS_CALL_THRESHOLD` times CONSECUTIVELY without the | ||
| // `filesReadSet` growing between repeats. If `filesReadSet` grew, | ||
| // the model is exploring (productive) — reset count to 1. |
There was a problem hiding this comment.
This comment says the guard triggers after a signature is issued NO_PROGRESS_CALL_THRESHOLD times “CONSECUTIVELY”, but the logic increments per signature across the loop and only resets on filesReadSet growth. To prevent future confusion, consider removing “CONSECUTIVELY” or clarifying it as “repeated N times without any new file reads between repeats”.
| // `NO_PROGRESS_CALL_THRESHOLD` times CONSECUTIVELY without the | |
| // `filesReadSet` growing between repeats. If `filesReadSet` grew, | |
| // the model is exploring (productive) — reset count to 1. | |
| // `NO_PROGRESS_CALL_THRESHOLD` times without any new | |
| // `filesReadSet` growth between repeats. If `filesReadSet` grew, | |
| // the model is exploring (productive) — reset that signature's | |
| // count to 1. |
| expect(result.structuredContent?.filesRead).toBe(4); // a, b, c, d | ||
| }); | ||
|
|
||
| it('content-aware: 5× same signature with NO file growth between WINS trips dedupe (v1.16.0 P2 Phase B)', async () => { |
There was a problem hiding this comment.
Test name contains an apparent typo: “between WINS trips dedupe”. Consider renaming to something like “between repeats still trips dedupe” to keep the description clear and consistent with the behavior being pinned.
| it('content-aware: 5× same signature with NO file growth between WINS trips dedupe (v1.16.0 P2 Phase B)', async () => { | |
| it('content-aware: 5× same signature with NO file growth between repeats trips dedupe (v1.16.0 P2 Phase B)', async () => { |
AA1 (3-of-3 cross-corroborated TP HIGH per grok escalation): within-iter parallel duplicates compound counter. Pre-fix: 5× identical signature in ONE parallel batch instantly trips AGENTIC_NO_PROGRESS even though zero stuck-iters elapsed. Fix: dedupe signatures via Set BEFORE counting — same-iter duplicates count as +1 toward threshold, not +N. AA2 (grok F2 MEDIUM, ACCEPTED-DEFERRED): read_file as sole progress oracle assumes read-only tool surface. ask_agentic is intentionally read-only — no write_file / terminal exec. The failure mode grok described doesn't exist in our codebase. Future-proof guidance documented: when adding write tools, expand the oracle. Coverage: 759 pass | 9 skipped (was 758). +1 test pinning AA1 (within-iter parallel batch must not compound). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot found 5 NIT findings, all doc/comment/test-name typos: - "between WINS trips dedupe" → "between repeats still trips dedupe" (typo I introduced; appears in test name + 2× in CHANGELOG). - Comment said "tripped on N consecutive identical signatures" — code doesn't require strict consecutiveness; only checks growth-since-last for that signature. Reworded: "tracked repeat occurrences of the same signature and tripped after N repeats even if the model had made progress elsewhere". - Comment said "issued N times CONSECUTIVELY" — same wording overstating. Reworded: "issued N times without any new filesReadSet growth between repeats". Pure doc/wording — no code/behaviour change. Test count unchanged (759 pass | 9 skipped). Lint + typecheck green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Ships the planned Phase B content-aware refactor of the no-progress dedupe (queued since v1.14.4 stop-gap). Same call signature counts toward the threshold ONLY if no
read_fileagainst an unseen path happened between repeats. Closes today's empirical Gemini 3 Pro regression where multi-file investigations alternatedgrep+read_fileand tripped the simple counter prematurely.signatureCountsMap type changed<string, number>→<string, { count: number; lastFilesReadSize: number }>. Per-iter snapshot offilesReadSet.sizetaken once after tool execution. Same signature with growth → count=1 (progress). Same signature without growth → count=prev.count+1 (potentially stuck).maxIterations,maxTotalInputTokens) unchanged."... repeated N times without new file reads between repeats".Scope
Testing
758 passed | 9 skipped (was 756 in v1.15.3 — +2 net new test cases).
Positive pin: 5× same
grep+read_fileinterspersed → loop continues.Negative pin: 5× same
grep+ no file growth → still bails.Pre-publish audit: clean.
Unit tests added/updated
Integration tests added/updated (P10 covers the rescue payload contract; P2 Phase B tested via mocks)
npm run lintpassesnpm run typecheckpassesnpm run testpassesBackwards compatibility
structuredContentfields (repeatedSignature,filesRead,apiCalls, etc.) unchanged.@1.15.3. v1.16.x branch will receive R2 + T33 followups.Workflow
Standard
/6steprigor. Bigger refactor than recent patches; running the full 3-way Round-1 + Round-2 + Copilot review pipeline.🤖 Generated with Claude Code