Skip to content

v1.16.0 — content-aware NO_PROGRESS dedupe (P2 Phase B)#56

Merged
qmt merged 3 commits into
mainfrom
v1.16.0-content-aware-no-progress
Apr 30, 2026
Merged

v1.16.0 — content-aware NO_PROGRESS dedupe (P2 Phase B)#56
qmt merged 3 commits into
mainfrom
v1.16.0-content-aware-no-progress

Conversation

@qmt
Copy link
Copy Markdown
Member

@qmt qmt commented Apr 30, 2026

Summary

Ships the planned Phase B content-aware refactor of the no-progress dedupe (queued since v1.14.4 stop-gap). Same call signature counts toward the threshold ONLY if no read_file against an unseen path happened between repeats. Closes today's empirical Gemini 3 Pro regression where multi-file investigations alternated grep + read_file and tripped the simple counter prematurely.

  • Implementation: signatureCounts Map type changed <string, number><string, { count: number; lastFilesReadSize: number }>. Per-iter snapshot of filesReadSet.size taken once after tool execution. Same signature with growth → count=1 (progress). Same signature without growth → count=prev.count+1 (potentially stuck).
  • Threshold (5) unchanged. Hard upper bounds (maxIterations, maxTotalInputTokens) unchanged.
  • Error message updated so operators see why: "... repeated N times without new file reads between repeats".

Scope

  • Bug fix (refines no-progress signal — closes today's empirical regression)
  • New feature
  • Refactor
  • Documentation
  • Breaking change

Testing

  • 758 passed | 9 skipped (was 756 in v1.15.3 — +2 net new test cases).

  • Positive pin: 5× same grep + read_file interspersed → loop continues.

  • Negative pin: 5× same grep + no file growth → still bails.

  • Pre-publish audit: clean.

  • Unit tests added/updated

  • Integration tests added/updated (P10 covers the rescue payload contract; P2 Phase B tested via mocks)

  • npm run lint passes

  • npm run typecheck passes

  • npm run test passes

Backwards compatibility

  • No API/schema change. structuredContent fields (repeatedSignature, filesRead, apiCalls, etc.) unchanged.
  • Behaviour change observable only in dedupe-trip decision: agentic calls that previously bailed at NO_PROGRESS while reading new files now run to completion (or to maxIterations + rescue).
  • Operators wanting to revert to the v1.15.x simple counter can pin @1.15.3. v1.16.x branch will receive R2 + T33 followups.

Workflow

Standard /6step rigor. Bigger refactor than recent patches; running the full 3-way Round-1 + Round-2 + Copilot review pipeline.

🤖 Generated with Claude Code

The v1.14.4 threshold bump (3 → 5) was a stop-gap. v1.16.0 ships the
planned Phase B content-aware refactor:

  Same call signature counts toward the no-progress threshold ONLY
  if no `read_file` against an unseen path happened between repeats.
  If filesReadSet grew between identical signatures, count resets to 1.

Why: today's empirical Gemini 3 Pro under HIGH thinking re-runs the
same grep ("FunctionCallingConfigMode") while ALSO reading new files
between iterations. That IS progress. The simple counter (v1.14.4)
tripped before maxIterations, killing the rescue path. Phase B
distinguishes "exploring with repeated lookups" from "genuinely stuck".

Why read_file specifically: list_directory/find_files/grep are
exploration without commitment. read_file is the progress signal — it
grows the conversation the rescue path will replay if the loop
exhausts iterations.

Implementation:
- signatureCounts Map type: <string, number> → <string, {count, lastFilesReadSize}>
- Capture filesReadSet.size ONCE per iter (after tool execution) so
  all signatures within the same iter see the same snapshot.
- On each sig: if prev.lastFilesReadSize < current → count=1 (progress);
  else count=prev.count+1.
- Threshold (5) unchanged. Hard caps (maxIterations, maxTotalInputTokens)
  unchanged.
- Error message now reads "... repeated N times without new file reads
  between repeats" so operators see why.

Coverage: 758 pass | 9 skipped (was 756). +2 test cases:
- content-aware positive: 5× same grep + read_file interspersed → continues
- content-aware negative: 5× same grep + NO file growth → still trips

No API/schema change. No structuredContent field changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements Phase B of ask_agentic’s NO_PROGRESS detection by making repeated tool-call signature dedupe “content-aware”: identical signatures only count toward the threshold when no new read_file (unseen path) occurred between repeats, addressing premature bails during multi-file investigations.

Changes:

  • Update ask_agentic no-progress tracking to store { count, lastFilesReadSize } per signature and reset counts when filesReadSet grows.
  • Update operator-facing NO_PROGRESS error message to explicitly mention “without new file reads between repeats”.
  • Add unit tests pinning both the positive (progress via file reads) and negative (no progress) cases; bump version + changelog for v1.16.0.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/tools/ask-agentic.tool.ts Refines NO_PROGRESS dedupe to be content-aware via filesReadSet.size snapshots and per-signature state; updates error message.
test/unit/ask-agentic.test.ts Adds two unit tests pinning Phase B behavior (progress resets count; true no-progress still trips).
CHANGELOG.md Documents v1.16.0 behavior change, rationale, and coverage notes.
package.json Bumps package version to 1.16.0.
server.json Bumps server/package version references to 1.16.0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CHANGELOG.md Outdated
### Behavioural impact

- **Wins**: Gemini 3 Pro multi-file investigations that re-issue the same `grep`/`list_directory` between `read_file` calls now run to completion (or to maxIterations + rescue) instead of bailing prematurely. Today's empirical regression on the v1.14.1 self-review prompt (50% NO_PROGRESS rate at threshold-3, partially mitigated by threshold-5 in v1.14.4) is now structurally addressed.
- **Negative pin preserved**: when the model truly is stuck (5× identical signature with NO new `read_file` between), dedupe still trips. The new test `content-aware: 5× same signature with NO file growth between WINS trips dedupe` pins this.
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changelog entry references the new test name with “between WINS trips dedupe”, which looks like a typo and won’t match the actual test title once corrected. Please update this snippet to use the corrected wording so the changelog remains searchable/accurate.

Copilot uses AI. Check for mistakes.
Comment thread CHANGELOG.md Outdated
2 net new test cases in `test/unit/ask-agentic.test.ts`:

- `content-aware: same signature does NOT trip dedupe when read_file growth happens between repeats` — scripts 5× same `grep` with `read_file` interspersed; loop reaches the scripted final-text on iter 6 (would have tripped at iter 5 pre-Phase-B).
- `content-aware: 5× same signature with NO file growth between WINS trips dedupe` — negative pin: pure-grep loop with no file reads still bails at threshold-5. Also pins the new error message wording.
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same typo appears again in the Coverage bullet for the test name (“between WINS trips dedupe”). Please update to the corrected wording to keep changelog/test names consistent.

Suggested change
- `content-aware: 5× same signature with NO file growth between WINS trips dedupe` — negative pin: pure-grep loop with no file reads still bails at threshold-5. Also pins the new error message wording.
- `content-aware: 5× same signature with NO file growth between repeats trips dedupe` — negative pin: pure-grep loop with no file reads still bails at threshold-5. Also pins the new error message wording.

Copilot uses AI. Check for mistakes.
Comment thread src/tools/ask-agentic.tool.ts Outdated
Comment on lines +666 to +670
* number>` tripped on N consecutive identical signatures regardless
* of whether the model was making progress between repeats. That
* was empirically wrong on 2026-04-30 — Gemini 3 Pro under HIGH
* thinking re-runs the same `grep` while ALSO reading new files,
* which IS progress, but the simple counter still tripped.
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the pre-Phase-B counter “tripped on N consecutive identical signatures”, but the implementation (and the tests above) treat it as a per-signature repeat counter rather than requiring consecutive identical calls. Please reword this to avoid implying a stricter “consecutive” requirement than what the code enforces (e.g., describe it as repeating the same signature without intervening new file reads).

Suggested change
* number>` tripped on N consecutive identical signatures regardless
* of whether the model was making progress between repeats. That
* was empirically wrong on 2026-04-30 Gemini 3 Pro under HIGH
* thinking re-runs the same `grep` while ALSO reading new files,
* which IS progress, but the simple counter still tripped.
* number>` tracked repeat occurrences of the same signature and
* tripped after N repeats even if the model had made progress
* elsewhere since an earlier occurrence. That was empirically wrong
* on 2026-04-30 Gemini 3 Pro under HIGH thinking re-runs the same
* `grep` while ALSO reading new files, which IS progress, but the
* simple counter still tripped.

Copilot uses AI. Check for mistakes.
Comment thread src/tools/ask-agentic.tool.ts Outdated
Comment on lines +983 to +985
// `NO_PROGRESS_CALL_THRESHOLD` times CONSECUTIVELY without the
// `filesReadSet` growing between repeats. If `filesReadSet` grew,
// the model is exploring (productive) — reset count to 1.
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says the guard triggers after a signature is issued NO_PROGRESS_CALL_THRESHOLD times “CONSECUTIVELY”, but the logic increments per signature across the loop and only resets on filesReadSet growth. To prevent future confusion, consider removing “CONSECUTIVELY” or clarifying it as “repeated N times without any new file reads between repeats”.

Suggested change
// `NO_PROGRESS_CALL_THRESHOLD` times CONSECUTIVELY without the
// `filesReadSet` growing between repeats. If `filesReadSet` grew,
// the model is exploring (productive) — reset count to 1.
// `NO_PROGRESS_CALL_THRESHOLD` times without any new
// `filesReadSet` growth between repeats. If `filesReadSet` grew,
// the model is exploring (productive) — reset that signature's
// count to 1.

Copilot uses AI. Check for mistakes.
Comment thread test/unit/ask-agentic.test.ts Outdated
expect(result.structuredContent?.filesRead).toBe(4); // a, b, c, d
});

it('content-aware: 5× same signature with NO file growth between WINS trips dedupe (v1.16.0 P2 Phase B)', async () => {
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test name contains an apparent typo: “between WINS trips dedupe”. Consider renaming to something like “between repeats still trips dedupe” to keep the description clear and consistent with the behavior being pinned.

Suggested change
it('content-aware: 5× same signature with NO file growth between WINS trips dedupe (v1.16.0 P2 Phase B)', async () => {
it('content-aware: 5× same signature with NO file growth between repeats trips dedupe (v1.16.0 P2 Phase B)', async () => {

Copilot uses AI. Check for mistakes.
qmt and others added 2 commits April 30, 2026 22:43
AA1 (3-of-3 cross-corroborated TP HIGH per grok escalation): within-iter
parallel duplicates compound counter. Pre-fix: 5× identical signature in
ONE parallel batch instantly trips AGENTIC_NO_PROGRESS even though zero
stuck-iters elapsed. Fix: dedupe signatures via Set BEFORE counting —
same-iter duplicates count as +1 toward threshold, not +N.

AA2 (grok F2 MEDIUM, ACCEPTED-DEFERRED): read_file as sole progress
oracle assumes read-only tool surface. ask_agentic is intentionally
read-only — no write_file / terminal exec. The failure mode grok
described doesn't exist in our codebase. Future-proof guidance
documented: when adding write tools, expand the oracle.

Coverage: 759 pass | 9 skipped (was 758). +1 test pinning AA1
(within-iter parallel batch must not compound).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot found 5 NIT findings, all doc/comment/test-name typos:

- "between WINS trips dedupe" → "between repeats still trips dedupe"
  (typo I introduced; appears in test name + 2× in CHANGELOG).
- Comment said "tripped on N consecutive identical signatures" — code
  doesn't require strict consecutiveness; only checks growth-since-last
  for that signature. Reworded: "tracked repeat occurrences of the
  same signature and tripped after N repeats even if the model had
  made progress elsewhere".
- Comment said "issued N times CONSECUTIVELY" — same wording overstating.
  Reworded: "issued N times without any new filesReadSet growth between
  repeats".

Pure doc/wording — no code/behaviour change. Test count unchanged
(759 pass | 9 skipped). Lint + typecheck green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@qmt qmt merged commit 42109b9 into main Apr 30, 2026
4 checks passed
@qmt qmt deleted the v1.16.0-content-aware-no-progress branch April 30, 2026 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants