Skip to content

fix(ai): preserve Anthropic server tool replay#48

Merged
code-yeongyu merged 2 commits into
mainfrom
codex/fix-anthropic-thinking-replay
Jun 18, 2026
Merged

fix(ai): preserve Anthropic server tool replay#48
code-yeongyu merged 2 commits into
mainfrom
codex/fix-anthropic-thinking-replay

Conversation

@code-yeongyu

@code-yeongyu code-yeongyu commented Jun 17, 2026

Copy link
Copy Markdown
Owner

Summary

Fix Anthropic same-model replay for assistant messages that contain provider-native server tool blocks and signed thinking.

The failing session 019ed4f0-2914-701d-9614-dd0f58c0b103 ended with Anthropic rejecting the next request because thinking blocks had been modified. The local replay serializer preserved signed thinking text, but it dropped same-model Anthropic providerNative blocks such as server_tool_use and web_search_tool_result. That changed the protected assistant content sequence before the follow-up tool-result request.

This patch keeps replayable Anthropic provider-native raw blocks only for the same provider/api/model, while continuing to drop non-portable cross-provider native blocks.

QA Evidence

Evidence directory: local-ignore/qa-evidence/20260617-thinking-session-fix/

  • Failing-first: red-anthropic-provider-native.txt showed same-model provider-native server blocks were dropped from the Anthropic replay payload.
  • Regression: green-anthropic-provider-native-replay-after-review.txt passed test/anthropic-provider-native-replay.test.ts with the real failure shape: server native blocks + signed thinking + text + tool call + tool result.
  • Focused hermetic suite: green-focused-ai-tests-hermetic.txt passed 13 tests with 1 live Anthropic E2E skipped.
  • Full check: npm-run-check-after-install.txt passed npm run check after npm install --ignore-scripts hydrated already-declared local @types packages.
  • Senpi real CLI QA: senpi-qa-mock-loop-anthropic-self.txt and senpi-qa-mock-loop-anthropic-tool.txt passed the Anthropic mock loop and multi-turn tool loop with zero real provider calls and real auth unchanged.
  • Review: ultrawork reviewer approval recorded after the regression test was staged.

Summary by cubic

Fixes Anthropic same-model replay to keep server-native tool blocks around signed thinking so follow-up tool-result requests are accepted. Restores stable multi-turn tool flows and scopes PR530 benchmark suites by package for faster CI.

  • Bug Fixes
    • Preserve a whitelist of Anthropic provider-native blocks when replaying the same provider/api/model (e.g., server_tool_use, web_search_tool_result); continue dropping cross-provider native blocks.
    • Added a regression test to verify signed thinking + native server blocks + tool use replay; removed the obsolete test that expected provider-native blocks to be dropped.

Written for commit 856288b. Summary will update on new commits.

Review in cubic

@code-yeongyu code-yeongyu merged commit 5f1c892 into main Jun 18, 2026
4 checks passed
@code-yeongyu code-yeongyu deleted the codex/fix-anthropic-thinking-replay branch June 18, 2026 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant