feat: advance prompt-cache breakpoint within the tool loop#1437
Conversation
The only message-side cache breakpoint sat before the current inbound user turn and never moved during the tool loop. Every round N > 0 re-sent the current turn (carrying the full dynamic context: memory up to 25KB, integrations, cross-session) plus all prior rounds' tool calls and results as uncached input. With max_tool_rounds=10 and large tool results the per-turn cost grows quadratically with round count. apply_in_turn_cache_breakpoint stamps a cache_control marker on the trailing tool_result block when the request ends in tool results, so round N reads the current turn plus rounds 0..N-1 from cache and pays cache-write only on the newest round. Message dicts are re-serialized from typed messages every round, so the marker advances with the loop instead of accumulating: at most four breakpoints per request (system, tools, prior-history tail, in-turn), which is Anthropic's limit. Round 0 ends in the current user turn (string content) and is a no-op, same as today. Fixes #1430 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
WalkthroughThis PR adds an in-turn cache breakpoint helper that marks tool-result blocks for caching during multi-round tool-calling flows. The new ChangesIn-turn cache breakpoint for tool-result rounds
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Description
Fixes #1430.
The only message-side cache breakpoint (
apply_history_cache_breakpoint) sits before the current inbound user turn and never moves during the tool loop. Every round N > 0 therefore re-sent the current turn (which carries the full dynamic context: MEMORY.md up to 25KB, integration status, cross-session context) plus all prior rounds' tool calls and results as fresh, uncached input. Withmax_tool_rounds=10and large tool results, the per-turn cost grows quadratically with round count.New
apply_in_turn_cache_breakpointstamps acache_controlmarker on the trailingtool_resultblock when the request ends in tool results (rounds N > 0). Round N then reads the current turn plus rounds 0..N-1 from cache and pays cache-write only on the newest round's content.Safety properties:
AgentMessageobjects on every round (messages_to_messages_api), so the marker naturally advances with the loop; it cannot pile up across rounds.ContextLengthExceededErrorretry path.The existing
Agent turn cache summarylog line (cache_read_input_tokens) is the production signal to confirm the improvement on multi-round turns.Type
Checklist
uv run pytest -v) (2890 passed, 2 skipped)ruff check backend/ && ruff format --check backend/)AI Usage
Overview
This PR implements an in-turn cache breakpoint feature to improve efficiency in multi-round tool-using agent calls. Previously, when an agent ran multiple rounds of tool calls and tool results, each subsequent round would re-send the entire current user turn (which can include large dynamic context like memory files) and all prior tool calls and results as uncached input, wasting cache capacity and tokens. This PR adds a smart cache marker that allows subsequent rounds to reuse cached content from earlier rounds.
What Changed
Added:
apply_in_turn_cache_breakpoint) that marks the last tool-result message block in the current turn as cacheableModified:
No Changes to:
Benefits
Technical Details
The implementation:
cache_controlmarker on the trailingtool_resultblock in the message list when a request ends in tool resultsContextLengthExceededErrorretry handlerTests validate that the marker is correctly placed on tool-result blocks, not applied on round zero, and that when combined with the existing history breakpoint, the total marker count stays within limits.