Skip to content

feat: advance prompt-cache breakpoint within the tool loop#1437

Merged
njbrake merged 2 commits into
mainfrom
feat/in-turn-cache-breakpoint
Jun 10, 2026
Merged

feat: advance prompt-cache breakpoint within the tool loop#1437
njbrake merged 2 commits into
mainfrom
feat/in-turn-cache-breakpoint

Conversation

@njbrake

@njbrake njbrake commented Jun 10, 2026

Copy link
Copy Markdown
Member

Note: this PR was drafted by Claude via back-and-forth with @njbrake. The reasoning and decisions are his; the prose is Claude's.

Description

Fixes #1430.

The only message-side cache breakpoint (apply_history_cache_breakpoint) sits before the current inbound user turn and never moves during the tool loop. Every round N > 0 therefore re-sent the current turn (which carries the full dynamic context: MEMORY.md up to 25KB, integration status, cross-session context) plus all prior rounds' tool calls and results as fresh, uncached input. With max_tool_rounds=10 and large tool results, the per-turn cost grows quadratically with round count.

New apply_in_turn_cache_breakpoint stamps a cache_control marker on the trailing tool_result block when the request ends in tool results (rounds N > 0). Round N then reads the current turn plus rounds 0..N-1 from cache and pays cache-write only on the newest round's content.

Safety properties:

  • No marker accumulation. Message dicts are re-serialized from typed AgentMessage objects on every round (messages_to_messages_api), so the marker naturally advances with the loop; it cannot pile up across rounds.
  • At most four breakpoints per request (system, tools, prior-history tail, in-turn), which is Anthropic's limit. A test asserts the message side carries at most two.
  • Round 0 unchanged. The request ends in the current user turn (plain string content), so the function is a no-op there and the request shape is byte-identical to today.
  • Applied on both the main call path and the ContextLengthExceededError retry path.

The existing Agent turn cache summary log line (cache_read_input_tokens) is the production signal to confirm the improvement on multi-round turns.

Type

  • Feature
  • Bug fix
  • Refactor
  • Test
  • CI/CD
  • Documentation

Checklist

  • Tests pass (uv run pytest -v) (2890 passed, 2 skipped)
  • Lint passes (ruff check backend/ && ruff format --check backend/)
  • New tests added for new functionality
  • Bug fixes include regression tests (not a bug fix; performance feature)

AI Usage

  • AI-assisted (describe how): Claude analyzed the per-round cache behavior, implemented the breakpoint rotation, and wrote the tests, with direction and review by @njbrake.
  • No AI used

Overview

This PR implements an in-turn cache breakpoint feature to improve efficiency in multi-round tool-using agent calls. Previously, when an agent ran multiple rounds of tool calls and tool results, each subsequent round would re-send the entire current user turn (which can include large dynamic context like memory files) and all prior tool calls and results as uncached input, wasting cache capacity and tokens. This PR adds a smart cache marker that allows subsequent rounds to reuse cached content from earlier rounds.

What Changed

Added:

  • A new cache control function (apply_in_turn_cache_breakpoint) that marks the last tool-result message block in the current turn as cacheable
  • Logic in the agent core to apply this marker before sending requests with tool results
  • Comprehensive test coverage validating the marker is placed correctly and doesn't accumulate

Modified:

  • Agent call pipeline to invoke the new cache breakpoint function on both the main request path and on error retry paths (when context length is exceeded)

No Changes to:

  • Single-round behavior (marked as "Round 0") — requests ending in user messages behave identically
  • Overall request structure or message sequencing

Benefits

  • Reduced token waste: Multi-round tool calls now reuse cached context from earlier rounds, paying cache tokens only once for the current turn's content
  • Better scalability: Agents with large context (up to 25KB dynamic content per turn) benefit from avoiding repeated uncached sends
  • Maintained safety: The implementation ensures at most four cache breakpoints per request (Anthropic limit) and prevents marker accumulation
  • Backward compatible: Single-round requests and existing behavior remain unchanged

Technical Details

The implementation:

  • Stamps a cache_control marker on the trailing tool_result block in the message list when a request ends in tool results
  • Performs safety checks to ensure the marker only applies to valid tool-result messages
  • Leverages message re-serialization each round to prevent marker accumulation
  • Works with both the main LLM call path and the ContextLengthExceededError retry handler
  • Maintains the existing prior-history cache breakpoint placement for cross-turn consistency

Tests validate that the marker is correctly placed on tool-result blocks, not applied on round zero, and that when combined with the existing history breakpoint, the total marker count stays within limits.

The only message-side cache breakpoint sat before the current inbound
user turn and never moved during the tool loop. Every round N > 0
re-sent the current turn (carrying the full dynamic context: memory up
to 25KB, integrations, cross-session) plus all prior rounds' tool calls
and results as uncached input. With max_tool_rounds=10 and large tool
results the per-turn cost grows quadratically with round count.

apply_in_turn_cache_breakpoint stamps a cache_control marker on the
trailing tool_result block when the request ends in tool results, so
round N reads the current turn plus rounds 0..N-1 from cache and pays
cache-write only on the newest round. Message dicts are re-serialized
from typed messages every round, so the marker advances with the loop
instead of accumulating: at most four breakpoints per request (system,
tools, prior-history tail, in-turn), which is Anthropic's limit. Round
0 ends in the current user turn (string content) and is a no-op, same
as today.

Fixes #1430

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 79bcdd28-fe4e-450a-95a1-252c81224f73

📥 Commits

Reviewing files that changed from the base of the PR and between b2f8eb4 and 5195c73.

📒 Files selected for processing (3)
  • backend/app/agent/core.py
  • backend/app/services/llm_service.py
  • tests/test_llm_service.py

Walkthrough

This PR adds an in-turn cache breakpoint helper that marks tool-result blocks for caching during multi-round tool-calling flows. The new apply_in_turn_cache_breakpoint function is imported, integrated into the main LLM request path and its error-recovery flow, and validated with comprehensive tests covering both the happy path and edge cases.

Changes

In-turn cache breakpoint for tool-result rounds

Layer / File(s) Summary
Cache breakpoint helper
backend/app/services/llm_service.py
apply_in_turn_cache_breakpoint stamps cache_control onto the final tool_result block when safe, with guards for empty messages, non-list content, and round-zero user turns; returns messages unchanged on edge cases.
LLM pipeline integration
backend/app/agent/core.py
Import statement and two application points: before the main LLM request in _call_llm_with_retry (rounds N > 0), and on trimmed dicts in the ContextLengthExceededError recovery path to maintain cache behavior across retries.
Test coverage
tests/test_llm_service.py
New TestApplyInTurnCacheBreakpoint class validates tool-result marking, edge cases (empty lists, round-zero strings, trailing assistants), and combined breakpoint-count limits against Anthropic's four-marker ceiling.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes


Possibly related PRs

  • mozilla-ai/clawbolt#1424: Implements a parallel apply_history_cache_breakpoint helper in the same LLM-service utilities module and applies it in the adjacent message-pipeline code path, sharing the same architecture and cache-control-marker placement pattern.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title follows the required Conventional Commit format with 'feat:' prefix, uses imperative mood, and clearly describes the main change at 58 characters.
Description check ✅ Passed The PR description is comprehensive, includes all required template sections with checkmarks, clearly explains the change, its rationale, and safety properties.
Linked Issues check ✅ Passed The changes fully implement the requirements from #1430: adds in-turn cache breakpoint rotation, maintains the four-marker limit, preserves round-0 behavior, and applies the fix on both main and retry paths.
Out of Scope Changes check ✅ Passed All changes directly support the objective of advancing the cache breakpoint within the tool loop; no unrelated modifications are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/in-turn-cache-breakpoint
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/in-turn-cache-breakpoint

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@njbrake njbrake merged commit 59bfe1d into main Jun 10, 2026
10 checks passed
@njbrake njbrake deleted the feat/in-turn-cache-breakpoint branch June 10, 2026 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Advance the prompt-cache breakpoint within the tool loop so rounds reuse this turn's prefix

1 participant