feat: advance prompt-cache breakpoint within the tool loop by njbrake · Pull Request #1437 · mozilla-ai/clawbolt

njbrake · 2026-06-10T08:33:14Z

Note: this PR was drafted by Claude via back-and-forth with @njbrake. The reasoning and decisions are his; the prose is Claude's.

Description

Fixes #1430.

The only message-side cache breakpoint (apply_history_cache_breakpoint) sits before the current inbound user turn and never moves during the tool loop. Every round N > 0 therefore re-sent the current turn (which carries the full dynamic context: MEMORY.md up to 25KB, integration status, cross-session context) plus all prior rounds' tool calls and results as fresh, uncached input. With max_tool_rounds=10 and large tool results, the per-turn cost grows quadratically with round count.

New apply_in_turn_cache_breakpoint stamps a cache_control marker on the trailing tool_result block when the request ends in tool results (rounds N > 0). Round N then reads the current turn plus rounds 0..N-1 from cache and pays cache-write only on the newest round's content.

Safety properties:

No marker accumulation. Message dicts are re-serialized from typed AgentMessage objects on every round (messages_to_messages_api), so the marker naturally advances with the loop; it cannot pile up across rounds.
At most four breakpoints per request (system, tools, prior-history tail, in-turn), which is Anthropic's limit. A test asserts the message side carries at most two.
Round 0 unchanged. The request ends in the current user turn (plain string content), so the function is a no-op there and the request shape is byte-identical to today.
Applied on both the main call path and the ContextLengthExceededError retry path.

The existing Agent turn cache summary log line (cache_read_input_tokens) is the production signal to confirm the improvement on multi-round turns.

Type

Checklist

Tests pass (uv run pytest -v) (2890 passed, 2 skipped)
Lint passes (ruff check backend/ && ruff format --check backend/)
New tests added for new functionality
Bug fixes include regression tests (not a bug fix; performance feature)

AI Usage

AI-assisted (describe how): Claude analyzed the per-round cache behavior, implemented the breakpoint rotation, and wrote the tests, with direction and review by @njbrake.
No AI used

Overview

This PR implements an in-turn cache breakpoint feature to improve efficiency in multi-round tool-using agent calls. Previously, when an agent ran multiple rounds of tool calls and tool results, each subsequent round would re-send the entire current user turn (which can include large dynamic context like memory files) and all prior tool calls and results as uncached input, wasting cache capacity and tokens. This PR adds a smart cache marker that allows subsequent rounds to reuse cached content from earlier rounds.

What Changed

Added:

A new cache control function (apply_in_turn_cache_breakpoint) that marks the last tool-result message block in the current turn as cacheable
Logic in the agent core to apply this marker before sending requests with tool results
Comprehensive test coverage validating the marker is placed correctly and doesn't accumulate

Modified:

Agent call pipeline to invoke the new cache breakpoint function on both the main request path and on error retry paths (when context length is exceeded)

No Changes to:

Single-round behavior (marked as "Round 0") — requests ending in user messages behave identically
Overall request structure or message sequencing

Benefits

Reduced token waste: Multi-round tool calls now reuse cached context from earlier rounds, paying cache tokens only once for the current turn's content
Better scalability: Agents with large context (up to 25KB dynamic content per turn) benefit from avoiding repeated uncached sends
Maintained safety: The implementation ensures at most four cache breakpoints per request (Anthropic limit) and prevents marker accumulation
Backward compatible: Single-round requests and existing behavior remain unchanged

Technical Details

The implementation:

Stamps a cache_control marker on the trailing tool_result block in the message list when a request ends in tool results
Performs safety checks to ensure the marker only applies to valid tool-result messages
Leverages message re-serialization each round to prevent marker accumulation
Works with both the main LLM call path and the ContextLengthExceededError retry handler
Maintains the existing prior-history cache breakpoint placement for cross-turn consistency

Tests validate that the marker is correctly placed on tool-result blocks, not applied on round zero, and that when combined with the existing history breakpoint, the total marker count stays within limits.

The only message-side cache breakpoint sat before the current inbound user turn and never moved during the tool loop. Every round N > 0 re-sent the current turn (carrying the full dynamic context: memory up to 25KB, integrations, cross-session) plus all prior rounds' tool calls and results as uncached input. With max_tool_rounds=10 and large tool results the per-turn cost grows quadratically with round count. apply_in_turn_cache_breakpoint stamps a cache_control marker on the trailing tool_result block when the request ends in tool results, so round N reads the current turn plus rounds 0..N-1 from cache and pays cache-write only on the newest round. Message dicts are re-serialized from typed messages every round, so the marker advances with the loop instead of accumulating: at most four breakpoints per request (system, tools, prior-history tail, in-turn), which is Anthropic's limit. Round 0 ends in the current user turn (string content) and is a no-op, same as today. Fixes #1430 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-10T08:33:26Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 79bcdd28-fe4e-450a-95a1-252c81224f73

📥 Commits

Reviewing files that changed from the base of the PR and between b2f8eb4 and 5195c73.

📒 Files selected for processing (3)

backend/app/agent/core.py
backend/app/services/llm_service.py
tests/test_llm_service.py

Walkthrough

This PR adds an in-turn cache breakpoint helper that marks tool-result blocks for caching during multi-round tool-calling flows. The new apply_in_turn_cache_breakpoint function is imported, integrated into the main LLM request path and its error-recovery flow, and validated with comprehensive tests covering both the happy path and edge cases.

Changes

In-turn cache breakpoint for tool-result rounds

Layer / File(s)	Summary
Cache breakpoint helper `backend/app/services/llm_service.py`	`apply_in_turn_cache_breakpoint` stamps `cache_control` onto the final `tool_result` block when safe, with guards for empty messages, non-list content, and round-zero user turns; returns messages unchanged on edge cases.
LLM pipeline integration `backend/app/agent/core.py`	Import statement and two application points: before the main LLM request in `_call_llm_with_retry` (rounds N > 0), and on trimmed dicts in the `ContextLengthExceededError` recovery path to maintain cache behavior across retries.
Test coverage `tests/test_llm_service.py`	New `TestApplyInTurnCacheBreakpoint` class validates tool-result marking, edge cases (empty lists, round-zero strings, trailing assistants), and combined breakpoint-count limits against Anthropic's four-marker ceiling.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

mozilla-ai/clawbolt#1424: Implements a parallel apply_history_cache_breakpoint helper in the same LLM-service utilities module and applies it in the adjacent message-pipeline code path, sharing the same architecture and cache-control-marker placement pattern.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title follows the required Conventional Commit format with 'feat:' prefix, uses imperative mood, and clearly describes the main change at 58 characters.
Description check	✅ Passed	The PR description is comprehensive, includes all required template sections with checkmarks, clearly explains the change, its rationale, and safety properties.
Linked Issues check	✅ Passed	The changes fully implement the requirements from `#1430`: adds in-turn cache breakpoint rotation, maintains the four-marker limit, preserves round-0 behavior, and applies the fix on both main and retry paths.
Out of Scope Changes check	✅ Passed	All changes directly support the objective of advancing the cache breakpoint within the tool loop; no unrelated modifications are present.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/in-turn-cache-breakpoint

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch feat/in-turn-cache-breakpoint

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Merge branch 'main' into feat/in-turn-cache-breakpoint

aa53822

njbrake merged commit 59bfe1d into main Jun 10, 2026
10 checks passed

njbrake deleted the feat/in-turn-cache-breakpoint branch June 10, 2026 10:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: advance prompt-cache breakpoint within the tool loop#1437

feat: advance prompt-cache breakpoint within the tool loop#1437
njbrake merged 2 commits into
mainfrom
feat/in-turn-cache-breakpoint

njbrake commented Jun 10, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

njbrake commented Jun 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type

Checklist

AI Usage

Overview

What Changed

Benefits

Technical Details

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

njbrake commented Jun 10, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading