fix(llm): strip Vertex Gemini thought signatures from archival history by juanmichelini · Pull Request #3581 · OpenHands/software-agent-sdk

juanmichelini · 2026-06-09T03:22:20Z

Why

While investigating the high cost of running swebench on litellm_proxy/gemini-3.5-flash (mean $2.81/instance on a slice where all 10 instances resolved; one instance at $11.11), I pulled the conversation event logs and found the dominant cost driver is not iteration count, not the condenser (which never fired on this run), and not the agent flailing. It is Vertex Gemini's thoughtSignature blob being re-shipped in every subsequent prompt.

Mechanism

When Vertex Gemini is used with reasoning_effort, the provider returns a thoughtSignature field on each function-calling turn that encodes the model's internal reasoning state. The signature must be passed back on the immediately following tool-result turn so the model can resume. LiteLLM smuggles it through the OpenAI-shaped tool_call.id as:

call_f0be918123f4462bb482dd9df123__thought__AY89a18oWjPi7IVOiw5FIMB22r9...

The SDK currently stores those ids verbatim on ActionEvent and ObservationEvent, so when events_to_messages builds the history for the next LLM call, every prior signature is re-serialised into every prompt — once in the assistant message's tool_calls and once in the matching tool result's tool_call_id.

Empirical impact

Decompressed results.tar.gz from a real eval run and replayed the events from django__django-11999:

actions: 47, observations: 47
raw tool_call_id bytes:      1,210,168
stripped tool_call_id bytes:     3,492
saved:                       1,206,676 (99.7%)

signatures larger than 1 KB: 14 actions / 14 observations
biggest tcid: 278,100 bytes

The accumulated Metrics.usage_to_metrics["default"] for that instance reports 5,063,835 prompt tokens with only 26 % cache hit rate and 0 cache writes — i.e. 74 % of those tokens are billed at $1.50/M uncached. Re-shipping 1.2 MB of dead signatures across 47 turns is roughly half of that prompt bill.

The same pattern (smaller magnitudes) holds across the other 9 instances on that run — every Gemini turn that uses reasoning is affected.

What this PR does

Adds a post-processing pass at the bottom of LLMConvertibleEvent.events_to_messages:

Walks the produced messages from the end, finds the most recent assistant message that has tool_calls, and records those ids as "kept".
For every other assistant tool_call.id and every tool message tool_call_id not in the kept set, strips everything from the literal marker __thought__ onwards.
The pair stays consistent: assistant and matching tool result are both stripped, or both kept.
Stripping creates a new MessageToolCall via model_copy(update={"id": ...}) so the underlying ActionEvent.tool_call is untouched — the on-disk event log still has the full signature for forensic / replay use.

The marker check (__thought__ substring) is a no-op for Anthropic toolu_*, OpenAI call_* without signatures, ACP ids, and anything else that doesn't carry the marker.

Files

New: openhands-sdk/openhands/sdk/llm/utils/thought_signature.py — THOUGHT_SIGNATURE_MARKER, has_thought_signature(id), strip_thought_signature(id).
Modified: openhands-sdk/openhands/sdk/event/base.py — adds _strip_archival_thought_signatures(messages) and calls it at the end of events_to_messages.
New: tests/sdk/llm/test_thought_signature.py — 13 unit tests for the classifier and stripper (Gemini ids, OpenAI ids, Anthropic ids, empty/None, the 278 KB pathological case, idempotence, multiple markers).
Modified: tests/sdk/event/test_events_to_messages.py — 5 new integration tests in TestThoughtSignatureStripping:
- Older turns get stripped, latest turn keeps its signature.
- Stripped pairs stay consistent (assistant tool_call.id == tool tool_call_id).
- Source ActionEvent.tool_call.id is unchanged after conversion.
- No-op for ids without the marker (Anthropic / OpenAI shape).
- Parallel tool calls within the most-recent assistant turn all keep their signatures.

Test plan

uv run pytest tests/sdk/llm/test_thought_signature.py -v        # 13 passed
uv run pytest tests/sdk/event/test_events_to_messages.py -v     # 20 passed (15 existing + 5 new)
uv run pytest tests/sdk/event/ tests/sdk/llm/ -q                # 942 passed
uv run ruff format <files> && uv run ruff check <files>         # clean
uv run pyright <files>                                          # 0 errors

Also did the real-world replay above as a sanity check.

Scope / what this does not fix

It only strips __thought__<blob> suffixes; it does not change how Vertex prompt caching works. cache_write_tokens=0 and the 26 % implicit-cache hit rate are a separate problem and need a follow-up to wire actual CachedContent.create explicit caching for Vertex.
It does not change reasoning_effort defaults. Lowering it for gemini-3.5-flash is a separate model-config change.
For non-Gemini models the behaviour is byte-for-byte identical to today (the marker is never present).

This PR was created by an AI agent (OpenHands) on behalf of @juanmichelini, following an investigation triggered by the cost analysis in OpenHands/benchmarks#741.

@juanmichelini can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:f5efa63-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-f5efa63-python \
  ghcr.io/openhands/agent-server:f5efa63-python

All tags pushed for this build

ghcr.io/openhands/agent-server:f5efa63-golang-amd64
ghcr.io/openhands/agent-server:f5efa635ab520c11155a3ee82629330be0f60452-golang-amd64
ghcr.io/openhands/agent-server:fix-strip-gemini-thought-signatures-from-history-golang-amd64
ghcr.io/openhands/agent-server:f5efa63-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:f5efa63-golang-arm64
ghcr.io/openhands/agent-server:f5efa635ab520c11155a3ee82629330be0f60452-golang-arm64
ghcr.io/openhands/agent-server:fix-strip-gemini-thought-signatures-from-history-golang-arm64
ghcr.io/openhands/agent-server:f5efa63-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:f5efa63-java-amd64
ghcr.io/openhands/agent-server:f5efa635ab520c11155a3ee82629330be0f60452-java-amd64
ghcr.io/openhands/agent-server:fix-strip-gemini-thought-signatures-from-history-java-amd64
ghcr.io/openhands/agent-server:f5efa63-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:f5efa63-java-arm64
ghcr.io/openhands/agent-server:f5efa635ab520c11155a3ee82629330be0f60452-java-arm64
ghcr.io/openhands/agent-server:fix-strip-gemini-thought-signatures-from-history-java-arm64
ghcr.io/openhands/agent-server:f5efa63-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:f5efa63-python-amd64
ghcr.io/openhands/agent-server:f5efa635ab520c11155a3ee82629330be0f60452-python-amd64
ghcr.io/openhands/agent-server:fix-strip-gemini-thought-signatures-from-history-python-amd64
ghcr.io/openhands/agent-server:f5efa63-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:f5efa63-python-arm64
ghcr.io/openhands/agent-server:f5efa635ab520c11155a3ee82629330be0f60452-python-arm64
ghcr.io/openhands/agent-server:fix-strip-gemini-thought-signatures-from-history-python-arm64
ghcr.io/openhands/agent-server:f5efa63-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:f5efa63-golang
ghcr.io/openhands/agent-server:f5efa635ab520c11155a3ee82629330be0f60452-golang
ghcr.io/openhands/agent-server:fix-strip-gemini-thought-signatures-from-history-golang
ghcr.io/openhands/agent-server:f5efa63-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:f5efa63-java
ghcr.io/openhands/agent-server:f5efa635ab520c11155a3ee82629330be0f60452-java
ghcr.io/openhands/agent-server:fix-strip-gemini-thought-signatures-from-history-java
ghcr.io/openhands/agent-server:f5efa63-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:f5efa63-python
ghcr.io/openhands/agent-server:f5efa635ab520c11155a3ee82629330be0f60452-python
ghcr.io/openhands/agent-server:fix-strip-gemini-thought-signatures-from-history-python
ghcr.io/openhands/agent-server:f5efa63-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

Each variant tag (e.g., f5efa63-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., f5efa63-python-amd64) are also available if needed

LiteLLM smuggles Vertex Gemini's `thoughtSignature` blob through the OpenAI-shaped `tool_call.id` field as `call_<hex>__thought__<base64>`. The signature is only required on the immediately-following tool-result turn so the model can resume reasoning; on every later turn it is dead weight that gets re-shipped in every prompt. On a real swe-bench-verified instance (`django__django-11999`, $11.11 with gemini-3.5-flash + reasoning_effort=high) the cumulative tool_call_id payload was 1.21 MB; only 3.5 KB of that is the actual canonical id. The remaining ~1.2 MB is the same signatures replayed across 47 turns. This commit adds a post-processing pass at the bottom of `events_to_messages` that: * Identifies the tool_call ids on the most recent assistant turn that has tool calls. * Strips the `__thought__<blob>` suffix from every other assistant `tool_call.id` and every matching `tool` message `tool_call_id`, so the paired ids stay consistent. * Is a no-op for Anthropic, OpenAI, and ACP ids that do not contain the `__thought__` marker. The pass mutates only the produced `Message` objects (via `MessageToolCall.model_copy(update=...)` and a plain string reassignment on `tool_call_id`); the underlying `ActionEvent` / `ObservationEvent` data is untouched, so on-disk event logs preserve the signatures. Tests added: * `tests/sdk/llm/test_thought_signature.py` — unit tests for the classifier and stripping helpers. * `tests/sdk/event/test_events_to_messages.py::TestThoughtSignatureStripping` — five integration tests covering older-turn stripping, paired consistency, source-event immutability, the non-Gemini no-op case, and parallel tool calls within the most-recent assistant turn. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-06-09T03:22:48Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-06-09T03:22:59Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-06-09T03:25:18Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/event
base.py	108	9	91%	52, 63, 75–76, 82, 85–86, 88, 121
TOTAL	29671	8346	71%

enyst · 2026-06-09T20:53:35Z

LiteLLM smuggles it through the OpenAI-shaped tool_call.id

Ooh wow…! Sometimes I think we might be happier to implement Claude and Gemini native APIs, and maybe use liteLLM only for openai-compatible providers…
/me ducks

On a different note, I think they should have been cached anyway, and maybe @VascoSch92 ’s fix on that addresses a lot of the problem.

…tory

juanmichelini · 2026-06-09T21:36:44Z

@enyst interesting take!
@VascoSch92 #3586 fix reduces costs a lot, I'm testing this other fixes on top of that.

all-hands-bot

⚠️ QA Report: PASS WITH ISSUES

The SDK behavior change works as intended: archival Gemini __thought__ signatures are stripped from older history while the latest tool-call turn, matching tool results, source events, parallel calls, and non-Gemini IDs remain correct.

Does this PR achieve its stated goal?

Yes. I exercised the SDK as a library user by constructing real ActionEvent/ObservationEvent histories and calling LLMConvertibleEvent.events_to_messages(). On main, all 6 synthetic Gemini tool-call IDs were re-emitted with __thought__ blobs (message_tool_id_bytes=120288); on this PR, only the latest assistant/tool pair kept signatures (message_marker_count=2, message_tool_id_bytes=40244), earlier assistant/tool pairs stayed consistent after stripping, and the original event log still retained the full first ID.

Phase	Result
Environment Setup	✅ `make build` completed successfully and installed editable SDK packages via `uv sync --dev`.
CI Status	⚠️ Most checks are green, but `Validate PR description` is failing and `qa-changes` is still in progress at review time.
Functional Verification	✅ Before/after SDK execution confirms the claimed stripping behavior and no-op behavior for plain IDs.

Functional Verification

Test 1: Archival Gemini signatures are stripped only after the immediate tool-result turn

Step 1 — Reproduce baseline without the fix:
Checked out origin/main and ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_thought_signature.py, which constructs three real SDK action/observation pairs with large call_*__thought__... IDs and converts them through LLMConvertibleEvent.events_to_messages():

{
  "archival_stripping": {
    "assistant_markers_by_turn": [true, true, true],
    "message_marker_count": 6,
    "message_tool_id_bytes": 120288,
    "pairs_consistent": [true, true, true],
    "raw_tool_id_bytes": 120288,
    "source_event_first_kept_full": true,
    "tool_markers_by_turn": [true, true, true]
  },
  "parallel_latest_turn": {
    "latest_parallel_markers": [true, true],
    "latest_parallel_pairs_consistent": [true, true],
    "old_turn_stripped": false
  },
  "non_gemini_ids_unchanged": {"unchanged": true, "marker_count": 0}
}

This confirms the pre-fix problem: every previous assistant/tool message re-ships the thought signature, and message ID bytes equal the raw event-log ID bytes.

Step 2 — Apply the PR's changes:
Checked out f5efa635ab520c11155a3ee82629330be0f60452.

Step 3 — Re-run with the fix in place:
Ran the same command on the PR commit:

{
  "archival_stripping": {
    "assistant_markers_by_turn": [false, false, true],
    "message_marker_count": 2,
    "message_tool_id_bytes": 40244,
    "pairs_consistent": [true, true, true],
    "raw_tool_id_bytes": 120288,
    "source_event_first_kept_full": true,
    "tool_markers_by_turn": [false, false, true]
  },
  "parallel_latest_turn": {
    "latest_parallel_markers": [true, true],
    "latest_parallel_pairs_consistent": [true, true],
    "old_turn_stripped": true
  },
  "non_gemini_ids_unchanged": {"unchanged": true, "marker_count": 0}
}

This shows the fix works: older assistant and tool-result IDs are stripped together, the latest turn keeps signatures, source events are not mutated, and prompt-history ID bytes dropped from 120,288 to 40,244 in this reproduction.

Test 2: Related behavior remains intact

The same script also verified two side paths on the PR commit: a latest assistant message with two parallel tool calls kept both signatures and both tool results matched their assistant IDs, while plain call_plain_* IDs without __thought__ were byte-for-byte unchanged.

Issues Found

🟡 Minor: CI is not fully green at review time because PR Description Check / Validate PR description is failing and qa-changes is still in progress. I did not inspect or edit the human-only PR description fields.

This review was created by an AI agent (OpenHands) on behalf of the user.

Final verdict: PASS WITH ISSUES

juanmichelini mentioned this pull request Jun 9, 2026

feat(llm): add vertex_cached_content config for explicit Vertex AI caching #3583

Draft

Merge branch 'main' into fix/strip-gemini-thought-signatures-from-his…

f5efa63

…tory

juanmichelini marked this pull request as ready for review June 9, 2026 21:36

all-hands-bot reviewed Jun 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llm): strip Vertex Gemini thought signatures from archival history#3581

fix(llm): strip Vertex Gemini thought signatures from archival history#3581
juanmichelini wants to merge 2 commits into
mainfrom
fix/strip-gemini-thought-signatures-from-history

juanmichelini commented Jun 9, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

enyst commented Jun 9, 2026

Uh oh!

juanmichelini commented Jun 9, 2026

Uh oh!

all-hands-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

juanmichelini commented Jun 9, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Mechanism

Empirical impact

What this PR does

Files

Test plan

Scope / what this does not fix

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

enyst commented Jun 9, 2026

Uh oh!

juanmichelini commented Jun 9, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

⚠️ QA Report: PASS WITH ISSUES

Does this PR achieve its stated goal?

Test 1: Archival Gemini signatures are stripped only after the immediate tool-result turn

Test 2: Related behavior remains intact

Issues Found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

juanmichelini commented Jun 9, 2026 •

edited by github-actions Bot

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading