Skip to content

fix(server): Anthropic stream emission corner cases around tool_use#845

Open
mrtkrcm wants to merge 2 commits intojundot:mainfrom
mrtkrcm:fix/server-anthropic-stream-tool
Open

fix(server): Anthropic stream emission corner cases around tool_use#845
mrtkrcm wants to merge 2 commits intojundot:mainfrom
mrtkrcm:fix/server-anthropic-stream-tool

Conversation

@mrtkrcm
Copy link
Copy Markdown
Contributor

@mrtkrcm mrtkrcm commented Apr 18, 2026

Summary

Fix Anthropic /v1/messages streaming when a response is mostly or only tool_use content.

Changes

  • Drop leading whitespace deltas around tool envelopes before any text block starts.
  • Parse tool calls before deciding whether to emit an empty text block.
  • Start tool-only streams at content block index 0 instead of emitting an empty text block first.
  • Preserve cleaned empty text in non-streaming Anthropic responses.
  • Add regression coverage for tool-only Anthropic streams.

Local validation

Built and installed from this branch into /Applications/oMLX.app (0.3.8.dev2), with port 8801 owned by the visible app process.

  • Focused tests: 201 passed, 12 deselected for tests/test_tool_calling.py, tests/integration/test_e2e_streaming.py, and tests/test_admin_profiles_api.py.
  • Live Anthropic proxy path: claude-sonnet-4-6 forced get_weather; streaming emitted first content block as tool_use at index 0, no leading empty text block, stop_reason=tool_use.
  • Direct OpenAI control: Ternary-Bonsai-8B-mlx-2bit returned ready; benchmark tool OK in 0.6s.
  • Related Qwen tool path: Qwen3-Coder-30B-A3B-Instruct-4bit forced read_file; structured tool_calls, benchmark tool OK in 2.3s.

Note: benchmark host was not clean; preflight saw active desktop/client load, so throughput is smoke data only.

Test Plan

  • uv run pytest tests/integration/test_e2e_streaming.py -q
  • python3 -m py_compile omlx/server.py tests/integration/test_e2e_streaming.py

Murat Karacam and others added 2 commits April 28, 2026 19:01
Three related bugs in the Anthropic streaming code path surface when
a response contains tool_calls and little-to-no text:

1. Text-leak: cleaned_text.strip() was guarded by a falsy check, so a
   non-empty string that stripped to empty produced regular_content
   (the un-cleaned original). Replaced with explicit None-check so
   cleaned text is preferred whenever present.

2. Empty-block emission: when the response contained only tool_calls,
   the emitter produced an extra content_block_start/stop pair for a
   zero-length text block. Move tool-call extraction before the block
   close and gate the empty-text emission on `not tool_calls`; fall
   back to tool_block_start = 0 when no preceding text block was opened.

3. Whitespace adjacency: a pure-whitespace content_delta preceding a
   tool envelope would break strict Anthropic clients. Drop the delta
   when kwargs.get('tools') is set and no text block has been started.

Together these make oMLX's /v1/messages output pass strict Anthropic
tool-use validators (Claude Code, kern) without proxy-side cleanup.
@mrtkrcm mrtkrcm force-pushed the fix/server-anthropic-stream-tool branch from 76cd257 to 986d5b2 Compare April 28, 2026 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant