fix(server): Anthropic stream emission corner cases around tool_use#845
Open
mrtkrcm wants to merge 2 commits intojundot:mainfrom
Open
fix(server): Anthropic stream emission corner cases around tool_use#845mrtkrcm wants to merge 2 commits intojundot:mainfrom
mrtkrcm wants to merge 2 commits intojundot:mainfrom
Conversation
41ac906 to
76cd257
Compare
Three related bugs in the Anthropic streaming code path surface when
a response contains tool_calls and little-to-no text:
1. Text-leak: cleaned_text.strip() was guarded by a falsy check, so a
non-empty string that stripped to empty produced regular_content
(the un-cleaned original). Replaced with explicit None-check so
cleaned text is preferred whenever present.
2. Empty-block emission: when the response contained only tool_calls,
the emitter produced an extra content_block_start/stop pair for a
zero-length text block. Move tool-call extraction before the block
close and gate the empty-text emission on `not tool_calls`; fall
back to tool_block_start = 0 when no preceding text block was opened.
3. Whitespace adjacency: a pure-whitespace content_delta preceding a
tool envelope would break strict Anthropic clients. Drop the delta
when kwargs.get('tools') is set and no text block has been started.
Together these make oMLX's /v1/messages output pass strict Anthropic
tool-use validators (Claude Code, kern) without proxy-side cleanup.
76cd257 to
986d5b2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix Anthropic
/v1/messagesstreaming when a response is mostly or onlytool_usecontent.Changes
0instead of emitting an empty text block first.Local validation
Built and installed from this branch into
/Applications/oMLX.app(0.3.8.dev2), with port8801owned by the visible app process.201 passed, 12 deselectedfortests/test_tool_calling.py,tests/integration/test_e2e_streaming.py, andtests/test_admin_profiles_api.py.claude-sonnet-4-6forcedget_weather; streaming emitted first content block astool_useat index0, no leading empty text block,stop_reason=tool_use.Ternary-Bonsai-8B-mlx-2bitreturnedready; benchmark toolOKin0.6s.Qwen3-Coder-30B-A3B-Instruct-4bitforcedread_file; structuredtool_calls, benchmark toolOKin2.3s.Note: benchmark host was not clean; preflight saw active desktop/client load, so throughput is smoke data only.
Test Plan
uv run pytest tests/integration/test_e2e_streaming.py -qpython3 -m py_compile omlx/server.py tests/integration/test_e2e_streaming.py