Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions e2e_test/mlx/test_mlx_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
import platform
import sys

import openai
import pytest

pytestmark = pytest.mark.skipif(
Expand All @@ -21,6 +22,36 @@
)


def collect_streamed_completion(stream):
"""Collect all text and the final choice from a streaming completion response."""
chunks = list(stream)
text = "".join(c.choices[0].text for c in chunks if c.choices and c.choices[0].text)
final_choice = next(
c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason
)
Comment on lines +29 to +31
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The next() function is called on a generator that may be empty if no chunk in the stream contains a finish_reason. This would raise a StopIteration exception, which is less informative than an assertion failure. It is better to provide a default value to next() and then assert that a valid choice was found to improve test failure diagnostics.

Suggested change
final_choice = next(
c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason
)
final_choice = next(
(c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason),
None,
)
assert final_choice is not None, "No chunk with finish_reason found in stream"

return text, final_choice
Comment on lines +25 to +32
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard missing-final-chunk case in streaming helper.

next(...) can raise StopIteration if no chunk carries finish_reason, producing a non-diagnostic failure. Add an explicit assertion for debuggability.

Suggested fix
 def collect_streamed_completion(stream):
     """Collect all text and the final choice from a streaming completion response."""
     chunks = list(stream)
     text = "".join(c.choices[0].text for c in chunks if c.choices and c.choices[0].text)
-    final_choice = next(
-        c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason
-    )
+    final_choice = next(
+        (c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason),
+        None,
+    )
+    assert final_choice is not None, "Expected a final streamed chunk with finish_reason"
     return text, final_choice
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@e2e_test/mlx/test_mlx_backend.py` around lines 25 - 32, The helper
collect_streamed_completion can raise StopIteration when no chunk has a
finish_reason; update it to explicitly check for a final chunk before using
next() — e.g., search for collect_streamed_completion, compute
final_choice_candidate by scanning reversed(chunks) for a chunk with c.choices
and c.choices[0].finish_reason, assert that such a chunk exists (or raise a
clear ValueError/AssertionError with a diagnostic message) and then set
final_choice from that candidate; this makes failures explicit and debuggable
instead of letting next() raise StopIteration.



def assert_stop_text_trimmed(text, stop_text):
assert stop_text not in text, f"Stop text {stop_text!r} should not appear in output: {text!r}"


def assert_matched_stop(choice, expected):
actual = getattr(choice, "matched_stop", None)
assert actual == expected, f"Expected matched_stop={expected!r}, got {actual!r}"

Comment on lines +39 to +42
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

matched_stop type assertion is inconsistent with MLX response mapping.

assert_matched_stop currently expects a stop string (e.g., "6"), but MLX is mapped via matched_stop_token_id to a JSON number in the router wrapper. These assertions are likely to fail even if stop handling works.

Suggested adjustment
-def assert_matched_stop(choice, expected):
+def assert_matched_stop(choice, expected):
     actual = getattr(choice, "matched_stop", None)
     assert actual == expected, f"Expected matched_stop={expected!r}, got {actual!r}"
-        assert_matched_stop(choice, self.SINGLE_STRING_STOP)
+        # MLX currently surfaces token-id matched_stop
+        assert isinstance(getattr(choice, "matched_stop", None), int)

Also applies to: 203-204, 216-217, 252-252

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@e2e_test/mlx/test_mlx_backend.py` around lines 39 - 42, The test helper
assert_matched_stop currently compares choice.matched_stop (a string) to an
expected string, but the MLX router wrapper sets matched_stop_token_id as a JSON
number; update assert_matched_stop to normalize types before comparing: fetch
getattr(choice, "matched_stop", None) and also check for getattr(choice,
"matched_stop_token_id", None), coerce numeric token ids to strings (or coerce
expected to int) and then assert equality; update all call-sites that pass
expected values (lines around 203-204, 216-217, 252) to use the same
normalization approach so the assertion treats "6" and 6 as equivalent.


def assert_api_error(err, expected_status: int, expected_code: str) -> None:
assert err.status_code == expected_status, (
f"Expected HTTP {expected_status}, got {err.status_code}"
)
body = getattr(err, "body", None) or {}
error_str = str(body) + str(getattr(err, "message", ""))
assert expected_code in error_str, (
f"Expected {expected_code!r} in error body, got: {error_str!r}"
)


WEATHER_TOOL = {
"type": "function",
"function": {
Expand Down Expand Up @@ -60,6 +91,9 @@ class TestMlxBackend:
# actual content within max_tokens.
NO_THINKING = {"chat_template_kwargs": {"enable_thinking": False}}

STOP_SEQUENCE_TEST_PROMPT = "Repeat: 1 2 3 hello world 4 5 6 7"
SINGLE_STRING_STOP = "6"

def test_basic_chat(self, model, api_client):
response = api_client.chat.completions.create(
model=model,
Expand Down Expand Up @@ -153,3 +187,66 @@ def test_max_tokens_finish_reason(self, model, api_client):
)
assert response.choices[0].finish_reason == "length"
assert response.usage.completion_tokens == 10

def test_chat_stop_string_non_streaming(self, model, api_client):
response = api_client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": self.STOP_SEQUENCE_TEST_PROMPT}],
max_tokens=160,
temperature=0,
stop=[self.SINGLE_STRING_STOP],
extra_body=self.NO_THINKING,
)
choice = response.choices[0]
assert choice.finish_reason == "stop"
assert_stop_text_trimmed(choice.message.content or "", self.SINGLE_STRING_STOP)
assert_matched_stop(choice, self.SINGLE_STRING_STOP)

def test_completion_stop_string_non_streaming(self, model, api_client):
response = api_client.completions.create(
model=model,
prompt=self.STOP_SEQUENCE_TEST_PROMPT,
max_tokens=160,
temperature=0,
stop=[self.SINGLE_STRING_STOP],
)
choice = response.choices[0]
assert choice.finish_reason == "stop"
assert_stop_text_trimmed(choice.text, self.SINGLE_STRING_STOP)
assert_matched_stop(choice, self.SINGLE_STRING_STOP)

Comment on lines +191 to +217
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Single-stop “success” expectations conflict with current MLX contract.

These tests assert HTTP 200 and finish_reason=="stop" for stop=["6"], but the current MLX gRPC request builder rejects any non-empty stop strings (reject_stop_strings(...)). This will make these cases fail until backend behavior is changed.

Suggested adjustment
-    def test_chat_stop_string_non_streaming(self, model, api_client):
-        response = api_client.chat.completions.create(
-            model=model,
-            messages=[{"role": "user", "content": self.STOP_SEQUENCE_TEST_PROMPT}],
-            max_tokens=160,
-            temperature=0,
-            stop=[self.SINGLE_STRING_STOP],
-            extra_body=self.NO_THINKING,
-        )
-        choice = response.choices[0]
-        assert choice.finish_reason == "stop"
-        assert_stop_text_trimmed(choice.message.content or "", self.SINGLE_STRING_STOP)
-        assert_matched_stop(choice, self.SINGLE_STRING_STOP)
+    # Keep/enable success-path tests only once MLX stop-string support is implemented.
+    # For current behavior, assert 400 unsupported_stop_string.

Also applies to: 239-252

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@e2e_test/mlx/test_mlx_backend.py` around lines 191 - 217, Tests assume string
stops are accepted but MLX currently rejects non-empty stop strings via
reject_stop_strings(...), so update the failing tests to match backend behavior:
in test_chat_stop_string_non_streaming and
test_completion_stop_string_non_streaming (and the similar cases at the later
block), remove or do not pass stop=[self.SINGLE_STRING_STOP] (use no stop
parameter or an accepted stop form), and update assertions accordingly (do not
assert finish_reason == "stop", remove
assert_stop_text_trimmed/assert_matched_stop) or mark the tests as
expected-to-fail/skip until MLX accepts string stops; target the two test
functions by name to locate and modify them.

def test_chat_rejects_multi_token_stop_string(self, model, api_client):
with pytest.raises((openai.BadRequestError, openai.APIStatusError)) as exc_info:
api_client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": self.STOP_SEQUENCE_TEST_PROMPT}],
max_tokens=10,
stop=["hello world"],
extra_body=self.NO_THINKING,
)
assert_api_error(exc_info.value, 400, "unsupported_stop_string")

def test_completion_rejects_multi_token_stop_string(self, model, api_client):
with pytest.raises((openai.BadRequestError, openai.APIStatusError)) as exc_info:
api_client.completions.create(
model=model,
prompt=self.STOP_SEQUENCE_TEST_PROMPT,
max_tokens=10,
stop=["hello world"],
)
assert_api_error(exc_info.value, 400, "unsupported_stop_string")

def test_completion_stop_string_streaming_final_chunk(self, model, api_client):
"""Streaming completion: final chunk has finish_reason='stop' and matched_stop set."""
stream = api_client.completions.create(
model=model,
prompt=self.STOP_SEQUENCE_TEST_PROMPT,
max_tokens=160,
temperature=0,
stop=[self.SINGLE_STRING_STOP],
stream=True,
)
text, final_choice = collect_streamed_completion(stream)
assert final_choice.finish_reason == "stop"
assert_stop_text_trimmed(text, self.SINGLE_STRING_STOP)
assert_matched_stop(final_choice, self.SINGLE_STRING_STOP)
Loading