test(e2e): add basic string stop and matched stop test coverage for MLX#1538
test(e2e): add basic string stop and matched stop test coverage for MLX#1538zach-li-sudo wants to merge 2 commits into
Conversation
Signed-off-by: Zhuo Li <zhuo.li.ca@outlook.com>
Signed-off-by: Zhuo Li <zhuo.li.ca@outlook.com>
📝 WalkthroughWalkthroughThe MLX backend E2E test suite is extended with stop-string validation tests and streaming helpers. A new ChangesMLX stop-sequence validation tests
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces end-to-end tests for stop sequence functionality in the MLX backend, covering both chat and completion endpoints in streaming and non-streaming modes. It also includes tests to verify that multi-token stop strings are correctly rejected with a 400 error. Feedback was provided to improve the robustness of the collect_streamed_completion helper by handling cases where no chunk contains a finish reason, preventing uninformative StopIteration errors and improving test failure diagnostics.
| final_choice = next( | ||
| c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason | ||
| ) |
There was a problem hiding this comment.
The next() function is called on a generator that may be empty if no chunk in the stream contains a finish_reason. This would raise a StopIteration exception, which is less informative than an assertion failure. It is better to provide a default value to next() and then assert that a valid choice was found to improve test failure diagnostics.
| final_choice = next( | |
| c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason | |
| ) | |
| final_choice = next( | |
| (c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason), | |
| None, | |
| ) | |
| assert final_choice is not None, "No chunk with finish_reason found in stream" |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@e2e_test/mlx/test_mlx_backend.py`:
- Around line 39-42: The test helper assert_matched_stop currently compares
choice.matched_stop (a string) to an expected string, but the MLX router wrapper
sets matched_stop_token_id as a JSON number; update assert_matched_stop to
normalize types before comparing: fetch getattr(choice, "matched_stop", None)
and also check for getattr(choice, "matched_stop_token_id", None), coerce
numeric token ids to strings (or coerce expected to int) and then assert
equality; update all call-sites that pass expected values (lines around 203-204,
216-217, 252) to use the same normalization approach so the assertion treats "6"
and 6 as equivalent.
- Around line 191-217: Tests assume string stops are accepted but MLX currently
rejects non-empty stop strings via reject_stop_strings(...), so update the
failing tests to match backend behavior: in test_chat_stop_string_non_streaming
and test_completion_stop_string_non_streaming (and the similar cases at the
later block), remove or do not pass stop=[self.SINGLE_STRING_STOP] (use no stop
parameter or an accepted stop form), and update assertions accordingly (do not
assert finish_reason == "stop", remove
assert_stop_text_trimmed/assert_matched_stop) or mark the tests as
expected-to-fail/skip until MLX accepts string stops; target the two test
functions by name to locate and modify them.
- Around line 25-32: The helper collect_streamed_completion can raise
StopIteration when no chunk has a finish_reason; update it to explicitly check
for a final chunk before using next() — e.g., search for
collect_streamed_completion, compute final_choice_candidate by scanning
reversed(chunks) for a chunk with c.choices and c.choices[0].finish_reason,
assert that such a chunk exists (or raise a clear ValueError/AssertionError with
a diagnostic message) and then set final_choice from that candidate; this makes
failures explicit and debuggable instead of letting next() raise StopIteration.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 24ba503a-0b9a-4d8d-911a-ca0337f5fd4c
📒 Files selected for processing (1)
e2e_test/mlx/test_mlx_backend.py
| def collect_streamed_completion(stream): | ||
| """Collect all text and the final choice from a streaming completion response.""" | ||
| chunks = list(stream) | ||
| text = "".join(c.choices[0].text for c in chunks if c.choices and c.choices[0].text) | ||
| final_choice = next( | ||
| c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason | ||
| ) | ||
| return text, final_choice |
There was a problem hiding this comment.
Guard missing-final-chunk case in streaming helper.
next(...) can raise StopIteration if no chunk carries finish_reason, producing a non-diagnostic failure. Add an explicit assertion for debuggability.
Suggested fix
def collect_streamed_completion(stream):
"""Collect all text and the final choice from a streaming completion response."""
chunks = list(stream)
text = "".join(c.choices[0].text for c in chunks if c.choices and c.choices[0].text)
- final_choice = next(
- c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason
- )
+ final_choice = next(
+ (c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason),
+ None,
+ )
+ assert final_choice is not None, "Expected a final streamed chunk with finish_reason"
return text, final_choice🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@e2e_test/mlx/test_mlx_backend.py` around lines 25 - 32, The helper
collect_streamed_completion can raise StopIteration when no chunk has a
finish_reason; update it to explicitly check for a final chunk before using
next() — e.g., search for collect_streamed_completion, compute
final_choice_candidate by scanning reversed(chunks) for a chunk with c.choices
and c.choices[0].finish_reason, assert that such a chunk exists (or raise a
clear ValueError/AssertionError with a diagnostic message) and then set
final_choice from that candidate; this makes failures explicit and debuggable
instead of letting next() raise StopIteration.
| def assert_matched_stop(choice, expected): | ||
| actual = getattr(choice, "matched_stop", None) | ||
| assert actual == expected, f"Expected matched_stop={expected!r}, got {actual!r}" | ||
|
|
There was a problem hiding this comment.
matched_stop type assertion is inconsistent with MLX response mapping.
assert_matched_stop currently expects a stop string (e.g., "6"), but MLX is mapped via matched_stop_token_id to a JSON number in the router wrapper. These assertions are likely to fail even if stop handling works.
Suggested adjustment
-def assert_matched_stop(choice, expected):
+def assert_matched_stop(choice, expected):
actual = getattr(choice, "matched_stop", None)
assert actual == expected, f"Expected matched_stop={expected!r}, got {actual!r}"- assert_matched_stop(choice, self.SINGLE_STRING_STOP)
+ # MLX currently surfaces token-id matched_stop
+ assert isinstance(getattr(choice, "matched_stop", None), int)Also applies to: 203-204, 216-217, 252-252
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@e2e_test/mlx/test_mlx_backend.py` around lines 39 - 42, The test helper
assert_matched_stop currently compares choice.matched_stop (a string) to an
expected string, but the MLX router wrapper sets matched_stop_token_id as a JSON
number; update assert_matched_stop to normalize types before comparing: fetch
getattr(choice, "matched_stop", None) and also check for getattr(choice,
"matched_stop_token_id", None), coerce numeric token ids to strings (or coerce
expected to int) and then assert equality; update all call-sites that pass
expected values (lines around 203-204, 216-217, 252) to use the same
normalization approach so the assertion treats "6" and 6 as equivalent.
| def test_chat_stop_string_non_streaming(self, model, api_client): | ||
| response = api_client.chat.completions.create( | ||
| model=model, | ||
| messages=[{"role": "user", "content": self.STOP_SEQUENCE_TEST_PROMPT}], | ||
| max_tokens=160, | ||
| temperature=0, | ||
| stop=[self.SINGLE_STRING_STOP], | ||
| extra_body=self.NO_THINKING, | ||
| ) | ||
| choice = response.choices[0] | ||
| assert choice.finish_reason == "stop" | ||
| assert_stop_text_trimmed(choice.message.content or "", self.SINGLE_STRING_STOP) | ||
| assert_matched_stop(choice, self.SINGLE_STRING_STOP) | ||
|
|
||
| def test_completion_stop_string_non_streaming(self, model, api_client): | ||
| response = api_client.completions.create( | ||
| model=model, | ||
| prompt=self.STOP_SEQUENCE_TEST_PROMPT, | ||
| max_tokens=160, | ||
| temperature=0, | ||
| stop=[self.SINGLE_STRING_STOP], | ||
| ) | ||
| choice = response.choices[0] | ||
| assert choice.finish_reason == "stop" | ||
| assert_stop_text_trimmed(choice.text, self.SINGLE_STRING_STOP) | ||
| assert_matched_stop(choice, self.SINGLE_STRING_STOP) | ||
|
|
There was a problem hiding this comment.
Single-stop “success” expectations conflict with current MLX contract.
These tests assert HTTP 200 and finish_reason=="stop" for stop=["6"], but the current MLX gRPC request builder rejects any non-empty stop strings (reject_stop_strings(...)). This will make these cases fail until backend behavior is changed.
Suggested adjustment
- def test_chat_stop_string_non_streaming(self, model, api_client):
- response = api_client.chat.completions.create(
- model=model,
- messages=[{"role": "user", "content": self.STOP_SEQUENCE_TEST_PROMPT}],
- max_tokens=160,
- temperature=0,
- stop=[self.SINGLE_STRING_STOP],
- extra_body=self.NO_THINKING,
- )
- choice = response.choices[0]
- assert choice.finish_reason == "stop"
- assert_stop_text_trimmed(choice.message.content or "", self.SINGLE_STRING_STOP)
- assert_matched_stop(choice, self.SINGLE_STRING_STOP)
+ # Keep/enable success-path tests only once MLX stop-string support is implemented.
+ # For current behavior, assert 400 unsupported_stop_string.Also applies to: 239-252
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@e2e_test/mlx/test_mlx_backend.py` around lines 191 - 217, Tests assume string
stops are accepted but MLX currently rejects non-empty stop strings via
reject_stop_strings(...), so update the failing tests to match backend behavior:
in test_chat_stop_string_non_streaming and
test_completion_stop_string_non_streaming (and the similar cases at the later
block), remove or do not pass stop=[self.SINGLE_STRING_STOP] (use no stop
parameter or an accepted stop form), and update assertions accordingly (do not
assert finish_reason == "stop", remove
assert_stop_text_trimmed/assert_matched_stop) or mark the tests as
expected-to-fail/skip until MLX accepts string stops; target the two test
functions by name to locate and modify them.
Description
Problem
The MLX gRPC backend added support for string stop sequences and
matched_stopreporting, but had no end-to-end test coverage validating this through the full router → gRPC → MLX worker path.String stop support PR: #1524
Solution
Add 5 e2e tests to
e2e_test/mlx/test_mlx_backend.pycovering the stop sequence feature for the regular (non-Harmony) pipeline:finish_reason == "stop", stop text is excludedfrom the output, and
matched_stopechoes back the stop string.unsupported_stop_stringerror code.finish_reason == "stop"and the correctmatched_stopvalue.Helper functions
assert_stop_text_trimmed,assert_matched_stop,assert_api_error, andcollect_streamed_completionare extracted to keep thetest bodies concise.
Changes
e2e_test/mlx/test_mlx_backend.py: add 5 stop-sequence/matched-stop tests and supporting helper functions; addSTOP_SEQUENCE_TEST_PROMPTandSINGLE_STRING_STOPclass constants toTestMlxBackendTest Plan
Run on Apple Silicon (macOS arm64) with
mlx-community/Qwen3-0.6B-4bit:also run python format checks:
Summary by CodeRabbit