Skip to content

test(e2e): add basic string stop and matched stop test coverage for MLX#1538

Open
zach-li-sudo wants to merge 2 commits into
lightseekorg:mainfrom
zach-li-sudo:zhuoli/mlx-e2e
Open

test(e2e): add basic string stop and matched stop test coverage for MLX#1538
zach-li-sudo wants to merge 2 commits into
lightseekorg:mainfrom
zach-li-sudo:zhuoli/mlx-e2e

Conversation

@zach-li-sudo
Copy link
Copy Markdown
Contributor

@zach-li-sudo zach-li-sudo commented May 25, 2026

Description

Problem

The MLX gRPC backend added support for string stop sequences and matched_stop reporting, but had no end-to-end test coverage validating this through the full router → gRPC → MLX worker path.

String stop support PR: #1524

Solution

Add 5 e2e tests to e2e_test/mlx/test_mlx_backend.py covering the stop sequence feature for the regular (non-Harmony) pipeline:

  • 2 tests for Chat/completion non-streaming, single-token string stop: verifies the response returns HTTP 200, finish_reason == "stop", stop text is excluded
    from the output, and matched_stop echoes back the stop string.
  • 2 tests for Chat/completion non-streaming, multi-token string stop: verifies the gateway returns HTTP 400 with unsupported_stop_string error code.
  • 1 test for Completion streaming, single-token string stop: verifies the final SSE chunk carries finish_reason == "stop" and the correct matched_stop value.

Helper functions assert_stop_text_trimmed, assert_matched_stop, assert_api_error, and collect_streamed_completion are extracted to keep the
test bodies concise.

Changes

  • e2e_test/mlx/test_mlx_backend.py: add 5 stop-sequence/matched-stop tests and supporting helper functions; add STOP_SEQUENCE_TEST_PROMPT and
    SINGLE_STRING_STOP class constants to TestMlxBackend

Test Plan

Run on Apple Silicon (macOS arm64) with mlx-community/Qwen3-0.6B-4bit:

E2E_RUNTIME=mlx pytest e2e_test/mlx/test_mlx_backend.py -v

All 10 tests pass (5 pre-existing + 5 new).
  • cargo +nightly fmt passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • (Optional) Documentation updated
  • (Optional) Please join us on Slack #sig-smg (https://slack.lightseek.org) to discuss, review, and merge PRs

also run python format checks:

# Lint (with auto-fix)
ruff check --fix e2e_test/ bindings/python/ scripts/

# Format
ruff format e2e_test/ bindings/python/ scripts/

# Type check
mypy e2e_test/ --config-file mypy.ini
mypy bindings/python/ --config-file mypy.ini

Summary by CodeRabbit

  • Tests
    • Expanded MLX backend test coverage for stop-sequence handling in both streaming and non-streaming modes
    • Added validation tests to ensure proper handling of unsupported multi-token stop sequences

Review Change Stack

Signed-off-by: Zhuo Li <zhuo.li.ca@outlook.com>
Signed-off-by: Zhuo Li <zhuo.li.ca@outlook.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

📝 Walkthrough

Walkthrough

The MLX backend E2E test suite is extended with stop-string validation tests and streaming helpers. A new openai import and reusable assertion functions support test cases that verify stop-text trimming, matched_stop correctness, and rejection of unsupported multi-token stop strings across both chat and completion endpoints in non-streaming and streaming modes.

Changes

MLX stop-sequence validation tests

Layer / File(s) Summary
Test helper utilities and imports
e2e_test/mlx/test_mlx_backend.py
openai import enables error type assertions. Helper functions collect_streamed_completion(), assert_stop_text_trimmed(), assert_matched_stop(), and assert_api_error() provide reusable test infrastructure for stop-string and streaming validation.
Stop-sequence validation test cases
e2e_test/mlx/test_mlx_backend.py
Class constants STOP_SEQUENCE_TEST_PROMPT and SINGLE_STRING_STOP define test data. Five new test methods validate chat and completion stop behavior in non-streaming mode (correct finish_reason, trimmed output, matched stop), reject multi-token stop strings with 400/unsupported_stop_string errors, and verify streaming completion final chunks report correct stop state.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • lightseekorg/smg#1398: Both PRs extend TestMlxBackend with new test logic and helpers for MLX gRPC E2E validation.

Suggested labels

grpc, tests

Suggested reviewers

  • key4ng
  • slin1237
  • XinyueZhang369
  • CatherineSue

Poem

🐰 A fluffy test suite hops into place,
Stop-strings trimmed with graceful pace,
Streaming chunks and error states align,
The MLX backend tests now shine! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main change: adding E2E tests for MLX backend's string stop and matched_stop functionality.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces end-to-end tests for stop sequence functionality in the MLX backend, covering both chat and completion endpoints in streaming and non-streaming modes. It also includes tests to verify that multi-token stop strings are correctly rejected with a 400 error. Feedback was provided to improve the robustness of the collect_streamed_completion helper by handling cases where no chunk contains a finish reason, preventing uninformative StopIteration errors and improving test failure diagnostics.

Comment on lines +29 to +31
final_choice = next(
c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The next() function is called on a generator that may be empty if no chunk in the stream contains a finish_reason. This would raise a StopIteration exception, which is less informative than an assertion failure. It is better to provide a default value to next() and then assert that a valid choice was found to improve test failure diagnostics.

Suggested change
final_choice = next(
c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason
)
final_choice = next(
(c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason),
None,
)
assert final_choice is not None, "No chunk with finish_reason found in stream"

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@e2e_test/mlx/test_mlx_backend.py`:
- Around line 39-42: The test helper assert_matched_stop currently compares
choice.matched_stop (a string) to an expected string, but the MLX router wrapper
sets matched_stop_token_id as a JSON number; update assert_matched_stop to
normalize types before comparing: fetch getattr(choice, "matched_stop", None)
and also check for getattr(choice, "matched_stop_token_id", None), coerce
numeric token ids to strings (or coerce expected to int) and then assert
equality; update all call-sites that pass expected values (lines around 203-204,
216-217, 252) to use the same normalization approach so the assertion treats "6"
and 6 as equivalent.
- Around line 191-217: Tests assume string stops are accepted but MLX currently
rejects non-empty stop strings via reject_stop_strings(...), so update the
failing tests to match backend behavior: in test_chat_stop_string_non_streaming
and test_completion_stop_string_non_streaming (and the similar cases at the
later block), remove or do not pass stop=[self.SINGLE_STRING_STOP] (use no stop
parameter or an accepted stop form), and update assertions accordingly (do not
assert finish_reason == "stop", remove
assert_stop_text_trimmed/assert_matched_stop) or mark the tests as
expected-to-fail/skip until MLX accepts string stops; target the two test
functions by name to locate and modify them.
- Around line 25-32: The helper collect_streamed_completion can raise
StopIteration when no chunk has a finish_reason; update it to explicitly check
for a final chunk before using next() — e.g., search for
collect_streamed_completion, compute final_choice_candidate by scanning
reversed(chunks) for a chunk with c.choices and c.choices[0].finish_reason,
assert that such a chunk exists (or raise a clear ValueError/AssertionError with
a diagnostic message) and then set final_choice from that candidate; this makes
failures explicit and debuggable instead of letting next() raise StopIteration.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 24ba503a-0b9a-4d8d-911a-ca0337f5fd4c

📥 Commits

Reviewing files that changed from the base of the PR and between d31f92a and 0077d46.

📒 Files selected for processing (1)
  • e2e_test/mlx/test_mlx_backend.py

Comment on lines +25 to +32
def collect_streamed_completion(stream):
"""Collect all text and the final choice from a streaming completion response."""
chunks = list(stream)
text = "".join(c.choices[0].text for c in chunks if c.choices and c.choices[0].text)
final_choice = next(
c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason
)
return text, final_choice
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard missing-final-chunk case in streaming helper.

next(...) can raise StopIteration if no chunk carries finish_reason, producing a non-diagnostic failure. Add an explicit assertion for debuggability.

Suggested fix
 def collect_streamed_completion(stream):
     """Collect all text and the final choice from a streaming completion response."""
     chunks = list(stream)
     text = "".join(c.choices[0].text for c in chunks if c.choices and c.choices[0].text)
-    final_choice = next(
-        c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason
-    )
+    final_choice = next(
+        (c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason),
+        None,
+    )
+    assert final_choice is not None, "Expected a final streamed chunk with finish_reason"
     return text, final_choice
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@e2e_test/mlx/test_mlx_backend.py` around lines 25 - 32, The helper
collect_streamed_completion can raise StopIteration when no chunk has a
finish_reason; update it to explicitly check for a final chunk before using
next() — e.g., search for collect_streamed_completion, compute
final_choice_candidate by scanning reversed(chunks) for a chunk with c.choices
and c.choices[0].finish_reason, assert that such a chunk exists (or raise a
clear ValueError/AssertionError with a diagnostic message) and then set
final_choice from that candidate; this makes failures explicit and debuggable
instead of letting next() raise StopIteration.

Comment on lines +39 to +42
def assert_matched_stop(choice, expected):
actual = getattr(choice, "matched_stop", None)
assert actual == expected, f"Expected matched_stop={expected!r}, got {actual!r}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

matched_stop type assertion is inconsistent with MLX response mapping.

assert_matched_stop currently expects a stop string (e.g., "6"), but MLX is mapped via matched_stop_token_id to a JSON number in the router wrapper. These assertions are likely to fail even if stop handling works.

Suggested adjustment
-def assert_matched_stop(choice, expected):
+def assert_matched_stop(choice, expected):
     actual = getattr(choice, "matched_stop", None)
     assert actual == expected, f"Expected matched_stop={expected!r}, got {actual!r}"
-        assert_matched_stop(choice, self.SINGLE_STRING_STOP)
+        # MLX currently surfaces token-id matched_stop
+        assert isinstance(getattr(choice, "matched_stop", None), int)

Also applies to: 203-204, 216-217, 252-252

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@e2e_test/mlx/test_mlx_backend.py` around lines 39 - 42, The test helper
assert_matched_stop currently compares choice.matched_stop (a string) to an
expected string, but the MLX router wrapper sets matched_stop_token_id as a JSON
number; update assert_matched_stop to normalize types before comparing: fetch
getattr(choice, "matched_stop", None) and also check for getattr(choice,
"matched_stop_token_id", None), coerce numeric token ids to strings (or coerce
expected to int) and then assert equality; update all call-sites that pass
expected values (lines around 203-204, 216-217, 252) to use the same
normalization approach so the assertion treats "6" and 6 as equivalent.

Comment on lines +191 to +217
def test_chat_stop_string_non_streaming(self, model, api_client):
response = api_client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": self.STOP_SEQUENCE_TEST_PROMPT}],
max_tokens=160,
temperature=0,
stop=[self.SINGLE_STRING_STOP],
extra_body=self.NO_THINKING,
)
choice = response.choices[0]
assert choice.finish_reason == "stop"
assert_stop_text_trimmed(choice.message.content or "", self.SINGLE_STRING_STOP)
assert_matched_stop(choice, self.SINGLE_STRING_STOP)

def test_completion_stop_string_non_streaming(self, model, api_client):
response = api_client.completions.create(
model=model,
prompt=self.STOP_SEQUENCE_TEST_PROMPT,
max_tokens=160,
temperature=0,
stop=[self.SINGLE_STRING_STOP],
)
choice = response.choices[0]
assert choice.finish_reason == "stop"
assert_stop_text_trimmed(choice.text, self.SINGLE_STRING_STOP)
assert_matched_stop(choice, self.SINGLE_STRING_STOP)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Single-stop “success” expectations conflict with current MLX contract.

These tests assert HTTP 200 and finish_reason=="stop" for stop=["6"], but the current MLX gRPC request builder rejects any non-empty stop strings (reject_stop_strings(...)). This will make these cases fail until backend behavior is changed.

Suggested adjustment
-    def test_chat_stop_string_non_streaming(self, model, api_client):
-        response = api_client.chat.completions.create(
-            model=model,
-            messages=[{"role": "user", "content": self.STOP_SEQUENCE_TEST_PROMPT}],
-            max_tokens=160,
-            temperature=0,
-            stop=[self.SINGLE_STRING_STOP],
-            extra_body=self.NO_THINKING,
-        )
-        choice = response.choices[0]
-        assert choice.finish_reason == "stop"
-        assert_stop_text_trimmed(choice.message.content or "", self.SINGLE_STRING_STOP)
-        assert_matched_stop(choice, self.SINGLE_STRING_STOP)
+    # Keep/enable success-path tests only once MLX stop-string support is implemented.
+    # For current behavior, assert 400 unsupported_stop_string.

Also applies to: 239-252

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@e2e_test/mlx/test_mlx_backend.py` around lines 191 - 217, Tests assume string
stops are accepted but MLX currently rejects non-empty stop strings via
reject_stop_strings(...), so update the failing tests to match backend behavior:
in test_chat_stop_string_non_streaming and
test_completion_stop_string_non_streaming (and the similar cases at the later
block), remove or do not pass stop=[self.SINGLE_STRING_STOP] (use no stop
parameter or an accepted stop form), and update assertions accordingly (do not
assert finish_reason == "stop", remove
assert_stop_text_trimmed/assert_matched_stop) or mark the tests as
expected-to-fail/skip until MLX accepts string stops; target the two test
functions by name to locate and modify them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant