test(e2e): add basic string stop and matched stop test coverage for MLX by zach-li-sudo · Pull Request #1538 · lightseekorg/smg

zach-li-sudo · 2026-05-25T01:25:32Z

Description

Problem

The MLX gRPC backend added support for string stop sequences and matched_stop reporting, but had no end-to-end test coverage validating this through the full router → gRPC → MLX worker path.

String stop support PR: #1524

Solution

Add 5 e2e tests to e2e_test/mlx/test_mlx_backend.py covering the stop sequence feature for the regular (non-Harmony) pipeline:

2 tests for Chat/completion non-streaming, single-token string stop: verifies the response returns HTTP 200, finish_reason == "stop", stop text is excluded
from the output, and matched_stop echoes back the stop string.
2 tests for Chat/completion non-streaming, multi-token string stop: verifies the gateway returns HTTP 400 with unsupported_stop_string error code.
1 test for Completion streaming, single-token string stop: verifies the final SSE chunk carries finish_reason == "stop" and the correct matched_stop value.

Helper functions assert_stop_text_trimmed, assert_matched_stop, assert_api_error, and collect_streamed_completion are extracted to keep the
test bodies concise.

Changes

e2e_test/mlx/test_mlx_backend.py: add 5 stop-sequence/matched-stop tests and supporting helper functions; add STOP_SEQUENCE_TEST_PROMPT and
SINGLE_STRING_STOP class constants to TestMlxBackend

Test Plan

Run on Apple Silicon (macOS arm64) with mlx-community/Qwen3-0.6B-4bit:

E2E_RUNTIME=mlx pytest e2e_test/mlx/test_mlx_backend.py -v

All 10 tests pass (5 pre-existing + 5 new).

cargo +nightly fmt passes
cargo clippy --all-targets --all-features -- -D warnings passes
(Optional) Documentation updated
(Optional) Please join us on Slack #sig-smg (https://slack.lightseek.org) to discuss, review, and merge PRs

also run python format checks:

# Lint (with auto-fix)
ruff check --fix e2e_test/ bindings/python/ scripts/

# Format
ruff format e2e_test/ bindings/python/ scripts/

# Type check
mypy e2e_test/ --config-file mypy.ini
mypy bindings/python/ --config-file mypy.ini

Summary by CodeRabbit

Tests
- Expanded MLX backend test coverage for stop-sequence handling in both streaming and non-streaming modes
- Added validation tests to ensure proper handling of unsupported multi-token stop sequences

Signed-off-by: Zhuo Li <zhuo.li.ca@outlook.com>

coderabbitai · 2026-05-25T01:25:45Z

📝 Walkthrough

Walkthrough

The MLX backend E2E test suite is extended with stop-string validation tests and streaming helpers. A new openai import and reusable assertion functions support test cases that verify stop-text trimming, matched_stop correctness, and rejection of unsupported multi-token stop strings across both chat and completion endpoints in non-streaming and streaming modes.

Changes

MLX stop-sequence validation tests

Layer / File(s)	Summary
Test helper utilities and imports `e2e_test/mlx/test_mlx_backend.py`	`openai` import enables error type assertions. Helper functions `collect_streamed_completion()`, `assert_stop_text_trimmed()`, `assert_matched_stop()`, and `assert_api_error()` provide reusable test infrastructure for stop-string and streaming validation.
Stop-sequence validation test cases `e2e_test/mlx/test_mlx_backend.py`	Class constants `STOP_SEQUENCE_TEST_PROMPT` and `SINGLE_STRING_STOP` define test data. Five new test methods validate chat and completion stop behavior in non-streaming mode (correct `finish_reason`, trimmed output, matched stop), reject multi-token stop strings with 400/`unsupported_stop_string` errors, and verify streaming completion final chunks report correct stop state.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

lightseekorg/smg#1398: Both PRs extend TestMlxBackend with new test logic and helpers for MLX gRPC E2E validation.

Suggested labels

grpc, tests

Suggested reviewers

key4ng
slin1237
XinyueZhang369
CatherineSue

Poem

🐰 A fluffy test suite hops into place,
Stop-strings trimmed with graceful pace,
Streaming chunks and error states align,
The MLX backend tests now shine! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly describes the main change: adding E2E tests for MLX backend's string stop and matched_stop functionality.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces end-to-end tests for stop sequence functionality in the MLX backend, covering both chat and completion endpoints in streaming and non-streaming modes. It also includes tests to verify that multi-token stop strings are correctly rejected with a 400 error. Feedback was provided to improve the robustness of the collect_streamed_completion helper by handling cases where no chunk contains a finish reason, preventing uninformative StopIteration errors and improving test failure diagnostics.

gemini-code-assist · 2026-05-25T01:27:13Z

+    final_choice = next(
+        c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason
+    )


The next() function is called on a generator that may be empty if no chunk in the stream contains a finish_reason. This would raise a StopIteration exception, which is less informative than an assertion failure. It is better to provide a default value to next() and then assert that a valid choice was found to improve test failure diagnostics.

Suggested change

final_choice = next(

c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason

)

final_choice = next(

(c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason),

None,

)

assert final_choice is not None, "No chunk with finish_reason found in stream"

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@e2e_test/mlx/test_mlx_backend.py`:
- Around line 39-42: The test helper assert_matched_stop currently compares
choice.matched_stop (a string) to an expected string, but the MLX router wrapper
sets matched_stop_token_id as a JSON number; update assert_matched_stop to
normalize types before comparing: fetch getattr(choice, "matched_stop", None)
and also check for getattr(choice, "matched_stop_token_id", None), coerce
numeric token ids to strings (or coerce expected to int) and then assert
equality; update all call-sites that pass expected values (lines around 203-204,
216-217, 252) to use the same normalization approach so the assertion treats "6"
and 6 as equivalent.
- Around line 191-217: Tests assume string stops are accepted but MLX currently
rejects non-empty stop strings via reject_stop_strings(...), so update the
failing tests to match backend behavior: in test_chat_stop_string_non_streaming
and test_completion_stop_string_non_streaming (and the similar cases at the
later block), remove or do not pass stop=[self.SINGLE_STRING_STOP] (use no stop
parameter or an accepted stop form), and update assertions accordingly (do not
assert finish_reason == "stop", remove
assert_stop_text_trimmed/assert_matched_stop) or mark the tests as
expected-to-fail/skip until MLX accepts string stops; target the two test
functions by name to locate and modify them.
- Around line 25-32: The helper collect_streamed_completion can raise
StopIteration when no chunk has a finish_reason; update it to explicitly check
for a final chunk before using next() — e.g., search for
collect_streamed_completion, compute final_choice_candidate by scanning
reversed(chunks) for a chunk with c.choices and c.choices[0].finish_reason,
assert that such a chunk exists (or raise a clear ValueError/AssertionError with
a diagnostic message) and then set final_choice from that candidate; this makes
failures explicit and debuggable instead of letting next() raise StopIteration.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 24ba503a-0b9a-4d8d-911a-ca0337f5fd4c

📥 Commits

Reviewing files that changed from the base of the PR and between d31f92a and 0077d46.

📒 Files selected for processing (1)

e2e_test/mlx/test_mlx_backend.py

coderabbitai · 2026-05-25T01:31:11Z

+def collect_streamed_completion(stream):
+    """Collect all text and the final choice from a streaming completion response."""
+    chunks = list(stream)
+    text = "".join(c.choices[0].text for c in chunks if c.choices and c.choices[0].text)
+    final_choice = next(
+        c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason
+    )
+    return text, final_choice


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Guard missing-final-chunk case in streaming helper.

next(...) can raise StopIteration if no chunk carries finish_reason, producing a non-diagnostic failure. Add an explicit assertion for debuggability.

Suggested fix

def collect_streamed_completion(stream): """Collect all text and the final choice from a streaming completion response.""" chunks = list(stream) text = "".join(c.choices[0].text for c in chunks if c.choices and c.choices[0].text) - final_choice = next( - c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason - ) + final_choice = next( + (c.choices[0] for c in reversed(chunks) if c.choices and c.choices[0].finish_reason), + None, + ) + assert final_choice is not None, "Expected a final streamed chunk with finish_reason" return text, final_choice

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@e2e_test/mlx/test_mlx_backend.py` around lines 25 - 32, The helper collect_streamed_completion can raise StopIteration when no chunk has a finish_reason; update it to explicitly check for a final chunk before using next() — e.g., search for collect_streamed_completion, compute final_choice_candidate by scanning reversed(chunks) for a chunk with c.choices and c.choices[0].finish_reason, assert that such a chunk exists (or raise a clear ValueError/AssertionError with a diagnostic message) and then set final_choice from that candidate; this makes failures explicit and debuggable instead of letting next() raise StopIteration.

coderabbitai · 2026-05-25T01:31:11Z

+def assert_matched_stop(choice, expected):
+    actual = getattr(choice, "matched_stop", None)
+    assert actual == expected, f"Expected matched_stop={expected!r}, got {actual!r}"
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

matched_stop type assertion is inconsistent with MLX response mapping.

assert_matched_stop currently expects a stop string (e.g., "6"), but MLX is mapped via matched_stop_token_id to a JSON number in the router wrapper. These assertions are likely to fail even if stop handling works.

Suggested adjustment

-def assert_matched_stop(choice, expected): +def assert_matched_stop(choice, expected): actual = getattr(choice, "matched_stop", None) assert actual == expected, f"Expected matched_stop={expected!r}, got {actual!r}"

- assert_matched_stop(choice, self.SINGLE_STRING_STOP) + # MLX currently surfaces token-id matched_stop + assert isinstance(getattr(choice, "matched_stop", None), int)

Also applies to: 203-204, 216-217, 252-252

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@e2e_test/mlx/test_mlx_backend.py` around lines 39 - 42, The test helper assert_matched_stop currently compares choice.matched_stop (a string) to an expected string, but the MLX router wrapper sets matched_stop_token_id as a JSON number; update assert_matched_stop to normalize types before comparing: fetch getattr(choice, "matched_stop", None) and also check for getattr(choice, "matched_stop_token_id", None), coerce numeric token ids to strings (or coerce expected to int) and then assert equality; update all call-sites that pass expected values (lines around 203-204, 216-217, 252) to use the same normalization approach so the assertion treats "6" and 6 as equivalent.

coderabbitai · 2026-05-25T01:31:11Z

+    def test_chat_stop_string_non_streaming(self, model, api_client):
+        response = api_client.chat.completions.create(
+            model=model,
+            messages=[{"role": "user", "content": self.STOP_SEQUENCE_TEST_PROMPT}],
+            max_tokens=160,
+            temperature=0,
+            stop=[self.SINGLE_STRING_STOP],
+            extra_body=self.NO_THINKING,
+        )
+        choice = response.choices[0]
+        assert choice.finish_reason == "stop"
+        assert_stop_text_trimmed(choice.message.content or "", self.SINGLE_STRING_STOP)
+        assert_matched_stop(choice, self.SINGLE_STRING_STOP)
+
+    def test_completion_stop_string_non_streaming(self, model, api_client):
+        response = api_client.completions.create(
+            model=model,
+            prompt=self.STOP_SEQUENCE_TEST_PROMPT,
+            max_tokens=160,
+            temperature=0,
+            stop=[self.SINGLE_STRING_STOP],
+        )
+        choice = response.choices[0]
+        assert choice.finish_reason == "stop"
+        assert_stop_text_trimmed(choice.text, self.SINGLE_STRING_STOP)
+        assert_matched_stop(choice, self.SINGLE_STRING_STOP)
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Single-stop “success” expectations conflict with current MLX contract.

These tests assert HTTP 200 and finish_reason=="stop" for stop=["6"], but the current MLX gRPC request builder rejects any non-empty stop strings (reject_stop_strings(...)). This will make these cases fail until backend behavior is changed.

Suggested adjustment

- def test_chat_stop_string_non_streaming(self, model, api_client): - response = api_client.chat.completions.create( - model=model, - messages=[{"role": "user", "content": self.STOP_SEQUENCE_TEST_PROMPT}], - max_tokens=160, - temperature=0, - stop=[self.SINGLE_STRING_STOP], - extra_body=self.NO_THINKING, - ) - choice = response.choices[0] - assert choice.finish_reason == "stop" - assert_stop_text_trimmed(choice.message.content or "", self.SINGLE_STRING_STOP) - assert_matched_stop(choice, self.SINGLE_STRING_STOP) + # Keep/enable success-path tests only once MLX stop-string support is implemented. + # For current behavior, assert 400 unsupported_stop_string.

Also applies to: 239-252

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@e2e_test/mlx/test_mlx_backend.py` around lines 191 - 217, Tests assume string stops are accepted but MLX currently rejects non-empty stop strings via reject_stop_strings(...), so update the failing tests to match backend behavior: in test_chat_stop_string_non_streaming and test_completion_stop_string_non_streaming (and the similar cases at the later block), remove or do not pass stop=[self.SINGLE_STRING_STOP] (use no stop parameter or an accepted stop form), and update assertions accordingly (do not assert finish_reason == "stop", remove assert_stop_text_trimmed/assert_matched_stop) or mark the tests as expected-to-fail/skip until MLX accepts string stops; target the two test functions by name to locate and modify them.

zach-li-sudo added 2 commits May 24, 2026 17:57

test(e2e): add basic string stop and matched stop test coverage for MLX

8160b9d

Signed-off-by: Zhuo Li <zhuo.li.ca@outlook.com>

test(e2e): formatting

0077d46

Signed-off-by: Zhuo Li <zhuo.li.ca@outlook.com>

zach-li-sudo requested review from CatherineSue, XinyueZhang369, key4ng and slin1237 as code owners May 25, 2026 01:25

github-actions Bot added the tests Test changes label May 25, 2026

gemini-code-assist Bot reviewed May 25, 2026

View reviewed changes

zach-li-sudo mentioned this pull request May 25, 2026

feat(mlx-grpc): String stop sequence support for MLX on all 6 pipeline/path combinations #1524

Open

4 tasks

coderabbitai Bot requested changes May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e): add basic string stop and matched stop test coverage for MLX#1538

test(e2e): add basic string stop and matched stop test coverage for MLX#1538
zach-li-sudo wants to merge 2 commits into
lightseekorg:mainfrom
zach-li-sudo:zhuoli/mlx-e2e

zach-li-sudo commented May 25, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 25, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 25, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 25, 2026

Uh oh!

coderabbitai Bot May 25, 2026

Uh oh!

coderabbitai Bot May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zach-li-sudo commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Changes

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zach-li-sudo commented May 25, 2026 •

edited

Loading

coderabbitai Bot commented May 25, 2026 •

edited

Loading