fix(reasoning/qwen3): route bare-text 'thinking process:' preamble to reasoning_content (#570) by raullenchai · Pull Request #573 · raullenchai/Rapid-MLX

raullenchai · 2026-06-14T21:36:44Z

Summary

Closes #570.

Qwen3ReasoningParser now recognises bare-text "thinking process" preambles at the start of an assistant turn and routes them to reasoning_content instead of letting them leak into the user-facing content field.

Root cause

Qwen3 chat templates inject <think>\n after the assistant generation marker when enable_thinking=True, putting the model in implicit-think mode — the model is supposed to emit its chain-of-thought followed by </think> and then the user-facing answer.

In practice the model occasionally restates the channel boundary inline with a bare-text preamble like:

Here's a thinking process:

1.  **Analyze User Input:**
   - **Route:** Seattle to San Diego
2.  **Evaluate Each Option:**
   - Portland, OR: World-renowned food scene.

When the request also runs out of max_tokens before the model reaches </think>, neither tag appears anywhere in the model output. The previous Qwen3ReasoningParser.extract_reasoning "no end token, no start token" branch then returned (None, model_output) — routing the whole chain-of-thought into the user-facing content field while reasoning_content stayed empty. Any OpenAI-compatible client consuming /v1/chat/completions saw raw reasoning leak into the answer.

The bug reproduces both on multi-turn and single-turn prompts; the trigger is "model emits bare-text thinking preamble AND output is truncated before </think>", which becomes more likely as conversation length and max_tokens interact.

Fix

Extend Qwen3ReasoningParser to recognise bare-text "thinking process" preambles and route them to reasoning_content with cleaned_text cleared. Applies to both non-streaming extract_reasoning and the finalize_streaming correction path — matching the existing streaming heuristic in BaseThinkingReasoningParser that already routes pre-tag text to reasoning while </think> has not yet been seen.

The regex is scoped narrowly to explicit scratchpad labels — preambles that end with : and use known scratchpad nouns. Conversational openers (Let me think..., I need to analyze..., I'll analyze...) are deliberately NOT matched: they are common direct-answer phrasings when enable_thinking=False, and matching them would blank out valid responses.

Recognised preambles (matched at ^\s*, case-insensitive, all require terminating :):

Here's a thinking process:, Here is my reasoning:, Here's the chain-of-thought:, Here is the scratchpad:, Here's the thought process:, Here is the reasoning process:
Thinking step by step:, Thinking out loud:, Thinking through this:, Thinking carefully:, Thinking aloud:
Step by step: / Step-by-step:
My thought process: / My reasoning process:

Why server-side is the right layer

Two alternative paths were considered:

Patch the chat template to keep <think> open across continuation turns. The chat template is owned by the upstream HF tokenizer config (Qwen3.6-A3B), so we'd have to maintain a local override; any HF refresh would silently revert the fix. Captured for future upstream contribution; not blocked on it.
Inject <think> server-side at the prompt boundary. Would mask the symptom but bare-text preambles also occur in single-turn prompts (the trigger is "truncated before </think>", not "continuation turn"). Server-side parser remains the right place to be defensive.

Regression coverage

tests/test_reasoning_parsers.py::TestQwen3 — bare-text-fallback positive cases (12 explicit-label variants), the explicit-tag passthrough, the streaming finalizer behavior, and a dedicated test_ambiguous_phrases_not_misclassified negative test that pins the conservative scope (9 conversational openers that must stay as content).
tests/test_chat_route_tool_tag_leak.py::TestBareThinkingProcessLeakRegression — drives the real _finalize_content_and_reasoning orchestration used by /v1/chat/completions, including the exact text captured from a failing request against qwen3.6-35b-4bit.

Scope

No version bump (parser fix; release SOP 11-gate walk happens at the next bundled cut).
Diff limited to one parser file + two test files (+335 / -2).
No alias changes, no template changes, no upstream pin changes.

Test plan

python3.12 -m pytest tests/test_reasoning_parsers.py tests/test_chat_route_tool_tag_leak.py — 142 passed
Broader sweep: python3.12 -m pytest tests/ -k "reason or qwen or think or finalize or tool_tag" --ignore=tests/integrations --ignore=tests/test_streaming_simulator.py — 1003 passed
ruff check + ruff format --check on touched files — clean
Direct verification on the captured failing payload via _finalize_content_and_reasoning — reasoning_content populates with the chain-of-thought, content clears

… reasoning_content (#570) Qwen3 chat templates inject `<think>\n` after the assistant generation marker when `enable_thinking=True`, putting the model in implicit-think mode. The model is supposed to emit its chain-of-thought followed by `</think>` and then the user-facing answer. In practice the model occasionally restates the channel boundary inline as a bare-text preamble (`Here's a thinking process:\n\n1. **Analyze...`); when the request also runs out of `max_tokens` before the model reaches `</think>`, neither tag appears anywhere in the output. The previous `Qwen3ReasoningParser.extract_reasoning` "no end token, no start token" branch then returned `(None, model_output)` — routing the whole chain-of-thought into the user-facing `content` field while `reasoning_content` stayed empty. Any OpenAI-compatible client consuming `/v1/chat/completions` saw raw reasoning leak into the answer. Fix: recognise bare-text "thinking process" preambles at the very start of the output and route them to `reasoning_content` with `cleaned_text` cleared. The regex anchors to `^\s*` and is scoped narrowly to explicit scratchpad labels — preambles that end with `:` and use known scratchpad nouns (`thinking process`, `reasoning`, `chain-of-thought`, `scratchpad`, `thought process`, `reasoning process`) or the labelled `Thinking step by step:` / `Step by step:` / `My thought process:` forms. Conversational openers like `Let me think...` or `I need to analyze...` are deliberately NOT matched — they are common direct-answer phrasings and matching them would blank out valid responses when `enable_thinking=False`. Applies to both non-streaming `extract_reasoning` and the `finalize_streaming` correction path, matching the existing streaming heuristic in `BaseThinkingReasoningParser` that already routes pre-tag text to reasoning while `</think>` has not yet been seen. Regression coverage: - `tests/test_reasoning_parsers.py::TestQwen3` — bare-text-fallback positive cases, the false-positive guard for normal answers that mention `think` mid-sentence, the explicit-tag passthrough, the streaming finalizer behavior, and an `test_ambiguous_phrases_not_ misclassified` negative test that pins the conservative scope. - `tests/test_chat_route_tool_tag_leak.py::TestBareThinkingProcessLeakRegression` — drives the real `_finalize_content_and_reasoning` orchestration used by `/v1/chat/completions`, including the exact text captured from a failing request against `qwen3.6-35b-4bit`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-14T21:37:20Z

PR #573 validation scorecard

Title: fix(reasoning/qwen3): route bare-text 'thinking process:' preamble to reasoning_content (#570)
Author: raullenchai
Diff: 3 file(s), +425/-2 LOC, blast radius: medium

Verdict: MERGE-SAFE

step	status	summary	time
`fetch`	PASS	3 files, +425/-2 LOC, blast=medium	0.9s
`test_plan_check`	PASS	all 4 test-plan item(s) checked	0.0s
`cl_description_quality`	PASS	title OK + body has rationale (4856 chars)	0.0s
`codex_review`	skip	codex CLI not found on PATH (install: `npm i -g @openai/codex`)	0.0s
`supply_chain`	PASS	no hooks touched, no suspicious patterns, deps clean	0.0s
`lint`	PASS	clean (3 file(s))	0.0s
`targeted_tests`	PASS	exit 2 (in 2 target file(s))	2.0s

…p bare Step-by-step (codex r2) Codex round-2 review on PR #573 caught two further BLOCKING items: * **r2 B1 — tool-call markup leaked into reasoning_content.** When ``_finalize_content_and_reasoning`` runs with ``tool_calls`` set, it calls ``extract_reasoning(raw_text)`` on the unstripped raw output. If that output both starts with a recognised bare-text preamble AND embeds ``<tool_call>...`` markup, the bare-text branch would return the whole raw output (markup and all) as reasoning_content — exposing in the reasoning channel the same tags the tool parser already stripped from ``content``. Add a tool-tag detector (``<tool_call>``, ``<function=``, ``<|tool_call|>``, ``<invoke ``, ``<minimax:tool_call>``) and skip the fallback whenever it fires; defer to the upstream tool/text pipeline for that case. * **r2 B2 — bare ``Step by step:`` / ``Step-by-step:`` is a normal answer heading.** Users frequently ask "explain step by step" and the canonical answer heading is exactly ``Step by step:`` — matching it would clobber legitimate tutorials. Drop those two alternates from the regex. The labelled ``Thinking step by step:`` form (which is unambiguous scratchpad) is retained. New negative-coverage tests: * ``test_bare_think_prefix_with_tool_call_markup_not_routed`` — five tool-tag flavors (matching every variant ``test_chat_route_tool_tag_ leak`` recognises) + the combined "preamble + tool tag" payload. * ``test_ambiguous_phrases_not_misclassified`` extended with two ``Step by step:`` / ``Step-by-step:`` direct-answer cases. * ``test_bare_thinking_preamble_with_tool_call_does_not_leak_markup`` drives the full ``_drive_chat_route_pipeline`` (tool parser + reasoning parser + finalize helper) to confirm neither sink leaks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(reasoning/qwen3): route bare-text 'thinking process:' preamble to reasoning_content (#570)#573

fix(reasoning/qwen3): route bare-text 'thinking process:' preamble to reasoning_content (#570)#573
raullenchai wants to merge 2 commits into
mainfrom
fix/qwen3-parser-bare-think-prefix-v2

raullenchai commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raullenchai commented Jun 14, 2026

Summary

Root cause

Fix

Why server-side is the right layer

Regression coverage

Scope

Test plan

Uh oh!

github-actions Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR #573 validation scorecard

Verdict: MERGE-SAFE

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 14, 2026 •

edited

Loading