fix(reasoning/qwen3): route bare-text 'thinking process:' preamble to reasoning_content (#570)#573
Open
raullenchai wants to merge 2 commits into
Open
fix(reasoning/qwen3): route bare-text 'thinking process:' preamble to reasoning_content (#570)#573raullenchai wants to merge 2 commits into
raullenchai wants to merge 2 commits into
Conversation
… reasoning_content (#570) Qwen3 chat templates inject `<think>\n` after the assistant generation marker when `enable_thinking=True`, putting the model in implicit-think mode. The model is supposed to emit its chain-of-thought followed by `</think>` and then the user-facing answer. In practice the model occasionally restates the channel boundary inline as a bare-text preamble (`Here's a thinking process:\n\n1. **Analyze...`); when the request also runs out of `max_tokens` before the model reaches `</think>`, neither tag appears anywhere in the output. The previous `Qwen3ReasoningParser.extract_reasoning` "no end token, no start token" branch then returned `(None, model_output)` — routing the whole chain-of-thought into the user-facing `content` field while `reasoning_content` stayed empty. Any OpenAI-compatible client consuming `/v1/chat/completions` saw raw reasoning leak into the answer. Fix: recognise bare-text "thinking process" preambles at the very start of the output and route them to `reasoning_content` with `cleaned_text` cleared. The regex anchors to `^\s*` and is scoped narrowly to explicit scratchpad labels — preambles that end with `:` and use known scratchpad nouns (`thinking process`, `reasoning`, `chain-of-thought`, `scratchpad`, `thought process`, `reasoning process`) or the labelled `Thinking step by step:` / `Step by step:` / `My thought process:` forms. Conversational openers like `Let me think...` or `I need to analyze...` are deliberately NOT matched — they are common direct-answer phrasings and matching them would blank out valid responses when `enable_thinking=False`. Applies to both non-streaming `extract_reasoning` and the `finalize_streaming` correction path, matching the existing streaming heuristic in `BaseThinkingReasoningParser` that already routes pre-tag text to reasoning while `</think>` has not yet been seen. Regression coverage: - `tests/test_reasoning_parsers.py::TestQwen3` — bare-text-fallback positive cases, the false-positive guard for normal answers that mention `think` mid-sentence, the explicit-tag passthrough, the streaming finalizer behavior, and an `test_ambiguous_phrases_not_ misclassified` negative test that pins the conservative scope. - `tests/test_chat_route_tool_tag_leak.py::TestBareThinkingProcessLeakRegression` — drives the real `_finalize_content_and_reasoning` orchestration used by `/v1/chat/completions`, including the exact text captured from a failing request against `qwen3.6-35b-4bit`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #573 validation scorecardTitle: fix(reasoning/qwen3): route bare-text 'thinking process:' preamble to reasoning_content (#570) Verdict: MERGE-SAFE
|
…p bare Step-by-step (codex r2) Codex round-2 review on PR #573 caught two further BLOCKING items: * **r2 B1 — tool-call markup leaked into reasoning_content.** When ``_finalize_content_and_reasoning`` runs with ``tool_calls`` set, it calls ``extract_reasoning(raw_text)`` on the unstripped raw output. If that output both starts with a recognised bare-text preamble AND embeds ``<tool_call>...`` markup, the bare-text branch would return the whole raw output (markup and all) as reasoning_content — exposing in the reasoning channel the same tags the tool parser already stripped from ``content``. Add a tool-tag detector (``<tool_call>``, ``<function=``, ``<|tool_call|>``, ``<invoke ``, ``<minimax:tool_call>``) and skip the fallback whenever it fires; defer to the upstream tool/text pipeline for that case. * **r2 B2 — bare ``Step by step:`` / ``Step-by-step:`` is a normal answer heading.** Users frequently ask "explain step by step" and the canonical answer heading is exactly ``Step by step:`` — matching it would clobber legitimate tutorials. Drop those two alternates from the regex. The labelled ``Thinking step by step:`` form (which is unambiguous scratchpad) is retained. New negative-coverage tests: * ``test_bare_think_prefix_with_tool_call_markup_not_routed`` — five tool-tag flavors (matching every variant ``test_chat_route_tool_tag_ leak`` recognises) + the combined "preamble + tool tag" payload. * ``test_ambiguous_phrases_not_misclassified`` extended with two ``Step by step:`` / ``Step-by-step:`` direct-answer cases. * ``test_bare_thinking_preamble_with_tool_call_does_not_leak_markup`` drives the full ``_drive_chat_route_pipeline`` (tool parser + reasoning parser + finalize helper) to confirm neither sink leaks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #570.
Qwen3ReasoningParsernow recognises bare-text "thinking process" preambles at the start of an assistant turn and routes them toreasoning_contentinstead of letting them leak into the user-facingcontentfield.Root cause
Qwen3 chat templates inject
<think>\nafter the assistant generation marker whenenable_thinking=True, putting the model in implicit-think mode — the model is supposed to emit its chain-of-thought followed by</think>and then the user-facing answer.In practice the model occasionally restates the channel boundary inline with a bare-text preamble like:
When the request also runs out of
max_tokensbefore the model reaches</think>, neither tag appears anywhere in the model output. The previousQwen3ReasoningParser.extract_reasoning"no end token, no start token" branch then returned(None, model_output)— routing the whole chain-of-thought into the user-facingcontentfield whilereasoning_contentstayed empty. Any OpenAI-compatible client consuming/v1/chat/completionssaw raw reasoning leak into the answer.The bug reproduces both on multi-turn and single-turn prompts; the trigger is "model emits bare-text thinking preamble AND output is truncated before
</think>", which becomes more likely as conversation length andmax_tokensinteract.Fix
Extend
Qwen3ReasoningParserto recognise bare-text "thinking process" preambles and route them toreasoning_contentwithcleaned_textcleared. Applies to both non-streamingextract_reasoningand thefinalize_streamingcorrection path — matching the existing streaming heuristic inBaseThinkingReasoningParserthat already routes pre-tag text to reasoning while</think>has not yet been seen.The regex is scoped narrowly to explicit scratchpad labels — preambles that end with
:and use known scratchpad nouns. Conversational openers (Let me think...,I need to analyze...,I'll analyze...) are deliberately NOT matched: they are common direct-answer phrasings whenenable_thinking=False, and matching them would blank out valid responses.Recognised preambles (matched at
^\s*, case-insensitive, all require terminating:):Here's a thinking process:,Here is my reasoning:,Here's the chain-of-thought:,Here is the scratchpad:,Here's the thought process:,Here is the reasoning process:Thinking step by step:,Thinking out loud:,Thinking through this:,Thinking carefully:,Thinking aloud:Step by step:/Step-by-step:My thought process:/My reasoning process:Why server-side is the right layer
Two alternative paths were considered:
<think>open across continuation turns. The chat template is owned by the upstream HF tokenizer config (Qwen3.6-A3B), so we'd have to maintain a local override; any HF refresh would silently revert the fix. Captured for future upstream contribution; not blocked on it.<think>server-side at the prompt boundary. Would mask the symptom but bare-text preambles also occur in single-turn prompts (the trigger is "truncated before</think>", not "continuation turn"). Server-side parser remains the right place to be defensive.Regression coverage
tests/test_reasoning_parsers.py::TestQwen3— bare-text-fallback positive cases (12 explicit-label variants), the explicit-tag passthrough, the streaming finalizer behavior, and a dedicatedtest_ambiguous_phrases_not_misclassifiednegative test that pins the conservative scope (9 conversational openers that must stay ascontent).tests/test_chat_route_tool_tag_leak.py::TestBareThinkingProcessLeakRegression— drives the real_finalize_content_and_reasoningorchestration used by/v1/chat/completions, including the exact text captured from a failing request againstqwen3.6-35b-4bit.Scope
Test plan
python3.12 -m pytest tests/test_reasoning_parsers.py tests/test_chat_route_tool_tag_leak.py— 142 passedpython3.12 -m pytest tests/ -k "reason or qwen or think or finalize or tool_tag" --ignore=tests/integrations --ignore=tests/test_streaming_simulator.py— 1003 passedruff check+ruff format --checkon touched files — clean_finalize_content_and_reasoning—reasoning_contentpopulates with the chain-of-thought,contentclears