fix: disable thinking/reasoning for submodel calls#1392
Draft
Lap1acian wants to merge 2 commits into
Draft
Conversation
Reasoning models (Gemini thinking, Claude thinking, DeepSeek R1, OpenRouter reasoning, local models with <think> tags) produce chain-of-thought content that pollutes the image generation prompt when used as submodel for Stable Diffusion / NovelAI tag generation. Three changes to address the root cause: 1. request.ts: Strip thinking-related flags (geminiThinking, deepSeekThinkingOutput, claudeThinking, claudeAdaptiveThinking) from modelInfo when mode is 'submodel', so the API never requests thinking in the first place. 2. openAI/requests.ts: Skip wrapping reasoning_content / reasoning API response fields in <Thoughts> tags for submodel calls. If the model's text content is empty, fall back to reasoning content directly (without wrapper tags) so downstream processing can still work with it. 3. stableDiff.ts: Strip both <Thoughts> and <think> blocks from the submodel result, extending the previous <Thoughts>-only strip to also cover local models that emit <think> tags in their text output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous approach fell back to raw reasoning_content when the model's text content was empty, which caused reasoning text to leak into the SD prompt for models that put everything in the reasoning field. Simply discard reasoning for submodel calls — if the model produces no text content, an empty result is preferable to garbage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
#1373 fixed the
<Thoughts>block leaking into SD submodel output via post-processing strip.However, with a local thinking model (gemma4:31b via OpenAI-compatible API), a deeper issue was found: the model's actual result (image tags for
{{slot}}) is often empty or incomplete because the model spends its output budget on chain-of-thought reasoning that gets discarded anyway.Disabling thinking at the request level — not just stripping it from the output — is needed so the model focuses entirely on producing useful tag content.
Solution
Three layers of defense for
mode === 'submodel'calls:request.tsLLMFlags(geminiThinking,deepSeekThinkingOutput,claudeThinking,claudeAdaptiveThinking) from model info before the request is sentopenAI/requests.tsreasoning_content/reasoningresponse fields in<Thoughts>tagsstableDiff.ts<Thoughts>and<think>blocks from output (safety net for local models that emit thinking in text)Testing
Tested with gemma4:31b (local, OpenAI-compatible API):
Known Limitations
Feedback, additional test results from other thinking models, or insights on the intermittent issue would be very welcome.
Related