Skip to content

fix: disable thinking/reasoning for submodel calls#1392

Draft
Lap1acian wants to merge 2 commits into
kwaroran:mainfrom
Lap1acian:fix/sd-submodel-disable-thinking
Draft

fix: disable thinking/reasoning for submodel calls#1392
Lap1acian wants to merge 2 commits into
kwaroran:mainfrom
Lap1acian:fix/sd-submodel-disable-thinking

Conversation

@Lap1acian
Copy link
Copy Markdown

Problem

#1373 fixed the <Thoughts> block leaking into SD submodel output via post-processing strip.

However, with a local thinking model (gemma4:31b via OpenAI-compatible API), a deeper issue was found: the model's actual result (image tags for {{slot}}) is often empty or incomplete because the model spends its output budget on chain-of-thought reasoning that gets discarded anyway.

Disabling thinking at the request level — not just stripping it from the output — is needed so the model focuses entirely on producing useful tag content.

Solution

Three layers of defense for mode === 'submodel' calls:

File Change
request.ts Strip thinking-related LLMFlags (geminiThinking, deepSeekThinkingOutput, claudeThinking, claudeAdaptiveThinking) from model info before the request is sent
openAI/requests.ts Skip wrapping reasoning_content / reasoning response fields in <Thoughts> tags
stableDiff.ts Strip residual <Thoughts> and <think> blocks from output (safety net for local models that emit thinking in text)

Testing

Tested with gemma4:31b (local, OpenAI-compatible API):

  • Tag generation works correctly in most cases — model produces clean keyword output without thinking wrappers
  • In some cases, the thinking result is still not included in the prompt; the exact conditions for this are not yet identified

Known Limitations

  • Only verified with a single local model (gemma4:31b). Behavior with other thinking models (DeepSeek R1, Claude thinking, Gemini thinking) has not been tested.
  • The intermittent failure condition (thinking result missing from prompt) needs further investigation.

Feedback, additional test results from other thinking models, or insights on the intermittent issue would be very welcome.

Related

kamukande and others added 2 commits April 10, 2026 17:35
Reasoning models (Gemini thinking, Claude thinking, DeepSeek R1,
OpenRouter reasoning, local models with <think> tags) produce
chain-of-thought content that pollutes the image generation prompt
when used as submodel for Stable Diffusion / NovelAI tag generation.

Three changes to address the root cause:

1. request.ts: Strip thinking-related flags (geminiThinking,
   deepSeekThinkingOutput, claudeThinking, claudeAdaptiveThinking)
   from modelInfo when mode is 'submodel', so the API never requests
   thinking in the first place.

2. openAI/requests.ts: Skip wrapping reasoning_content / reasoning
   API response fields in <Thoughts> tags for submodel calls. If the
   model's text content is empty, fall back to reasoning content
   directly (without wrapper tags) so downstream processing can still
   work with it.

3. stableDiff.ts: Strip both <Thoughts> and <think> blocks from the
   submodel result, extending the previous <Thoughts>-only strip to
   also cover local models that emit <think> tags in their text output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous approach fell back to raw reasoning_content when the
model's text content was empty, which caused reasoning text to leak
into the SD prompt for models that put everything in the reasoning
field. Simply discard reasoning for submodel calls — if the model
produces no text content, an empty result is preferable to garbage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants