fix: strip think tags before extracting title when saving to wiki#406
Open
shisonghong-git wants to merge 2 commits into
Open
fix: strip think tags before extracting title when saving to wiki#406shisonghong-git wants to merge 2 commits into
shisonghong-git wants to merge 2 commits into
Conversation
Reasoning models (e.g. deepseek-r1) may start their reply with <think>…</think>. The previous code extracted the title from the raw content first and cleaned think tags second, causing the saved query page title to be polluted with the model's internal reasoning text instead of the actual answer. Move content cleanup before title extraction so the saved title reflects the real answer content.
Author
|
可解决issue #414 (comment) |
…pping The saved query title now prefers the user's question. Long questions are summarized at the nearest sentence/clause boundary (with an ellipsis) instead of a hard mid-word slice, and image markdown / base64 data URIs are stripped so image-only or image-heavy questions don't leak a blob into the title (falling back to the answer's first line when no usable question text remains). Refs nashsu#404
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #404
问题
使用推理模型(如 deepseek-r1)时,LLM 回复可能以
<think>…</think>开头。之前的代码先从原始内容提取标题、再清理 think 标签,导致保存的 query 页面标题被推理过程污染。此外,即使取到正确内容,把回答首行当标题本身也不理想——回答首行常是套话,且过长时会被硬截、含截图时会泄漏 base64。修复
<think>标签和 sources 注释,再提取标题。deriveTitleFromQuestion(),过长时在 60 字符预算内按最近的句读/分句边界截断并加…,避免断在词中间;找不到边界才硬截。图片 markdown 和裸data:image;base64,...;纯图片提问得到空标题 → 回退到回答首行,图文混合则只保留文字。测试
deriveTitleFromQuestion单元测试:短问题、换行折叠、空/纯图片、图文混合、超长摘要、无边界硬截,全部通过。