fix: create individual entity pages when insight has affectedPages list#413
Open
shisonghong-git wants to merge 3 commits into
Open
fix: create individual entity pages when insight has affectedPages list#413shisonghong-git wants to merge 3 commits into
shisonghong-git wants to merge 3 commits into
Conversation
added 2 commits
June 16, 2026 21:47
Reasoning models (e.g. deepseek-r1) may start their reply with <think>…</think>. The previous code extracted the title from the raw content first and cleaned think tags second, causing the saved query page title to be polluted with the model's internal reasoning text instead of the actual answer. Move content cleanup before title extraction so the saved title reflects the real answer content.
When a 'missing-page' insight lists specific entity paths in affectedPages (e.g. wiki/entities/CallMethod.md), clicking 'Create Page' now creates individual pages for each entity instead of a single query page with the insight description. - Added multi-page creation mode that iterates affectedPages - Added extractEntityInfo helper to pull entity-specific text from insights - Pages are created with proper frontmatter (type, title, created, tags) - Existing pages are skipped to avoid overwriting - Wiki index and log are updated for all created pages - Backwards compatible: single-page mode preserved when affectedPages is empty
Author
|
Fixes #414 |
A missing-page insight names the missing entities in its TITLE; affectedPages lists the EXISTING pages that reference the gap, not pages to create. The prior logic created/overwrote those reference pages and only ran when affectedPages was set (which the model rarely emits), so it fell back to a single query page. Now parse the entity name(s) from the title (handling '、'/comma lists and hyphen-joined identifier runs), create one wiki/entities/<slug>.md per entity, skip existing pages, and record affectedPages as related: back-references. Also tighten the ingest prompt so missing-page titles carry the bare entity names. Refs nashsu#414
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #414
问题
当洞察系统生成
missing-page类型洞察、列出缺失实体(如 CallMethod、StartFunc、Print)时,点击 "Create Page" 并未为各实体创建页面,而是把整条洞察存成了一个 query 页(如wiki/concepts/核心测试项实体页缺失-callmethod-startfunc-print-….md)。根因
之前的实现把
affectedPages当作"要创建的页面列表",但它的真实语义是"引用了该缺口的现有页面"——不是要创建的页。真正要创建的实体名其实在洞察标题里。再加上 LLM 几乎从不输出PAGES:行,affectedPages为空,多页分支从不触发,于是落到单 query 页。修复
extractMissingEntityNames(),剥离缺失页面:/Missing page:等前缀及…实体页缺失这类 CJK 描述性前缀后,按、,,;/|拆分;对Foo-Bar-Baz这类大写标识符连写也按连字符拆,但不会误伤self-attention之类小写 kebab-case。wiki/entities/<slug>.md(按洞察类型可落 concepts),已存在则跳过、绝不覆盖,并更新 index/log。affectedPages改作引用:写入新页的related:反向引用,而非创建目标。、分隔,PAGES仅列引用页。测试
extractMissingEntityNames单元测试:单实体、中英文前缀、、/逗号多实体、issue 洞察 提示 缺少实体,但是点击 Create Page 后,它并未按所需创建对应的页面,而是将问题存为一个md页面了。 #414 的连字符列表、保留大小写、不切 kebab-case、空标题,全部通过。