Skip to content

fix: create individual entity pages when insight has affectedPages list#413

Open
shisonghong-git wants to merge 3 commits into
nashsu:mainfrom
shisonghong-git:fix/create-page-multiple-entities
Open

fix: create individual entity pages when insight has affectedPages list#413
shisonghong-git wants to merge 3 commits into
nashsu:mainfrom
shisonghong-git:fix/create-page-multiple-entities

Conversation

@shisonghong-git

@shisonghong-git shisonghong-git commented Jun 17, 2026

Copy link
Copy Markdown

Fixes #414

问题

当洞察系统生成 missing-page 类型洞察、列出缺失实体(如 CallMethod、StartFunc、Print)时,点击 "Create Page" 并未为各实体创建页面,而是把整条洞察存成了一个 query 页(如 wiki/concepts/核心测试项实体页缺失-callmethod-startfunc-print-….md)。

根因

之前的实现把 affectedPages 当作"要创建的页面列表",但它的真实语义是"引用了该缺口的现有页面"——不是要创建的页。真正要创建的实体名其实在洞察标题里。再加上 LLM 几乎从不输出 PAGES: 行,affectedPages 为空,多页分支从不触发,于是落到单 query 页。

修复

  1. 从标题解析实体名:新增 extractMissingEntityNames(),剥离 缺失页面: / Missing page: 等前缀及 …实体页缺失 这类 CJK 描述性前缀后,按 、,,;/| 拆分;对 Foo-Bar-Baz 这类大写标识符连写也按连字符拆,但不会误伤 self-attention 之类小写 kebab-case。
  2. 逐实体建页:为每个实体创建独立的 wiki/entities/<slug>.md(按洞察类型可落 concepts),已存在则跳过、绝不覆盖,并更新 index/log。
  3. affectedPages 改作引用:写入新页的 related: 反向引用,而非创建目标。
  4. 收紧 prompt:要求 missing-page 的标题就是缺失实体名本身,多个用 分隔,PAGES 仅列引用页。

测试

shisonghong added 2 commits June 16, 2026 21:47
Reasoning models (e.g. deepseek-r1) may start their reply with
<think>…</think>. The previous code extracted the title from the raw
content first and cleaned think tags second, causing the saved
query page title to be polluted with the model's internal
reasoning text instead of the actual answer.

Move content cleanup before title extraction so the saved title
reflects the real answer content.
When a 'missing-page' insight lists specific entity paths in affectedPages
(e.g. wiki/entities/CallMethod.md), clicking 'Create Page' now creates
individual pages for each entity instead of a single query page with the
insight description.

- Added multi-page creation mode that iterates affectedPages
- Added extractEntityInfo helper to pull entity-specific text from insights
- Pages are created with proper frontmatter (type, title, created, tags)
- Existing pages are skipped to avoid overwriting
- Wiki index and log are updated for all created pages
- Backwards compatible: single-page mode preserved when affectedPages is empty
@shisonghong-git

Copy link
Copy Markdown
Author

Fixes #414

A missing-page insight names the missing entities in its TITLE; affectedPages
lists the EXISTING pages that reference the gap, not pages to create. The prior
logic created/overwrote those reference pages and only ran when affectedPages
was set (which the model rarely emits), so it fell back to a single query page.

Now parse the entity name(s) from the title (handling '、'/comma lists and
hyphen-joined identifier runs), create one wiki/entities/<slug>.md per entity,
skip existing pages, and record affectedPages as related: back-references. Also
tighten the ingest prompt so missing-page titles carry the bare entity names.

Refs nashsu#414
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

洞察 提示 缺少实体,但是点击 Create Page 后,它并未按所需创建对应的页面,而是将问题存为一个md页面了。

1 participant