Skip to content

feat(kb): hierarchical organization Phase 1 — project MOCs + de-hubbed index — 0.23.0#11

Merged
Cain-Ish merged 9 commits into
mainfrom
feat/kb-hierarchical-organization
Jun 2, 2026
Merged

feat(kb): hierarchical organization Phase 1 — project MOCs + de-hubbed index — 0.23.0#11
Cain-Ish merged 9 commits into
mainfrom
feat/kb-hierarchical-organization

Conversation

@Cain-Ish

@Cain-Ish Cain-Ish commented Jun 2, 2026

Copy link
Copy Markdown
Owner

Why

The knowledge base read as a flat hairball. Root cause (one): hierarchy lives in the edge log but is drowned + unsurfaced — active edges are ~461 relates : 9 part_of, a flat index.md link-hubs all ~108 pages, and there's no project: facet so a project's notes scatter across type-folders (kiri-* loosely tied; cainish-bridge-* zero grouping). Design: docs/specs/2026-06-02-knowledge-base-hierarchical-organization-design.md (brainstormed + 5-facet web research; all facets converged). Plan: docs/plans/2026-06-02-kb-hierarchical-organization-phase1.md.

What (Phase 1 — deterministic read-side wins, no file ever moved)

  • project: facetparseDoc exposes project/area; membership is a frontmatter facet (a query), not a synthetic edge.
  • Project MOCsreindex projects one wiki/projects/<slug>.md per project with ≥ SB_MOC_MIN_MEMBERS (default 3) members, grouped by type; FORGET-protected like themes/; graph: exclude.
  • De-hubbed two-tier index.md — Home → ## Maps of Content (project/theme MOC links) + ## Categories (per-type counts, plain-text slug rows, no [[wikilinks]]), graph: exclude — so a graph viewer shows clusters, not a 108-edge star.
  • Pure projection — structure regenerates idempotently each reindex (byte-identical modulo the generated timestamp); the editing skills preserve it by keeping edges + facets correct. SB_KB_MOC=off disables.
  • One-shot seedkb-project-backfill.sh walks part_of ancestry from registry anchors and sets project: on members (idempotent, reversible). Optional; facets accrue from new edits otherwise.
  • projects is a known category (validate + FORGET PROTECT). MCP server → 2.4.0. Additive + back-compat: no facets ⇒ prior flat index.

Phases 2–3 (on-write facet preservation + relates→part_of promotion + maintainer plurality-vote; lint/drift enforcement) are separate future plans.

Release-gate review — 8 findings, 0 false positives, all CRITICAL/HIGH/MEDIUM fixed

A deep review (unit + architectural + history lenses) found real bugs; the last commit fixes them with regression tests:

  • de-hub + idempotency — graph-project + graph-cluster now skip the generated MOC dirs, so MOC [[slug]] hubs never re-enter our clustering and a MOC whose key is an edge endpoint isn't mangled by projection (2-reindex byte-identical test).
  • member-desc leakfirstSentence strips the projected ## Dependencies block before extracting a member's MOC description.
  • duplicate_slug — validate excludes generated MOCs (a project named after a page is no longer a recurring error).
  • stale prune — reindex deletes MOCs whose project dropped below the threshold (output depends only on current input).
  • NaN gateSB_MOC_MIN_MEMBERS garbage/0 falls back to 3 (was: gated nothing).
  • lint — orphan candidate set excludes projects/+themes/ (no false orphans).
  • backfill — deterministic file pick on duplicate basenames.

Verification

Full suite green (61 shell + 281→ vitest, 0 fail; 1 intentional skip); validate + version lockstep (0.23.0) + migration-row gates pass; pre-push hook green.

🤖 Generated with Claude Code

Cain-Ish and others added 9 commits June 2, 2026 15:40
Brainstormed + web-researched (5-facet parallel sweep: PARA/MOC, graph-hub
hygiene, GraphRAG hierarchical communities, taxonomy-vs-tags, auto-maintenance).
All facets converged: keep files FLAT (type-folders), make hierarchy a SOFT
overlay projected from the edge log — a project: facet + page->page part_of +
deterministically-generated project/theme MOC pages + a de-hubbed two-tier index.
Structure becomes a pure projection regenerated idempotently each reindex, so the
editing skills (extractor/maintainer/dream/reindex/lint) preserve it by keeping
edges + facets correct — fix the source, never re-file by hand.

Root cause confirmed from the live KB: 461 relates : 9 part_of (hierarchy drowned),
flat index.md hubs all 108 pages, no project facet so siblings scatter across
type-folders. Decision: one full spec (all phases), project assignment = registry +
LLM-fallback (staged). Honors the heterogeneous-main-groups invariant.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8 bite-sized TDD tasks for the deterministic read-side wins: parseDoc project
facet, pure project-MOC builder (>=3 gate), projects/ category, reindex MOC
projection, de-hubbed two-tier index (graph:exclude), idempotency guard, one-shot
part_of-ancestry backfill, and the build+gate+version task. Phases 2-3 (on-write
preservation, lint/drift enforcement) follow as their own plans.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Retargeted from the planned validate test (which would pass vacuously — validate
doesn't flag unknown sub-categories) to the real RED-able capability: a generated
project-MOC must be PROTECT:category like themes, else FORGET would archive it.
Also adds projects to KNOWN_CATEGORIES (created-date inference / type correctness).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dempotent)

Tasks 4-6: reindex now collects the project: facet per page, projects one
wiki/projects/<slug>.md MOC per project with >= SB_MOC_MIN_MEMBERS (default 3)
members (grouped by type, deterministic, FORGET-protected, graph:exclude), and
rebuilds index.md as a thin two-tier Home (## Maps of Content -> project/theme MOC
links + ## Categories -> per-type counts with plain-text slug rows) marked
graph:exclude so a viewer never hubs it. Pure projection: a second reindex is
byte-identical modulo the generated timestamp. SB_KB_MOC=off disables.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…versible)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
….4.0

Rebuild bundles (project-moc inlined into knowledge-reindex.bundle), bump plugin +
marketplace 0.22.5->0.23.0 lockstep + MCP server 2.3.1->2.4.0, add the 0.23.0
migration row (project: facet + project-MOC projection + de-hubbed two-tier index +
projects/ FORGET-protection + one-shot kb-project-backfill seed). Full suite green
(61 shell + 281 vitest); validate + migration-row gate pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The deep-review gate caught real bugs in Phase 1; all CRITICAL/HIGH/MEDIUM fixed:

- #1/#2 de-hub + idempotency: graph-project.ts and graph-cluster-cli.ts now SKIP the
  generated MOC dirs (projects/, themes/) — so MOC pages (pure [[slug]] hubs) never
  re-enter our own clustering, and projectGraphToPages never injects related:/##
  Dependencies into a MOC (which broke "pure projection" when a project key equals an
  edge endpoint). A 2-reindex byte-identical test now exercises this.
- firstSentence: strip the generated <!-- graph:begin..end --> block before extracting
  the member description, so a member without a description: frontmatter no longer
  leaks the projected ## Dependencies block into its MOC row.
- #3 collision: knowledge-validate excludes projects/+themes/ from the duplicate_slug
  check — a project named after an existing page (e.g. architecture-v1) is no longer a
  recurring error.
- #4 stale prune: reindex deletes projects/*.md whose project no longer qualifies, so
  output depends only on CURRENT input.
- #5 clamp: SB_MOC_MIN_MEMBERS NaN/0/negative now falls back to 3 (was: NaN gate made
  every single-member project a MOC).
- history#1 lint: orphan candidate set ($ALL) excludes projects/+themes/ so generated
  MOCs (path-style [[projects/x]] links, no authored inbound links) aren't false orphans.
- bash#1 backfill: find | sort | head -1 (deterministic file pick on duplicate basenames).

New regression tests: stale-prune, clamp, MOC-not-mangled (2-reindex), duplicate_slug
collision. Full suite green (61 shell + vitest). Advisory (writeProjectMocs helper
extraction) + a low backfill edge-case test deferred as non-blocking follow-ups.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 2, 2026 14:14

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces Phase 1 of hierarchical knowledge-base organization by adding a project: (and area:) frontmatter facet and using it to deterministically project project MOC pages plus a de-hubbed two-tier index.md, without moving any existing wiki files. It also updates validation/forget/lint behaviors to treat generated MOC directories as protected, non-source “view” outputs, and bumps the MCP server + plugin versions for release.

Changes:

  • Extend parseDoc to expose project/area facets and add tests.
  • Add deterministic project MOC projection (wiki/projects/<slug>.md) and de-hubbed index.md generation to knowledge_reindex, with idempotency + pruning behavior and tests.
  • Add an optional one-shot kb-project-backfill.sh (with tests) and update validation/forget scoring/lint to recognize and exclude generated MOC dirs; bump versions/docs.

Reviewed changes

Copilot reviewed 21 out of 57 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/test-wiki-forget-projects.sh Ensures generated type: projects MOCs are category-protected from FORGET.
tests/test-kb-project-backfill.sh Tests deterministic, bounded, idempotent project: backfill via part_of ancestry.
skills/upgrade/SKILL.md Adds the 0.23.0 migration row documenting Phase 1 behavior and optional seed steps.
skills/lint/SKILL.md Excludes generated MOC dirs from orphan detection to avoid false positives.
scripts/wiki-forget-score.sh Treats projects category as protected for FORGET candidate generation.
scripts/kb-project-backfill.sh New one-shot script to backfill project: from registry anchors + part_of graph.
mcp/src/tools/project-moc.ts New pure function for deterministic project MOC markdown generation.
mcp/src/tools/project-moc.test.ts Unit tests for project MOC generation determinism and gating.
mcp/src/tools/knowledge-validate.ts Recognizes projects category and excludes projects/themes from duplicate slug checks.
mcp/src/tools/knowledge-validate.test.ts Tests duplicate slug exemption for generated project MOCs.
mcp/src/tools/knowledge-search.ts Adds project/area fields to ParsedDoc and extracts them in parseDoc.
mcp/src/tools/knowledge-search.test.ts Tests parseDoc project facet extraction/defaulting.
mcp/src/tools/knowledge-reindex.ts Projects project MOCs, prunes stale MOCs, and generates de-hubbed two-tier index.
mcp/src/tools/knowledge-reindex.test.ts Integration tests for MOC writing, index format, idempotency, pruning, env clamp, and edge-endpoint mangle guard.
mcp/src/tools/graph-project.ts Skips projects/ and themes/ to avoid mangling generated MOCs during graph projection.
mcp/src/tools/graph-cluster-cli.ts Excludes generated MOC dirs from clustering input to prevent hub artifacts.
mcp/src/server.ts Bumps MCP server version to 2.4.0.
mcp/dist/tools/project-moc.test.js.map Built artifact for project-moc tests.
mcp/dist/tools/project-moc.test.js Built artifact for project-moc tests.
mcp/dist/tools/project-moc.test.d.ts.map Built artifact for project-moc test typings map.
mcp/dist/tools/project-moc.test.d.ts Built artifact for project-moc test typings.
mcp/dist/tools/project-moc.js.map Built artifact sourcemap for project-moc.
mcp/dist/tools/project-moc.js Built artifact for project-moc.
mcp/dist/tools/project-moc.d.ts.map Built artifact typings map for project-moc.
mcp/dist/tools/project-moc.d.ts Built artifact typings for project-moc.
mcp/dist/tools/knowledge-validate.test.js.map Built artifact sourcemap for updated validate tests.
mcp/dist/tools/knowledge-validate.test.js Built artifact for updated validate tests.
mcp/dist/tools/knowledge-validate.js.map Built artifact sourcemap for updated validate tool.
mcp/dist/tools/knowledge-validate.js Built artifact for updated validate tool.
mcp/dist/tools/knowledge-validate.d.ts.map Built artifact typings map for validate tool.
mcp/dist/tools/knowledge-validate.bundle.js Built bundled artifact including validate changes.
mcp/dist/tools/knowledge-search.test.js.map Built artifact sourcemap for updated search tests.
mcp/dist/tools/knowledge-search.test.js Built artifact for updated search tests.
mcp/dist/tools/knowledge-search.js Built artifact for updated search tool.
mcp/dist/tools/knowledge-search.d.ts.map Built artifact typings map for updated search tool.
mcp/dist/tools/knowledge-search.d.ts Built artifact typings for updated search tool.
mcp/dist/tools/knowledge-search-cli.bundle.js Built bundled artifact including search facet changes.
mcp/dist/tools/knowledge-reindex.test.js.map Built artifact sourcemap for updated reindex tests.
mcp/dist/tools/knowledge-reindex.test.js Built artifact for updated reindex tests.
mcp/dist/tools/knowledge-reindex.js.map Built artifact sourcemap for updated reindex tool.
mcp/dist/tools/knowledge-reindex.js Built artifact for updated reindex tool.
mcp/dist/tools/knowledge-reindex.d.ts.map Built artifact typings map for updated reindex tool.
mcp/dist/tools/knowledge-reindex.bundle.js Built bundled artifact including reindex + MOC projection changes.
mcp/dist/tools/graph-project.js.map Built artifact sourcemap for updated graph projection tool.
mcp/dist/tools/graph-project.js Built artifact for updated graph projection tool.
mcp/dist/tools/graph-project.d.ts.map Built artifact typings map for updated graph projection tool.
mcp/dist/tools/graph-cluster-cli.js.map Built artifact sourcemap for updated graph cluster CLI.
mcp/dist/tools/graph-cluster-cli.js Built artifact for updated graph cluster CLI.
mcp/dist/tools/graph-cluster-cli.bundle.js Built bundled artifact including MOC-dir exclusion.
mcp/dist/server.js Built artifact for MCP server version bump.
mcp/dist/server.bundle.js Built bundled artifact for MCP server version bump + tool changes.
mcp/dist/cli/sb-entry.bundle.js Built CLI bundle incorporating facet parsing changes.
docs/specs/2026-06-02-knowledge-base-hierarchical-organization-design.md New design spec documenting the model, goals, and projection approach.
docs/plans/2026-06-02-kb-hierarchical-organization-phase1.md New step-by-step implementation plan for Phase 1.
.claude-plugin/plugin.json Plugin version bump to 0.23.0.
.claude-plugin/marketplace.json Marketplace version bump to 0.23.0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 42 to 44
for (const filePath of files.sort()) {
const slug = filePath.split('/').pop()!.replace(/\.md$/, '');
try {
Comment on lines +71 to +73
for (const existing of await mocSlugs(projDir)) {
if (!mocs.has(existing)) { try { await fs.unlink(join(projDir, `${existing}.md`)); } catch { /* gone */ } }
}
Comment on lines +45 to 49
// Skip index.md and the generated MOC dirs (projects/, themes/) — they are pure
// projections; injecting related:/## Dependencies into a MOC would mangle it and break
// reindex idempotency.
if (file.endsWith('index.md') || /\/(projects|themes)\//.test(file)) continue;
const slug = slugFromPath(file);
Comment thread skills/lint/SKILL.md
Comment on lines +40 to +43
# All wiki page slugs (filename without .md). Exclude the generated MOC dirs
# (projects/, themes/) — like index.md they are auto-generated projections with no
# AUTHORED inbound links by design, so they must never be flagged as orphans.
ALL=$(find "$KD/wiki" -name '*.md' -type f -not -path '*/projects/*' -not -path '*/themes/*' | while read f; do basename "$f" .md; done | sort -u)
Comment on lines +31 to +33
[ -n "$f" ] || return 0
grep -qE '^project:' "$f" && return 0 # idempotent: never overwrite an existing facet
awk -v p="$proj" '
@Cain-Ish Cain-Ish merged commit b1064c6 into main Jun 2, 2026
2 checks passed
@Cain-Ish Cain-Ish deleted the feat/kb-hierarchical-organization branch June 2, 2026 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants