Skip to content

feat(kb): AI-native representation Phase 3 — maintenance + backfill — 0.24.3#16

Merged
Cain-Ish merged 11 commits into
mainfrom
feat/ai-block-phase3-maintenance
Jun 2, 2026
Merged

feat(kb): AI-native representation Phase 3 — maintenance + backfill — 0.24.3#16
Cain-Ish merged 11 commits into
mainfrom
feat/ai-block-phase3-maintenance

Conversation

@Cain-Ish

@Cain-Ish Cain-Ish commented Jun 2, 2026

Copy link
Copy Markdown
Owner

What — Phase 3 (the block lifecycle closes)

The AI-native block now gets refreshed, surfaced, and backfilled — not just created (Phase 1b) and consumed (Phase 2).

  1. Refresh on update (the automatic half). merge-project-update.sh's UPDATE path now replaces a complete <!-- ai:begin … ai:end --> region in place (injects one after the frontmatter when absent), so an authored block is refreshed with the page. extract-prompt.txt instructs the extractor to emit ai_block on update too. Idempotent, mawk-safe, and it leaves a malformed begin-without-end page untouched (never eats the body — the FORGET-bug class).
  2. Lint staleness. knowledge_validate gains ai_block_missing (a structured, substantive — ≥200 prose chars — page with no block → gentle warning; stubs / non-structured types / generated projects+themes MOCs exempt — spec §7 "warns on missing block"). /second-brain:lint Check 4 surfaces the same signal standalone (offline bash).
  3. Backfill. New deterministic, read-only kb-ai-block-candidates.sh (idempotent work-list of blockless structured pages) feeds the knowledge-maintainer's new Phase 4b, which authors a block per page from its existing prose only (never invents), renders via the CLI, injects between frontmatter and H1, self-checks via knowledge_validate, counts each against the 50/run cap, and runs explicit-invocation only (auto-dispatched maintenance runs skip it). The dream stays surface-only (counts blockless staging pages, recommends /second-brain:maintain; single authoring path = the maintainer; gated SB_DREAM_AI_BLOCKS).

MCP server → 2.6.1. Additive + back-compat throughout. The §7 timestamp block↔prose drift heuristic is deferred (the block carries no authored-time; the robust offline signal is structural-missing). Phase 2b (the block's own embedding) still needs embeddings → deferred.

Release-gate review — 11 findings, 0 false positives, all fixed + guarded

First pass (6 reviewers): CRITICAL a stray <!-- ai:begin --> in prose re-armed the refresh awk → duplicate region + runaway drop-to-EOF on this repo's own ai-block-documenting pages (fixed with a replaced latch); MED dedup continue skipped the refresh (hoisted above it); MED AI_BLOCK_SCHEMAS[ptype] hit Object.prototype keys (constructor/__proto__) → spurious flag + TypeError aborting the whole validate run (fixed at source: guarded schemaFor()); HIGH maintainer Autonomous Dispatch contradicted Phase 4b (reconciled); CRLF strip, env-override, <block-json> clarity, no-frontmatter inject. Re-review pass (2 reviewers): no regressions; one LOW single-quote type: strip parity — fixed.

End-to-end functional verification on the live 120-page KB (you asked — it's a KB-wide change)

  • knowledge_validate scans all 120 pages with no crash (the proto-pollution fix) → 107 backfill candidates (the migration scope; 13 stubs/MOCs correctly exempt).
  • knowledge_search returns ranked real hits; knowledge_reindex is byte-stable (idempotent modulo timestamp); knowledge_fetch block-tier falls back cleanly on a blockless page.
  • All three backfill surfaces agree at 107/120 (validate, kb-ai-block-candidates.sh, lint Check 4) after aligning their type-resolution.
  • This verification caught a real bug the diff-reviewers couldn't: lint Check 4's grep -q … && continue SIGPIPEs find under a job-control shell (the way /second-brain:lint pastes the block), silently reporting 0 — fixed with grep -l, plus a static anti-regression guard.

Verification

Full suite green (70 shell + vitest, 0 fail, 1 pre-existing skip); lockstep 0.24.3 / MCP 2.6.1; 8 TDD tasks each RED→GREEN; migration row added; live KB smoke as above.

🤖 Generated with Claude Code

Cain-Ish and others added 11 commits June 2, 2026 19:41
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… update (Phase 3 Tasks 1-2)

UPDATE path now replaces a complete ai:begin..ai:end region in place, injects one
after frontmatter when absent, and leaves a malformed (begin-without-end) page
untouched (never eats the body). extract-prompt instructs refresh on update.
mawk-safe via ENVIRON. Idempotent. Offline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sk 3)

Structured, substantive (>=200 prose chars), blockless page -> gentle warning
(spec 7 'warns on missing block'). Stubs, non-structured types, and generated
projects/themes MOCs exempt. Block-present pages never double-flagged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Standalone offline bash check mirrors knowledge_validate's ai_block_missing
signal in the lint idiom: structured pages (>=200 prose chars) lacking an
ai:begin block. mawk-safe (infm/drop). Defers backfill to /second-brain:maintain.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
kb-ai-block-candidates.sh: read-only, idempotent enumeration of blockless,
substantive, structured pages (TSV type/slug/path) -> the maintainer's Phase 4b
work-list. Mirrors kb-project-* deterministic-script pattern. mawk-safe.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… Task 6)

New Enrich sub-phase: consume kb-ai-block-candidates.sh -> extract field values
from existing prose (never invent) -> render via the CLI -> inject between
frontmatter and H1 -> self-check via knowledge_validate. Closed-vocab (six
types), counted against the 50/run cap, explicit-invocation only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Task 7)

Dream (skill + runner) is ai-block aware but surface-only: counts blockless
structured staging pages and recommends /second-brain:maintain. Does NOT author
in staging -- single authoring path through the maintainer (correct rationale;
reindex never overwrites ai-blocks). Gated SB_DREAM_AI_BLOCKS.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… + migration row

Version lockstep (plugin.json + marketplace.json), knowledge-base server 2.6.1
(validate gained ai_block_missing), upgrade migration row, rebuilt dist.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ives)

CRITICAL F1: merge-refresh REPLACE awk re-triggered on a stray <!-- ai:begin -->
in prose -> duplicate region + runaway drop-to-EOF (FORGET-class data loss on
this repo's own ai-block-documenting pages). Fixed with a 'replaced' latch
(first begin only; drop never re-arms).
MED F2: dedup early-continue skipped the block refresh (the dominant refresh
case) -> hoisted the refresh above the prose-dedup continue.
MED F3: AI_BLOCK_SCHEMAS[ptype] hit inherited Object.prototype keys
(constructor/__proto__) -> spurious flag + TypeError aborting the whole validate
run. Fixed at source: guarded schemaFor() accessor in ai-block.ts, used by
render/snippet/validate + the new validate call site.
LOW F4: frontmatter-strip regex LF-only -> CRLF false-positive. Now \r?\n.
MED/LOW F5: candidate+lint prose-awk dropped-to-EOF on an unterminated region
(regression of 59a9b25) -> buffer-emit at END so trailing prose still counts;
flexible ai:begin grep so an odd-spaced block-page is skipped not mis-listed.
HIGH F7: maintainer 'Autonomous Dispatch' contradicted Phase 4b
'explicit-invocation only' -> reconciled (auto-dispatched runs skip Phase 4b).
LOW W1/F8/merge#3: validate honors SB_AI_BLOCK_MIN_PROSE; <block-json> must be
valid JSON; inject guarded against a frontmatter-less page (no mid-prose block).
Tests added for every finding (inline-marker, dedup-skip, proto-key, CRLF,
unterminated-region, odd-spacing, negative prompt-guards). Full suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-verification fixes)

Caught by running the new surfaces against the live 120-page KB:
1. lint Check 4 used 'grep -q ... && continue' inside find|while. grep -q's
   early exit SIGPIPEs the upstream find under a job-control (monitor) shell --
   the way /second-brain:lint pastes the block -- ending the loop after one page,
   so the check silently reported 0 (worked as a script file, hence the guard
   test + candidate script passed). Switched both copies to 'grep -lE' (whole-
   file). Added a static anti-regression guard (a script-context dynamic test
   can't catch the inline-only failure).
2. Candidate script keyed on directory, listing 4 pages whose explicit
   frontmatter type: is non-structured/typo'd (type: index, type: concept) that
   knowledge_validate (canonical doc.type) correctly skips. Both bash surfaces
   now resolve the canonical type: -> all three surfaces agree at 107/120 on the
   live KB. Test added.

Live verification: validate scans 120 pages with no crash (proto fix), search
returns ranked hits, reindex byte-stable, fetch block-tier falls back cleanly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The canonical-type guard stripped " and , but not ' -> a single-quoted
type: 'security' would diverge from knowledge_validate (which strips both quote
styles), silently skipping the page in lint + the backfill work-list. Use
\047 (octal single-quote, mawk-safe) in both lockstep copies. Test added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 2, 2026 18:51
@Cain-Ish Cain-Ish merged commit ba32de9 into main Jun 2, 2026
2 checks passed
@Cain-Ish Cain-Ish deleted the feat/ai-block-phase3-maintenance branch June 2, 2026 18:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR advances the knowledge base’s AI-native block lifecycle (Phase 3) by ensuring authored <!-- ai:begin … ai:end --> regions are refreshed on page updates, surfacing “missing block” staleness signals in both MCP validation and the offline lint skill, and enabling deterministic backfill via a read-only candidate worklist plus maintainer/dream contract updates. It also bumps the plugin to 0.24.3 and the MCP server to 2.6.1, with extensive test coverage added around the new behaviors.

Changes:

  • Refresh/inject ai-block regions on update in merge-project-update.sh, and update the extractor prompt to emit ai_block on updates.
  • Add ai_block_missing warnings to knowledge_validate, plus an equivalent /second-brain:lint Check 4 and a deterministic kb-ai-block-candidates.sh enumerator for backfill work.
  • Update agent/skill contracts (maintainer Phase 4b, dream surface-only) and add regression/guard tests; rebuild bundled mcp/dist artifacts and bump versions.

Reviewed changes

Copilot reviewed 20 out of 37 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test-merge-ai-block-refresh.sh Adds end-to-end shell test coverage for refresh/inject/no-corrupt semantics on update.
tests/test-maintainer-ai-block-backfill.sh Adds grep-based guard ensuring maintainer contract includes Phase 4b backfill boundaries.
tests/test-lint-skill.sh Extends lint skill tests to validate presence/behavior of “Missing ai-block” Check 4 and guards.
tests/test-kb-ai-block-candidates.sh Adds deterministic/read-only behavior tests for the candidate enumerator script.
tests/test-dream-ai-block-parity.sh Adds guard ensuring dream remains surface-only for ai-blocks and honors kill switch.
skills/upgrade/SKILL.md Documents the 0.24.3 migration/feature row for Phase 3 functionality.
skills/lint/SKILL.md Adds Check 4 to surface substantive structured pages missing ai-blocks.
skills/dream/SKILL.md Updates dream skill to surface-only count/report missing ai-blocks behind SB_DREAM_AI_BLOCKS.
scripts/merge-project-update.sh Implements refresh/inject of ai-block regions on UPDATE before dedup early-continue.
scripts/kb-ai-block-candidates.sh Introduces deterministic TSV work-list enumerator for backfill candidates.
scripts/extract-prompt.txt Instructs extractor to emit ai_block for update actions as well as create.
mcp/src/tools/knowledge-validate.ts Adds ai_block_missing warning logic and env-configurable prose threshold.
mcp/src/tools/knowledge-validate.test.ts Adds vitest cases for ai_block_missing, MOC exemptions, and prototype-key safety.
mcp/src/tools/ai-block.ts Adds schemaFor() to prevent prototype-key lookups and uses it across consumers.
mcp/src/server.ts Bumps MCP server version to 2.6.1.
mcp/dist/tools/knowledge-validate.test.js.map Rebuilt bundle artifact reflecting updated validate tests.
mcp/dist/tools/knowledge-validate.test.js Rebuilt JS output for updated validate tests.
mcp/dist/tools/knowledge-validate.js.map Rebuilt source map for updated validate implementation.
mcp/dist/tools/knowledge-validate.js Rebuilt JS output for updated validate implementation.
mcp/dist/tools/knowledge-validate.d.ts.map Rebuilt typings map for updated validate types.
mcp/dist/tools/knowledge-validate.d.ts Updates validate issue type union to include ai_block_missing.
mcp/dist/tools/knowledge-validate.bundle.js Rebuilt bundled validate tool with new missing-block logic.
mcp/dist/tools/knowledge-search-cli.bundle.js Rebuilt bundle to incorporate schemaFor-safe lookup in snippet rendering path.
mcp/dist/tools/knowledge-reindex.bundle.js Rebuilt reindex bundle to incorporate schemaFor-safe lookup and validate changes.
mcp/dist/tools/ai-block.js.map Rebuilt map for ai-block module changes (schemaFor).
mcp/dist/tools/ai-block.js Rebuilt JS output for ai-block module changes (schemaFor).
mcp/dist/tools/ai-block.d.ts.map Rebuilt typings map for ai-block changes.
mcp/dist/tools/ai-block.d.ts Exposes schemaFor() in typings.
mcp/dist/tools/ai-block-render-cli.bundle.js Rebuilt render CLI bundle to use schemaFor-safe schema lookup.
mcp/dist/server.js Rebuilt server JS output with version bump and bundled changes.
mcp/dist/server.bundle.js Rebuilt server bundle with version bump and bundled changes.
mcp/dist/cli/sb-entry.bundle.js Rebuilt CLI entry bundle to incorporate schemaFor-safe snippet logic.
docs/plans/2026-06-02-ai-native-representation-phase3.md Adds Phase 3 implementation plan documentation (refresh/lint/backfill).
agents/knowledge-maintainer.md Adds Phase 4b maintainer contract for explicit-invocation-only ai-block backfill.
agents/dream-runner.md Adds dream runner contract to surface-only missing ai-block counts (no staging authoring).
.claude-plugin/plugin.json Bumps plugin version to 0.24.3.
.claude-plugin/marketplace.json Bumps marketplace version to 0.24.3.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// A structured page with this much prose (non-frontmatter, marked regions stripped) but no
// ai-block is a backfill candidate; shorter pages are legitimate stubs, exempt. Env-overridable
// in lockstep with kb-ai-block-candidates.sh / lint Check 4 (default 200).
const AI_BLOCK_MIN_PROSE = Number(process.env.SB_AI_BLOCK_MIN_PROSE) || 200;
.replace(/<!--\s*graph:begin[\s\S]*?graph:end\s*-->/g, '')
.replace(/<!--\s*theme:begin[\s\S]*?theme:end\s*-->/g, '')
.replace(/^---\r?\n[\s\S]*?\r?\n---\r?\n/, '');
if (prose.trim().length >= AI_BLOCK_MIN_PROSE) issues.push({
END { if (drop) printf "%s", buf }
' "$f" | tr -d '[:space:]' | wc -c)
[ "$prose" -ge "$MINPROSE" ] || continue
printf '%s\t%s\t%s\n' "$type" "$(basename "$f" .md)" "$f"
Comment thread skills/lint/SKILL.md
{ print }
END { if (drop) printf "%s", buf }
' "$f" | tr -d '[:space:]' | wc -c)
[ "$prose" -ge 200 ] && echo "MISSING-BLOCK: $type/$(basename "$f" .md) ($f)"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants