feat(kb): AI-native representation Phase 3 — maintenance + backfill — 0.24.3#16
Merged
Conversation
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… update (Phase 3 Tasks 1-2) UPDATE path now replaces a complete ai:begin..ai:end region in place, injects one after frontmatter when absent, and leaves a malformed (begin-without-end) page untouched (never eats the body). extract-prompt instructs refresh on update. mawk-safe via ENVIRON. Idempotent. Offline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sk 3) Structured, substantive (>=200 prose chars), blockless page -> gentle warning (spec 7 'warns on missing block'). Stubs, non-structured types, and generated projects/themes MOCs exempt. Block-present pages never double-flagged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Standalone offline bash check mirrors knowledge_validate's ai_block_missing signal in the lint idiom: structured pages (>=200 prose chars) lacking an ai:begin block. mawk-safe (infm/drop). Defers backfill to /second-brain:maintain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
kb-ai-block-candidates.sh: read-only, idempotent enumeration of blockless, substantive, structured pages (TSV type/slug/path) -> the maintainer's Phase 4b work-list. Mirrors kb-project-* deterministic-script pattern. mawk-safe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… Task 6) New Enrich sub-phase: consume kb-ai-block-candidates.sh -> extract field values from existing prose (never invent) -> render via the CLI -> inject between frontmatter and H1 -> self-check via knowledge_validate. Closed-vocab (six types), counted against the 50/run cap, explicit-invocation only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Task 7) Dream (skill + runner) is ai-block aware but surface-only: counts blockless structured staging pages and recommends /second-brain:maintain. Does NOT author in staging -- single authoring path through the maintainer (correct rationale; reindex never overwrites ai-blocks). Gated SB_DREAM_AI_BLOCKS. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… + migration row Version lockstep (plugin.json + marketplace.json), knowledge-base server 2.6.1 (validate gained ai_block_missing), upgrade migration row, rebuilt dist. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ives) CRITICAL F1: merge-refresh REPLACE awk re-triggered on a stray <!-- ai:begin --> in prose -> duplicate region + runaway drop-to-EOF (FORGET-class data loss on this repo's own ai-block-documenting pages). Fixed with a 'replaced' latch (first begin only; drop never re-arms). MED F2: dedup early-continue skipped the block refresh (the dominant refresh case) -> hoisted the refresh above the prose-dedup continue. MED F3: AI_BLOCK_SCHEMAS[ptype] hit inherited Object.prototype keys (constructor/__proto__) -> spurious flag + TypeError aborting the whole validate run. Fixed at source: guarded schemaFor() accessor in ai-block.ts, used by render/snippet/validate + the new validate call site. LOW F4: frontmatter-strip regex LF-only -> CRLF false-positive. Now \r?\n. MED/LOW F5: candidate+lint prose-awk dropped-to-EOF on an unterminated region (regression of 59a9b25) -> buffer-emit at END so trailing prose still counts; flexible ai:begin grep so an odd-spaced block-page is skipped not mis-listed. HIGH F7: maintainer 'Autonomous Dispatch' contradicted Phase 4b 'explicit-invocation only' -> reconciled (auto-dispatched runs skip Phase 4b). LOW W1/F8/merge#3: validate honors SB_AI_BLOCK_MIN_PROSE; <block-json> must be valid JSON; inject guarded against a frontmatter-less page (no mid-prose block). Tests added for every finding (inline-marker, dedup-skip, proto-key, CRLF, unterminated-region, odd-spacing, negative prompt-guards). Full suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-verification fixes) Caught by running the new surfaces against the live 120-page KB: 1. lint Check 4 used 'grep -q ... && continue' inside find|while. grep -q's early exit SIGPIPEs the upstream find under a job-control (monitor) shell -- the way /second-brain:lint pastes the block -- ending the loop after one page, so the check silently reported 0 (worked as a script file, hence the guard test + candidate script passed). Switched both copies to 'grep -lE' (whole- file). Added a static anti-regression guard (a script-context dynamic test can't catch the inline-only failure). 2. Candidate script keyed on directory, listing 4 pages whose explicit frontmatter type: is non-structured/typo'd (type: index, type: concept) that knowledge_validate (canonical doc.type) correctly skips. Both bash surfaces now resolve the canonical type: -> all three surfaces agree at 107/120 on the live KB. Test added. Live verification: validate scans 120 pages with no crash (proto fix), search returns ranked hits, reindex byte-stable, fetch block-tier falls back cleanly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The canonical-type guard stripped " and , but not ' -> a single-quoted type: 'security' would diverge from knowledge_validate (which strips both quote styles), silently skipping the page in lint + the backfill work-list. Use \047 (octal single-quote, mawk-safe) in both lockstep copies. Test added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR advances the knowledge base’s AI-native block lifecycle (Phase 3) by ensuring authored <!-- ai:begin … ai:end --> regions are refreshed on page updates, surfacing “missing block” staleness signals in both MCP validation and the offline lint skill, and enabling deterministic backfill via a read-only candidate worklist plus maintainer/dream contract updates. It also bumps the plugin to 0.24.3 and the MCP server to 2.6.1, with extensive test coverage added around the new behaviors.
Changes:
- Refresh/inject ai-block regions on
updateinmerge-project-update.sh, and update the extractor prompt to emitai_blockon updates. - Add
ai_block_missingwarnings toknowledge_validate, plus an equivalent/second-brain:lintCheck 4 and a deterministickb-ai-block-candidates.shenumerator for backfill work. - Update agent/skill contracts (maintainer Phase 4b, dream surface-only) and add regression/guard tests; rebuild bundled
mcp/distartifacts and bump versions.
Reviewed changes
Copilot reviewed 20 out of 37 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test-merge-ai-block-refresh.sh | Adds end-to-end shell test coverage for refresh/inject/no-corrupt semantics on update. |
| tests/test-maintainer-ai-block-backfill.sh | Adds grep-based guard ensuring maintainer contract includes Phase 4b backfill boundaries. |
| tests/test-lint-skill.sh | Extends lint skill tests to validate presence/behavior of “Missing ai-block” Check 4 and guards. |
| tests/test-kb-ai-block-candidates.sh | Adds deterministic/read-only behavior tests for the candidate enumerator script. |
| tests/test-dream-ai-block-parity.sh | Adds guard ensuring dream remains surface-only for ai-blocks and honors kill switch. |
| skills/upgrade/SKILL.md | Documents the 0.24.3 migration/feature row for Phase 3 functionality. |
| skills/lint/SKILL.md | Adds Check 4 to surface substantive structured pages missing ai-blocks. |
| skills/dream/SKILL.md | Updates dream skill to surface-only count/report missing ai-blocks behind SB_DREAM_AI_BLOCKS. |
| scripts/merge-project-update.sh | Implements refresh/inject of ai-block regions on UPDATE before dedup early-continue. |
| scripts/kb-ai-block-candidates.sh | Introduces deterministic TSV work-list enumerator for backfill candidates. |
| scripts/extract-prompt.txt | Instructs extractor to emit ai_block for update actions as well as create. |
| mcp/src/tools/knowledge-validate.ts | Adds ai_block_missing warning logic and env-configurable prose threshold. |
| mcp/src/tools/knowledge-validate.test.ts | Adds vitest cases for ai_block_missing, MOC exemptions, and prototype-key safety. |
| mcp/src/tools/ai-block.ts | Adds schemaFor() to prevent prototype-key lookups and uses it across consumers. |
| mcp/src/server.ts | Bumps MCP server version to 2.6.1. |
| mcp/dist/tools/knowledge-validate.test.js.map | Rebuilt bundle artifact reflecting updated validate tests. |
| mcp/dist/tools/knowledge-validate.test.js | Rebuilt JS output for updated validate tests. |
| mcp/dist/tools/knowledge-validate.js.map | Rebuilt source map for updated validate implementation. |
| mcp/dist/tools/knowledge-validate.js | Rebuilt JS output for updated validate implementation. |
| mcp/dist/tools/knowledge-validate.d.ts.map | Rebuilt typings map for updated validate types. |
| mcp/dist/tools/knowledge-validate.d.ts | Updates validate issue type union to include ai_block_missing. |
| mcp/dist/tools/knowledge-validate.bundle.js | Rebuilt bundled validate tool with new missing-block logic. |
| mcp/dist/tools/knowledge-search-cli.bundle.js | Rebuilt bundle to incorporate schemaFor-safe lookup in snippet rendering path. |
| mcp/dist/tools/knowledge-reindex.bundle.js | Rebuilt reindex bundle to incorporate schemaFor-safe lookup and validate changes. |
| mcp/dist/tools/ai-block.js.map | Rebuilt map for ai-block module changes (schemaFor). |
| mcp/dist/tools/ai-block.js | Rebuilt JS output for ai-block module changes (schemaFor). |
| mcp/dist/tools/ai-block.d.ts.map | Rebuilt typings map for ai-block changes. |
| mcp/dist/tools/ai-block.d.ts | Exposes schemaFor() in typings. |
| mcp/dist/tools/ai-block-render-cli.bundle.js | Rebuilt render CLI bundle to use schemaFor-safe schema lookup. |
| mcp/dist/server.js | Rebuilt server JS output with version bump and bundled changes. |
| mcp/dist/server.bundle.js | Rebuilt server bundle with version bump and bundled changes. |
| mcp/dist/cli/sb-entry.bundle.js | Rebuilt CLI entry bundle to incorporate schemaFor-safe snippet logic. |
| docs/plans/2026-06-02-ai-native-representation-phase3.md | Adds Phase 3 implementation plan documentation (refresh/lint/backfill). |
| agents/knowledge-maintainer.md | Adds Phase 4b maintainer contract for explicit-invocation-only ai-block backfill. |
| agents/dream-runner.md | Adds dream runner contract to surface-only missing ai-block counts (no staging authoring). |
| .claude-plugin/plugin.json | Bumps plugin version to 0.24.3. |
| .claude-plugin/marketplace.json | Bumps marketplace version to 0.24.3. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // A structured page with this much prose (non-frontmatter, marked regions stripped) but no | ||
| // ai-block is a backfill candidate; shorter pages are legitimate stubs, exempt. Env-overridable | ||
| // in lockstep with kb-ai-block-candidates.sh / lint Check 4 (default 200). | ||
| const AI_BLOCK_MIN_PROSE = Number(process.env.SB_AI_BLOCK_MIN_PROSE) || 200; |
| .replace(/<!--\s*graph:begin[\s\S]*?graph:end\s*-->/g, '') | ||
| .replace(/<!--\s*theme:begin[\s\S]*?theme:end\s*-->/g, '') | ||
| .replace(/^---\r?\n[\s\S]*?\r?\n---\r?\n/, ''); | ||
| if (prose.trim().length >= AI_BLOCK_MIN_PROSE) issues.push({ |
| END { if (drop) printf "%s", buf } | ||
| ' "$f" | tr -d '[:space:]' | wc -c) | ||
| [ "$prose" -ge "$MINPROSE" ] || continue | ||
| printf '%s\t%s\t%s\n' "$type" "$(basename "$f" .md)" "$f" |
| { print } | ||
| END { if (drop) printf "%s", buf } | ||
| ' "$f" | tr -d '[:space:]' | wc -c) | ||
| [ "$prose" -ge 200 ] && echo "MISSING-BLOCK: $type/$(basename "$f" .md) ($f)" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What — Phase 3 (the block lifecycle closes)
The AI-native block now gets refreshed, surfaced, and backfilled — not just created (Phase 1b) and consumed (Phase 2).
merge-project-update.sh's UPDATE path now replaces a complete<!-- ai:begin … ai:end -->region in place (injects one after the frontmatter when absent), so an authored block is refreshed with the page.extract-prompt.txtinstructs the extractor to emitai_blockonupdatetoo. Idempotent, mawk-safe, and it leaves a malformed begin-without-end page untouched (never eats the body — the FORGET-bug class).knowledge_validategainsai_block_missing(a structured, substantive — ≥200 prose chars — page with no block → gentle warning; stubs / non-structured types / generatedprojects+themesMOCs exempt — spec §7 "warns on missing block")./second-brain:lintCheck 4 surfaces the same signal standalone (offline bash).kb-ai-block-candidates.sh(idempotent work-list of blockless structured pages) feeds the knowledge-maintainer's new Phase 4b, which authors a block per page from its existing prose only (never invents), renders via the CLI, injects between frontmatter and H1, self-checks viaknowledge_validate, counts each against the 50/run cap, and runs explicit-invocation only (auto-dispatched maintenance runs skip it). The dream stays surface-only (counts blockless staging pages, recommends/second-brain:maintain; single authoring path = the maintainer; gatedSB_DREAM_AI_BLOCKS).MCP server → 2.6.1. Additive + back-compat throughout. The §7 timestamp block↔prose drift heuristic is deferred (the block carries no authored-time; the robust offline signal is structural-missing). Phase 2b (the block's own embedding) still needs embeddings → deferred.
Release-gate review — 11 findings, 0 false positives, all fixed + guarded
First pass (6 reviewers): CRITICAL a stray
<!-- ai:begin -->in prose re-armed the refresh awk → duplicate region + runaway drop-to-EOF on this repo's own ai-block-documenting pages (fixed with areplacedlatch); MED dedupcontinueskipped the refresh (hoisted above it); MEDAI_BLOCK_SCHEMAS[ptype]hitObject.prototypekeys (constructor/__proto__) → spurious flag + TypeError aborting the whole validate run (fixed at source: guardedschemaFor()); HIGH maintainerAutonomous Dispatchcontradicted Phase 4b (reconciled); CRLF strip, env-override,<block-json>clarity, no-frontmatter inject. Re-review pass (2 reviewers): no regressions; one LOW single-quotetype:strip parity — fixed.End-to-end functional verification on the live 120-page KB (you asked — it's a KB-wide change)
knowledge_validatescans all 120 pages with no crash (the proto-pollution fix) → 107 backfill candidates (the migration scope; 13 stubs/MOCs correctly exempt).knowledge_searchreturns ranked real hits;knowledge_reindexis byte-stable (idempotent modulo timestamp);knowledge_fetchblock-tier falls back cleanly on a blockless page.kb-ai-block-candidates.sh, lint Check 4) after aligning their type-resolution.grep -q … && continueSIGPIPEsfindunder a job-control shell (the way/second-brain:lintpastes the block), silently reporting 0 — fixed withgrep -l, plus a static anti-regression guard.Verification
Full suite green (70 shell + vitest, 0 fail, 1 pre-existing skip); lockstep 0.24.3 / MCP 2.6.1; 8 TDD tasks each RED→GREEN; migration row added; live KB smoke as above.
🤖 Generated with Claude Code