feat(kb): setup deep-scan (SP-3) — seed the raw inbox from repo docs#25
Merged
Conversation
Curated high-signal repo docs -> raw inbox (captured_by: setup-scan), folded into setup, reusing doc-sources filterIgnored + SP-2 captureItem. Dry-run preview then confirm; cap SB_SCAN_MAX=50; secret + low-signal denylists; content-hash dedup. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Cross-OS (high): isHighSignal normalizes path separators before splitting — path.relative emits native sep, so on Windows 'docs\adr\x.md' was one segment, making the root rule match everything and the .env secret-anchor miss. Exported + unit-tested with backslash paths. - Setup capture block (high): the preview and capture run as separate bash fences (separate shells), so the capture block recomputes SCAN_ROOT_DIR/SCAN_CLI instead of referencing unset vars (the scan otherwise silently did nothing). Guarded by a test asserting the assignment appears in both fences. - glob follow:false (symlink-loop hang); dry-run now lists the over-cap paths so the preview hides nothing; CLI reports unreadable captures distinctly from dedup skips. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds SP-3 “setup deep-scan” to seed a project’s raw inbox from existing high-signal repo markdown docs during /second-brain:setup (preview first, then capture on explicit confirm), reusing existing ignore/junk filtering and SP-2 raw capture dedup.
Changes:
- Introduces
raw-scanmodule + CLI to curate markdown candidates, apply cap (SB_SCAN_MAX), and capture them ascaptured_by: setup-scan(dedup on re-run). - Wires the new scan step into
skills/setup(dry-run preview + explicit confirm → capture), and documents it in the upgrade migrations. - Adds vitest coverage for curation/cap/dry-run/dedup + an end-to-end bash test for CLI + setup wiring (including gitignore behavior).
Reviewed changes
Copilot reviewed 12 out of 29 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/test-setup-scan.sh | E2E test for preview/capture/dedup and setup skill wiring |
| skills/upgrade/SKILL.md | Adds 0.24.12 migration row documenting SP-3 deep-scan |
| skills/setup/SKILL.md | Adds deep-scan setup step and allows Bash(node *) |
| mcp/src/tools/raw-scan.ts | New scanner: curation heuristic, cap logic, capture loop |
| mcp/src/tools/raw-scan.test.ts | New vitest coverage for curation, Windows paths, cap, dry-run, dedup |
| mcp/src/tools/raw-scan-cli.ts | New CLI wrapper around runScan for --dry-run and capture |
| mcp/src/tools/doc-sources.ts | Exports filterIgnored for reuse by raw-scan |
| mcp/package.json | Bundles raw-scan-cli into dist/tools/raw-scan-cli.bundle.js |
| mcp/dist/tools/raw-scan.test.js.map | Built artifact for new test |
| mcp/dist/tools/raw-scan.test.js | Built artifact for new test |
| mcp/dist/tools/raw-scan.test.d.ts.map | Built artifact for new test typings |
| mcp/dist/tools/raw-scan.test.d.ts | Built artifact for new test typings |
| mcp/dist/tools/raw-scan.js.map | Built artifact for raw-scan module |
| mcp/dist/tools/raw-scan.js | Built artifact for raw-scan module |
| mcp/dist/tools/raw-scan.d.ts.map | Built artifact for raw-scan typings |
| mcp/dist/tools/raw-scan.d.ts | Built artifact for raw-scan typings |
| mcp/dist/tools/raw-scan-cli.js.map | Built artifact for raw-scan CLI |
| mcp/dist/tools/raw-scan-cli.js | Built artifact for raw-scan CLI |
| mcp/dist/tools/raw-scan-cli.d.ts.map | Built artifact for raw-scan CLI typings |
| mcp/dist/tools/raw-scan-cli.d.ts | Built artifact for raw-scan CLI typings |
| mcp/dist/tools/doc-sources.js.map | Built artifact update due to filterIgnored export |
| mcp/dist/tools/doc-sources.js | Built artifact update due to filterIgnored export |
| mcp/dist/tools/doc-sources.d.ts.map | Built artifact update due to filterIgnored export |
| mcp/dist/tools/doc-sources.d.ts | Built artifact update due to filterIgnored export |
| docs/specs/2026-06-03-setup-deep-scan-design.md | Design spec for SP-3 deep-scan |
| docs/plans/2026-06-03-setup-deep-scan.md | Implementation plan for SP-3 deep-scan |
| .claude-plugin/plugin.json | Version bump to 0.24.12 |
| .claude-plugin/marketplace.json | Version bump to 0.24.12 |
Comments suppressed due to low confidence (1)
mcp/src/tools/doc-sources.ts:56
filterIgnoredusesrelative(projectRoot, p).split('/')to detect junk directories. On Windows,path.relativereturns backslashes, so junk filtering (e.g.,node_modules) can fail and allow high-signal matches under junk dirs to be captured by the new raw scan (and also affects existing doc-sources scanning). Normalize separators before splitting, and (ideally) pass normalized rel paths togit check-ignoreas well.
export function filterIgnored(projectRoot: string, absPaths: string[]): string[] {
const nonJunk = absPaths.filter((p) => !relative(projectRoot, p).split('/').some((seg) => JUNK_DIRS.has(seg)));
if (nonJunk.length === 0) return [];
const rels = nonJunk.map((p) => relative(projectRoot, p));
const res = spawnSync('git', ['-C', projectRoot, 'check-ignore', '--stdin'], { input: rels.join('\n'), encoding: 'utf-8' });
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+161
to
+163
| SCAN_CLI="${CLAUDE_PLUGIN_ROOT}/mcp/dist/tools/raw-scan-cli.bundle.js" | ||
| SCAN_ROOT_DIR=$(git rev-parse --show-toplevel 2>/dev/null || pwd) | ||
| SCAN_ROOT="$SCAN_ROOT_DIR" node "$SCAN_CLI" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SP-3 — Setup Deep-Scan
/second-brain:setupgains a step that seeds the raw inbox from the repo's existing high-signal docs, so a fresh project starts with material for the maintainer (SP-4) to refine into wiki nodes. Builds on SP-2's raw inbox. Spec:docs/specs/2026-06-03-setup-deep-scan-design.md.What it does
raw-scan.ts): globs**/*.{md,markdown}→ high-signal include rules (root*.md· adocs/doc/adr/rfc/spec/decisions/.ai-docs/notesdir segment · basenameREADME|ARCHITECTURE|DESIGN|CONTRIBUTING|ROADMAP) → drop low-signal (CHANGELOG|LICENSE|*template*) + secrets (.env|*.pem|*.key|*secret*|*credential*) → reuseddoc-sources.filterIgnored(junk +git check-ignore) → byte-stable sort.captureItem({capturedBy:'setup-scan'}), capped atSB_SCAN_MAX=50, content-hash dedup on re-run.--dry-run, showing both the to-capture set and the over-cap remainder) then captures only on explicit confirm. Kill switchSB_SCAN_SKIP=1.Deep-review gate (findings fixed before merge)
isHighSignalnow normalizes path separators —path.relativeemits native sep, so on Windowsdocs\adr\x.mdwas collapsing to one segment (root rule matched everything;.envsecret-anchor missed). Exported + unit-tested with backslash paths.SCAN_ROOT_DIR/SCAN_CLIinstead of referencing unset vars (otherwise the scan silently did nothing). Guarded by a test asserting the assignment appears in both fences.glob follow:false(symlink-loop hang); dry-run lists over-cap paths (nothing hidden); CLI reports unreadable captures distinctly from dedup skips.Notes
setup-scanwas already a validcaptured_by; server stays 2.6.4). Reuse-heavy:filterIgnored+ the whole SP-2 capture path.Plugin 0.24.11 → 0.24.12 + migration row.
🤖 Generated with Claude Code