Skip to content

feat(kb): setup deep-scan (SP-3) — seed the raw inbox from repo docs#25

Merged
Cain-Ish merged 7 commits into
mainfrom
feat/sp3-setup-deep-scan
Jun 4, 2026
Merged

feat(kb): setup deep-scan (SP-3) — seed the raw inbox from repo docs#25
Cain-Ish merged 7 commits into
mainfrom
feat/sp3-setup-deep-scan

Conversation

@Cain-Ish

@Cain-Ish Cain-Ish commented Jun 4, 2026

Copy link
Copy Markdown
Owner

SP-3 — Setup Deep-Scan

/second-brain:setup gains a step that seeds the raw inbox from the repo's existing high-signal docs, so a fresh project starts with material for the maintainer (SP-4) to refine into wiki nodes. Builds on SP-2's raw inbox. Spec: docs/specs/2026-06-03-setup-deep-scan-design.md.

What it does

  • Curation (raw-scan.ts): globs **/*.{md,markdown} → high-signal include rules (root *.md · a docs/doc/adr/rfc/spec/decisions/.ai-docs/notes dir segment · basename README|ARCHITECTURE|DESIGN|CONTRIBUTING|ROADMAP) → drop low-signal (CHANGELOG|LICENSE|*template*) + secrets (.env|*.pem|*.key|*secret*|*credential*) → reused doc-sources.filterIgnored (junk + git check-ignore) → byte-stable sort.
  • Capture: survivors → raw inbox via SP-2 captureItem({capturedBy:'setup-scan'}), capped at SB_SCAN_MAX=50, content-hash dedup on re-run.
  • Surface: the setup step previews (--dry-run, showing both the to-capture set and the over-cap remainder) then captures only on explicit confirm. Kill switch SB_SCAN_SKIP=1.

Deep-review gate (findings fixed before merge)

  • Cross-OS (high): isHighSignal now normalizes path separators — path.relative emits native sep, so on Windows docs\adr\x.md was collapsing to one segment (root rule matched everything; .env secret-anchor missed). Exported + unit-tested with backslash paths.
  • Broken capture (high): the preview and capture run as separate bash fences (separate shells); the capture fence now recomputes SCAN_ROOT_DIR/SCAN_CLI instead of referencing unset vars (otherwise the scan silently did nothing). Guarded by a test asserting the assignment appears in both fences.
  • glob follow:false (symlink-loop hang); dry-run lists over-cap paths (nothing hidden); CLI reports unreadable captures distinctly from dedup skips.
  • History/regression pass: clean.

Notes

  • No new MCP server tool, no SP-2 schema change (setup-scan was already a valid captured_by; server stays 2.6.4). Reuse-heavy: filterIgnored + the whole SP-2 capture path.
  • Additive + opt-in: only writes on the user's confirm. Full suite 81 pass / 0 fail / 1 known skip.

Plugin 0.24.11 → 0.24.12 + migration row.

🤖 Generated with Claude Code

Cain-Ish and others added 7 commits June 3, 2026 23:43
Curated high-signal repo docs -> raw inbox (captured_by: setup-scan), folded into
setup, reusing doc-sources filterIgnored + SP-2 captureItem. Dry-run preview then
confirm; cap SB_SCAN_MAX=50; secret + low-signal denylists; content-hash dedup.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Cross-OS (high): isHighSignal normalizes path separators before splitting —
  path.relative emits native sep, so on Windows 'docs\adr\x.md' was one segment,
  making the root rule match everything and the .env secret-anchor miss. Exported +
  unit-tested with backslash paths.
- Setup capture block (high): the preview and capture run as separate bash fences
  (separate shells), so the capture block recomputes SCAN_ROOT_DIR/SCAN_CLI instead
  of referencing unset vars (the scan otherwise silently did nothing). Guarded by a
  test asserting the assignment appears in both fences.
- glob follow:false (symlink-loop hang); dry-run now lists the over-cap paths so the
  preview hides nothing; CLI reports unreadable captures distinctly from dedup skips.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 4, 2026 08:44
@Cain-Ish Cain-Ish merged commit a515e61 into main Jun 4, 2026
1 check passed
@Cain-Ish Cain-Ish deleted the feat/sp3-setup-deep-scan branch June 4, 2026 08:44

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds SP-3 “setup deep-scan” to seed a project’s raw inbox from existing high-signal repo markdown docs during /second-brain:setup (preview first, then capture on explicit confirm), reusing existing ignore/junk filtering and SP-2 raw capture dedup.

Changes:

  • Introduces raw-scan module + CLI to curate markdown candidates, apply cap (SB_SCAN_MAX), and capture them as captured_by: setup-scan (dedup on re-run).
  • Wires the new scan step into skills/setup (dry-run preview + explicit confirm → capture), and documents it in the upgrade migrations.
  • Adds vitest coverage for curation/cap/dry-run/dedup + an end-to-end bash test for CLI + setup wiring (including gitignore behavior).

Reviewed changes

Copilot reviewed 12 out of 29 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/test-setup-scan.sh E2E test for preview/capture/dedup and setup skill wiring
skills/upgrade/SKILL.md Adds 0.24.12 migration row documenting SP-3 deep-scan
skills/setup/SKILL.md Adds deep-scan setup step and allows Bash(node *)
mcp/src/tools/raw-scan.ts New scanner: curation heuristic, cap logic, capture loop
mcp/src/tools/raw-scan.test.ts New vitest coverage for curation, Windows paths, cap, dry-run, dedup
mcp/src/tools/raw-scan-cli.ts New CLI wrapper around runScan for --dry-run and capture
mcp/src/tools/doc-sources.ts Exports filterIgnored for reuse by raw-scan
mcp/package.json Bundles raw-scan-cli into dist/tools/raw-scan-cli.bundle.js
mcp/dist/tools/raw-scan.test.js.map Built artifact for new test
mcp/dist/tools/raw-scan.test.js Built artifact for new test
mcp/dist/tools/raw-scan.test.d.ts.map Built artifact for new test typings
mcp/dist/tools/raw-scan.test.d.ts Built artifact for new test typings
mcp/dist/tools/raw-scan.js.map Built artifact for raw-scan module
mcp/dist/tools/raw-scan.js Built artifact for raw-scan module
mcp/dist/tools/raw-scan.d.ts.map Built artifact for raw-scan typings
mcp/dist/tools/raw-scan.d.ts Built artifact for raw-scan typings
mcp/dist/tools/raw-scan-cli.js.map Built artifact for raw-scan CLI
mcp/dist/tools/raw-scan-cli.js Built artifact for raw-scan CLI
mcp/dist/tools/raw-scan-cli.d.ts.map Built artifact for raw-scan CLI typings
mcp/dist/tools/raw-scan-cli.d.ts Built artifact for raw-scan CLI typings
mcp/dist/tools/doc-sources.js.map Built artifact update due to filterIgnored export
mcp/dist/tools/doc-sources.js Built artifact update due to filterIgnored export
mcp/dist/tools/doc-sources.d.ts.map Built artifact update due to filterIgnored export
mcp/dist/tools/doc-sources.d.ts Built artifact update due to filterIgnored export
docs/specs/2026-06-03-setup-deep-scan-design.md Design spec for SP-3 deep-scan
docs/plans/2026-06-03-setup-deep-scan.md Implementation plan for SP-3 deep-scan
.claude-plugin/plugin.json Version bump to 0.24.12
.claude-plugin/marketplace.json Version bump to 0.24.12
Comments suppressed due to low confidence (1)

mcp/src/tools/doc-sources.ts:56

  • filterIgnored uses relative(projectRoot, p).split('/') to detect junk directories. On Windows, path.relative returns backslashes, so junk filtering (e.g., node_modules) can fail and allow high-signal matches under junk dirs to be captured by the new raw scan (and also affects existing doc-sources scanning). Normalize separators before splitting, and (ideally) pass normalized rel paths to git check-ignore as well.
export function filterIgnored(projectRoot: string, absPaths: string[]): string[] {
  const nonJunk = absPaths.filter((p) => !relative(projectRoot, p).split('/').some((seg) => JUNK_DIRS.has(seg)));
  if (nonJunk.length === 0) return [];
  const rels = nonJunk.map((p) => relative(projectRoot, p));
  const res = spawnSync('git', ['-C', projectRoot, 'check-ignore', '--stdin'], { input: rels.join('\n'), encoding: 'utf-8' });

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread skills/setup/SKILL.md
Comment on lines +161 to +163
SCAN_CLI="${CLAUDE_PLUGIN_ROOT}/mcp/dist/tools/raw-scan-cli.bundle.js"
SCAN_ROOT_DIR=$(git rev-parse --show-toplevel 2>/dev/null || pwd)
SCAN_ROOT="$SCAN_ROOT_DIR" node "$SCAN_CLI"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants