Skip to content

feat(kb): raw inbox (SP-2) — per-project staging for unprocessed material#24

Merged
Cain-Ish merged 9 commits into
mainfrom
feat/sp2-raw-inbox
Jun 3, 2026
Merged

feat(kb): raw inbox (SP-2) — per-project staging for unprocessed material#24
Cain-Ish merged 9 commits into
mainfrom
feat/sp2-raw-inbox

Conversation

@Cain-Ish

@Cain-Ish Cain-Ish commented Jun 3, 2026

Copy link
Copy Markdown
Owner

SP-2 — Raw Inbox (foundation + producer)

Gives the KB a per-project staging area for unprocessed material — the foundation SP-3 (setup deep-scan) fills and SP-4 (maintainer) drains. Spec: docs/specs/2026-06-03-raw-inbox-design.md.

What it adds

  • Storage: ~/.second-brain/projects/<slug>/raw/ (per-project, hot tier, never in knowledge_search).
  • Items: raw/<id>.md with provenance frontmatter (source, captured_at, captured_by, content_type, status, optional target_node/blob, gist). Binary → blob sidecar; URL → offline pointer (no fetch). Work-list is derived by scanning status — no index file, so it can't drift.
  • Producer: /second-brain:capture <path|url|"text"> (+ paste, --node, --list, --discard), idempotent via content-hash.
  • Contract: a raw group in kb-schema.json (dual-reader, searchable:false).
  • Surfacing: session-load backlog banner (open = unprocessed + malformed), SB_RAW_INBOX=off.

Deep-review gate (findings fixed before merge)

  • C1 (critical): setStatus now rejects path-traversal ids — --discard "../../wiki/page" can no longer rewrite an arbitrary .md.
  • W1 (high): all frontmatter values are newline-sanitized — a newline in source can no longer inject a fake status: line that silently flips an item to discarded.
  • I3 (inline capture records source: paste), W2 (.. cwd-basename guard), and banner/CLI count alignment (malformed items now visible at session start).
  • Clean: history/regression pass, the kb-schema unit, dedup/id-loop soundness.

Notes

  • No MCP server tool change — capture rides a standalone raw-capture-cli bundle; knowledge_validate stays wiki-scoped (raw self-validates via --list, per the spec's heterogeneous-groups contract). Server stays 2.6.4.
  • Additive + back-compat: raw/ appears lazily on first capture.
  • Tests: 9 raw-inbox.test.ts (incl. traversal + injection regressions), test-raw-capture.sh, test-raw-inbox-banner.sh, raw-group schema guards. Full suite 80 pass / 0 fail / 1 known skip.

Plugin 0.24.10 → 0.24.11 + migration row.

🤖 Generated with Claude Code

Cain-Ish and others added 9 commits June 3, 2026 18:22
Per-project raw staging area (~/.second-brain/projects/<slug>/raw/), markdown-item
format with provenance frontmatter, derived (drift-free) work-list, kb-schema 'raw'
group, /second-brain:capture producer, session-load backlog banner. Foundation +
real producer; SP-3 setup-scan + SP-4 maintainer drain deferred.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sk 1)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…k 2)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- C1 (critical): setStatus now rejects a path-traversal id, so a malicious
  '--discard ../../wiki/page' can no longer rewrite an arbitrary .md outside raw/.
- W1/#1 (high): serialize sanitizes newlines in every frontmatter value (fmValue),
  so a newline in 'source' can no longer inject a fake 'status:' line that the
  first-match parser reads back and silently flips the item to discarded.
- I3: inline capture records canonical 'source: paste' (was the raw text).
- W2: resolveSlug rejects a '..' cwd basename.
- Banner: count open backlog as total - (processed|discarded), matching the
  module's unprocessedCount (malformed items are now visible at session start too).
New regression tests for the traversal + injection exploits.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 3, 2026 21:18
@Cain-Ish Cain-Ish merged commit f919c10 into main Jun 3, 2026
1 check passed
@Cain-Ish Cain-Ish deleted the feat/sp2-raw-inbox branch June 3, 2026 21:18

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a per-project raw inbox staging area for unprocessed material in the KB, including a user-invocable capture producer, schema contract, and session-start backlog surfacing (SP-2).

Changes:

  • Introduces raw/ inbox module + thin capture CLI bundle, plus /second-brain:capture skill wiring.
  • Extends kb-schema.json (and both TS + bash readers) with a raw group (searchable:false).
  • Surfaces a raw-inbox backlog banner in session-load.sh and adds bash + vitest coverage.

Reviewed changes

Copilot reviewed 19 out of 45 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/test-raw-inbox-banner.sh Adds a banner-count regression test for “open backlog” calculation.
tests/test-raw-capture.sh Adds end-to-end coverage for the raw capture CLI bundle + skill wiring.
tests/test-kb-schema.sh Extends schema guard tests to ensure raw is visible to bash/json readers.
skills/upgrade/SKILL.md Adds a 0.24.11 migration row documenting the raw inbox feature.
skills/capture/SKILL.md New user-invocable /second-brain:capture skill that shells out to the CLI bundle.
scripts/session-load.sh Adds session-start raw inbox backlog banner gated by SB_RAW_INBOX.
scripts/kb-schema.sh Exports SB_RAW_DIR / SB_RAW_STATUSES from kb-schema.json.
mcp/src/tools/raw-inbox.ts New raw-inbox core module (capture/list/status/count + parsing/serialization).
mcp/src/tools/raw-inbox.test.ts New vitest suite covering capture modes, dedup, traversal/id safety, and malformed handling.
mcp/src/tools/raw-capture-cli.ts New CLI wrapper around raw-inbox for skill consumption (capture/paste/list/discard).
mcp/src/tools/doc-sources.ts Exports assertSafeSlug for reuse by raw-inbox.
mcp/src/constants/kb-schema.ts Exposes RAW_DIR / RAW_STATUSES from kb-schema.json on TS side.
mcp/src/constants/kb-schema.test.ts Verifies TS constants match kb-schema.json for the new raw group.
mcp/package.json Bundles the new raw-capture-cli with esbuild.
mcp/dist/tools/raw-inbox.test.js.map Built artifact for raw-inbox vitest file.
mcp/dist/tools/raw-inbox.test.js Built artifact for raw-inbox vitest file.
mcp/dist/tools/raw-inbox.test.d.ts.map Built typings map for raw-inbox vitest file.
mcp/dist/tools/raw-inbox.test.d.ts Built typings for raw-inbox vitest file.
mcp/dist/tools/raw-inbox.js.map Built artifact map for raw-inbox module.
mcp/dist/tools/raw-inbox.js Built artifact for raw-inbox module.
mcp/dist/tools/raw-inbox.d.ts.map Built typings map for raw-inbox module.
mcp/dist/tools/raw-inbox.d.ts Built typings for raw-inbox module.
mcp/dist/tools/raw-capture-cli.js.map Built artifact map for raw-capture CLI.
mcp/dist/tools/raw-capture-cli.js Built artifact for raw-capture CLI.
mcp/dist/tools/raw-capture-cli.d.ts.map Built typings map for raw-capture CLI.
mcp/dist/tools/raw-capture-cli.d.ts Built typings for raw-capture CLI.
mcp/dist/tools/knowledge-validate.bundle.js Updates bundled schema constants to include raw group.
mcp/dist/tools/knowledge-reindex.bundle.js Updates bundled schema constants to include raw group.
mcp/dist/tools/doc-sources.js.map Updates built artifact map due to assertSafeSlug export.
mcp/dist/tools/doc-sources.js Updates built artifact due to assertSafeSlug export.
mcp/dist/tools/doc-sources.d.ts.map Updates built typings map due to assertSafeSlug export.
mcp/dist/tools/doc-sources.d.ts Updates built typings due to assertSafeSlug export.
mcp/dist/server.bundle.js Updates bundled schema constants to include raw group.
mcp/dist/constants/kb-schema.test.js.map Updates built artifact map for TS schema tests.
mcp/dist/constants/kb-schema.test.js Updates built artifact for TS schema tests.
mcp/dist/constants/kb-schema.js.map Updates built artifact map for TS schema constants.
mcp/dist/constants/kb-schema.js Updates built artifact for TS schema constants (adds RAW_*).
mcp/dist/constants/kb-schema.d.ts.map Updates built typings map for TS schema constants.
mcp/dist/constants/kb-schema.d.ts Updates built typings for TS schema constants (adds RAW_*).
kb-schema.json Adds raw group contract (dir/tier/statuses/searchable).
docs/specs/2026-06-03-raw-inbox-design.md Adds SP-2 raw inbox design spec.
docs/plans/2026-06-03-raw-inbox.md Adds SP-2 implementation plan / checklist.
.claude-plugin/plugin.json Bumps plugin version to 0.24.11.
.claude-plugin/marketplace.json Bumps marketplace version to 0.24.11.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +151 to +153
const next = /^status:[ \t]*.*$/m.test(content)
? content.replace(/^status:[ \t]*.*$/m, `status: ${status}`)
: content.replace(/^---\r?\n/, `---\nstatus: ${status}\n`);
if (items.length === 0) console.log(' (empty — capture something, e.g. /second-brain:capture ./notes.md)');
} else if (action === 'discard') {
const id = rest[0];
if (!id) { console.log('usage: capture --discard <id>'); return; }
console.log(`${r.duplicate ? 'Already captured' : 'Captured'} ${r.id} (${kind}) — ${r.unprocessed} unprocessed.`);
} else {
const n = await unprocessedCount(brainDir, slug);
console.log(`usage: capture <path|url> | capture paste | capture --list | capture --discard <id> (${n} unprocessed)`);
Comment thread scripts/session-load.sh
Comment on lines +422 to +424
RAW_TOTAL=$(find "$RAW_DIR_PATH" -maxdepth 1 -name '*.md' 2>/dev/null | wc -l | tr -d ' ')
RAW_CLOSED=$(grep -rlE '^status: (processed|discarded)$' "$RAW_DIR_PATH" 2>/dev/null | wc -l | tr -d ' ')
RAW_N=$(( ${RAW_TOTAL:-0} - ${RAW_CLOSED:-0} ))
Comment on lines +18 to +20
RAW_TOTAL=$(find "$RAW" -maxdepth 1 -name '*.md' 2>/dev/null | wc -l | tr -d ' ')
RAW_CLOSED=$(grep -rlE '^status: (processed|discarded)$' "$RAW" 2>/dev/null | wc -l | tr -d ' ')
N=$(( RAW_TOTAL - RAW_CLOSED ))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants