Skip to content

feat(audit): add oddkit_audit MCP action — Phase 2 PR-2.3a#143

Merged
klappy merged 6 commits intomainfrom
feat/oddkit-audit
Apr 27, 2026
Merged

feat(audit): add oddkit_audit MCP action — Phase 2 PR-2.3a#143
klappy merged 6 commits intomainfrom
feat/oddkit-audit

Conversation

@klappy
Copy link
Copy Markdown
Owner

@klappy klappy commented Apr 26, 2026

Summary

Phase 2 PR-2.3a of the link-rot-elimination campaign. Implements the oddkit_audit MCP action per klappy://docs/oddkit/specs/oddkit-audit (DRAFT v2 — KISS, landed in klappy/klappy.dev#142).

Walks every markdown file in canon, calls the same supersession-aware index lookup as oddkit_resolve on each klappy:// link target, emits structured findings for dead references and legacy markdown link patterns. Built for CI use; designed to be the mechanical replacement for the discipline that empirically failed.

This is one of two PRs landing the campaign's PR-2.3:

  • This PR (klappy/oddkit#TBD): the action implementation
  • Sibling PR (klappy/klappy.dev#TBD): three bug-class canon constraints from PR-2.1, bundled here per operator decision

What lands

  • workers/src/orchestrate.tsrunAudit function (~200 LOC) + audit case in dispatch switch + audit in VALID_ACTIONS array. Reads only the existing index + fetcher.getFile; no new caches.
  • workers/src/index.ts — standalone oddkit_audit tool definition + audit added to the unified router's action enum and description.
  • tests/cloudflare-production.test.sh — two new smoke tests (default-scope audit, narrow-scope audit honors paths filter).
  • CHANGELOG.md — 0.26.0 entry with full context, Vodka notes, and explicit list of what's deferred.
  • Version bumpspackage.json, workers/package.json, both lockfiles → 0.26.0.

Behavior contract

Input Status Behavior
Empty / no scope OK or FINDINGS Audits writings/, canon/, odd/, docs/ (excluding docs/archive/)
{ scope: { paths: ["writings/"] } } OK or FINDINGS Honors narrow scope
klappy://... link in markdown dead-reference finding (severity: error) if NOT_FOUND or circular
[label](/page/...) in writings/ legacy-link-pattern finding (severity: error)
[label](./*.md) in writings/ legacy-link-pattern finding (severity: error)
External URL, anchor, valid non-klappy path outside writings Ignored (not this action's job)
<!-- audit-allow: dead-reference reason="..." --> directive Suppresses next matching finding; appears in suppressed_findings field

Bounded by design

  • MAX_AUDIT_FILES=1000 per call (production canon is ~560 docs)
  • MAX_AUDIT_FINDINGS=500 per call (summary.truncated: true flags overflow)

Vodka discipline

v1 of the spec proposed four checks (dead-references + terminological-drift + projection-staleness + epoch-gaps) plus a deprecated-terms registry, epoch-completeness rules, and an audit_allow: frontmatter field. v2 cut to one check, two rule_ids, line-level allowlist only. Cuts captured with explicit revisit triggers in klappy://docs/planning/link-rot-deferred-concerns.

Three places updated for the new action surface

Per the lesson encoded in klappy://canon/constraints/oddkit-action-registration-completeness (landing in the sibling canon PR):

  1. ✅ Dispatch switch in handleUnifiedAction
  2. VALID_ACTIONS array
  3. ✅ Central router enum + standalone tool definition

This is the lesson Cursor Bugbot caught on PR-2.1; this PR proactively respects it.

Release-validation-gate (E0008.3)

This PR introduces a new action surface — load-bearing for the CI gate landing in Phase 3. Per klappy://canon/constraints/release-validation-gate, an independent Sonnet 4.6 validator should dispatch before promotion to verify:

  1. Default-scope audit returns shape: { status, summary, findings, scope }
  2. Real klappy:// URI that doesn't resolve → dead-reference finding with severity: error
  3. Real [label](/page/...) pattern in writings/ → legacy-link-pattern finding
  4. Real [label](klappy://valid-uri) → no finding (no false positives)
  5. Line-level <!-- audit-allow --> directive suppresses correctly (finding in suppressed_findings, not in findings)
  6. Backward-compat smoke: every existing action behaves identically (orient, challenge, gate, encode, search, get, resolve, catalog, validate, preflight, version, cleanup_storage)

Smoke tests in this PR cover (1) and partial (4). Validator should exercise (2)/(3)/(5)/(6) against the live preview after CF auto-deploy.

What this PR does NOT do

  • Does not implement CI workflow (.github/workflows/canon-quality.yml is Phase 3 PR-3.1)
  • Does not flip enforcement to hard-block (Phase 3 PR-3.2)
  • Does not add the three bug-class canon constraints (sibling PR in klappy/klappy.dev)
  • Does not include the deferred audit checks (terminological-drift, projection-staleness, epoch-gaps) — see deferred-concerns ledger

Sibling PR

The three bug-class lessons from PR-2.1 (this campaign's resolver implementation) land as canon constraints in a sibling PR in klappy/klappy.dev. Per the campaign sequencing amendment (klappy/klappy.dev#145), changes that touch multiple repos need explicit PR-per-repo. The two PRs are not strictly ordered — the constraints document patterns the audit honors; the audit ships the patterns as code. Either can merge first.

Refs

  • Spec: klappy://docs/oddkit/specs/oddkit-audit (DRAFT v2 — KISS)
  • Resolver dependency: klappy://docs/oddkit/specs/oddkit-resolve (in prod at v0.25.0)
  • Principle: klappy://canon/principles/identity-resolved-by-protocol
  • Campaign: klappy://docs/planning/link-rot-elimination-campaign
  • Deferred concerns: klappy://docs/planning/link-rot-deferred-concerns
  • Canon basis: klappy://canon/constraints/release-validation-gate, klappy://canon/principles/vodka-architecture, klappy://canon/principles/ritual-is-a-smell
  • Sibling canon PR (klappy/klappy.dev): TBD

Note

Medium Risk
Adds a new MCP action that scans many markdown files and performs supersession-chain resolution, which could affect worker latency/limits and introduce edge-case false positives/negatives. Existing actions are largely unchanged aside from action registration and input normalization for the new tool.

Overview
Introduces a new audit capability (oddkit_audit standalone tool and oddkit unified action) that scans markdown in a scoped set of paths (defaulting to writings/) and reports structured findings for dead klappy:// references (including supersession-chain cycles) and legacy link patterns (/page/... and ./*.md in writings), with optional line-level suppression via <!-- audit-allow: ... reason="..." -->.

Wires the new action into the worker router (VALID_ACTIONS + dispatch), adds input normalization so individual tools can pass object scope as JSON, and extends the Cloudflare production smoke tests to cover oddkit_audit basic response shape and scope filtering.

Bumps package/worker versions and documents the release as 0.26.0 in the changelog.

Reviewed by Cursor Bugbot for commit 6bc0595. Bugbot is set up for automated code reviews on this repo. Configure here.

…link detection

Phase 2 PR-2.3 of the link-rot-elimination campaign. Implements
oddkit_audit per klappy://docs/oddkit/specs/oddkit-audit (DRAFT v2 — KISS).

Walks every markdown file in scope (writings/, canon/, odd/, docs/,
excluding docs/archive/). For each link target:
  - klappy:// URI: resolves through the index (with same shape-tolerance
    as oddkit_resolve for superseded_by chains). NOT_FOUND or circular
    → dead-reference error.
  - /page/... or ./*.md in writings/: legacy-link-pattern error.
  - everything else (external, anchors, valid non-klappy paths): ignored.

Line-level allowlist via <!-- audit-allow: <rule-id> reason="..." -->.
Suppressed findings returned in a separate envelope field so reviewers
can challenge the reason.

Bounded by MAX_AUDIT_FILES=1000 and MAX_AUDIT_FINDINGS=500 with truncation
flagged in summary.truncated. Production canon is ~560 docs; well below cap.

Three places updated for the new action surface (per
klappy://canon/constraints/oddkit-action-registration-completeness):
  - dispatch switch in handleUnifiedAction
  - VALID_ACTIONS array
  - central router enum + standalone tool definition

Two new smoke tests added:
  - 14j: default-scope audit returns OK or FINDINGS with valid summary
  - 14k: narrow-scope audit honors paths filter

Vodka discipline preserved: v1 of spec proposed four checks plus
supporting registries; v2 cut to one check, two rule_ids. Other checks
deferred per klappy://docs/planning/link-rot-deferred-concerns.

Version bump: 0.25.0 → 0.26.0

Refs:
- Spec: klappy://docs/oddkit/specs/oddkit-audit (DRAFT v2)
- Resolver: klappy://docs/oddkit/specs/oddkit-resolve (in prod v0.25.0)
- Principle: klappy://canon/principles/identity-resolved-by-protocol
- Bug-class lessons (separate canon PR in klappy/klappy.dev):
  klappy://canon/constraints/oddkit-action-registration-completeness
  klappy://canon/constraints/superseded-by-shape-normalization
  klappy://canon/constraints/bash-test-rig-assignment-chain-discipline
- Canon basis: klappy://canon/constraints/release-validation-gate,
  klappy://canon/principles/vodka-architecture
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 26, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
oddkit 6bc0595 Commit Preview URL

Branch Preview URL
Apr 26 2026, 10:26 PM

Comment thread workers/src/orchestrate.ts
Comment thread workers/src/orchestrate.ts Outdated
Comment thread workers/src/orchestrate.ts
Comment thread workers/src/orchestrate.ts
Comment thread workers/src/orchestrate.ts
cursoragent and others added 2 commits April 26, 2026 21:18
…, findings cap

- Remove no-op ternary in handleUnifiedAction audit dispatch
- Preserve audit-allow suppression across blank/prose lines until a link is seen
- Surface suppression reason on suppressed findings via suppression_reason field
- Match runResolve.lookupSuccessor normalization in uriResolves (.md stem fallback)
- Honor MAX_AUDIT_FINDINGS within per-line loop to enforce the 500-per-call cap
Comment thread workers/src/orchestrate.ts Outdated
Comment thread workers/src/orchestrate.ts Outdated
Comment thread workers/src/orchestrate.ts Outdated
Comment thread workers/src/index.ts
…chema bridge

- audit: suppression directives now expire only on finding-producing
  links, not on out-of-scope links classifyLink ignores.
- audit: depth-cap exhaustion in uriResolves now matches runResolve --
  treat as circular only when the last entry still declares a successor.
- audit: drop unreachable uriExists helper; uriResolves is only invoked
  with klappy:// URIs, so an absent index entry is a definitive miss.
- bridge: normalize object input to a JSON string before calling
  handleUnifiedAction so UnifiedParams.input: string holds at runtime
  for oddkit_audit's union schema.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Suppression directive expires prematurely on non-matching findings
    • Removed the post-line expiration block (and its now-unused lineHadFinding tracker) so an audit-allow directive remains pending across non-matching findings until its rule_id-matched finding is encountered, restoring the documented "suppresses next matching finding" contract.
Preview (477bc2f2b3)
diff --git a/CHANGELOG.md b/CHANGELOG.md
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,32 @@
 
 ## [Unreleased]
 
+## [0.26.0] - 2026-04-26
+
+### Added
+
+- **`oddkit_audit` MCP action — mechanical detection of dead `klappy://` references and legacy markdown link patterns.** Per `klappy://docs/oddkit/specs/oddkit-audit` (DRAFT v2 — KISS). Walks every markdown file in the configured scope, classifies each link, emits structured findings. Two `rule_id`s: `dead-reference` (a `klappy://` URI that doesn't resolve through the index, including chains that end NOT_FOUND or cycle) and `legacy-link-pattern` (a `[label](/page/...)` or `[label](./*.md)` pattern in `writings/` — the patterns that caused the original reader complaints). Both severity `error` by default. Line-level allowlist via `<!-- audit-allow: <rule-id> reason="..." -->` directives. Returns suppressed findings separately so reviewers can challenge suppression reasons. Wired into the unified `oddkit` router (`action: "audit"`), exposed as a standalone `oddkit_audit` tool. Backward-compatible — purely additive. Internal supersession-walk shares normalization logic with `oddkit_resolve` (path/.md/URI shapes per `klappy://canon/constraints/superseded-by-shape-normalization`). Phase 2 PR-2.3 of the link-rot-elimination campaign.
+
+### Notes
+
+- **Vodka discipline preserved.** v1 of the spec proposed four checks (dead-references + terminological-drift + projection-staleness + epoch-gaps) plus a deprecated-terms registry, epoch-completeness rules, and an `audit_allow:` frontmatter field. v2 cut to one check, two rule_ids, line-level allowlist only. The other three checks moved to the deferred-concerns ledger with explicit revisit triggers.
+- **Three places updated for the new action surface** per `klappy://canon/constraints/oddkit-action-registration-completeness`: dispatch switch, `VALID_ACTIONS` array, central router enum + standalone tool definition. Smoke tests confirmed before push.
+- **No `PARTIAL_INDEX` status in v1.** Same as resolve: matches existing convention. If real cold-start visibility becomes load-bearing, follow-up.
+- **`since_commit` parameter accepted but ignored in v1.** The worker has no git access; CI workflows can pass file lists via `paths` instead. Documented in spec; reserves the field for a future implementation that reads from a git mirror or works against staged files.
+- **Bounded by `MAX_AUDIT_FILES=1000` and `MAX_AUDIT_FINDINGS=500`.** When truncated, `summary.truncated: true` flags it. Production canon is ~560 docs today; well below the cap.
+
+### Refs
+
+- Spec: `klappy://docs/oddkit/specs/oddkit-audit` (DRAFT v2 — KISS)
+- Resolver dependency: `klappy://docs/oddkit/specs/oddkit-resolve` (DRAFT v4 — in production at v0.25.0)
+- Principle: `klappy://canon/principles/identity-resolved-by-protocol`
+- Campaign: `klappy://docs/planning/link-rot-elimination-campaign`
+- Bug-class lessons (separate canon PR in klappy/klappy.dev):
+  - `klappy://canon/constraints/oddkit-action-registration-completeness`
+  - `klappy://canon/constraints/superseded-by-shape-normalization`
+  - `klappy://canon/constraints/bash-test-rig-assignment-chain-discipline`
+- Canon basis: `klappy://canon/constraints/release-validation-gate`, `klappy://canon/principles/vodka-architecture`, `klappy://canon/principles/ritual-is-a-smell`
+
 ## [0.25.0] - 2026-04-26
 
 ### Added

diff --git a/package-lock.json b/package-lock.json
--- a/package-lock.json
+++ b/package-lock.json
@@ -1,12 +1,12 @@
 {
   "name": "oddkit",
-  "version": "0.25.0",
+  "version": "0.26.0",
   "lockfileVersion": 3,
   "requires": true,
   "packages": {
     "": {
       "name": "oddkit",
-      "version": "0.25.0",
+      "version": "0.26.0",
       "license": "MIT",
       "dependencies": {
         "@modelcontextprotocol/sdk": "^1.0.0",

diff --git a/package.json b/package.json
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "oddkit",
-  "version": "0.25.0",
+  "version": "0.26.0",
   "description": "Agent-first CLI for ODD-governed repos. Epistemic terrain rendering with portable baseline.",
   "type": "module",
   "bin": {

diff --git a/tests/cloudflare-production.test.sh b/tests/cloudflare-production.test.sh
--- a/tests/cloudflare-production.test.sh
+++ b/tests/cloudflare-production.test.sh
@@ -494,6 +494,64 @@
   FAILED=$((FAILED + 1))
 fi
 
+# Test 14j: oddkit_audit — basic invocation, returns OK or FINDINGS
+# Per klappy://docs/oddkit/specs/oddkit-audit. Walks every klappy:// URI in canon
+# markdown and emits findings for those that don't resolve, plus legacy markdown
+# link patterns in writings/.
+echo ""
+echo "Test 14j: tools/call oddkit_audit (default scope)"
+RAW=$(curl -sf --max-time 120 "$WORKER_URL/mcp" -X POST \
+  -H "Content-Type: application/json" \
+  -H "Accept: application/json, text/event-stream" \
+  -d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"oddkit_audit","arguments":{}}}')
+RESULT=$(extract_json "$RAW")
+INNER=$(echo "$RESULT" | python3 -c "import sys, json; d=json.load(sys.stdin); print(d.get('result',{}).get('content',[{}])[0].get('text',''))" 2>/dev/null)
+if echo "$INNER" | python3 -c "
+import sys, json
+d = json.load(sys.stdin)
+r = d.get('result', {})
+status = r.get('status')
+assert status in ('OK', 'FINDINGS'), f'unexpected status: {status}'
+summary = r.get('summary', {})
+assert 'total_findings' in summary, 'missing summary.total_findings'
+assert 'by_severity' in summary, 'missing summary.by_severity'
+assert summary.get('files_scanned', 0) > 10, f'suspiciously few files scanned: {summary.get(\"files_scanned\")}'
+" 2>/dev/null; then
+  echo "PASS - audit returns OK or FINDINGS with valid summary"
+  PASSED=$((PASSED + 1))
+else
+  echo "FAIL - audit response shape unexpected"
+  echo "  Inner: $(echo "$INNER" | head -c 600)"
+  FAILED=$((FAILED + 1))
+fi
+
+# Test 14k: oddkit_audit — narrow scope (single path)
+echo ""
+echo "Test 14k: tools/call oddkit_audit (narrow scope: writings/ only)"
+RAW=$(curl -sf --max-time 120 "$WORKER_URL/mcp" -X POST \
+  -H "Content-Type: application/json" \
+  -H "Accept: application/json, text/event-stream" \
+  -d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"oddkit_audit","arguments":{"input":{"scope":{"paths":["writings/"]}}}}}')
+RESULT=$(extract_json "$RAW")
+INNER=$(echo "$RESULT" | python3 -c "import sys, json; d=json.load(sys.stdin); print(d.get('result',{}).get('content',[{}])[0].get('text',''))" 2>/dev/null)
+if echo "$INNER" | python3 -c "
+import sys, json
+d = json.load(sys.stdin)
+r = d.get('result', {})
+scope = r.get('scope', {})
+paths = scope.get('paths', [])
+assert paths == ['writings/'], f'scope echoed back unexpectedly: {paths}'
+status = r.get('status')
+assert status in ('OK', 'FINDINGS'), f'unexpected status: {status}'
+" 2>/dev/null; then
+  echo "PASS - audit honors narrow scope"
+  PASSED=$((PASSED + 1))
+else
+  echo "FAIL - audit narrow scope shape unexpected"
+  echo "  Inner: $(echo "$INNER" | head -c 600)"
+  FAILED=$((FAILED + 1))
+fi
+
 # ============================================
 # SECTION 4: Response Content Validation
 # ============================================

diff --git a/workers/package-lock.json b/workers/package-lock.json
--- a/workers/package-lock.json
+++ b/workers/package-lock.json
@@ -1,12 +1,12 @@
 {
   "name": "oddkit-mcp-worker",
-  "version": "0.25.0",
+  "version": "0.26.0",
   "lockfileVersion": 3,
   "requires": true,
   "packages": {
     "": {
       "name": "oddkit-mcp-worker",
-      "version": "0.25.0",
+      "version": "0.26.0",
       "dependencies": {
         "agents": "^0.4.1",
         "fflate": "^0.8.2",

diff --git a/workers/package.json b/workers/package.json
--- a/workers/package.json
+++ b/workers/package.json
@@ -1,6 +1,6 @@
 {
   "name": "oddkit-mcp-worker",
-  "version": "0.25.0",
+  "version": "0.26.0",
   "private": true,
   "type": "module",
   "scripts": {

diff --git a/workers/src/index.ts b/workers/src/index.ts
--- a/workers/src/index.ts
+++ b/workers/src/index.ts
@@ -193,13 +193,14 @@
 
   server.tool(
     "oddkit",
-    `Epistemic guide for Outcomes-Driven Development. Routes to orient, challenge, gate, encode, search, get, resolve, catalog, validate, preflight, version, or cleanup_storage actions.
+    `Epistemic guide for Outcomes-Driven Development. Routes to orient, challenge, gate, encode, search, get, resolve, audit, catalog, validate, preflight, version, or cleanup_storage actions.
 
 Use when:
 - Starting work: action="orient" to assess epistemic mode
 - Policy/canon questions: action="search" with your query
 - Fetching a specific doc: action="get" with URI
 - Resolving a URI to its current canonical answer (walks supersession): action="resolve" with URI
+- Auditing canon for dead references and legacy link patterns: action="audit" (CI use)
 - Pressure-testing claims: action="challenge"
 - Checking transition readiness: action="gate"
 - Recording decisions: action="encode"
@@ -208,7 +209,7 @@
 - Listing available docs: action="catalog"`,
     {
       action: z.enum([
-        "orient", "challenge", "gate", "encode", "search", "get", "resolve",
+        "orient", "challenge", "gate", "encode", "search", "get", "resolve", "audit",
         "catalog", "validate", "preflight", "version", "cleanup_storage",
       ]).describe("Which epistemic action to perform."),
       input: z.string().describe("Primary input — query, claim, URI, goal, or completion claim depending on action."),
@@ -347,6 +348,16 @@
       annotations: { readOnlyHint: true, destructiveHint: false, idempotentHint: true, openWorldHint: true },
     },
     {
+      name: "oddkit_audit",
+      description: "Walk every klappy:// URI in canon markdown and emit findings for those that don't resolve, plus any legacy markdown link patterns (/page/..., ./*.md) in writings/. Returns structured findings with rule_id, severity, location, occurrence, message. Designed for CI use. Per klappy://docs/oddkit/specs/oddkit-audit (DRAFT v2 — KISS).",
+      action: "audit",
+      schema: {
+        input: z.union([z.string(), z.object({}).passthrough()]).optional().describe("Optional scope: { paths: string[], since_commit?: string }. Default scope: writings/, canon/, odd/, docs/ (excluding docs/archive/). Pass as object or JSON string."),
+        knowledge_base_url: z.string().optional().describe("Optional: GitHub repo URL for your knowledge base. When set, strict mode is automatic: missing files fall through to the bundled governance tier."),
+      },
+      annotations: { readOnlyHint: true, destructiveHint: false, idempotentHint: true, openWorldHint: true },
+    },
+    {
       name: "oddkit_catalog",
       description: "Lists available documentation with categories, counts, and start-here suggestions. Supports temporal discovery: use sort_by='date' to get recent articles with full frontmatter metadata.",
       action: "catalog",
@@ -405,9 +416,19 @@
       tool.schema,
       tool.annotations,
       async (args: Record<string, unknown>) => {
+        // Most tools declare `input` as a string, but oddkit_audit accepts
+        // an object scope as well. Normalize objects to a JSON string so
+        // the UnifiedParams.input: string contract holds for every action.
+        const rawInput = args.input;
+        const normalizedInput =
+          typeof rawInput === "string"
+            ? rawInput
+            : rawInput && typeof rawInput === "object"
+              ? JSON.stringify(rawInput)
+              : "";
         const result = await handleUnifiedAction({
           action: tool.action,
-          input: (args.input as string) || "",
+          input: normalizedInput,
           context: args.context as string | undefined,
           mode: args.mode as string | undefined,
           knowledge_base_url: args.knowledge_base_url as string | undefined,

diff --git a/workers/src/orchestrate.ts b/workers/src/orchestrate.ts
--- a/workers/src/orchestrate.ts
+++ b/workers/src/orchestrate.ts
@@ -1775,6 +1775,303 @@
   return "/" + uri.slice("klappy://".length);
 }
 
+// ──────────────────────────────────────────────────────────────────────────────
+// runAudit — mechanical detection of dead klappy:// references and legacy
+// markdown link patterns.
+//
+// Per klappy://docs/oddkit/specs/oddkit-audit (DRAFT v2 — KISS): walk every
+// `klappy://` URI in canon, call resolve internally on each, report findings.
+// Plus one additional rule: legacy markdown patterns `/page/...` and
+// `./*.md` in writings/ are emitted as `legacy-link-pattern` errors.
+//
+// One check, two rule_ids. Other audit checks (terminological-drift,
+// projection-staleness, epoch-gap) are deferred per
+// klappy://docs/planning/link-rot-deferred-concerns.
+// ──────────────────────────────────────────────────────────────────────────────
+
+interface AuditFinding {
+  rule_id: "dead-reference" | "legacy-link-pattern";
+  severity: "error" | "warning";
+  location: { path: string; line: number };
+  occurrence: string;
+  message: string;
+  suppression_reason?: string;
+}
+
+interface AuditScope {
+  paths?: string[];
+  // since_commit is part of the spec but not implementable from the worker without
+  // git access. CI workflows can pass file lists via paths instead. Documented in
+  // the action's input schema; ignored here for v1.
+  since_commit?: string;
+}
+
+const DEFAULT_AUDIT_PATHS = ["writings/", "canon/", "odd/", "docs/"];
+const AUDIT_EXCLUDE_PREFIXES = ["docs/archive/"];
+const MAX_AUDIT_FILES = 1000;
+const MAX_AUDIT_FINDINGS = 500;
+
+// Match [label](target) — non-greedy label, balanced-paren-naive target (good
+// enough for the link forms canon uses; nested parens in URIs are rare and
+// handled by simply taking up to the first `)`).
+const MARKDOWN_LINK_RE = /\[([^\]]*?)\]\(([^)\s]+)(?:\s+"[^"]*")?\)/g;
+
+// Match the line-level allowlist directive. Captures: rule_id, optional reason.
+//   <!-- audit-allow: dead-reference reason="placeholder" -->
+const AUDIT_ALLOW_RE = /<!--\s*audit-allow:\s*([a-z-]+)(?:\s+reason="([^"]*)")?\s*-->/;
+
+async function runAudit(
+  input: AuditScope | string | undefined,
+  fetcher: KnowledgeBaseFetcher,
+  knowledgeBaseUrl?: string,
+  state?: OddkitState,
+): Promise<ActionResult> {
+  const startMs = Date.now();
+  const updatedState = state ? initState(state) : undefined;
+
+  // Normalize input: accept scope object, JSON string, or undefined (= defaults).
+  let scope: AuditScope = {};
+  if (typeof input === "string" && input.trim().length > 0) {
+    try {
+      const parsed = JSON.parse(input);
+      if (parsed && typeof parsed === "object" && !Array.isArray(parsed)) {
+        scope = parsed.scope || parsed;
+      }
+    } catch {
+      // Ignore — empty scope is a valid full-default audit
+    }
+  } else if (input && typeof input === "object" && !Array.isArray(input)) {
+    scope = (input as { scope?: AuditScope }).scope || (input as AuditScope);
+  }
+
+  const paths = Array.isArray(scope.paths) && scope.paths.length > 0
+    ? scope.paths
+    : DEFAULT_AUDIT_PATHS;
+
+  const index = await fetcher.getIndex(knowledgeBaseUrl);
+
+  // Build URI lookup for inline resolution. Same logic as runResolve's
+  // lookupSuccessor + initial lookup, but inlined here to avoid the overhead
+  // of constructing full ActionResult envelopes for each URI.
+  const byUri = new Map<string, IndexEntry>();
+  const byPath = new Map<string, IndexEntry>();
+  for (const entry of index.entries) {
+    if (entry.uri) byUri.set(entry.uri, entry);
+    if (entry.path) byPath.set(entry.path, entry);
+  }
+
+  // Walk a klappy:// URI through any superseded_by chain to a terminus,
+  // matching runResolve's algorithm. Returns true iff the chain reaches
+  // a stable terminus (FOUND); false on NOT_FOUND or CIRCULAR_SUPERSESSION.
+  // Only invoked from classifyLink with klappy:// URIs, so an absent entry
+  // is a definitive NOT_FOUND.
+  function uriResolves(uri: string): boolean {
+    const start = byUri.get(uri);
+    if (!start) return false;
+    let current: IndexEntry = start;
+    const visited = new Set<string>([current.uri]);
+    for (let depth = 0; depth < 16; depth++) {
+      const fm = current.frontmatter || {};
+      const next = fm.superseded_by;
+      if (typeof next !== "string" || next.length === 0) return true;
+      // Resolve next via same shape-tolerance as runResolve.lookupSuccessor
+      let nextEntry: IndexEntry | undefined = byUri.get(next) || byPath.get(next);
+      if (!nextEntry && !next.startsWith("klappy://") && !next.endsWith(".md")) {
+        nextEntry = byPath.get(next + ".md");
+      }
+      if (!nextEntry && !next.startsWith("klappy://")) {
+        const stem = next.endsWith(".md") ? next.slice(0, -".md".length) : next;
+        nextEntry = byUri.get("klappy://" + stem);
+      }
+      if (!nextEntry) {
+        // Chain points at unknown successor — runResolve treats this as FOUND
+        // with warning; the audit treats the URI as "resolves" because the
+        // last known entry is a real document.
+        return true;
+      }
+      const nextCanonical = nextEntry.uri;
+      if (visited.has(nextCanonical)) return false; // circular
+      visited.add(nextCanonical);
+      current = nextEntry;
+    }
+    // Depth-cap exhausted — match runResolve: only circular if the last
+    // entry still declares a further successor. Otherwise the chain
+    // properly terminates and the URI resolves.
+    const finalFm = current.frontmatter || {};
+    if (typeof finalFm.superseded_by === "string" && finalFm.superseded_by.length > 0) {
+      return false;
+    }
+    return true;
+  }
+
+  // Filter the index to markdown files within the configured scope.
+  const inScope = (path: string): boolean => {
+    if (!path.endsWith(".md")) return false;
+    if (AUDIT_EXCLUDE_PREFIXES.some((p) => path.startsWith(p))) return false;
+    return paths.some((p) => path.startsWith(p));
+  };
+
+  const targetPaths = index.entries
+    .filter((e) => inScope(e.path))
+    .map((e) => e.path)
+    .slice(0, MAX_AUDIT_FILES);
+
+  const findings: AuditFinding[] = [];
+  const suppressedFindings: AuditFinding[] = [];
+  let truncated = false;
+  let filesScanned = 0;
+
+  for (const path of targetPaths) {
+    if (findings.length >= MAX_AUDIT_FINDINGS) {
+      truncated = true;
+      break;
+    }
+    const content = await fetcher.getFile(path, knowledgeBaseUrl);
+    if (!content) continue;
+    filesScanned++;
+    const isWriting = path.startsWith("writings/");
+
+    const lines = content.split("\n");
+    // Track allowlist directives: when one appears, it suppresses the next
+    // finding of the matching rule_id on the *next* link (any subsequent line).
+    let pendingSuppress: { rule: string; reason: string | null; lineSeen: number } | null = null;
+
+    for (let lineIdx = 0; lineIdx < lines.length; lineIdx++) {
+      if (truncated) break;
+      const line = lines[lineIdx];
+
+      // Check for allowlist directive on this line
+      const allowMatch = AUDIT_ALLOW_RE.exec(line);
+      if (allowMatch) {
+        pendingSuppress = {
+          rule: allowMatch[1],
+          reason: allowMatch[2] || null,
+          lineSeen: lineIdx + 1,
+        };
+        // Don't continue — allowlist directives may sit on a line that also
+        // contains a link they are NOT meant to suppress (rare, but possible).
+        // The directive applies to the next link encountered.
+      }
+
+      // Reset link-finder regex state per line
+      MARKDOWN_LINK_RE.lastIndex = 0;
+      let linkMatch: RegExpExecArray | null;
+      while ((linkMatch = MARKDOWN_LINK_RE.exec(line)) !== null) {
+        const target = linkMatch[2];
+
+        const finding = classifyLink(target, path, lineIdx + 1, isWriting, uriResolves);
+        if (!finding) continue;
+
+        // Apply pending suppression if the rule matches
+        if (pendingSuppress && pendingSuppress.rule === finding.rule_id) {
+          if (pendingSuppress.reason) {
+            finding.suppression_reason = pendingSuppress.reason;
+          }
+          suppressedFindings.push(finding);
+          pendingSuppress = null;
+          continue;
+        }
+
+        findings.push(finding);
+        if (findings.length >= MAX_AUDIT_FINDINGS) {
+          truncated = true;
+          break;
+        }
+      }
+    }
+  }
+
+  const errorCount = findings.filter((f) => f.severity === "error").length;
+  const warningCount = findings.filter((f) => f.severity === "warning").length;
+
+  const status: "OK" | "FINDINGS" =
+    findings.length === 0 ? "OK" : "FINDINGS";
+
+  const summaryByRule: Record<string, number> = {};
+  for (const f of findings) {
+    summaryByRule[f.rule_id] = (summaryByRule[f.rule_id] || 0) + 1;
+  }
+
+  return {
+    action: "audit",
+    result: {
+      status,
+      summary: {
+        total_findings: findings.length,
+        by_severity: { error: errorCount, warning: warningCount },
+        by_rule: summaryByRule,
+        files_scanned: filesScanned,
+        suppressed_count: suppressedFindings.length,
+        truncated,
+      },
+      findings,
+      ...(suppressedFindings.length > 0 ? { suppressed_findings: suppressedFindings } : {}),
+      scope: { paths, excluded_prefixes: AUDIT_EXCLUDE_PREFIXES },
+    },
+    state: updatedState,
+    assistant_text:
+      findings.length === 0
+        ? `Audited ${filesScanned} files. No findings.`
+        : `Audited ${filesScanned} files. ${errorCount} error${errorCount === 1 ? "" : "s"}, ${warningCount} warning${warningCount === 1 ? "" : "s"}.${suppressedFindings.length > 0 ? ` ${suppressedFindings.length} suppressed.` : ""}${truncated ? ` Truncated at ${MAX_AUDIT_FINDINGS} findings.` : ""}`,
+    debug: { duration_ms: Date.now() - startMs, generated_at: new Date().toISOString() },
+  };
+}
+
+/**
+ * Classify a single markdown link target.
+ * Returns null when the target is out of scope (external URL, anchor, valid
+ * non-klappy path outside writings) — those are not this action's job.
+ */
+function classifyLink(
+  target: string,
+  filePath: string,
+  line: number,
+  isWriting: boolean,
+  uriResolves: (uri: string) => boolean,
+): AuditFinding | null {
+  // Strip fragment for resolution check
+  const bareTarget = target.split("#")[0];
+  if (!bareTarget) return null; // pure anchor link
+
+  if (bareTarget.startsWith("klappy://")) {
+    if (!uriResolves(bareTarget)) {
+      return {
+        rule_id: "dead-reference",
+        severity: "error",
+        location: { path: filePath, line },
+        occurrence: target,
+        message: "URI does not resolve",
+      };
+    }
+    return null;
+  }
+
+  if (isWriting) {
+    if (bareTarget.startsWith("/page/")) {
+      return {
+        rule_id: "legacy-link-pattern",
+        severity: "error",
+        location: { path: filePath, line },
+        occurrence: target,
+        message: "Use a klappy:// URI instead of /page/ path",
+      };
+    }
+    if (bareTarget.startsWith("./") && bareTarget.endsWith(".md")) {
+      return {
+        rule_id: "legacy-link-pattern",
+        severity: "error",
+        location: { path: filePath, line },
+        occurrence: target,
+        message: "Use a klappy:// URI instead of relative .md path",
+      };
+    }
+  }
+
+  // Out of scope: external URLs, mailto, anchors-only, valid non-klappy paths
+  // outside writings, etc.
+  return null;
+}
+
 async function runCleanupStorage(
   fetcher: KnowledgeBaseFetcher,
   knowledgeBaseUrl?: string,
@@ -2935,6 +3232,7 @@
   "search",
   "get",
   "resolve",
+  "audit",
   "catalog",
   "validate",
   "preflight",
@@ -2983,6 +3281,9 @@
       case "resolve":
         result = await runResolve(input, fetcher, knowledge_base_url, state);
         break;
+      case "audit":
+        result = await runAudit(input, fetcher, knowledge_base_url, state);
+        break;
       case "catalog":
         result = await runCatalog(fetcher, knowledge_base_url, state, { sort_by, limit, offset, filter_epoch });
         break;

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 164e69d. Configure here.

Comment thread workers/src/orchestrate.ts Outdated
…surface

CF Preview test 14j (default-scope audit) timed out at 120s on the prior
default of [writings/, canon/, odd/, docs/]. Cold-cache fetching ~560
files through the worker's zip-extract path exceeded the curl budget.

v1 default scope is writings/ only. Reasons it's honest, not a hack:
- PR-2.2's actual cleanup was writings-only; the campaign motivation
  was reader complaints about broken links in published essays.
- April-9 reference-integrity audit classified the 49 unfixed refs as
  intentional (template placeholders, site routes, historical archive,
  .cursor/plans) — none in writings/.
- writings/ is where authors write klappy:// URIs as body links most
  often; canon/odd/docs use frontmatter cross-refs which the resolver
  governs separately.

canon/, odd/, docs/ become explicit opt-in via scope.paths. Reversal
is one line if a real consumer demonstrates wider need (or if
parallelized fetching graduates from the deferred-concerns ledger).

Spec amendment to klappy://docs/oddkit/specs/oddkit-audit (v2.1) lands
in the sibling canon PR (klappy/klappy.dev#146) so the spec self-
documents the deviation rather than the code silently diverging.

Refs:
- klappy://docs/oddkit/specs/oddkit-audit (DRAFT v2.1 — to be amended)
- klappy://docs/planning/link-rot-deferred-concerns (parallelized
  fetching is a candidate for the deferred ledger)
@klappy klappy merged commit 7080d2e into main Apr 27, 2026
5 checks passed
@klappy klappy deleted the feat/oddkit-audit branch April 27, 2026 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants