Skip to content

feat(security): Spec 076 US3 — detect-engine eval corpus + CI recall/FP gate (T017-T019)#777

Merged
Dumbris merged 3 commits into
mainfrom
076-t4-scan-eval-gate
Jun 28, 2026
Merged

feat(security): Spec 076 US3 — detect-engine eval corpus + CI recall/FP gate (T017-T019)#777
Dumbris merged 3 commits into
mainfrom
076-t4-scan-eval-gate

Conversation

@Dumbris

@Dumbris Dumbris commented Jun 27, 2026

Copy link
Copy Markdown
Member

Spec 076 · US3 (T017–T019) — make detector reliability a blocking CI number

Closes the US3 increment of the deterministic offline tool-scanner (MCP-3579). Builds on the merged US1 engine (#769 T1 foundation, #770 hard checks). Branched off origin/main.

What this does

A CI step now fails the build if the offline detect.Engine regresses below 0.90 recall on malicious samples or above 0.05 false-positive rate on hard-negatives.

T017 — labeled detect corpus

specs/065-evaluation-foundation/datasets/detect_corpus_v1.json — a new corpus (the existing security_corpus_v1.json is immutable per CN-002 and is description-only, so it can't exercise the structural checks). 32 self-authored entries carrying the full ToolView the checks need (server, tool name/description/schema, cross-server peers so shadowing fires):

category malicious mapped check gated today
unicode_smuggling 6 unicode.hidden ✅ US1
decoded_payload 6 payload.decoded ✅ US1
shadowing 4 shadowing.cross_server ✅ US1
capability_mismatch 2 capability.mismatch ⏳ US2 — reported, not enforced
hard_negative 9 (benign)
benign 5

Validated by detect_corpus_test.go (coherent label/category, redistributable provenance reusing the package allowlist, per-category coverage). README updated with the file + counts.

T018 — scan-eval --gate

scan-eval --corpus <detect_corpus> --gate --min-recall 0.90 --max-fp 0.05

Runs detect.Engine over the corpus, emits per-category recall/precision/FP/F1 JSON, exits non-zero on breach (exit 6). Forward-compatible: a category is only enforced when its check is registered — capability_mismatch begins gating automatically the moment the US2 check lands, with no corpus change.

T019 — CI wiring

Blocking step in the existing eval.yml security-d2 job (already offline + Go + triggered on internal/security/**, cmd/scan-eval/**, dataset paths). Pure-Go, no mcp-eval/Python; runs first so a detector regression fails fast.

Verification

  • TDD: gate_test.go (incl. a committed-corpus regression anchor) written before gate.go.
  • Committed corpus passes at recall 1.0 (16/16 gated), FP 0/14.
  • go test -race ./cmd/scan-eval/... ./specs/065-evaluation-foundation/datasets/... ./internal/security/...
  • golangci-lint run --config .github/.golangci.yml ✅ (0 issues; ST1018 zero-width-literal gotcha handled via escape)
  • actionlint .github/workflows/eval.yml

Notes for reviewers

Related #MCP-3579

…FP gate (T017-T019)

Make detector reliability a blocking CI number for the offline Spec-076
detect.Engine.

T017 — new labeled corpus specs/065-evaluation-foundation/datasets/
detect_corpus_v1.json (32 self-authored entries) carrying the full ToolView
fields the structural checks need (server, tool name/description/schema,
cross-server peers). Categories map to detect checks: unicode_smuggling,
decoded_payload, shadowing (US1, gated today) plus capability_mismatch (US2,
reported but not yet gated) and attack-resembling hard-negatives. Validated by
detect_corpus_test.go (coherent labels, redistributable provenance, per-category
coverage). README documents the file + counts.

T018 — `scan-eval --gate --min-recall --max-fp` runs detect.Engine over the
corpus, prints per-category recall/precision/FP/F1 JSON, and exits non-zero on a
breach. A category is only enforced when its check is registered, so future
checks (capability.mismatch) begin gating automatically with no corpus change.

T019 — blocking step in the eval.yml security-d2 job:
`scan-eval --gate --min-recall 0.90 --max-fp 0.05` (pure Go, offline, runs first
so a detector regression fails fast).

TDD: gate_test.go (incl. a committed-corpus regression anchor) written first.
Committed corpus passes at recall 1.0 (16/16 gated), FP 0/14.

Related #MCP-3579
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 27, 2026

Copy link
Copy Markdown

Deploying mcpproxy-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: b1406a8
Status: ✅  Deploy successful!
Preview URL: https://40c37c69.mcpproxy-docs.pages.dev
Branch Preview URL: https://076-t4-scan-eval-gate.mcpproxy-docs.pages.dev

View logs

@codecov-commenter

codecov-commenter commented Jun 27, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 87.80488% with 20 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
cmd/scan-eval/gate.go 89.03% 10 Missing and 7 partials ⚠️
cmd/scan-eval/main.go 66.66% 2 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown

📦 Build Artifacts

Workflow Run: View Run
Branch: 076-t4-scan-eval-gate

Available Artifacts

  • archive-darwin-amd64 (28 MB)
  • archive-darwin-arm64 (25 MB)
  • archive-linux-amd64 (16 MB)
  • archive-linux-arm64 (14 MB)
  • archive-windows-amd64 (28 MB)
  • archive-windows-arm64 (25 MB)
  • frontend-dist-pr (0 MB)
  • installer-dmg-darwin-amd64 (21 MB)
  • installer-dmg-darwin-arm64 (19 MB)

How to Download

Option 1: GitHub Web UI (easiest)

  1. Go to the workflow run page linked above
  2. Scroll to the bottom "Artifacts" section
  3. Click on the artifact you want to download

Option 2: GitHub CLI

gh run download 28311451170 --repo smart-mcp-proxy/mcpproxy-go

Note: Artifacts expire in 14 days.

CodexReviewer re-review of #777: the gated false-positive rate computed its
denominator over every non-malicious entry (benign + hard_negative). SC-002
(spec.md:48,52,114) requires the ≤5% FP threshold to be measured on the
hard-negative set specifically — otherwise adding clean-benign corpus entries
dilutes the rate and the gate can pass while hard-negatives regress.

- fp_rate denominator = hard_negative entries only (the gated SC-002 metric).
- Report benign_total / benign_false_positives separately for transparency
  (SC-003 still expects zero FP across benign + hard-negatives), but only the
  hard-negative fp_rate feeds the gate decision.
- Precision now uses all-benign FPs; recall accounting unchanged.
- Guard: a corpus with zero hard-negatives fails the gate as vacuous (mirrors
  the zero-gated-malicious guard) rather than silently passing the FP side.
- New test TestGateFP_HardNegativeDenominatorOnly proves benign-corpus growth
  does not move the gated fp_rate (old code would dilute 1/3 -> 1/23).
- README documents the hard-negative denominator.

Committed corpus still passes: recall 1.0 (16/16 gated), fp_rate 0/9 hard-negs.

Related #MCP-3579

Co-Authored-By: Paperclip <noreply@paperclip.ing>
@Dumbris

Dumbris commented Jun 28, 2026

Copy link
Copy Markdown
Member Author

GeminiCritic (fallback) — ACCEPT

Head: f8cc0a4ec13a72d5ff78f924ca280e1542c2743f
Reviewer: GeminiCritic fallback (routed via MCP-3671 — primary CodexReviewer stalled)

Summary

Reviewed gate.go (312 lines), gate_test.go (229 lines), .github/workflows/eval.yml CI step, and detect_corpus_v1.json. No blocking issues found.

Findings

Correctness — OK

  • evaluateGateCorpus: gated/ungated split via gatedCategory() is correct. capability_mismatch maps to an unregistered check and is excluded from the gate decision — measured/reported but not enforced yet (correct for US2 landing later).
  • scanEntryFlagged: per-entry RegistryView isolation is correct; entries cannot cross-contaminate the shadowing check.
  • runGate: vacuous-corpus guards (zero gated malicious / zero hard negatives) prevent silent gate passes on a bad corpus — important correctness invariant.
  • ratio: division-by-zero guard correct; returns 0 not NaN.
  • FP gate scoped to hard-negatives only (SC-002) is the right design; diluting with generic benign entries would mask regressions as the corpus grows.

Corpus quality — OK

  • Malicious set covers all three registered checks: unicode smuggling (ZWS, ZWJ, BIDI override, PUA, tag-block, multi-class), decoded payload (base64 curl/wget/chmod/rm-rf/hex/revshell), shadowing (name collision × 2, cross-server reference × 2).
  • Hard negatives are well-chosen: accented/CJK text (won't trip unicode.hidden), benign JSON blob (won't trip payload.decoded), plain-text curl … | sh in description (correct — check decodes base64 first, so plaintext is not flagged), generic same-name peer (won't trip shadowing.cross_server), same-server peer (correctly excluded by shadowing check design).
  • zeroWidthSpace constant defined numerically in test source — prevents the test file itself from containing hidden characters.

Minor notes (non-blocking)

  • gateChecks() must be manually synced with the live scanner registrations. Acceptable risk; flagged in-code.
  • Precision uses benignFP (hard-negatives + plain benign) as FP denominator, slightly differing from FPRate. Semantically correct and documented in gateMetrics.

Verdict: ACCEPT

No correctness bugs, no security issues in the gate code, corpus adequate for thresholds (≥0.90 recall, ≤0.05 FP).

CodexReviewer re-review of #777: T018 (tasks.md:75) requires `scan-eval --gate`
to print per-category recall/precision/FP/F1, but categoryMetric only carried
recall (precision/FP/F1 existed only as overall metrics).

- categoryMetric now carries hard_negatives, false_positives, fp_rate,
  precision, and f1 per category, populated in the gate computation and JSON.
- Per-category FP is attributed via a new `resembles` field on hard_negative
  corpus entries (the attack class a benign mimics — the SC-003 framing): a
  flagged hard-negative lowers its resembled category's precision. Clean-benign
  entries carry no `resembles` and affect only the overall benign FP count.
- detect_corpus_v1.json: every hard_negative now declares `resembles`
  (consistent with its hn_<class> id); validator asserts it is set, names a
  gated category, and matches the id prefix.
- Extracted an f1() helper; overall F1 reuses it.
- Tests: TestGateMetrics_PerCategoryShapeAndFPAttribution proves the
  per-category JSON exposes recall/precision/FP/F1 and that a resembling
  hard-negative FP drops that category's precision (1 TP + 1 FP -> precision
  0.5); TestEvaluateGateCorpus asserts per-category recall/precision/f1 = 1.0.

Committed corpus: recall 1.0 (16/16 gated), fp_rate 0/9; every gated category
reports recall/precision/f1 = 1.0, FP 0.

Related #MCP-3579

Co-Authored-By: Paperclip <noreply@paperclip.ing>

@mcpproxy-gatekeeper mcpproxy-gatekeeper Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gatekeeper approval — Codex review verdict: ACCEPT.

This approval is posted automatically by the MCPProxy Gatekeeper App on behalf of the Codex reviewer (verdict of record lives in the Paperclip review thread). Author≠approver satisfied; QA + CI gates enforced separately.

Auto-approved per Model B (MCP-1249).

@Dumbris Dumbris merged commit 171249d into main Jun 28, 2026
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants