feat(security): Spec 076 US3 — detect-engine eval corpus + CI recall/FP gate (T017-T019) by Dumbris · Pull Request #777 · smart-mcp-proxy/mcpproxy-go

Dumbris · 2026-06-27T05:51:39Z

Spec 076 · US3 (T017–T019) — make detector reliability a blocking CI number

Closes the US3 increment of the deterministic offline tool-scanner (MCP-3579). Builds on the merged US1 engine (#769 T1 foundation, #770 hard checks). Branched off origin/main.

What this does

A CI step now fails the build if the offline detect.Engine regresses below 0.90 recall on malicious samples or above 0.05 false-positive rate on hard-negatives.

T017 — labeled detect corpus

specs/065-evaluation-foundation/datasets/detect_corpus_v1.json — a new corpus (the existing security_corpus_v1.json is immutable per CN-002 and is description-only, so it can't exercise the structural checks). 32 self-authored entries carrying the full ToolView the checks need (server, tool name/description/schema, cross-server peers so shadowing fires):

category	malicious	mapped check	gated today
`unicode_smuggling`	6	`unicode.hidden`	✅ US1
`decoded_payload`	6	`payload.decoded`	✅ US1
`shadowing`	4	`shadowing.cross_server`	✅ US1
`capability_mismatch`	2	`capability.mismatch`	⏳ US2 — reported, not enforced
hard_negative	9 (benign)	—	—
benign	5	—	—

Validated by detect_corpus_test.go (coherent label/category, redistributable provenance reusing the package allowlist, per-category coverage). README updated with the file + counts.

T018 — `scan-eval --gate`

scan-eval --corpus <detect_corpus> --gate --min-recall 0.90 --max-fp 0.05

Runs detect.Engine over the corpus, emits per-category recall/precision/FP/F1 JSON, exits non-zero on breach (exit 6). Forward-compatible: a category is only enforced when its check is registered — capability_mismatch begins gating automatically the moment the US2 check lands, with no corpus change.

T019 — CI wiring

Blocking step in the existing eval.yml security-d2 job (already offline + Go + triggered on internal/security/**, cmd/scan-eval/**, dataset paths). Pure-Go, no mcp-eval/Python; runs first so a detector regression fails fast.

Verification

TDD: gate_test.go (incl. a committed-corpus regression anchor) written before gate.go.
Committed corpus passes at recall 1.0 (16/16 gated), FP 0/14.
go test -race ./cmd/scan-eval/... ./specs/065-evaluation-foundation/datasets/... ./internal/security/... ✅
golangci-lint run --config .github/.golangci.yml ✅ (0 issues; ST1018 zero-width-literal gotcha handled via escape)
actionlint .github/workflows/eval.yml ✅

Notes for reviewers

T019 touches .github/workflows/eval.yml (one step). It's part of this assigned issue and mirrors precedent (ci: add bench.yml — publish benchmark dashboard on release tag (MCP-3133) #749 bench CI). Not a packaging/release change.
Docs for the six checks / two-tier model live in T022 (Polish); this PR documents the corpus + gate in the dataset README.

Related #MCP-3579

…FP gate (T017-T019) Make detector reliability a blocking CI number for the offline Spec-076 detect.Engine. T017 — new labeled corpus specs/065-evaluation-foundation/datasets/ detect_corpus_v1.json (32 self-authored entries) carrying the full ToolView fields the structural checks need (server, tool name/description/schema, cross-server peers). Categories map to detect checks: unicode_smuggling, decoded_payload, shadowing (US1, gated today) plus capability_mismatch (US2, reported but not yet gated) and attack-resembling hard-negatives. Validated by detect_corpus_test.go (coherent labels, redistributable provenance, per-category coverage). README documents the file + counts. T018 — `scan-eval --gate --min-recall --max-fp` runs detect.Engine over the corpus, prints per-category recall/precision/FP/F1 JSON, and exits non-zero on a breach. A category is only enforced when its check is registered, so future checks (capability.mismatch) begin gating automatically with no corpus change. T019 — blocking step in the eval.yml security-d2 job: `scan-eval --gate --min-recall 0.90 --max-fp 0.05` (pure Go, offline, runs first so a detector regression fails fast). TDD: gate_test.go (incl. a committed-corpus regression anchor) written first. Committed corpus passes at recall 1.0 (16/16 gated), FP 0/14. Related #MCP-3579

cloudflare-workers-and-pages · 2026-06-27T05:52:53Z

Deploying mcpproxy-docs with Cloudflare Pages

Latest commit:	`b1406a8`
Status:	✅ Deploy successful!
Preview URL:	https://40c37c69.mcpproxy-docs.pages.dev
Branch Preview URL:	https://076-t4-scan-eval-gate.mcpproxy-docs.pages.dev

View logs

codecov-commenter · 2026-06-27T05:56:34Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 87.80488% with 20 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
cmd/scan-eval/gate.go	89.03%	10 Missing and 7 partials ⚠️
cmd/scan-eval/main.go	66.66%	2 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

github-actions · 2026-06-27T05:59:14Z

📦 Build Artifacts

Workflow Run: View Run
Branch: 076-t4-scan-eval-gate

Available Artifacts

archive-darwin-amd64 (28 MB)
archive-darwin-arm64 (25 MB)
archive-linux-amd64 (16 MB)
archive-linux-arm64 (14 MB)
archive-windows-amd64 (28 MB)
archive-windows-arm64 (25 MB)
frontend-dist-pr (0 MB)
installer-dmg-darwin-amd64 (21 MB)
installer-dmg-darwin-arm64 (19 MB)

How to Download

Option 1: GitHub Web UI (easiest)

Go to the workflow run page linked above
Scroll to the bottom "Artifacts" section
Click on the artifact you want to download

Option 2: GitHub CLI

gh run download 28311451170 --repo smart-mcp-proxy/mcpproxy-go

Note: Artifacts expire in 14 days.

CodexReviewer re-review of #777: the gated false-positive rate computed its denominator over every non-malicious entry (benign + hard_negative). SC-002 (spec.md:48,52,114) requires the ≤5% FP threshold to be measured on the hard-negative set specifically — otherwise adding clean-benign corpus entries dilutes the rate and the gate can pass while hard-negatives regress. - fp_rate denominator = hard_negative entries only (the gated SC-002 metric). - Report benign_total / benign_false_positives separately for transparency (SC-003 still expects zero FP across benign + hard-negatives), but only the hard-negative fp_rate feeds the gate decision. - Precision now uses all-benign FPs; recall accounting unchanged. - Guard: a corpus with zero hard-negatives fails the gate as vacuous (mirrors the zero-gated-malicious guard) rather than silently passing the FP side. - New test TestGateFP_HardNegativeDenominatorOnly proves benign-corpus growth does not move the gated fp_rate (old code would dilute 1/3 -> 1/23). - README documents the hard-negative denominator. Committed corpus still passes: recall 1.0 (16/16 gated), fp_rate 0/9 hard-negs. Related #MCP-3579 Co-Authored-By: Paperclip <noreply@paperclip.ing>

Dumbris · 2026-06-28T04:32:49Z

GeminiCritic (fallback) — ACCEPT

Head: f8cc0a4ec13a72d5ff78f924ca280e1542c2743f
Reviewer: GeminiCritic fallback (routed via MCP-3671 — primary CodexReviewer stalled)

Summary

Reviewed gate.go (312 lines), gate_test.go (229 lines), .github/workflows/eval.yml CI step, and detect_corpus_v1.json. No blocking issues found.

Findings

Correctness — OK

evaluateGateCorpus: gated/ungated split via gatedCategory() is correct. capability_mismatch maps to an unregistered check and is excluded from the gate decision — measured/reported but not enforced yet (correct for US2 landing later).
scanEntryFlagged: per-entry RegistryView isolation is correct; entries cannot cross-contaminate the shadowing check.
runGate: vacuous-corpus guards (zero gated malicious / zero hard negatives) prevent silent gate passes on a bad corpus — important correctness invariant.
ratio: division-by-zero guard correct; returns 0 not NaN.
FP gate scoped to hard-negatives only (SC-002) is the right design; diluting with generic benign entries would mask regressions as the corpus grows.

Corpus quality — OK

Malicious set covers all three registered checks: unicode smuggling (ZWS, ZWJ, BIDI override, PUA, tag-block, multi-class), decoded payload (base64 curl/wget/chmod/rm-rf/hex/revshell), shadowing (name collision × 2, cross-server reference × 2).
Hard negatives are well-chosen: accented/CJK text (won't trip unicode.hidden), benign JSON blob (won't trip payload.decoded), plain-text curl … | sh in description (correct — check decodes base64 first, so plaintext is not flagged), generic same-name peer (won't trip shadowing.cross_server), same-server peer (correctly excluded by shadowing check design).
zeroWidthSpace constant defined numerically in test source — prevents the test file itself from containing hidden characters.

Minor notes (non-blocking)

gateChecks() must be manually synced with the live scanner registrations. Acceptable risk; flagged in-code.
Precision uses benignFP (hard-negatives + plain benign) as FP denominator, slightly differing from FPRate. Semantically correct and documented in gateMetrics.

Verdict: ACCEPT

No correctness bugs, no security issues in the gate code, corpus adequate for thresholds (≥0.90 recall, ≤0.05 FP).

CodexReviewer re-review of #777: T018 (tasks.md:75) requires `scan-eval --gate` to print per-category recall/precision/FP/F1, but categoryMetric only carried recall (precision/FP/F1 existed only as overall metrics). - categoryMetric now carries hard_negatives, false_positives, fp_rate, precision, and f1 per category, populated in the gate computation and JSON. - Per-category FP is attributed via a new `resembles` field on hard_negative corpus entries (the attack class a benign mimics — the SC-003 framing): a flagged hard-negative lowers its resembled category's precision. Clean-benign entries carry no `resembles` and affect only the overall benign FP count. - detect_corpus_v1.json: every hard_negative now declares `resembles` (consistent with its hn_<class> id); validator asserts it is set, names a gated category, and matches the id prefix. - Extracted an f1() helper; overall F1 reuses it. - Tests: TestGateMetrics_PerCategoryShapeAndFPAttribution proves the per-category JSON exposes recall/precision/FP/F1 and that a resembling hard-negative FP drops that category's precision (1 TP + 1 FP -> precision 0.5); TestEvaluateGateCorpus asserts per-category recall/precision/f1 = 1.0. Committed corpus: recall 1.0 (16/16 gated), fp_rate 0/9; every gated category reports recall/precision/f1 = 1.0, FP 0. Related #MCP-3579 Co-Authored-By: Paperclip <noreply@paperclip.ing>

mcpproxy-gatekeeper

✅ Gatekeeper approval — Codex review verdict: ACCEPT.

This approval is posted automatically by the MCPProxy Gatekeeper App on behalf of the Codex reviewer (verdict of record lives in the Paperclip review thread). Author≠approver satisfied; QA + CI gates enforced separately.

Auto-approved per Model B (MCP-1249).

mcpproxy-gatekeeper Bot approved these changes Jun 28, 2026

View reviewed changes

Dumbris merged commit 171249d into main Jun 28, 2026
48 checks passed

Dumbris mentioned this pull request Jun 28, 2026

docs(security): document deterministic tool-scanner detect engine (Spec 076 T022) #780

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(security): Spec 076 US3 — detect-engine eval corpus + CI recall/FP gate (T017-T019)#777

feat(security): Spec 076 US3 — detect-engine eval corpus + CI recall/FP gate (T017-T019)#777
Dumbris merged 3 commits into
mainfrom
076-t4-scan-eval-gate

Dumbris commented Jun 27, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 27, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 27, 2026 •

edited

Loading

Uh oh!

Dumbris commented Jun 28, 2026

Uh oh!

mcpproxy-gatekeeper Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Dumbris commented Jun 27, 2026

Spec 076 · US3 (T017–T019) — make detector reliability a blocking CI number

What this does

T017 — labeled detect corpus

T018 — scan-eval --gate

T019 — CI wiring

Verification

Notes for reviewers

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying mcpproxy-docs with Cloudflare Pages

Uh oh!

codecov-commenter commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📦 Build Artifacts

Available Artifacts

How to Download

Uh oh!

Dumbris commented Jun 28, 2026

GeminiCritic (fallback) — ACCEPT

Summary

Findings

Verdict: ACCEPT

Uh oh!

mcpproxy-gatekeeper Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

T018 — `scan-eval --gate`

cloudflare-workers-and-pages Bot commented Jun 27, 2026 •

edited

Loading

codecov-commenter commented Jun 27, 2026 •

edited

Loading

github-actions Bot commented Jun 27, 2026 •

edited

Loading