feat(security): Spec 076 US3 — detect-engine eval corpus + CI recall/FP gate (T017-T019)#777
Conversation
…FP gate (T017-T019) Make detector reliability a blocking CI number for the offline Spec-076 detect.Engine. T017 — new labeled corpus specs/065-evaluation-foundation/datasets/ detect_corpus_v1.json (32 self-authored entries) carrying the full ToolView fields the structural checks need (server, tool name/description/schema, cross-server peers). Categories map to detect checks: unicode_smuggling, decoded_payload, shadowing (US1, gated today) plus capability_mismatch (US2, reported but not yet gated) and attack-resembling hard-negatives. Validated by detect_corpus_test.go (coherent labels, redistributable provenance, per-category coverage). README documents the file + counts. T018 — `scan-eval --gate --min-recall --max-fp` runs detect.Engine over the corpus, prints per-category recall/precision/FP/F1 JSON, and exits non-zero on a breach. A category is only enforced when its check is registered, so future checks (capability.mismatch) begin gating automatically with no corpus change. T019 — blocking step in the eval.yml security-d2 job: `scan-eval --gate --min-recall 0.90 --max-fp 0.05` (pure Go, offline, runs first so a detector regression fails fast). TDD: gate_test.go (incl. a committed-corpus regression anchor) written first. Committed corpus passes at recall 1.0 (16/16 gated), FP 0/14. Related #MCP-3579
Deploying mcpproxy-docs with
|
| Latest commit: |
b1406a8
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://40c37c69.mcpproxy-docs.pages.dev |
| Branch Preview URL: | https://076-t4-scan-eval-gate.mcpproxy-docs.pages.dev |
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
📦 Build ArtifactsWorkflow Run: View Run Available Artifacts
How to DownloadOption 1: GitHub Web UI (easiest)
Option 2: GitHub CLI gh run download 28311451170 --repo smart-mcp-proxy/mcpproxy-go
|
CodexReviewer re-review of #777: the gated false-positive rate computed its denominator over every non-malicious entry (benign + hard_negative). SC-002 (spec.md:48,52,114) requires the ≤5% FP threshold to be measured on the hard-negative set specifically — otherwise adding clean-benign corpus entries dilutes the rate and the gate can pass while hard-negatives regress. - fp_rate denominator = hard_negative entries only (the gated SC-002 metric). - Report benign_total / benign_false_positives separately for transparency (SC-003 still expects zero FP across benign + hard-negatives), but only the hard-negative fp_rate feeds the gate decision. - Precision now uses all-benign FPs; recall accounting unchanged. - Guard: a corpus with zero hard-negatives fails the gate as vacuous (mirrors the zero-gated-malicious guard) rather than silently passing the FP side. - New test TestGateFP_HardNegativeDenominatorOnly proves benign-corpus growth does not move the gated fp_rate (old code would dilute 1/3 -> 1/23). - README documents the hard-negative denominator. Committed corpus still passes: recall 1.0 (16/16 gated), fp_rate 0/9 hard-negs. Related #MCP-3579 Co-Authored-By: Paperclip <noreply@paperclip.ing>
GeminiCritic (fallback) — ACCEPTHead: SummaryReviewed FindingsCorrectness — OK
Corpus quality — OK
Minor notes (non-blocking)
Verdict: ACCEPTNo correctness bugs, no security issues in the gate code, corpus adequate for thresholds (≥0.90 recall, ≤0.05 FP). |
CodexReviewer re-review of #777: T018 (tasks.md:75) requires `scan-eval --gate` to print per-category recall/precision/FP/F1, but categoryMetric only carried recall (precision/FP/F1 existed only as overall metrics). - categoryMetric now carries hard_negatives, false_positives, fp_rate, precision, and f1 per category, populated in the gate computation and JSON. - Per-category FP is attributed via a new `resembles` field on hard_negative corpus entries (the attack class a benign mimics — the SC-003 framing): a flagged hard-negative lowers its resembled category's precision. Clean-benign entries carry no `resembles` and affect only the overall benign FP count. - detect_corpus_v1.json: every hard_negative now declares `resembles` (consistent with its hn_<class> id); validator asserts it is set, names a gated category, and matches the id prefix. - Extracted an f1() helper; overall F1 reuses it. - Tests: TestGateMetrics_PerCategoryShapeAndFPAttribution proves the per-category JSON exposes recall/precision/FP/F1 and that a resembling hard-negative FP drops that category's precision (1 TP + 1 FP -> precision 0.5); TestEvaluateGateCorpus asserts per-category recall/precision/f1 = 1.0. Committed corpus: recall 1.0 (16/16 gated), fp_rate 0/9; every gated category reports recall/precision/f1 = 1.0, FP 0. Related #MCP-3579 Co-Authored-By: Paperclip <noreply@paperclip.ing>
There was a problem hiding this comment.
✅ Gatekeeper approval — Codex review verdict: ACCEPT.
This approval is posted automatically by the MCPProxy Gatekeeper App on behalf of the Codex reviewer (verdict of record lives in the Paperclip review thread). Author≠approver satisfied; QA + CI gates enforced separately.
Auto-approved per Model B (MCP-1249).
Spec 076 · US3 (T017–T019) — make detector reliability a blocking CI number
Closes the US3 increment of the deterministic offline tool-scanner (MCP-3579). Builds on the merged US1 engine (#769 T1 foundation, #770 hard checks). Branched off
origin/main.What this does
A CI step now fails the build if the offline
detect.Engineregresses below 0.90 recall on malicious samples or above 0.05 false-positive rate on hard-negatives.T017 — labeled detect corpus
specs/065-evaluation-foundation/datasets/detect_corpus_v1.json— a new corpus (the existingsecurity_corpus_v1.jsonis immutable per CN-002 and is description-only, so it can't exercise the structural checks). 32 self-authored entries carrying the fullToolViewthe checks need (server, tool name/description/schema, cross-serverpeerssoshadowingfires):unicode_smugglingunicode.hiddendecoded_payloadpayload.decodedshadowingshadowing.cross_servercapability_mismatchcapability.mismatchValidated by
detect_corpus_test.go(coherent label/category, redistributable provenance reusing the package allowlist, per-category coverage). README updated with the file + counts.T018 —
scan-eval --gateRuns
detect.Engineover the corpus, emits per-category recall/precision/FP/F1 JSON, exits non-zero on breach (exit 6). Forward-compatible: a category is only enforced when its check is registered —capability_mismatchbegins gating automatically the moment the US2 check lands, with no corpus change.T019 — CI wiring
Blocking step in the existing
eval.ymlsecurity-d2job (already offline + Go + triggered oninternal/security/**,cmd/scan-eval/**, dataset paths). Pure-Go, no mcp-eval/Python; runs first so a detector regression fails fast.Verification
gate_test.go(incl. a committed-corpus regression anchor) written beforegate.go.go test -race ./cmd/scan-eval/... ./specs/065-evaluation-foundation/datasets/... ./internal/security/...✅golangci-lint run --config .github/.golangci.yml✅ (0 issues; ST1018 zero-width-literal gotcha handled viaescape)actionlint .github/workflows/eval.yml✅Notes for reviewers
.github/workflows/eval.yml(one step). It's part of this assigned issue and mirrors precedent (ci: add bench.yml — publish benchmark dashboard on release tag (MCP-3133) #749 bench CI). Not a packaging/release change.Related #MCP-3579