Skip to content

rexcoleman/cycle12-agent-security-tooling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agent Security Scanner Benchmark: Can Bandit, Semgrep, or MCP Scanners Detect Agent Vulnerabilities?

No agent security scanner achieves a Youden Index above 0.30 across OWASP Agentic AI categories. Scanner union provides zero complementarity — all three scanners combined detect exactly what Sigil alone detects (80%). Tool poisoning (ASI01) and identity/privilege attacks (ASI03) have 0% detection. AOQL ranges 23x across scanners.

License: MIT Python 3.9+

Operating Characteristic Curves

Key Results

Finding Metric Evidence
No scanner achieves adequate discrimination Max Youden Index: 0.30 (Sigil+bandit) 37 MCP test cases (25 vulnerable, 12 safe)
Scanner union = Sigil alone Combined TPR = 80% = Sigil TPR Cisco and MEDUSA detections are strict subsets
AOQL spans 23x across scanners 0.04 (MEDUSA best) to 0.92 (Cisco) Operating Characteristic curve analysis
Strong category-level specialization ASI01/ASI03: 0% detection; ASI05: 100% 5 OWASP Agentic AI categories
MEDUSA: starkest tradeoff 96% TPR / 100% FPR → 16% TPR / 0% FPR Score threshold sweep
Cisco MCP Scanner: lowest detection 8% TPR (2/25) at all operating points Only detects ASI05 code execution
Statistically significant differences Fisher's exact p<0.001 (Bonferroni-corrected) Sigil vs Cisco, Sigil vs MEDUSA

Best Operating Points by Scanner

Scanner Operating Point TPR FPR Youden TPR 95% CI
Cisco MCP Scanner OP1 (static, all) 0.08 0.00 0.08 [0.01, 0.26]
MEDUSA OP3 (high threshold) 0.16 0.00 0.16 [0.05, 0.36]
MEDUSA OP1 (any finding) 0.96 1.00 -0.04 [0.80, 1.00]
Sigil+bandit OP1 (score >13) 0.80 0.50 0.30 [0.59, 0.93]
Sigil+bandit OP2 (score >19) 0.36 0.25 0.11 [0.18, 0.57]

The Finding

We benchmarked three agent security scanners — Cisco MCP Scanner (v4.6.0), MEDUSA (v2026.4.0), and Sigil (with bandit integration) — against a ground-truth corpus of 37 MCP server test cases covering 5 OWASP Agentic AI Security categories. Traditional SAST tools (bandit, semgrep, CodeQL) were not designed for agent-specific vulnerabilities, and the MCP-specific scanners don't fill the gap.

The core result: the scanners don't complement each other. Adding Cisco and MEDUSA to Sigil adds zero detection coverage. And even the best scanner (Sigil) only achieves TPR=0.80 at FPR=0.50 — meaning half of all safe servers are flagged as vulnerable to catch 80% of real vulnerabilities.

Category-level analysis reveals why: ASI01 (tool poisoning) and ASI03 (identity/privilege) have 0% detection by Cisco and MEDUSA at discriminating thresholds. These are arguably the most dangerous agentic attack categories, and no scanner reliably detects them.

Quick Start

git clone https://github.com/rexcoleman/cycle12-agent-security-tooling.git
cd cycle12-agent-security-tooling
pip install -r requirements.txt
bash reproduce.sh                    # full reproduction

Scanners Evaluated

Scanner Version Type Detection Approach
Cisco MCP Scanner v4.6.0 MCP-specific Pattern matching on tool descriptions
MEDUSA v2026.4.0 MCP-specific Static analysis + LLM-assisted scoring
Sigil + bandit latest General + Python AST analysis + security linting

Methodology

  • Test corpus: 37 MCP server implementations (25 vulnerable across 5 OWASP categories, 12 safe controls)
  • Analysis: Operating Characteristic curves, Youden Index optimization, AOQL computation
  • Statistical tests: Fisher's exact test with Bonferroni correction for pairwise comparisons
  • Framework: Manufacturing QA methodology (OC curves, AOQL) adapted for security scanner evaluation

Full methodology in EXPERIMENTAL_DESIGN.md. All results in FINDINGS.md.

Figures

Combined OC Detection Heatmap
Operating Characteristic curves for all scanners Category-level detection heatmap

Related Work

Citation

@software{coleman2026scanneroc,
  title = {Agent Security Scanner Operating Characteristics: A Manufacturing QA Framework for Comparative Evaluation},
  author = {Coleman, Rex},
  year = {2026},
  url = {https://github.com/rexcoleman/cycle12-agent-security-tooling},
  license = {MIT}
}

License

MIT. See LICENSE.