Agent Security Scanner Benchmark: Can Bandit, Semgrep, or MCP Scanners Detect Agent Vulnerabilities?
No agent security scanner achieves a Youden Index above 0.30 across OWASP Agentic AI categories. Scanner union provides zero complementarity — all three scanners combined detect exactly what Sigil alone detects (80%). Tool poisoning (ASI01) and identity/privilege attacks (ASI03) have 0% detection. AOQL ranges 23x across scanners.
| Finding | Metric | Evidence |
|---|---|---|
| No scanner achieves adequate discrimination | Max Youden Index: 0.30 (Sigil+bandit) | 37 MCP test cases (25 vulnerable, 12 safe) |
| Scanner union = Sigil alone | Combined TPR = 80% = Sigil TPR | Cisco and MEDUSA detections are strict subsets |
| AOQL spans 23x across scanners | 0.04 (MEDUSA best) to 0.92 (Cisco) | Operating Characteristic curve analysis |
| Strong category-level specialization | ASI01/ASI03: 0% detection; ASI05: 100% | 5 OWASP Agentic AI categories |
| MEDUSA: starkest tradeoff | 96% TPR / 100% FPR → 16% TPR / 0% FPR | Score threshold sweep |
| Cisco MCP Scanner: lowest detection | 8% TPR (2/25) at all operating points | Only detects ASI05 code execution |
| Statistically significant differences | Fisher's exact p<0.001 (Bonferroni-corrected) | Sigil vs Cisco, Sigil vs MEDUSA |
| Scanner | Operating Point | TPR | FPR | Youden | TPR 95% CI |
|---|---|---|---|---|---|
| Cisco MCP Scanner | OP1 (static, all) | 0.08 | 0.00 | 0.08 | [0.01, 0.26] |
| MEDUSA | OP3 (high threshold) | 0.16 | 0.00 | 0.16 | [0.05, 0.36] |
| MEDUSA | OP1 (any finding) | 0.96 | 1.00 | -0.04 | [0.80, 1.00] |
| Sigil+bandit | OP1 (score >13) | 0.80 | 0.50 | 0.30 | [0.59, 0.93] |
| Sigil+bandit | OP2 (score >19) | 0.36 | 0.25 | 0.11 | [0.18, 0.57] |
We benchmarked three agent security scanners — Cisco MCP Scanner (v4.6.0), MEDUSA (v2026.4.0), and Sigil (with bandit integration) — against a ground-truth corpus of 37 MCP server test cases covering 5 OWASP Agentic AI Security categories. Traditional SAST tools (bandit, semgrep, CodeQL) were not designed for agent-specific vulnerabilities, and the MCP-specific scanners don't fill the gap.
The core result: the scanners don't complement each other. Adding Cisco and MEDUSA to Sigil adds zero detection coverage. And even the best scanner (Sigil) only achieves TPR=0.80 at FPR=0.50 — meaning half of all safe servers are flagged as vulnerable to catch 80% of real vulnerabilities.
Category-level analysis reveals why: ASI01 (tool poisoning) and ASI03 (identity/privilege) have 0% detection by Cisco and MEDUSA at discriminating thresholds. These are arguably the most dangerous agentic attack categories, and no scanner reliably detects them.
git clone https://github.com/rexcoleman/cycle12-agent-security-tooling.git
cd cycle12-agent-security-tooling
pip install -r requirements.txt
bash reproduce.sh # full reproduction| Scanner | Version | Type | Detection Approach |
|---|---|---|---|
| Cisco MCP Scanner | v4.6.0 | MCP-specific | Pattern matching on tool descriptions |
| MEDUSA | v2026.4.0 | MCP-specific | Static analysis + LLM-assisted scoring |
| Sigil + bandit | latest | General + Python | AST analysis + security linting |
- Test corpus: 37 MCP server implementations (25 vulnerable across 5 OWASP categories, 12 safe controls)
- Analysis: Operating Characteristic curves, Youden Index optimization, AOQL computation
- Statistical tests: Fisher's exact test with Bonferroni correction for pairwise comparisons
- Framework: Manufacturing QA methodology (OC curves, AOQL) adapted for security scanner evaluation
Full methodology in EXPERIMENTAL_DESIGN.md. All results in FINDINGS.md.
![]() |
![]() |
| Operating Characteristic curves for all scanners | Category-level detection heatmap |
- Blog post: Can Bandit or Semgrep Detect Agent Vulnerabilities? — Accessible summary of this research
- agent-skill-scanner — PyPI-installable agent security scanner (SE-157)
- agent-skill-scan-action — GitHub Action for agent security (SE-158)
- agent-skill-scan-mcp — MCP server for agent security checks (SE-159)
- controllability-bound — Defense difficulty decomposition framework
@software{coleman2026scanneroc,
title = {Agent Security Scanner Operating Characteristics: A Manufacturing QA Framework for Comparative Evaluation},
author = {Coleman, Rex},
year = {2026},
url = {https://github.com/rexcoleman/cycle12-agent-security-tooling},
license = {MIT}
}MIT. See LICENSE.

