Tooling for security analysis of open-source AI agent frameworks.
Two parts:
analyzer/- a scan pipeline built around Semgrep, Trivy, Grype, Gitleaks, and a set of custom Semgrep rules aimed at agent-specific patterns (agent-security.yml).validator/- a coordinated-disclosure workflow: scope-read against the project'sSECURITY.md, PoC construction, CVSS scoring, private report and email templates.
The two are meant to be used together. The analyzer surfaces candidate findings. The validator turns a candidate into a defensible report or drops it.
Generic SAST catches the obvious subprocess(shell=True) in a CLI. It misses the class of bugs that only make sense in agent frameworks: PythonREPLTool handed an LLM output stream, yaml.load on a downloaded tool spec, pickle-based checkpoint files, eval() wrapping a "plan" the model just emitted.
rules/agent-security.yml is a small, hand-written ruleset targeting those patterns. It is not a replacement for p/security-audit; it runs alongside it.
analyzer/
scripts/
install-tools.sh # semgrep, trivy, grype, gitleaks, snyk
repo-inventory.py # language, dep-manager, IaC, LOC inventory
run-scanners.py # orchestrates trivy + grype + gitleaks + semgrep
rules/
agent-security.yml # 12 custom rules, see docs/rules.md
references/
real-risk-scoring.md # contextual risk model
stride-model.md # threat model reference
templates/
report-template.md
pr-template.md
validator/
CVSS_GUIDE.md
SCOPE_MAPPING.md
templates/
REPORT_TEMPLATE.md
EMAIL_TEMPLATE.md
docs/
patterns/ # pattern write-ups (no unfixed-bug attribution)
Install the scanner toolchain:
bash analyzer/scripts/install-tools.sh
Inventory a repo:
python3 analyzer/scripts/repo-inventory.py /path/to/repo --output inventory.json
Run the scan pipeline against a cloned repo:
python3 analyzer/scripts/run-scanners.py \
--repo /path/to/repo \
--output ./scan-results \
--depth standard
Depth options:
quick- Trivy SCA + Gitleaks onlystandard- full scanner set withp/default,p/security-audit, and the localagent-security.ymlrulesdeep- standard plusp/owasp-top-tenandp/cwe-top-25
Run only the custom rules against a target:
semgrep scan --config analyzer/rules/agent-security.yml /path/to/repo
| Rule ID | Class | Severity |
|---|---|---|
agent-pickle-load-untrusted |
Deserialization (CWE-502) | ERROR |
agent-dill-load-untrusted |
Deserialization (CWE-502) | ERROR |
agent-yaml-unsafe-load |
Deserialization (CWE-502) | ERROR |
agent-joblib-load-untrusted |
Deserialization (CWE-502) | WARNING |
agent-marshal-loads |
Deserialization (CWE-502) | ERROR |
agent-subprocess-shell-true |
Command injection (CWE-78) | ERROR |
agent-os-system |
Command injection (CWE-78) | ERROR |
agent-os-popen |
Command injection (CWE-78) | ERROR |
agent-shell-tool-langchain |
Prompt-to-shell (CWE-77) | ERROR |
agent-eval-dynamic-input |
Dynamic code eval (CWE-95) | ERROR |
agent-exec-dynamic-input |
Dynamic code eval (CWE-95) | ERROR |
agent-compile-then-exec |
Dynamic code eval (CWE-95) | WARNING |
agent-python-repl-unsandboxed |
Prompt-to-RCE (CWE-94) | ERROR |
agent-http-fetch-no-scheme-allowlist |
SSRF (CWE-918) | WARNING |
agent-secret-in-serialization |
Credential exposure (CWE-522) | WARNING |
Full rationale for each rule is in the ruleset file itself.
The validator directory is deliberately opinionated:
- Read the project's published policy first. Quote the in-scope and out-of-scope clauses.
- Build a PoC against the current release. Pin the version.
- Score with CVSS 3.1. Do not inflate.
- Send the report to the address in
SECURITY.md. Not to a bug bounty platform unless the project runs one. - One bug per report.
- No public discussion (issues, PRs, posts) until the maintainer has responded or the embargo expires.
The templates in validator/templates/ follow that flow.
Rule submissions welcome. Requirements:
- Every rule must include a
cwemetadata field. - Every rule must be paired with a positive and negative test case (add to
tests/, coming in a future revision). - No
severity: INFO. If it is not worth a WARNING or an ERROR, it is not worth a rule.
MIT. See LICENSE.
