Skip to content

isaacs-12/pentest-agent

Repository files navigation

pentest-agent

An autonomous, LLM-driven penetration testing agent that doesn't just find vulnerabilities — it exploits them.

pentest-agent uses large language models to iteratively plan, execute, and analyze security assessments against target systems. It combines AI reasoning with 40 offensive security skills to conduct authorized penetration tests, from reconnaissance through exploitation.

XBOW Benchmark: 62.2% (28/45)

Tested against the XBOW validation benchmark — 45 real-world CTF challenges. Each challenge runs in Docker with a random flag that the agent must find through exploitation.

Difficulty Solved Rate
Easy 28/45 62%
Vulnerability Type Solved Rate
xss 1/8 12%
command_injection 5/6 83%
sqli 2/5 40%
information_disclosure 3/5 60%
privilege_escalation 4/5 80%
idor 3/4 75%
default_credentials 4/4 100%
ssti 4/4 100%
lfi 2/4 50%
business_logic 4/4 100%

99.64s avg per challenge · 134,383 total tokens · Claude Code (zero-config, no API key)

Approach: Why a Helper Library Instead of Prompt Engineering

Early iterations relied on prompt engineering to guide the LLM in writing correct HTTP/login code from scratch using urllib.request. This approach proved insufficient:

  • Login misinterpretation: The LLM would call json.loads() on HTML responses from /token, get a JSONDecodeError, and wrongly conclude the login failed — when a 200 response with HTML actually means success (cookie/session-based auth). This single failure mode caused the agent to spend 10+ iterations stuck retrying credentials that had already worked.
  • Brittle HTTP plumbing: Every python_exec script reimplemented cookie handling, token extraction, redirect following, and error handling. Small variations (missing http.cookiejar, not checking Set-Cookie, crashing on HTTPError) caused cascading failures.
  • Prompt instructions ignored: Despite explicit instructions ("use http.cookiejar", "don't json.loads() first"), the LLM would revert to its trained patterns and write the exact code the prompts warned against. Adding more prompt constraints didn't improve reliability.

The current approach uses a helper library pattern: instead of teaching the LLM to write correct HTTP code, we provide a pre-tested CTFClient class (ctf_helpers.py) that handles login detection, cookie/JWT auth, response parsing, and flag extraction correctly. The LLM only needs to call CTFClient.login() — not reimplement the plumbing. This is deployed to /tmp/ctf_helpers.py before every python_exec invocation.

Additional reliability measures:

  • Action deduplication: Hash (skill, params) to prevent the agent from repeating identical actions
  • Structured fact extraction: Parse raw stdout for credentials, flags, status codes, and endpoints before passing to the Planner, so the LLM doesn't misinterpret raw output
  • Schema validation: Validate Planner JSON output has required fields before execution
  • Rolling analysis window: Last 3 results fed to the Planner so it doesn't lose context from earlier steps

Features

  • Autonomous agent loop — Plans attacks, executes tools, analyzes results, chains vulnerabilities, and adapts strategy
  • Multi-agent architecture — Planner/Executor/Analyzer roles with clean context windows prevent reasoning degradation
  • 40 built-in skills across 10 categories: recon, web, injection, auth, exploit, fuzzing, stress testing, and more
  • MCP server mode — Expose all skills to Claude Desktop, VS Code Copilot, Cursor, or any MCP-compatible client
  • Exploitation-focused — Doesn't stop at detection. Proves exploitability with RCE, auth bypass, data exfiltration
  • Vulnerability chaining — Automatically chains findings (SSRF → cloud creds, LFI → RCE, JWT weakness + IDOR → account takeover)
  • Multi-provider LLM support — Anthropic, OpenAI, or any LiteLLM-compatible model
  • Zero-config with Claude Code — Uses your existing Claude subscription, no API keys needed
  • Safety-first design — Scope enforcement, rate limiting, risk levels, time windows, approval gates
  • Knowledge persistence — SQLite-based learning across engagements
  • Docker sandboxing — Isolate tool execution in containers
  • Multi-format reporting — JSON, HTML, and Markdown reports with evidence
  • XBOW benchmarking — Built-in runner for the 104-challenge XBOW CTF benchmark suite with parallel execution and detailed metrics
  • Plugin architecture — Drop in a Python file and it's auto-discovered

Quick Start

Install

pip install -e .

# With MCP server support (for Claude Desktop / IDE integration)
pip install -e ".[mcp]"

Create an engagement config

pentest-agent init engagement.yaml
# Edit engagement.yaml with your targets and scope

Run

# Zero-config if you have Claude Code installed — no API key needed!
pentest-agent run --target https://target.example.com

# From config file
pentest-agent run --config engagement.yaml

# Explicit provider (if you prefer direct API access)
pentest-agent run \
  --config engagement.yaml \
  --provider anthropic \
  --model claude-sonnet-4-20250514 \
  --report-format all \
  --max-iterations 30

List available skills

pentest-agent skills

View engagement history

pentest-agent history

MCP Server Mode

pentest-agent can run as an MCP (Model Context Protocol) server, exposing all 40 skills as tools that any MCP-compatible AI client can call directly. This means you can use pentest-agent's skills through natural conversation in Claude Desktop, VS Code Copilot, Cursor, or any other MCP client.

Setup with Claude Desktop

  1. Install with MCP support:
pip install -e ".[mcp]"
  1. Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
  "mcpServers": {
    "pentest-agent": {
      "command": "pentest-agent",
      "args": ["mcp-serve"]
    }
  }
}
  1. Restart Claude Desktop. You'll see pentest-agent's 40 skills in the tools menu.

Setup with Claude Code

Add to your Claude Code MCP config (~/.claude/claude_code_config.json):

{
  "mcpServers": {
    "pentest-agent": {
      "command": "pentest-agent",
      "args": ["mcp-serve"]
    }
  }
}

Setup with VS Code (Copilot / Continue)

{
  "mcpServers": {
    "pentest-agent": {
      "command": "pentest-agent",
      "args": ["mcp-serve"]
    }
  }
}

What you can do

Once connected, just talk to your AI assistant:

  • "Scan example.com for open ports and services" → calls nmap_port_scan
  • "Test this login form for SQL injection" → calls sqli_check
  • "Check if the JWT token is vulnerable to alg:none" → calls jwt_attack
  • "Fuzz for hidden API endpoints on the target" → calls dir_fuzz and param_fuzz
  • "Run a full SSTI test against this search parameter" → calls ssti_check
  • "List all available skills" → calls the list_skills meta-tool
  • "Show me all findings so far" → calls the get_findings meta-tool

The AI client handles reasoning and orchestration — it picks the right skills, chains them together, and interprets results, while pentest-agent provides the security testing capabilities.

Run standalone

You can also run the MCP server directly:

# stdio mode (for client integration)
pentest-agent mcp-serve

# Or run the module directly
python -m pentest_agent.mcp_server

Architecture

┌─────────────────────────────────────────────────────────────┐
│          CLI (click + rich) / MCP Server (stdio)            │
├─────────────────────────────────────────────────────────────┤
│                   PentestAgent (Coordinator)                 │
│                                                             │
│   ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐ │
│   │   PLANNER    │  │   EXECUTOR   │  │    ANALYZER      │ │
│   │  (LLM role)  │→ │(deterministic│→ │   (LLM role)     │ │
│   │  Decides     │  │  Validates   │  │   Interprets     │ │
│   │  next action │  │  & executes) │  │   results        │ │
│   └──────┬───────┘  └──────────────┘  └────────┬─────────┘ │
│          │          ↑ intelligence              │           │
│          └──────────┘ feedback loop ────────────┘           │
│                                                             │
│  ┌───────────┐ ┌───────────┐ ┌────────────┐ ┌───────────┐  │
│  │ LLM       │ │ Safety    │ │ Skill      │ │ State     │  │
│  │ Provider  │ │ Control   │ │ Registry   │ │ Manager   │  │
│  └───────────┘ └───────────┘ └────────────┘ └───────────┘  │
│  ┌───────────┐ ┌───────────┐ ┌────────────┐                 │
│  │ Reporter  │ │ Knowledge │ │ Evidence   │                 │
│  │           │ │ Base      │ │ Graph      │                 │
│  └───────────┘ └───────────┘ └────────────┘                 │
├─────────────────────────────────────────────────────────────┤
│                     Skills (40 Plugins)                      │
│  recon/ web/ inject/ auth/ exploit/ fuzz/ stress/           │
│  network/ cloud/ vuln/                                      │
├─────────────────────────────────────────────────────────────┤
│               Docker Sandbox (optional)                      │
└─────────────────────────────────────────────────────────────┘

Multi-Agent Loop (Planner → Executor → Analyzer)

Each agent role gets a clean context window per call — no conversation history accumulation — preventing reasoning degradation over long engagements.

1. Initialize → Load config, register skills, set scope
2. PLANNER    → Analyzes state + last analysis, decides next skill(s) to run
3. EXECUTOR   → Validates safety, executes skill(s) — in parallel when possible
4. ANALYZER   → Interprets results, populates evidence graph, extracts attack chains
5. Feed back  → Analyzer's intelligence feeds forward to Planner
6. DAG check  → If task graph has independent ready tasks, batch them for parallel execution
7. Reflect    → Every 8 iterations, Analyzer assesses progress & strategy
8. Repeat     → Go to 2 (until objectives met or max iterations)
9. Report     → Generate reports with evidence graph, attack chains, and task execution history

The Executor is deterministic (no LLM) — skill execution shouldn't depend on AI reasoning. Only the Planner and Analyzer use the LLM, each with focused system prompts.

Evidence Graph

Every finding is backed by a chain of evidence. The evidence graph prevents hallucination by requiring proof at every level:

EVIDENCE (raw facts)  →  HYPOTHESIS (needs testing)  →  VULNERABILITY (confirmed)  →  EXPLOIT (proven)

Node types:

  • Evidence — Raw data from tool output (e.g., "nmap found port 8080 running Tomcat 9.0.50"). Always proven confidence.
  • Hypothesis — Inferred possibility that needs validation (e.g., "Tomcat 9.0.50 may be vulnerable to CVE-2021-42013"). Starts at low confidence.
  • Vulnerability — Confirmed security issue backed by evidence (e.g., "Path traversal confirmed via ..;/ bypass"). high confidence.
  • Exploit — Proven exploitation with demonstrated impact (e.g., "RCE achieved via JSP upload through path traversal"). proven confidence.

Edge types:

  • supports — Evidence supports a hypothesis
  • confirms — Evidence/test confirms a vulnerability
  • chains_to — One finding enables another (attack chain: SSRF → cloud creds → lateral movement)
  • disproves — Evidence disproves a hypothesis
  • leads_to — One node leads to discovering another

Anti-hallucination: Vulnerabilities and exploits without evidence backing are flagged as unsubstantiated in the final report. The Analyzer creates hypotheses for uncertain observations and promotes them to vulnerabilities only when confirmed by further testing.

Attack chain tracking: The graph automatically identifies multi-step attack paths (e.g., port scan → version detection → CVE match → path traversal → RCE) and reports them as exploitable chains with full evidence trails.

Parallel Task Execution (DAG Scheduler)

Tasks are organized as a Directed Acyclic Graph (DAG) where edges represent dependencies. Independent tasks execute in parallel via asyncio.gather, significantly speeding up engagements.

recon-nmap ──┐
recon-dns  ──┼──→ fuzz-dirs  ──┐
recon-sub  ──┘    fuzz-params ──┼──→ inject-sqli ──→ exploit-chain
                               └──→ inject-xss
                               └──→ inject-ssti

In this example:

  • 3 recon tasks run in parallel (no dependencies on each other)
  • 2 fuzz tasks run in parallel once recon-nmap completes
  • 3 injection tasks run in parallel once both fuzz tasks complete
  • exploit-chain only starts after injection confirms a vulnerability

The Planner can dynamically modify the DAG during execution via graph edit operations (ADD_TASK, UPDATE_TASK, ADD_DEPENDENCY, DEPRECATE_TASK). When the Analyzer discovers new attack surface, the Planner adds new tasks with appropriate dependencies — no full plan regeneration needed.

Cycle detection prevents invalid dependency chains. Topological sort determines execution order. Tasks are batched by priority when multiple are ready.

Built-in Skills (40)

Reconnaissance

Skill Risk Description
nmap_port_scan active Port scanning and service detection
dns_enumeration passive DNS record enumeration and zone transfers
subdomain_enum passive Subdomain discovery (brute-force + subfinder)
whois_lookup passive WHOIS registration information

Web Application

Skill Risk Description
http_header_check passive Security header analysis
dir_bruteforce active Directory and file discovery
nuclei_scan active Nuclei vulnerability scanner
ssl_check passive SSL/TLS certificate and config analysis
tech_detect passive Technology fingerprinting
web_crawler passive Page, form, and endpoint discovery
waf_detect passive WAF and CDN detection
graphql_audit active GraphQL introspection, schema dump, batching abuse, DoS testing

Injection

Skill Risk Description
sqli_check active SQL injection testing (sqlmap)
xss_check active Reflected XSS testing
ssti_check intrusive SSTI across 7+ template engines with RCE proof (Jinja2, Twig, FreeMarker, ERB, Mako, Smarty, EL)
command_injection intrusive OS command injection — output-based, blind time-based with verification, OOB callbacks
nosql_injection intrusive MongoDB operator injection ($ne, $gt, $regex), $where JS injection, blind time-based
path_traversal intrusive Path traversal / LFI with encoding bypasses, null bytes, PHP wrappers (php://filter, data://, expect://)
ssrf_check intrusive SSRF targeting cloud metadata (AWS/GCP/Azure IMDS), internal services, protocol smuggling (file://, gopher://)
xxe_injection intrusive XML External Entity — file read, blind OOB exfiltration, SVG upload, encoding bypasses

Authentication & Authorization

Skill Risk Description
jwt_attack intrusive JWT alg:none bypass, weak secret brute-force, algorithm confusion (RS256→HS256), kid injection
auth_bypass active HTTP verb tampering, path/header bypasses, 401/403 bypass, IDOR, parameter privilege escalation
session_attack active Session fixation, cookie attribute audit, token entropy analysis, logout invalidation

Exploitation

Skill Risk Description
race_condition intrusive Concurrent request racing for TOCTOU/double-spend bugs
request_smuggling intrusive HTTP request smuggling (CL.TE, TE.CL, TE.TE) via raw sockets
prototype_pollution intrusive JS prototype pollution via JSON body and query params with persistence check
deserialization active Detect insecure deserialization (Java, PHP, Python pickle, .NET ViewState)
metasploit_exploit intrusive Metasploit module execution
metasploit_search passive Search Metasploit for exploits
cve_lookup passive CVE search via NVD API

Fuzzing & Discovery

Skill Risk Description
param_fuzz active Hidden parameter discovery (100+ names) and value fuzzing (30+ payloads)
dir_fuzz active Directory/file brute-force with 150+ paths, extension testing, soft-404 detection

Network

Skill Risk Description
smb_enum active SMB share and user enumeration
snmp_enum active SNMP community string and info enumeration
service_bruteforce intrusive Credential brute-force (Hydra)

Cloud

Skill Risk Description
s3_bucket_check passive AWS S3 misconfiguration checks
azure_storage_check passive Azure Blob Storage checks
gcp_bucket_check passive GCP Cloud Storage checks

Stress Testing

Skill Risk Description
slowloris intrusive Slow-connection DoS resilience testing via raw sockets
resource_exhaustion intrusive Body size, parameter count, JSON depth, header length, concurrency thresholds

How the Agent Attacks

The agent follows an aggressive methodology, driven by LLM reasoning:

  1. Recon — Maps the full attack surface: subdomains, ports, services, technologies, endpoints
  2. Discovery — Fuzzes for hidden directories, parameters, APIs, and files
  3. Injection — Tests every discovered input for SQLi, XSS, SSTI, command injection, NoSQL injection, XXE, path traversal
  4. Auth attacks — Probes authentication: JWT manipulation, session flaws, auth bypass, IDOR
  5. Exploitation — SSRF for cloud credentials, LFI-to-RCE via PHP wrappers, prototype pollution, request smuggling, race conditions
  6. Chaining — Combines findings for maximum impact (e.g., SSRF → AWS IAM creds → lateral movement)
  7. Stress testing — Tests resilience against slowloris, resource exhaustion

The agent doesn't just report "possible SQL injection" — it proves it with extracted data, achieved RCE, or bypassed authentication.

Adding Custom Skills

Create a new file in any skills subdirectory:

from pentest_agent.skills.base import BaseSkill

class MyCustomSkill(BaseSkill):
    @property
    def name(self) -> str:
        return "my_custom_skill"

    @property
    def category(self) -> str:
        return "web"

    @property
    def description(self) -> str:
        return "Does something useful"

    @property
    def parameters(self) -> dict:
        return {
            "target": {"type": "str", "required": True, "description": "Target URL"},
        }

    @property
    def risk_level(self) -> str:
        return "active"  # passive | active | intrusive

    async def execute(self, params, state):
        # Your tool logic here
        return {
            "success": True,
            "summary": "What happened",
            "data": {"details": "here"},
            "errors": [],
        }

Skills are auto-discovered — just drop the file in and it's available.

Engagement Configuration

name: My Assessment
targets:
  - https://target.example.com
  - 10.0.1.0/24

scope:
  allowed_hosts:
    - 10.0.1.0/24
    - target.example.com
  excluded_hosts:
    - 10.0.1.1
  allowed_domains:
    - "*.example.com"
  allowed_ports: [80, 443, 8080]
  max_severity: intrusive   # passive | active | intrusive
  time_window:
    start: "2024-01-15T09:00:00Z"
    end: "2024-01-15T18:00:00Z"

objectives:
  - Identify and exploit web application vulnerabilities
  - Attempt authentication bypass and privilege escalation
  - Test for RCE via injection and deserialization
  - Enumerate exposed services and cloud misconfigurations

llm:
  provider: auto          # auto | claude-code | anthropic | openai | litellm
  model: claude-sonnet-4-20250514

max_iterations: 50
require_approval:
  - command_injection
  - ssti_check
  - ssrf_check
  - path_traversal
  - nosql_injection
  - xxe_injection
  - jwt_attack
  - race_condition
  - request_smuggling
  - slowloris
  - resource_exhaustion
  - service_bruteforce
  - metasploit_exploit

report:
  format: all             # json | html | markdown | all
  output_dir: ./reports

Safety Controls

pentest-agent enforces multiple safety layers:

Control Description
Scope enforcement Only approved targets, domains, hosts, and ports
Exclusion lists Hosts that must never be touched
Risk level caps Maximum intrusiveness (passive → active → intrusive)
Rate limiting Global RPM and per-target request caps
Time windows Testing only during approved hours
Approval gates Human sign-off for intrusive operations
Audit logging Every action recorded with timestamps

Intrusive skills (command injection, SSTI, SSRF, stress tests, etc.) require explicit approval by default.

Docker

Run the agent in Docker

docker compose up pentest-agent

Build the sandbox image

docker compose build sandbox

The sandbox image includes Kali tools (nmap, sqlmap, hydra, nuclei, subfinder, etc.) and runs with security constraints (read-only filesystem, memory limits, no privilege escalation).

Benchmarking (XBOW)

pentest-agent includes a built-in benchmark runner compatible with the XBOW validation benchmarks — 104 Docker-based CTF challenges that are the industry standard for evaluating AI penetration testing agents.

Each challenge spins up a vulnerable Docker application with a random flag. The agent must find and extract the flag through exploitation — not just detection.

Running benchmarks

# Clone the XBOW benchmark suite
git clone https://github.com/xbow-engineering/validation-benchmarks.git

# Run all challenges
pentest-agent benchmark ./validation-benchmarks

# Run a specific category
pentest-agent benchmark ./validation-benchmarks --category web

# Run 10 challenges with 3 in parallel
pentest-agent benchmark ./validation-benchmarks -n 10 --parallel 3

# Filter by difficulty
pentest-agent benchmark ./validation-benchmarks --difficulty easy

# Use a specific model
pentest-agent benchmark ./validation-benchmarks --model claude-sonnet-4-20250514 --provider anthropic

How it works

1. Discovery  → Scan benchmark directory for challenges (docker-compose.yml + metadata)
2. Setup      → Build & start Docker container with random flag (FLAG{<hex>})
3. Agent run  → pentest-agent runs a full engagement against the challenge
4. Flag check → Search all agent output, findings, and evidence for the flag
5. Teardown   → Stop and remove containers
6. Report     → Aggregate results by category, difficulty, and overall success rate

Benchmark output

Results are saved to ./benchmark-results/benchmark-results.json with:

  • Per-challenge: pass/fail, time, iterations, tokens, findings count
  • By category: success rate per challenge type (web, crypto, misc, etc.)
  • By difficulty: success rate per difficulty level
  • Aggregates: overall success rate, total time, total tokens, averages
┌───────────────────┬────────┐
│ Metric            │  Value │
├───────────────────┼────────┤
│ Total challenges  │    104 │
│ Passed            │     42 │
│ Failed            │     58 │
│ Errored           │      4 │
│ Success rate      │  40.4% │
│ Total time        │ 5420s  │
│ Total tokens      │ 892000 │
└───────────────────┴────────┘

Challenge structure

The runner expects this directory layout:

validation-benchmarks/
  web/
    sqli-login/
      docker-compose.yml    # Must accept FLAG env var
      challenge.json        # Optional: {difficulty, description, target_port}
    xss-reflected/
      docker-compose.yml
  crypto/
    weak-jwt/
      docker-compose.yml

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check .

# Type check
mypy pentest_agent

Requirements

  • Python 3.11+
  • One of the following for LLM access:
    • Claude Code CLI installed and authenticated (zero-config, uses your existing subscription)
    • ANTHROPIC_API_KEY environment variable for direct Anthropic API
    • OPENAI_API_KEY for OpenAI-compatible models
  • Security tools installed for skills you want to use (nmap, sqlmap, nuclei, etc.)
  • Docker (optional, for sandboxed execution)

Plug-and-play: If you have Claude Code installed, just pip install -e . and run. No API keys needed.

License

MIT

Disclaimer

This tool is designed for authorized security testing only. Always obtain proper written authorization before testing any systems. Unauthorized access to computer systems is illegal. The authors are not responsible for misuse of this tool.

About

Agentic penetration testing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages