Skip to content

SongTonyLi/claude-cve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

claude-cve

End-to-end CVE reproduction and patching driven by Claude Code (the CLI), using the same security system prompt philosophy as the arise-sec-lion multi-agent system.

The point of this directory is to run a head-to-head comparison: same CVE, same security instructions, two agentic systems. arise-sec-lion uses a BOSS → MANAGER → WORKER tree; Claude Code can't orchestrate that cleanly, so here the entire Builder → Exploiter → Fixer → Reporter pipeline is flattened into one linear workflow executed by a single Claude Code session. The prompt in CLAUDE.md is a condensed, de-treed port of arise-sec-lion/prompts/domains/secbench/.

Layout

claude-cve/
├── CLAUDE.md                    # flat security system prompt (Build → Exploit → Fix → Report)
├── .mcp.json                    # MCP server config — filesystem access to ./output
├── run-cve.sh                   # launcher: starts a sandbox container + Claude Code
├── instances/                   # self-contained SEC-bench instance catalog
│   ├── fetch_secbench.py        # download instances from HuggingFace SEC-bench/SEC-bench
│   ├── <instance_id>.json       # one JSON per CVE instance (14 pre-seeded)
│   └── ...
├── output/                      # per-run artifacts, one subdir per CVE instance_id
│   └── <instance_id>/testcase/
│       ├── base_commit_hash
│       ├── repro.sh
│       ├── model_patch.diff
│       └── security_report.md
└── README.md

claude-cve is standalone — it no longer depends on the arise-sec-lion repo at runtime. The instance catalog lives under ./instances/ and the docker tools layer is built inline by run-cve.sh.

Prerequisites

  • Docker (with hwiwonlee/secb.eval.x86_64.* or songtli/secb.eval.x86_64.* base images accessible, or network to pull)
  • Claude Code CLI (claude) on your PATH
  • Python 3 (used by run-cve.sh to parse the CVE JSON)
  • Node.js (for the MCP filesystem server that ships as @modelcontextprotocol/server-filesystem)
  • (optional) pip install datasets — only needed if you want to fetch more instances from HuggingFace

CVE Instance Catalog

The ./instances/ directory is the single source of truth for available CVE instances. 14 instances are pre-seeded (copied from SEC-bench's canonical fixtures), and more can be pulled from the upstream HuggingFace dataset at any time.

# list local + upstream instances
python instances/fetch_secbench.py --list

# pull one more instance by id
python instances/fetch_secbench.py --instance-id gpac.cve-2023-0770

# pull everything from the 'cve' split
python instances/fetch_secbench.py --split cve

# dry run (show what would be saved)
python instances/fetch_secbench.py --dry-run

Each instance JSON uses the same flat schema as SEC-bench itself (instance_id, repo, work_dir, base_commit, bug_description, sanitizer_report, etc.), so instances are fully interchangeable with the fixtures under arise-sec-lion/plugins/security/tests/fixtures/.

Instances can specify a docker_image_override field to use a custom Docker image instead of the default hwiwonlee/secb.eval.x86_64.* tag scheme. This is used for images hosted under alternative registries (e.g. songtli/secb.eval.x86_64.*). See libretro-common.cve-2025-9809.json for an example.

How it works

             ┌────────────────── host ──────────────────┐
  you ───►  claude (from /home/songli/claude-cve)
                 │         uses CLAUDE.md as system prompt
                 │         uses .mcp.json → filesystem MCP on ./output
                 │
                 ├── Read/Write/Edit  ─►  ./output/<id>/testcase/   (bind-mounted)
                 │
                 └── Bash ─► docker exec $CVE_CONTAINER_NAME bash -c '…'
                                      │
             ┌────────────────── container ─────────────┐
             │  secb-tools:<id>-patch                   │
             │  /src/<project>/   ← vulnerable repo     │
             │  /work/bin/        ← sanitizer binaries  │
             │  /testcase/        ← shared with host    │
             └──────────────────────────────────────────┘

run-cve.sh starts a long-lived sandbox container and bind-mounts ./output/<instance_id>/testcase/ to /testcase/ in the container, so Claude can use its local Read/Write/Edit tools on the same files it builds, exploits, and patches inside the container via docker exec.

The filesystem MCP server (.mcp.json) gives Claude a structured view of ./output/ so the security report and patch files can be inspected without shell-parsing.

Running a CVE

cd /home/songli/claude-cve

# by instance id (resolved against ./instances/)
./run-cve.sh libredwg.cve-2020-21816

# instance with a custom docker image (songtli/secb.eval.x86_64.*)
./run-cve.sh libretro-common.cve-2025-9809

# or by explicit path
./run-cve.sh ./instances/libredwg.cve-2020-21816.json

The launcher will:

  1. Resolve the instance JSON from ./instances/ (or take an explicit path).
  2. Parse instance_id, work_dir, base_commit, sanitizer, and the base docker image (from docker_image_override if present, otherwise falling back to the canonical hwiwonlee/secb.eval.x86_64.<project>.<cve>:patch tag).
  3. Build or reuse secb-tools:<instance_id>-patch — SEC-bench base image layered with valgrind / cppcheck / gdb / cflow / strace / flawfinder / libasan runtime, built inline from a self-contained Dockerfile (no arise-sec-lion dependency).
  4. Recreate a sandbox container claude-cve-<instance_id> and bind-mount ./output/<instance_id>/testcase//testcase/.
  5. Export CVE_JSON_PATH, CVE_INSTANCE_ID, CVE_CONTAINER_NAME, CVE_WORK_DIR, CVE_SANITIZER, CVE_BASE_COMMIT.
  6. Launch claude --permission-mode bypassPermissions with an initial prompt that starts the pipeline immediately.

From there, Claude reads CLAUDE.md, the CVE JSON, and walks the four phases linearly.

What to expect as the user

The launcher starts Claude Code in fully autonomous mode (--permission-mode bypassPermissions) and primes it with an initial prompt, so you do not have to click through tool approvals or re-prompt between phases:

  • Claude starts working immediately after the sandbox banner.
  • All tool uses (Bash, docker exec, Write, etc.) auto-approve.
  • Progress is visible on disk at output/<instance_id>/testcase/ — you can tail -f files there from another terminal.
  • The run is done when security_report.md exists and Claude prints its final summary, at which point Claude stops on its own.

If you'd rather run with the normal per-tool approval UX, drop the --permission-mode bypassPermissions flag from run-cve.sh.

Running a different CVE

Just point the launcher at a different JSON file:

./run-cve.sh gpac.cve-2023-5586

Each instance gets its own sandbox container and its own output/<id>/ subdirectory, so runs don't collide.

Cleaning up

# stop the sandbox for one instance
docker rm -f claude-cve-cjson-2016-10749

# wipe a run's artifacts
rm -rf output/cjson.cve-2016-10749

# nuke every sandbox this tool started
docker ps -a --filter 'name=claude-cve-' -q | xargs -r docker rm -f

Comparing against arise-sec-lion

Both systems consume the same SEC-bench instance schema, so a CVE JSON file from either arise-sec-lion/plugins/security/tests/fixtures/ or claude-cve/instances/ can be fed to either system unchanged.

Step arise-sec-lion claude-cve
instance source plugins/security/tests/fixtures/*.json ./instances/*.json
launch docker exec arise-app python main.py run --domain-context-file plugins/security/tests/fixtures/<cve.json> --domain security '<task>' ./run-cve.sh <instance-id>
architecture 4-phase BOSS → MANAGER → WORKER tree (~15-20 agents) single Claude Code session, linear 4-phase
prompt source prompts/domains/secbench/*.j2 (Jinja, role-specialized) CLAUDE.md (de-treed, single file)
artifacts arise-sec-lion/output/<boss_id>/testcase/ claude-cve/output/<instance_id>/testcase/

The deliverable filenames are intentionally identical (base_commit_hash, repro.sh, model_patch.diff, security_report.md) so you can diff the two systems' outputs directly.

Caveats

  • No sub-agent orchestration. Claude Code can spawn Task agents, but the CLAUDE.md deliberately keeps the work in a single session — the comparison we care about is system-prompt effectiveness, not sub-agent plumbing.
  • run-cve.sh assumes the SEC-bench base images exist. First run for a new CVE may download several GB from Docker Hub.
  • MCP filesystem access is scoped to ./output. Claude cannot read your home dir through MCP; for source code it uses docker exec into the sandbox, which is the isolation boundary we want.
  • The launcher invokes exec claude. Your interactive Claude Code session replaces the shell — Ctrl-C out if you want the shell back.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors