End-to-end CVE reproduction and patching driven by Claude Code (the CLI), using the
same security system prompt philosophy as the arise-sec-lion multi-agent system.
The point of this directory is to run a head-to-head comparison: same CVE, same security
instructions, two agentic systems. arise-sec-lion uses a BOSS → MANAGER → WORKER tree;
Claude Code can't orchestrate that cleanly, so here the entire Builder → Exploiter → Fixer
→ Reporter pipeline is flattened into one linear workflow executed by a single Claude
Code session. The prompt in CLAUDE.md is a condensed, de-treed port of
arise-sec-lion/prompts/domains/secbench/.
claude-cve/
├── CLAUDE.md # flat security system prompt (Build → Exploit → Fix → Report)
├── .mcp.json # MCP server config — filesystem access to ./output
├── run-cve.sh # launcher: starts a sandbox container + Claude Code
├── instances/ # self-contained SEC-bench instance catalog
│ ├── fetch_secbench.py # download instances from HuggingFace SEC-bench/SEC-bench
│ ├── <instance_id>.json # one JSON per CVE instance (14 pre-seeded)
│ └── ...
├── output/ # per-run artifacts, one subdir per CVE instance_id
│ └── <instance_id>/testcase/
│ ├── base_commit_hash
│ ├── repro.sh
│ ├── model_patch.diff
│ └── security_report.md
└── README.md
claude-cve is standalone — it no longer depends on the arise-sec-lion repo at
runtime. The instance catalog lives under ./instances/ and the docker tools layer is
built inline by run-cve.sh.
- Docker (with
hwiwonlee/secb.eval.x86_64.*orsongtli/secb.eval.x86_64.*base images accessible, or network to pull) - Claude Code CLI (
claude) on your PATH - Python 3 (used by
run-cve.shto parse the CVE JSON) - Node.js (for the MCP filesystem server that ships as
@modelcontextprotocol/server-filesystem) - (optional)
pip install datasets— only needed if you want to fetch more instances from HuggingFace
The ./instances/ directory is the single source of truth for available CVE instances.
14 instances are pre-seeded (copied from SEC-bench's canonical fixtures), and more can
be pulled from the upstream HuggingFace dataset at any time.
# list local + upstream instances
python instances/fetch_secbench.py --list
# pull one more instance by id
python instances/fetch_secbench.py --instance-id gpac.cve-2023-0770
# pull everything from the 'cve' split
python instances/fetch_secbench.py --split cve
# dry run (show what would be saved)
python instances/fetch_secbench.py --dry-runEach instance JSON uses the same flat schema as SEC-bench itself (instance_id, repo,
work_dir, base_commit, bug_description, sanitizer_report, etc.), so instances are
fully interchangeable with the fixtures under arise-sec-lion/plugins/security/tests/fixtures/.
Instances can specify a docker_image_override field to use a custom Docker image instead
of the default hwiwonlee/secb.eval.x86_64.* tag scheme. This is used for images hosted
under alternative registries (e.g. songtli/secb.eval.x86_64.*). See
libretro-common.cve-2025-9809.json for an example.
┌────────────────── host ──────────────────┐
you ───► claude (from /home/songli/claude-cve)
│ uses CLAUDE.md as system prompt
│ uses .mcp.json → filesystem MCP on ./output
│
├── Read/Write/Edit ─► ./output/<id>/testcase/ (bind-mounted)
│
└── Bash ─► docker exec $CVE_CONTAINER_NAME bash -c '…'
│
┌────────────────── container ─────────────┐
│ secb-tools:<id>-patch │
│ /src/<project>/ ← vulnerable repo │
│ /work/bin/ ← sanitizer binaries │
│ /testcase/ ← shared with host │
└──────────────────────────────────────────┘
run-cve.sh starts a long-lived sandbox container and bind-mounts
./output/<instance_id>/testcase/ to /testcase/ in the container, so Claude can use its
local Read/Write/Edit tools on the same files it builds, exploits, and patches
inside the container via docker exec.
The filesystem MCP server (.mcp.json) gives Claude a structured view of ./output/ so
the security report and patch files can be inspected without shell-parsing.
cd /home/songli/claude-cve
# by instance id (resolved against ./instances/)
./run-cve.sh libredwg.cve-2020-21816
# instance with a custom docker image (songtli/secb.eval.x86_64.*)
./run-cve.sh libretro-common.cve-2025-9809
# or by explicit path
./run-cve.sh ./instances/libredwg.cve-2020-21816.jsonThe launcher will:
- Resolve the instance JSON from
./instances/(or take an explicit path). - Parse
instance_id,work_dir,base_commit,sanitizer, and the base docker image (fromdocker_image_overrideif present, otherwise falling back to the canonicalhwiwonlee/secb.eval.x86_64.<project>.<cve>:patchtag). - Build or reuse
secb-tools:<instance_id>-patch— SEC-bench base image layered with valgrind / cppcheck / gdb / cflow / strace / flawfinder / libasan runtime, built inline from a self-contained Dockerfile (no arise-sec-lion dependency). - Recreate a sandbox container
claude-cve-<instance_id>and bind-mount./output/<instance_id>/testcase/→/testcase/. - Export
CVE_JSON_PATH,CVE_INSTANCE_ID,CVE_CONTAINER_NAME,CVE_WORK_DIR,CVE_SANITIZER,CVE_BASE_COMMIT. - Launch
claude --permission-mode bypassPermissionswith an initial prompt that starts the pipeline immediately.
From there, Claude reads CLAUDE.md, the CVE JSON, and walks the four phases linearly.
The launcher starts Claude Code in fully autonomous mode
(--permission-mode bypassPermissions) and primes it with an initial prompt, so you do
not have to click through tool approvals or re-prompt between phases:
- Claude starts working immediately after the sandbox banner.
- All tool uses (
Bash,docker exec,Write, etc.) auto-approve. - Progress is visible on disk at
output/<instance_id>/testcase/— you cantail -ffiles there from another terminal. - The run is done when
security_report.mdexists and Claude prints its final summary, at which point Claude stops on its own.
If you'd rather run with the normal per-tool approval UX, drop the
--permission-mode bypassPermissions flag from run-cve.sh.
Just point the launcher at a different JSON file:
./run-cve.sh gpac.cve-2023-5586Each instance gets its own sandbox container and its own output/<id>/ subdirectory, so
runs don't collide.
# stop the sandbox for one instance
docker rm -f claude-cve-cjson-2016-10749
# wipe a run's artifacts
rm -rf output/cjson.cve-2016-10749
# nuke every sandbox this tool started
docker ps -a --filter 'name=claude-cve-' -q | xargs -r docker rm -fBoth systems consume the same SEC-bench instance schema, so a CVE JSON file from
either arise-sec-lion/plugins/security/tests/fixtures/ or
claude-cve/instances/ can be fed to either system unchanged.
| Step | arise-sec-lion | claude-cve |
|---|---|---|
| instance source | plugins/security/tests/fixtures/*.json |
./instances/*.json |
| launch | docker exec arise-app python main.py run --domain-context-file plugins/security/tests/fixtures/<cve.json> --domain security '<task>' |
./run-cve.sh <instance-id> |
| architecture | 4-phase BOSS → MANAGER → WORKER tree (~15-20 agents) | single Claude Code session, linear 4-phase |
| prompt source | prompts/domains/secbench/*.j2 (Jinja, role-specialized) |
CLAUDE.md (de-treed, single file) |
| artifacts | arise-sec-lion/output/<boss_id>/testcase/ |
claude-cve/output/<instance_id>/testcase/ |
The deliverable filenames are intentionally identical (base_commit_hash, repro.sh,
model_patch.diff, security_report.md) so you can diff the two systems' outputs directly.
- No sub-agent orchestration. Claude Code can spawn Task agents, but the CLAUDE.md deliberately keeps the work in a single session — the comparison we care about is system-prompt effectiveness, not sub-agent plumbing.
run-cve.shassumes the SEC-bench base images exist. First run for a new CVE may download several GB from Docker Hub.- MCP filesystem access is scoped to
./output. Claude cannot read your home dir through MCP; for source code it usesdocker execinto the sandbox, which is the isolation boundary we want. - The launcher invokes
exec claude. Your interactive Claude Code session replaces the shell —Ctrl-Cout if you want the shell back.