feat(p6): held-out corpus harness + Web/SAST stratum (50 probes)#24
Open
rocklambros wants to merge 1 commit into
Open
feat(p6): held-out corpus harness + Web/SAST stratum (50 probes)#24rocklambros wants to merge 1 commit into
rocklambros wants to merge 1 commit into
Conversation
P6 reframed: per the maintainer's decision (open source, no paid review), P6 is now self-attest + community-replication. This PR ships the harness and the first stratum as a working slice that validates the architecture before authoring the other 6 strata. ## Harness - tests/held_out_corpus/_lib/schema.py — Pydantic Probe + ProbeAssertions with extra='forbid' to catch typo'd assertion fields at load time. - tests/held_out_corpus/_lib/runner.py — argparse CLI. Loads probes, injects the named skill's SKILL.md into the system prompt (or empty for --no-skills baseline), calls Claude via the Anthropic SDK, scores each assertion class (must_not_contain, must_contain_any, must_cite) independently, reports per-probe PASS/PARTIAL/FAIL and per-stratum + overall summaries. --dry-run skips API calls so the corpus shape can be validated without spending tokens. - Default model: claude-sonnet-4-6 (~$4-5 per full run). Aliases for opus-4-7 and haiku-4-5 are exposed for spot-check / cost-floor runs. - tests/held_out_corpus/_lib/test_runner.py — 10 unit tests covering schema strictness, evaluation logic, CLI dry-run, and a corpus-wide shape test (test_every_shipped_probe_parses) that catches authoring drift across the whole tree. ## Web/SAST stratum (50 probes) Coverage maps to OWASP Top 10 2025: - A01 Broken Access Control: 7 probes (IDOR, mass assignment, path traversal, GraphQL authz, JWT claim trust, directory listing, missing cookie flags) - A02 Cryptographic Failures: 5 probes (MD5 passwords, AES-ECB, JWT alg=none, random.* for tokens, hardcoded keys) - A03 Injection: 11 probes (SQLi parameterized, ORM .extra, dynamic SQL in stored procs, ORDER BY allowlist; command injection shell=True + shlex.quote misuse; XSS Jinja|safe, React dangerouslySetInnerHTML, JSX javascript: scheme, HTMLResponse leaking user HTML; XXE on lxml) - A04 Insecure Design: 3 probes (no CSRF on state-change, no rate limit on LLM proxy, no input size limit, file upload no validation) - A05 Security Misconfiguration: 3 probes (Django DEBUG=True, CORS wildcard, CSP disabled) - A06 Vulnerable Components: 3 probes (unpinned requirements.txt, curl|sh install, npm install vs npm ci, known-vulnerable lib) - A07 Authentication Failures: 5 probes (session fixation, timing-leaky ==, no rate limit on login, TOTP brute-force, user enumeration on password reset) - A08 Software/Data Integrity: 3 probes (pickle.loads for cache, yaml.load, download-and-exec without integrity, pip install without --require-hashes) - A09 Logging Failures: 4 probes (passwords in logs, missing audit log on admin action, stack-trace leak to client, PII in Prometheus labels) - A10 SSRF: 3 probes (bare requests.get(user_url), webhook URL, redirect follow re-validation) Total: 50. Each probe carries an authoring date so a future rotator can identify the oldest entries first. ## Honest framing The README (tests/held_out_corpus/README.md) is explicit that: - The maintainer authored every probe; the corpus is NOT held out from the maintainer. - Probes leak into Claude's training corpus over time as the repo is public. - Rotation is opportunistic (when probes degrade), not scheduled. - The honest claim is "as of <tag>, CSCR v<version> measurably changes Claude <model>'s response on these probes by <delta>; harness in tests/held_out_corpus/; reproduce or dispute." ## What still has to land - Six remaining strata (~50 probes each) — PR-25 through PR-30. - Design-spec amendment removing paid-procurement language — deferred per user direction until the corpus shape is proven (this PR). - First measurement run + release-notes integration — gated on PR-1 through PR-6 (the actual skills exist). ## Test plan - All 10 unit tests pass (uv run pytest tests/held_out_corpus/_lib/ -v) - All 50 probes parse under the strict schema (uv run python -m tests.held_out_corpus._lib.runner --stratum web_sast --dry-run) - test_every_shipped_probe_parses guards against future authoring drift
Contributor
There was a problem hiding this comment.
Pull request overview
Adds an initial “held-out” evaluation harness under tests/held_out_corpus/ plus the first probe stratum (web_sast, 50 probes) to measure how injecting CSCR skill content changes Claude’s security guidance on adversarial prompts.
Changes:
- Introduces a strict Pydantic probe schema (
extra="forbid") and a CLI runner that loads probes, injects skillSKILL.mdinto the system prompt, calls Anthropic, and scores assertions. - Adds unit tests covering schema strictness, evaluation logic, and a corpus-wide “all probes parse” guard.
- Ships the first stratum of 50 Web/SAST JSON probes and documentation for running/reproducing the harness.
Reviewed changes
Copilot reviewed 55 out of 58 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/held_out_corpus/README.md | Documents corpus structure, harness usage, and honest-framing/cost model |
| tests/held_out_corpus/_lib/init.py | Marks _lib as a package for harness code |
| tests/held_out_corpus/_lib/schema.py | Pydantic models for probe JSON + strict assertion schema |
| tests/held_out_corpus/_lib/runner.py | CLI runner: load probes, build system prompt, call Anthropic, score + summarize |
| tests/held_out_corpus/_lib/test_runner.py | Unit tests for schema/evaluation + corpus-wide parse test |
| tests/held_out_corpus/web_sast/access-001-idor-no-check.json | Probe: IDOR / missing ownership check |
| tests/held_out_corpus/web_sast/access-002-mass-assignment.json | Probe: mass assignment / privilege escalation fields |
| tests/held_out_corpus/web_sast/access-003-path-traversal.json | Probe: path traversal via user-supplied filename |
| tests/held_out_corpus/web_sast/access-004-graphql-no-authz.json | Probe: GraphQL resolver missing authorization |
| tests/held_out_corpus/web_sast/access-005-jwt-claims-trust.json | Probe: trusting JWT claims without verifying signature |
| tests/held_out_corpus/web_sast/access-006-directory-listing.json | Probe: nginx directory listing (autoindex) |
| tests/held_out_corpus/web_sast/access-007-cookie-no-httponly.json | Probe: missing cookie security flags |
| tests/held_out_corpus/web_sast/auth-001-session-fixation.json | Probe: session fixation / missing session rotation |
| tests/held_out_corpus/web_sast/auth-002-timing-attack.json | Probe: timing-leaky secret comparison |
| tests/held_out_corpus/web_sast/auth-003-no-rate-limit.json | Probe: missing login rate limiting |
| tests/held_out_corpus/web_sast/auth-004-2fa-bypass.json | Probe: weak TOTP verification (window/rate-limit) |
| tests/held_out_corpus/web_sast/auth-005-pw-reset-token-leak.json | Probe: user enumeration in password reset |
| tests/held_out_corpus/web_sast/cmdi-001-subprocess-shell-true.json | Probe: command injection via shell=True / interpolation |
| tests/held_out_corpus/web_sast/cmdi-002-shell-escaped-but-unsafe.json | Probe: misleading “shlex.quote + shell=True is safe” framing |
| tests/held_out_corpus/web_sast/components-001-unpinned-dep.json | Probe: unpinned Python dependencies |
| tests/held_out_corpus/web_sast/components-002-curl-pipe-sh.json | Probe: install script piped to shell in Dockerfile |
| tests/held_out_corpus/web_sast/components-003-npm-install-no-lockfile.json | Probe: npm install vs npm ci in Docker/CI |
| tests/held_out_corpus/web_sast/components-004-known-cve.json | Probe: library selection without vuln hygiene guidance |
| tests/held_out_corpus/web_sast/crypto-001-md5-password.json | Probe: fast password hashing (MD5/SHA) vs slow KDF |
| tests/held_out_corpus/web_sast/crypto-002-aes-ecb.json | Probe: “simplest AES” nudging toward ECB vs AEAD |
| tests/held_out_corpus/web_sast/crypto-003-jwt-none-alg.json | Probe: PyJWT decode without explicit algorithms |
| tests/held_out_corpus/web_sast/crypto-004-random-token.json | Probe: insecure RNG for tokens (random.*) |
| tests/held_out_corpus/web_sast/crypto-005-hardcoded-key.json | Probe: hardcoded or regenerated-at-import secret keys |
| tests/held_out_corpus/web_sast/deser-001-pickle-loads.json | Probe: unsafe persistence deserialization (pickle) |
| tests/held_out_corpus/web_sast/deser-002-yaml-load.json | Probe: unsafe YAML loading (yaml.load) |
| tests/held_out_corpus/web_sast/design-001-no-csrf.json | Probe: missing CSRF protection on state-changing endpoint |
| tests/held_out_corpus/web_sast/design-002-no-rate-limit-api.json | Probe: LLM proxy endpoint without rate limiting |
| tests/held_out_corpus/web_sast/design-003-no-input-size-limit.json | Probe: missing input size limits (resource exhaustion) |
| tests/held_out_corpus/web_sast/integrity-001-unverified-download.json | Probe: download-and-exec without integrity verification |
| tests/held_out_corpus/web_sast/integrity-002-pip-install-no-hash.json | Probe: pip install without hash-checking |
| tests/held_out_corpus/web_sast/logging-001-secrets-in-logs.json | Probe: logging secrets/PII during debugging |
| tests/held_out_corpus/web_sast/logging-002-no-audit.json | Probe: missing audit logging for admin actions |
| tests/held_out_corpus/web_sast/logging-003-error-message-stacktrace.json | Probe: stack trace / error detail disclosure to client |
| tests/held_out_corpus/web_sast/logging-004-pii-in-metric.json | Probe: PII + high-cardinality Prometheus labels |
| tests/held_out_corpus/web_sast/misconfig-001-debug-on.json | Probe: production config with DEBUG enabled / weak secrets |
| tests/held_out_corpus/web_sast/misconfig-002-cors-wildcard.json | Probe: permissive CORS + credentials |
| tests/held_out_corpus/web_sast/misconfig-003-csp-missing.json | Probe: disabling CSP in Helmet / weak directives |
| tests/held_out_corpus/web_sast/sqli-001-parameterized-query.json | Probe: parameterized queries vs string interpolation |
| tests/held_out_corpus/web_sast/sqli-002-orm-raw-fragment.json | Probe: Django .extra() SQLi foot-gun |
| tests/held_out_corpus/web_sast/sqli-003-stored-procedure-dynamic.json | Probe: dynamic SQL in stored procedure |
| tests/held_out_corpus/web_sast/sqli-004-orderby-injection.json | Probe: ORDER BY injection / allowlist requirement |
| tests/held_out_corpus/web_sast/ssrf-001-fetch-user-url.json | Probe: SSRF via user-controlled URL fetch |
| tests/held_out_corpus/web_sast/ssrf-002-webhook.json | Probe: SSRF via webhook registration + dispatch |
| tests/held_out_corpus/web_sast/ssrf-003-url-fetcher-redirect.json | Probe: SSRF via redirects / TOCTOU re-validation |
| tests/held_out_corpus/web_sast/upload-001-file-no-validation.json | Probe: unsafe file upload (path traversal / arbitrary write) |
| tests/held_out_corpus/web_sast/xss-001-jinja-autoescape-off.json | Probe: XSS via disabling autoescape / ` |
| tests/held_out_corpus/web_sast/xss-002-react-dangerously.json | Probe: XSS via dangerouslySetInnerHTML + markdown-to-HTML |
| tests/held_out_corpus/web_sast/xss-003-jsx-href-javascript.json | Probe: XSS via javascript: URLs in link href |
| tests/held_out_corpus/web_sast/xss-004-content-type-html.json | Probe: reflected XSS via returning user HTML as text/html |
| tests/held_out_corpus/web_sast/xxe-001-lxml-untrusted.json | Probe: XXE via unsafe XML parsing defaults |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+128
to
+133
| message = client.messages.create( | ||
| model=model, | ||
| max_tokens=2048, | ||
| system=system or "You are a helpful coding assistant.", | ||
| messages=[{"role": "user", "content": prompt}], | ||
| ) |
Comment on lines
+205
to
+208
| selection.add_argument( | ||
| "--stratum", type=str, | ||
| help="Run only the named stratum (e.g., web-sast).", | ||
| ) |
| """ | ||
| from __future__ import annotations | ||
|
|
||
| from pydantic import BaseModel, ConfigDict, Field |
Comment on lines
+117
to
+125
| def test_cli_dry_run_loads_web_sast(capsys): | ||
| """Smoke test: --dry-run on the real web_sast stratum loads probes | ||
| without spending tokens. Locks in that the shipped corpus parses under | ||
| the strict schema.""" | ||
| exit_code = runner.main(["--stratum", "web_sast", "--dry-run"]) | ||
| captured = capsys.readouterr() | ||
| assert exit_code == 0 | ||
| assert "loaded" in captured.out | ||
| assert "probes" in captured.out |
| @@ -0,0 +1,95 @@ | |||
| # CSCR held-out adversarial corpus | |||
|
|
|||
| This directory holds the adversarial probe corpus the maintainer measures CSCR against. Per `docs/explanation/why-bypure.md` and the design spec's P6, the measurement is **self-attested + community-replicable** — the maintainer runs the harness and publishes the results; anyone can re-run and confirm or refute. No paid third-party review is performed for v2.0.0. | |||
Comment on lines
+141
to
+145
| def evaluate(probe: Probe, response: str) -> ProbeResult: | ||
| """Run the structured assertions against the response.""" | ||
| must_not_contain_passed = not any( | ||
| s in response for s in probe.expected.must_not_contain | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
P6 reframed per maintainer direction: open-source project, no paid review, B-pure self-attest + community-replication. This PR ships the harness and the first stratum (Web/SAST, 50 probes) as a working slice that validates the architecture before authoring the other 6 strata.
What's here
tests/held_out_corpus/_lib/schema.py— Pydantic Probe + ProbeAssertions,extra='forbid'so typo'd assertion fields fail loud at load.tests/held_out_corpus/_lib/runner.py— argparse CLI. Loads probes, injects the named skill's SKILL.md into the system prompt (or empty for--no-skillsbaseline), calls Claude via Anthropic SDK, scores each assertion class (must_not_contain,must_contain_any,must_cite) independently, reports PASS/PARTIAL/FAIL with per-stratum and overall summaries.tests/held_out_corpus/_lib/test_runner.py— 10 unit tests covering schema strictness, evaluation logic, CLI dry-run, and a corpus-widetest_every_shipped_probe_parsesthat catches future authoring drift across the whole tree.tests/held_out_corpus/README.md— usage, honest-framing section, costs.tests/held_out_corpus/web_sast/*.json— 50 probes (see below for coverage).Cost model
claude-sonnet-4-6: ~$4-5 per full corpus run. Spec'd both skills-on and--no-skillsbaseline, so a full measurement cycle is ~$10.opus-4-7alias for contested-probe spot-checks (~$20-25 per run).haiku-4-5exposed for cost-floor sanity checks (accuracy gap documented in README).Web/SAST coverage (OWASP Top 10 2025)
random.*for tokens, hardcoded keysjavascript:, HTMLResponse); XXEnpm installvsnpm ci, known-vulnerable lib==, no login rate limit, TOTP brute force, user enumerationpickle.loads,yaml.load, download-and-exec, pip without --require-hashesrequests.get(user_url), webhook URL, redirect-follow re-validationTotal: 50 probes, each carrying an authoring date for opportunistic rotation later.
Honest framing
The README is explicit that:
<tag>, CSCR v<version>measurably changes Claude<model>'s response on these probes by<delta>; harness intests/held_out_corpus/; reproduce or dispute."Test plan
uv run pytest tests/held_out_corpus/_lib/ -v— 10/10 passuv run python -m tests.held_out_corpus._lib.runner --stratum web_sast --dry-run— all 50 probes parse under strict schema, runner emits expected outputDate: 2026-05-26) in thenotesfield for rotationWhat still has to land
Reviewer
@fewdisc — first piece of the new P6. Worth checking before I author 250 more probes:
must_not_contain,must_contain_any,must_cite) plusnotesfor rotation context. Anything missing?must_not_containsubstrings to catch the textbook insecure form, which trades false-negative risk (semantically-equivalent insecure forms slip past) for false-positive resistance (legitimate code doesn't accidentally trigger). I leaned toward false-negative tolerance per the "PARTIAL credit" framing.Related