Skip to content

feat(p6): held-out corpus harness + Web/SAST stratum (50 probes)#24

Open
rocklambros wants to merge 1 commit into
v2/modernizationfrom
feat/p6-harness-and-web-sast
Open

feat(p6): held-out corpus harness + Web/SAST stratum (50 probes)#24
rocklambros wants to merge 1 commit into
v2/modernizationfrom
feat/p6-harness-and-web-sast

Conversation

@rocklambros
Copy link
Copy Markdown
Member

Summary

P6 reframed per maintainer direction: open-source project, no paid review, B-pure self-attest + community-replication. This PR ships the harness and the first stratum (Web/SAST, 50 probes) as a working slice that validates the architecture before authoring the other 6 strata.

What's here

  • tests/held_out_corpus/_lib/schema.py — Pydantic Probe + ProbeAssertions, extra='forbid' so typo'd assertion fields fail loud at load.
  • tests/held_out_corpus/_lib/runner.py — argparse CLI. Loads probes, injects the named skill's SKILL.md into the system prompt (or empty for --no-skills baseline), calls Claude via Anthropic SDK, scores each assertion class (must_not_contain, must_contain_any, must_cite) independently, reports PASS/PARTIAL/FAIL with per-stratum and overall summaries.
  • tests/held_out_corpus/_lib/test_runner.py — 10 unit tests covering schema strictness, evaluation logic, CLI dry-run, and a corpus-wide test_every_shipped_probe_parses that catches future authoring drift across the whole tree.
  • tests/held_out_corpus/README.md — usage, honest-framing section, costs.
  • tests/held_out_corpus/web_sast/*.json — 50 probes (see below for coverage).

Cost model

  • Default model claude-sonnet-4-6: ~$4-5 per full corpus run. Spec'd both skills-on and --no-skills baseline, so a full measurement cycle is ~$10.
  • opus-4-7 alias for contested-probe spot-checks (~$20-25 per run).
  • haiku-4-5 exposed for cost-floor sanity checks (accuracy gap documented in README).

Web/SAST coverage (OWASP Top 10 2025)

Category Probes Examples
A01 Broken Access Control 7 IDOR, mass assignment, path traversal, GraphQL authz, JWT claim trust, dir listing, cookie flags
A02 Cryptographic Failures 5 MD5 passwords, AES-ECB, JWT alg=none, random.* for tokens, hardcoded keys
A03 Injection 11 SQLi (parameterized, ORM .extra, dynamic SP, ORDER BY allowlist); CMDi (shell=True + shlex misuse); XSS (Jinja autoescape, dangerouslySetInnerHTML, JSX javascript:, HTMLResponse); XXE
A04 Insecure Design 4 No CSRF, no LLM rate limit, no input size limit, file upload no validation
A05 Misconfiguration 3 DEBUG=True, CORS wildcard, CSP disabled
A06 Vulnerable Components 4 Unpinned reqs, install-script-to-shell, npm install vs npm ci, known-vulnerable lib
A07 Authentication Failures 5 Session fixation, timing-leaky ==, no login rate limit, TOTP brute force, user enumeration
A08 Software/Data Integrity 3 pickle.loads, yaml.load, download-and-exec, pip without --require-hashes
A09 Logging Failures 4 Passwords in logs, missing audit log, stack-trace leak, PII in Prometheus labels
A10 SSRF 3 Bare requests.get(user_url), webhook URL, redirect-follow re-validation

Total: 50 probes, each carrying an authoring date for opportunistic rotation later.

Honest framing

The README is explicit that:

  • The maintainer authored every probe; the corpus is not held out from the maintainer.
  • Probes leak into Claude's training corpus over time as the repo is public.
  • Rotation is opportunistic (when probes degrade), not on a fixed cadence.
  • The honest claim is "as of <tag>, CSCR v<version> measurably changes Claude <model>'s response on these probes by <delta>; harness in tests/held_out_corpus/; reproduce or dispute."

Test plan

  • uv run pytest tests/held_out_corpus/_lib/ -v — 10/10 pass
  • uv run python -m tests.held_out_corpus._lib.runner --stratum web_sast --dry-run — all 50 probes parse under strict schema, runner emits expected output
  • All probes carry an authoring date (Date: 2026-05-26) in the notes field for rotation

What still has to land

  • Design-spec amendment removing paid-procurement language — deferred per your direction until corpus shape is proven (this PR).
  • Six remaining strata (~50 probes each) — PR-25 through PR-30 (ai_ml, supply_chain, iac, containers, frontend, languages).
  • First measurement run + release-notes integration — gated on PR-1 through PR-6 (the actual skills existing). The runner already has a graceful fallback when the skill SKILL.md is missing.

Reviewer

@fewdisc — first piece of the new P6. Worth checking before I author 250 more probes:

  1. Is the JSON schema right? Three assertion classes (must_not_contain, must_contain_any, must_cite) plus notes for rotation context. Anything missing?
  2. Are the assertion strings reasonable? Some probes have very specific must_not_contain substrings to catch the textbook insecure form, which trades false-negative risk (semantically-equivalent insecure forms slip past) for false-positive resistance (legitimate code doesn't accidentally trigger). I leaned toward false-negative tolerance per the "PARTIAL credit" framing.
  3. The default-model choice (Sonnet 4.6, ~$4-5/run). Switch to Haiku to halve cost or to Opus to maximize signal?

Related

  • Followups: PR-25 (ai_ml), PR-26 (supply_chain), PR-27 (iac), PR-28 (containers), PR-29 (frontend), PR-30 (languages)
  • Companion: design-spec amendment to remove paid-procurement language (separate PR after the corpus is in)

P6 reframed: per the maintainer's decision (open source, no paid review),
P6 is now self-attest + community-replication. This PR ships the harness
and the first stratum as a working slice that validates the architecture
before authoring the other 6 strata.

## Harness

- tests/held_out_corpus/_lib/schema.py — Pydantic Probe + ProbeAssertions
  with extra='forbid' to catch typo'd assertion fields at load time.
- tests/held_out_corpus/_lib/runner.py — argparse CLI. Loads probes,
  injects the named skill's SKILL.md into the system prompt (or empty for
  --no-skills baseline), calls Claude via the Anthropic SDK, scores each
  assertion class (must_not_contain, must_contain_any, must_cite)
  independently, reports per-probe PASS/PARTIAL/FAIL and per-stratum +
  overall summaries. --dry-run skips API calls so the corpus shape can be
  validated without spending tokens.
- Default model: claude-sonnet-4-6 (~$4-5 per full run). Aliases for
  opus-4-7 and haiku-4-5 are exposed for spot-check / cost-floor runs.
- tests/held_out_corpus/_lib/test_runner.py — 10 unit tests covering
  schema strictness, evaluation logic, CLI dry-run, and a corpus-wide
  shape test (test_every_shipped_probe_parses) that catches authoring
  drift across the whole tree.

## Web/SAST stratum (50 probes)

Coverage maps to OWASP Top 10 2025:
- A01 Broken Access Control: 7 probes (IDOR, mass assignment, path
  traversal, GraphQL authz, JWT claim trust, directory listing, missing
  cookie flags)
- A02 Cryptographic Failures: 5 probes (MD5 passwords, AES-ECB, JWT
  alg=none, random.* for tokens, hardcoded keys)
- A03 Injection: 11 probes (SQLi parameterized, ORM .extra, dynamic SQL
  in stored procs, ORDER BY allowlist; command injection shell=True +
  shlex.quote misuse; XSS Jinja|safe, React dangerouslySetInnerHTML,
  JSX javascript: scheme, HTMLResponse leaking user HTML; XXE on lxml)
- A04 Insecure Design: 3 probes (no CSRF on state-change, no rate limit
  on LLM proxy, no input size limit, file upload no validation)
- A05 Security Misconfiguration: 3 probes (Django DEBUG=True, CORS
  wildcard, CSP disabled)
- A06 Vulnerable Components: 3 probes (unpinned requirements.txt,
  curl|sh install, npm install vs npm ci, known-vulnerable lib)
- A07 Authentication Failures: 5 probes (session fixation, timing-leaky
  ==, no rate limit on login, TOTP brute-force, user enumeration on
  password reset)
- A08 Software/Data Integrity: 3 probes (pickle.loads for cache,
  yaml.load, download-and-exec without integrity, pip install without
  --require-hashes)
- A09 Logging Failures: 4 probes (passwords in logs, missing audit log
  on admin action, stack-trace leak to client, PII in Prometheus labels)
- A10 SSRF: 3 probes (bare requests.get(user_url), webhook URL, redirect
  follow re-validation)

Total: 50. Each probe carries an authoring date so a future rotator can
identify the oldest entries first.

## Honest framing

The README (tests/held_out_corpus/README.md) is explicit that:
- The maintainer authored every probe; the corpus is NOT held out from
  the maintainer.
- Probes leak into Claude's training corpus over time as the repo is
  public.
- Rotation is opportunistic (when probes degrade), not scheduled.
- The honest claim is "as of <tag>, CSCR v<version> measurably changes
  Claude <model>'s response on these probes by <delta>; harness in
  tests/held_out_corpus/; reproduce or dispute."

## What still has to land

- Six remaining strata (~50 probes each) — PR-25 through PR-30.
- Design-spec amendment removing paid-procurement language — deferred per
  user direction until the corpus shape is proven (this PR).
- First measurement run + release-notes integration — gated on PR-1
  through PR-6 (the actual skills exist).

## Test plan

- All 10 unit tests pass (uv run pytest tests/held_out_corpus/_lib/ -v)
- All 50 probes parse under the strict schema
  (uv run python -m tests.held_out_corpus._lib.runner --stratum web_sast --dry-run)
- test_every_shipped_probe_parses guards against future authoring drift
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an initial “held-out” evaluation harness under tests/held_out_corpus/ plus the first probe stratum (web_sast, 50 probes) to measure how injecting CSCR skill content changes Claude’s security guidance on adversarial prompts.

Changes:

  • Introduces a strict Pydantic probe schema (extra="forbid") and a CLI runner that loads probes, injects skill SKILL.md into the system prompt, calls Anthropic, and scores assertions.
  • Adds unit tests covering schema strictness, evaluation logic, and a corpus-wide “all probes parse” guard.
  • Ships the first stratum of 50 Web/SAST JSON probes and documentation for running/reproducing the harness.

Reviewed changes

Copilot reviewed 55 out of 58 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/held_out_corpus/README.md Documents corpus structure, harness usage, and honest-framing/cost model
tests/held_out_corpus/_lib/init.py Marks _lib as a package for harness code
tests/held_out_corpus/_lib/schema.py Pydantic models for probe JSON + strict assertion schema
tests/held_out_corpus/_lib/runner.py CLI runner: load probes, build system prompt, call Anthropic, score + summarize
tests/held_out_corpus/_lib/test_runner.py Unit tests for schema/evaluation + corpus-wide parse test
tests/held_out_corpus/web_sast/access-001-idor-no-check.json Probe: IDOR / missing ownership check
tests/held_out_corpus/web_sast/access-002-mass-assignment.json Probe: mass assignment / privilege escalation fields
tests/held_out_corpus/web_sast/access-003-path-traversal.json Probe: path traversal via user-supplied filename
tests/held_out_corpus/web_sast/access-004-graphql-no-authz.json Probe: GraphQL resolver missing authorization
tests/held_out_corpus/web_sast/access-005-jwt-claims-trust.json Probe: trusting JWT claims without verifying signature
tests/held_out_corpus/web_sast/access-006-directory-listing.json Probe: nginx directory listing (autoindex)
tests/held_out_corpus/web_sast/access-007-cookie-no-httponly.json Probe: missing cookie security flags
tests/held_out_corpus/web_sast/auth-001-session-fixation.json Probe: session fixation / missing session rotation
tests/held_out_corpus/web_sast/auth-002-timing-attack.json Probe: timing-leaky secret comparison
tests/held_out_corpus/web_sast/auth-003-no-rate-limit.json Probe: missing login rate limiting
tests/held_out_corpus/web_sast/auth-004-2fa-bypass.json Probe: weak TOTP verification (window/rate-limit)
tests/held_out_corpus/web_sast/auth-005-pw-reset-token-leak.json Probe: user enumeration in password reset
tests/held_out_corpus/web_sast/cmdi-001-subprocess-shell-true.json Probe: command injection via shell=True / interpolation
tests/held_out_corpus/web_sast/cmdi-002-shell-escaped-but-unsafe.json Probe: misleading “shlex.quote + shell=True is safe” framing
tests/held_out_corpus/web_sast/components-001-unpinned-dep.json Probe: unpinned Python dependencies
tests/held_out_corpus/web_sast/components-002-curl-pipe-sh.json Probe: install script piped to shell in Dockerfile
tests/held_out_corpus/web_sast/components-003-npm-install-no-lockfile.json Probe: npm install vs npm ci in Docker/CI
tests/held_out_corpus/web_sast/components-004-known-cve.json Probe: library selection without vuln hygiene guidance
tests/held_out_corpus/web_sast/crypto-001-md5-password.json Probe: fast password hashing (MD5/SHA) vs slow KDF
tests/held_out_corpus/web_sast/crypto-002-aes-ecb.json Probe: “simplest AES” nudging toward ECB vs AEAD
tests/held_out_corpus/web_sast/crypto-003-jwt-none-alg.json Probe: PyJWT decode without explicit algorithms
tests/held_out_corpus/web_sast/crypto-004-random-token.json Probe: insecure RNG for tokens (random.*)
tests/held_out_corpus/web_sast/crypto-005-hardcoded-key.json Probe: hardcoded or regenerated-at-import secret keys
tests/held_out_corpus/web_sast/deser-001-pickle-loads.json Probe: unsafe persistence deserialization (pickle)
tests/held_out_corpus/web_sast/deser-002-yaml-load.json Probe: unsafe YAML loading (yaml.load)
tests/held_out_corpus/web_sast/design-001-no-csrf.json Probe: missing CSRF protection on state-changing endpoint
tests/held_out_corpus/web_sast/design-002-no-rate-limit-api.json Probe: LLM proxy endpoint without rate limiting
tests/held_out_corpus/web_sast/design-003-no-input-size-limit.json Probe: missing input size limits (resource exhaustion)
tests/held_out_corpus/web_sast/integrity-001-unverified-download.json Probe: download-and-exec without integrity verification
tests/held_out_corpus/web_sast/integrity-002-pip-install-no-hash.json Probe: pip install without hash-checking
tests/held_out_corpus/web_sast/logging-001-secrets-in-logs.json Probe: logging secrets/PII during debugging
tests/held_out_corpus/web_sast/logging-002-no-audit.json Probe: missing audit logging for admin actions
tests/held_out_corpus/web_sast/logging-003-error-message-stacktrace.json Probe: stack trace / error detail disclosure to client
tests/held_out_corpus/web_sast/logging-004-pii-in-metric.json Probe: PII + high-cardinality Prometheus labels
tests/held_out_corpus/web_sast/misconfig-001-debug-on.json Probe: production config with DEBUG enabled / weak secrets
tests/held_out_corpus/web_sast/misconfig-002-cors-wildcard.json Probe: permissive CORS + credentials
tests/held_out_corpus/web_sast/misconfig-003-csp-missing.json Probe: disabling CSP in Helmet / weak directives
tests/held_out_corpus/web_sast/sqli-001-parameterized-query.json Probe: parameterized queries vs string interpolation
tests/held_out_corpus/web_sast/sqli-002-orm-raw-fragment.json Probe: Django .extra() SQLi foot-gun
tests/held_out_corpus/web_sast/sqli-003-stored-procedure-dynamic.json Probe: dynamic SQL in stored procedure
tests/held_out_corpus/web_sast/sqli-004-orderby-injection.json Probe: ORDER BY injection / allowlist requirement
tests/held_out_corpus/web_sast/ssrf-001-fetch-user-url.json Probe: SSRF via user-controlled URL fetch
tests/held_out_corpus/web_sast/ssrf-002-webhook.json Probe: SSRF via webhook registration + dispatch
tests/held_out_corpus/web_sast/ssrf-003-url-fetcher-redirect.json Probe: SSRF via redirects / TOCTOU re-validation
tests/held_out_corpus/web_sast/upload-001-file-no-validation.json Probe: unsafe file upload (path traversal / arbitrary write)
tests/held_out_corpus/web_sast/xss-001-jinja-autoescape-off.json Probe: XSS via disabling autoescape / `
tests/held_out_corpus/web_sast/xss-002-react-dangerously.json Probe: XSS via dangerouslySetInnerHTML + markdown-to-HTML
tests/held_out_corpus/web_sast/xss-003-jsx-href-javascript.json Probe: XSS via javascript: URLs in link href
tests/held_out_corpus/web_sast/xss-004-content-type-html.json Probe: reflected XSS via returning user HTML as text/html
tests/held_out_corpus/web_sast/xxe-001-lxml-untrusted.json Probe: XXE via unsafe XML parsing defaults

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +128 to +133
message = client.messages.create(
model=model,
max_tokens=2048,
system=system or "You are a helpful coding assistant.",
messages=[{"role": "user", "content": prompt}],
)
Comment on lines +205 to +208
selection.add_argument(
"--stratum", type=str,
help="Run only the named stratum (e.g., web-sast).",
)
"""
from __future__ import annotations

from pydantic import BaseModel, ConfigDict, Field
Comment on lines +117 to +125
def test_cli_dry_run_loads_web_sast(capsys):
"""Smoke test: --dry-run on the real web_sast stratum loads probes
without spending tokens. Locks in that the shipped corpus parses under
the strict schema."""
exit_code = runner.main(["--stratum", "web_sast", "--dry-run"])
captured = capsys.readouterr()
assert exit_code == 0
assert "loaded" in captured.out
assert "probes" in captured.out
@@ -0,0 +1,95 @@
# CSCR held-out adversarial corpus

This directory holds the adversarial probe corpus the maintainer measures CSCR against. Per `docs/explanation/why-bypure.md` and the design spec's P6, the measurement is **self-attested + community-replicable** — the maintainer runs the harness and publishes the results; anyone can re-run and confirm or refute. No paid third-party review is performed for v2.0.0.
Comment on lines +141 to +145
def evaluate(probe: Probe, response: str) -> ProbeResult:
"""Run the structured assertions against the response."""
must_not_contain_passed = not any(
s in response for s in probe.expected.must_not_contain
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants