feat(p6): held-out corpus harness + Web/SAST stratum (50 probes) by rocklambros · Pull Request #24 · TikiTribe/claude-secure-coding-rules

rocklambros · 2026-05-26T21:08:32Z

Summary

P6 reframed per maintainer direction: open-source project, no paid review, B-pure self-attest + community-replication. This PR ships the harness and the first stratum (Web/SAST, 50 probes) as a working slice that validates the architecture before authoring the other 6 strata.

What's here

tests/held_out_corpus/_lib/schema.py — Pydantic Probe + ProbeAssertions, extra='forbid' so typo'd assertion fields fail loud at load.
tests/held_out_corpus/_lib/runner.py — argparse CLI. Loads probes, injects the named skill's SKILL.md into the system prompt (or empty for --no-skills baseline), calls Claude via Anthropic SDK, scores each assertion class (must_not_contain, must_contain_any, must_cite) independently, reports PASS/PARTIAL/FAIL with per-stratum and overall summaries.
tests/held_out_corpus/_lib/test_runner.py — 10 unit tests covering schema strictness, evaluation logic, CLI dry-run, and a corpus-wide test_every_shipped_probe_parses that catches future authoring drift across the whole tree.
tests/held_out_corpus/README.md — usage, honest-framing section, costs.
tests/held_out_corpus/web_sast/*.json — 50 probes (see below for coverage).

Cost model

Default model claude-sonnet-4-6: ~$4-5 per full corpus run. Spec'd both skills-on and --no-skills baseline, so a full measurement cycle is ~$10.
opus-4-7 alias for contested-probe spot-checks (~$20-25 per run).
haiku-4-5 exposed for cost-floor sanity checks (accuracy gap documented in README).

Web/SAST coverage (OWASP Top 10 2025)

Category	Probes	Examples
A01 Broken Access Control	7	IDOR, mass assignment, path traversal, GraphQL authz, JWT claim trust, dir listing, cookie flags
A02 Cryptographic Failures	5	MD5 passwords, AES-ECB, JWT alg=none, `random.*` for tokens, hardcoded keys
A03 Injection	11	SQLi (parameterized, ORM .extra, dynamic SP, ORDER BY allowlist); CMDi (shell=True + shlex misuse); XSS (Jinja autoescape, dangerouslySetInnerHTML, JSX `javascript:`, HTMLResponse); XXE
A04 Insecure Design	4	No CSRF, no LLM rate limit, no input size limit, file upload no validation
A05 Misconfiguration	3	DEBUG=True, CORS wildcard, CSP disabled
A06 Vulnerable Components	4	Unpinned reqs, install-script-to-shell, `npm install` vs `npm ci`, known-vulnerable lib
A07 Authentication Failures	5	Session fixation, timing-leaky `==`, no login rate limit, TOTP brute force, user enumeration
A08 Software/Data Integrity	3	`pickle.loads`, `yaml.load`, download-and-exec, pip without --require-hashes
A09 Logging Failures	4	Passwords in logs, missing audit log, stack-trace leak, PII in Prometheus labels
A10 SSRF	3	Bare `requests.get(user_url)`, webhook URL, redirect-follow re-validation

Total: 50 probes, each carrying an authoring date for opportunistic rotation later.

Honest framing

The README is explicit that:

The maintainer authored every probe; the corpus is not held out from the maintainer.
Probes leak into Claude's training corpus over time as the repo is public.
Rotation is opportunistic (when probes degrade), not on a fixed cadence.
The honest claim is "as of <tag>, CSCR v<version> measurably changes Claude <model>'s response on these probes by <delta>; harness in tests/held_out_corpus/; reproduce or dispute."

Test plan

uv run pytest tests/held_out_corpus/_lib/ -v — 10/10 pass
uv run python -m tests.held_out_corpus._lib.runner --stratum web_sast --dry-run — all 50 probes parse under strict schema, runner emits expected output
All probes carry an authoring date (Date: 2026-05-26) in the notes field for rotation

What still has to land

Design-spec amendment removing paid-procurement language — deferred per your direction until corpus shape is proven (this PR).
Six remaining strata (~50 probes each) — PR-25 through PR-30 (ai_ml, supply_chain, iac, containers, frontend, languages).
First measurement run + release-notes integration — gated on PR-1 through PR-6 (the actual skills existing). The runner already has a graceful fallback when the skill SKILL.md is missing.

Reviewer

@fewdisc — first piece of the new P6. Worth checking before I author 250 more probes:

Is the JSON schema right? Three assertion classes (must_not_contain, must_contain_any, must_cite) plus notes for rotation context. Anything missing?
Are the assertion strings reasonable? Some probes have very specific must_not_contain substrings to catch the textbook insecure form, which trades false-negative risk (semantically-equivalent insecure forms slip past) for false-positive resistance (legitimate code doesn't accidentally trigger). I leaned toward false-negative tolerance per the "PARTIAL credit" framing.
The default-model choice (Sonnet 4.6, ~$4-5/run). Switch to Haiku to halve cost or to Opus to maximize signal?

P6 reframed: per the maintainer's decision (open source, no paid review), P6 is now self-attest + community-replication. This PR ships the harness and the first stratum as a working slice that validates the architecture before authoring the other 6 strata. ## Harness - tests/held_out_corpus/_lib/schema.py — Pydantic Probe + ProbeAssertions with extra='forbid' to catch typo'd assertion fields at load time. - tests/held_out_corpus/_lib/runner.py — argparse CLI. Loads probes, injects the named skill's SKILL.md into the system prompt (or empty for --no-skills baseline), calls Claude via the Anthropic SDK, scores each assertion class (must_not_contain, must_contain_any, must_cite) independently, reports per-probe PASS/PARTIAL/FAIL and per-stratum + overall summaries. --dry-run skips API calls so the corpus shape can be validated without spending tokens. - Default model: claude-sonnet-4-6 (~$4-5 per full run). Aliases for opus-4-7 and haiku-4-5 are exposed for spot-check / cost-floor runs. - tests/held_out_corpus/_lib/test_runner.py — 10 unit tests covering schema strictness, evaluation logic, CLI dry-run, and a corpus-wide shape test (test_every_shipped_probe_parses) that catches authoring drift across the whole tree. ## Web/SAST stratum (50 probes) Coverage maps to OWASP Top 10 2025: - A01 Broken Access Control: 7 probes (IDOR, mass assignment, path traversal, GraphQL authz, JWT claim trust, directory listing, missing cookie flags) - A02 Cryptographic Failures: 5 probes (MD5 passwords, AES-ECB, JWT alg=none, random.* for tokens, hardcoded keys) - A03 Injection: 11 probes (SQLi parameterized, ORM .extra, dynamic SQL in stored procs, ORDER BY allowlist; command injection shell=True + shlex.quote misuse; XSS Jinja|safe, React dangerouslySetInnerHTML, JSX javascript: scheme, HTMLResponse leaking user HTML; XXE on lxml) - A04 Insecure Design: 3 probes (no CSRF on state-change, no rate limit on LLM proxy, no input size limit, file upload no validation) - A05 Security Misconfiguration: 3 probes (Django DEBUG=True, CORS wildcard, CSP disabled) - A06 Vulnerable Components: 3 probes (unpinned requirements.txt, curl|sh install, npm install vs npm ci, known-vulnerable lib) - A07 Authentication Failures: 5 probes (session fixation, timing-leaky ==, no rate limit on login, TOTP brute-force, user enumeration on password reset) - A08 Software/Data Integrity: 3 probes (pickle.loads for cache, yaml.load, download-and-exec without integrity, pip install without --require-hashes) - A09 Logging Failures: 4 probes (passwords in logs, missing audit log on admin action, stack-trace leak to client, PII in Prometheus labels) - A10 SSRF: 3 probes (bare requests.get(user_url), webhook URL, redirect follow re-validation) Total: 50. Each probe carries an authoring date so a future rotator can identify the oldest entries first. ## Honest framing The README (tests/held_out_corpus/README.md) is explicit that: - The maintainer authored every probe; the corpus is NOT held out from the maintainer. - Probes leak into Claude's training corpus over time as the repo is public. - Rotation is opportunistic (when probes degrade), not scheduled. - The honest claim is "as of <tag>, CSCR v<version> measurably changes Claude <model>'s response on these probes by <delta>; harness in tests/held_out_corpus/; reproduce or dispute." ## What still has to land - Six remaining strata (~50 probes each) — PR-25 through PR-30. - Design-spec amendment removing paid-procurement language — deferred per user direction until the corpus shape is proven (this PR). - First measurement run + release-notes integration — gated on PR-1 through PR-6 (the actual skills exist). ## Test plan - All 10 unit tests pass (uv run pytest tests/held_out_corpus/_lib/ -v) - All 50 probes parse under the strict schema (uv run python -m tests.held_out_corpus._lib.runner --stratum web_sast --dry-run) - test_every_shipped_probe_parses guards against future authoring drift

Copilot

Pull request overview

Adds an initial “held-out” evaluation harness under tests/held_out_corpus/ plus the first probe stratum (web_sast, 50 probes) to measure how injecting CSCR skill content changes Claude’s security guidance on adversarial prompts.

Changes:

Introduces a strict Pydantic probe schema (extra="forbid") and a CLI runner that loads probes, injects skill SKILL.md into the system prompt, calls Anthropic, and scores assertions.
Adds unit tests covering schema strictness, evaluation logic, and a corpus-wide “all probes parse” guard.
Ships the first stratum of 50 Web/SAST JSON probes and documentation for running/reproducing the harness.

Reviewed changes

Copilot reviewed 55 out of 58 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/held_out_corpus/README.md	Documents corpus structure, harness usage, and honest-framing/cost model
tests/held_out_corpus/_lib/init.py	Marks `_lib` as a package for harness code
tests/held_out_corpus/_lib/schema.py	Pydantic models for probe JSON + strict assertion schema
tests/held_out_corpus/_lib/runner.py	CLI runner: load probes, build system prompt, call Anthropic, score + summarize
tests/held_out_corpus/_lib/test_runner.py	Unit tests for schema/evaluation + corpus-wide parse test
tests/held_out_corpus/web_sast/access-001-idor-no-check.json	Probe: IDOR / missing ownership check
tests/held_out_corpus/web_sast/access-002-mass-assignment.json	Probe: mass assignment / privilege escalation fields
tests/held_out_corpus/web_sast/access-003-path-traversal.json	Probe: path traversal via user-supplied filename
tests/held_out_corpus/web_sast/access-004-graphql-no-authz.json	Probe: GraphQL resolver missing authorization
tests/held_out_corpus/web_sast/access-005-jwt-claims-trust.json	Probe: trusting JWT claims without verifying signature
tests/held_out_corpus/web_sast/access-006-directory-listing.json	Probe: nginx directory listing (autoindex)
tests/held_out_corpus/web_sast/access-007-cookie-no-httponly.json	Probe: missing cookie security flags
tests/held_out_corpus/web_sast/auth-001-session-fixation.json	Probe: session fixation / missing session rotation
tests/held_out_corpus/web_sast/auth-002-timing-attack.json	Probe: timing-leaky secret comparison
tests/held_out_corpus/web_sast/auth-003-no-rate-limit.json	Probe: missing login rate limiting
tests/held_out_corpus/web_sast/auth-004-2fa-bypass.json	Probe: weak TOTP verification (window/rate-limit)
tests/held_out_corpus/web_sast/auth-005-pw-reset-token-leak.json	Probe: user enumeration in password reset
tests/held_out_corpus/web_sast/cmdi-001-subprocess-shell-true.json	Probe: command injection via `shell=True` / interpolation
tests/held_out_corpus/web_sast/cmdi-002-shell-escaped-but-unsafe.json	Probe: misleading “shlex.quote + shell=True is safe” framing
tests/held_out_corpus/web_sast/components-001-unpinned-dep.json	Probe: unpinned Python dependencies
tests/held_out_corpus/web_sast/components-002-curl-pipe-sh.json	Probe: install script piped to shell in Dockerfile
tests/held_out_corpus/web_sast/components-003-npm-install-no-lockfile.json	Probe: `npm install` vs `npm ci` in Docker/CI
tests/held_out_corpus/web_sast/components-004-known-cve.json	Probe: library selection without vuln hygiene guidance
tests/held_out_corpus/web_sast/crypto-001-md5-password.json	Probe: fast password hashing (MD5/SHA) vs slow KDF
tests/held_out_corpus/web_sast/crypto-002-aes-ecb.json	Probe: “simplest AES” nudging toward ECB vs AEAD
tests/held_out_corpus/web_sast/crypto-003-jwt-none-alg.json	Probe: PyJWT decode without explicit algorithms
tests/held_out_corpus/web_sast/crypto-004-random-token.json	Probe: insecure RNG for tokens (`random.*`)
tests/held_out_corpus/web_sast/crypto-005-hardcoded-key.json	Probe: hardcoded or regenerated-at-import secret keys
tests/held_out_corpus/web_sast/deser-001-pickle-loads.json	Probe: unsafe persistence deserialization (pickle)
tests/held_out_corpus/web_sast/deser-002-yaml-load.json	Probe: unsafe YAML loading (`yaml.load`)
tests/held_out_corpus/web_sast/design-001-no-csrf.json	Probe: missing CSRF protection on state-changing endpoint
tests/held_out_corpus/web_sast/design-002-no-rate-limit-api.json	Probe: LLM proxy endpoint without rate limiting
tests/held_out_corpus/web_sast/design-003-no-input-size-limit.json	Probe: missing input size limits (resource exhaustion)
tests/held_out_corpus/web_sast/integrity-001-unverified-download.json	Probe: download-and-exec without integrity verification
tests/held_out_corpus/web_sast/integrity-002-pip-install-no-hash.json	Probe: pip install without hash-checking
tests/held_out_corpus/web_sast/logging-001-secrets-in-logs.json	Probe: logging secrets/PII during debugging
tests/held_out_corpus/web_sast/logging-002-no-audit.json	Probe: missing audit logging for admin actions
tests/held_out_corpus/web_sast/logging-003-error-message-stacktrace.json	Probe: stack trace / error detail disclosure to client
tests/held_out_corpus/web_sast/logging-004-pii-in-metric.json	Probe: PII + high-cardinality Prometheus labels
tests/held_out_corpus/web_sast/misconfig-001-debug-on.json	Probe: production config with DEBUG enabled / weak secrets
tests/held_out_corpus/web_sast/misconfig-002-cors-wildcard.json	Probe: permissive CORS + credentials
tests/held_out_corpus/web_sast/misconfig-003-csp-missing.json	Probe: disabling CSP in Helmet / weak directives
tests/held_out_corpus/web_sast/sqli-001-parameterized-query.json	Probe: parameterized queries vs string interpolation
tests/held_out_corpus/web_sast/sqli-002-orm-raw-fragment.json	Probe: Django `.extra()` SQLi foot-gun
tests/held_out_corpus/web_sast/sqli-003-stored-procedure-dynamic.json	Probe: dynamic SQL in stored procedure
tests/held_out_corpus/web_sast/sqli-004-orderby-injection.json	Probe: ORDER BY injection / allowlist requirement
tests/held_out_corpus/web_sast/ssrf-001-fetch-user-url.json	Probe: SSRF via user-controlled URL fetch
tests/held_out_corpus/web_sast/ssrf-002-webhook.json	Probe: SSRF via webhook registration + dispatch
tests/held_out_corpus/web_sast/ssrf-003-url-fetcher-redirect.json	Probe: SSRF via redirects / TOCTOU re-validation
tests/held_out_corpus/web_sast/upload-001-file-no-validation.json	Probe: unsafe file upload (path traversal / arbitrary write)
tests/held_out_corpus/web_sast/xss-001-jinja-autoescape-off.json	Probe: XSS via disabling autoescape / `
tests/held_out_corpus/web_sast/xss-002-react-dangerously.json	Probe: XSS via `dangerouslySetInnerHTML` + markdown-to-HTML
tests/held_out_corpus/web_sast/xss-003-jsx-href-javascript.json	Probe: XSS via `javascript:` URLs in link href
tests/held_out_corpus/web_sast/xss-004-content-type-html.json	Probe: reflected XSS via returning user HTML as `text/html`
tests/held_out_corpus/web_sast/xxe-001-lxml-untrusted.json	Probe: XXE via unsafe XML parsing defaults

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    message = client.messages.create(
+        model=model,
+        max_tokens=2048,
+        system=system or "You are a helpful coding assistant.",
+        messages=[{"role": "user", "content": prompt}],
+    )


+    selection.add_argument(
+        "--stratum", type=str,
+        help="Run only the named stratum (e.g., web-sast).",
+    )


+"""
+from __future__ import annotations
+
+from pydantic import BaseModel, ConfigDict, Field


+def test_cli_dry_run_loads_web_sast(capsys):
+    """Smoke test: --dry-run on the real web_sast stratum loads probes
+    without spending tokens. Locks in that the shipped corpus parses under
+    the strict schema."""
+    exit_code = runner.main(["--stratum", "web_sast", "--dry-run"])
+    captured = capsys.readouterr()
+    assert exit_code == 0
+    assert "loaded" in captured.out
+    assert "probes" in captured.out


@@ -0,0 +1,95 @@
+# CSCR held-out adversarial corpus
+
+This directory holds the adversarial probe corpus the maintainer measures CSCR against. Per `docs/explanation/why-bypure.md` and the design spec's P6, the measurement is **self-attested + community-replicable** — the maintainer runs the harness and publishes the results; anyone can re-run and confirm or refute. No paid third-party review is performed for v2.0.0.


+def evaluate(probe: Probe, response: str) -> ProbeResult:
+    """Run the structured assertions against the response."""
+    must_not_contain_passed = not any(
+        s in response for s in probe.expected.must_not_contain
+    )


rocklambros requested review from Copilot and fewdisc May 26, 2026 21:08

Copilot started reviewing on behalf of rocklambros May 26, 2026 21:08 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(p6): held-out corpus harness + Web/SAST stratum (50 probes)#24

feat(p6): held-out corpus harness + Web/SAST stratum (50 probes)#24
rocklambros wants to merge 1 commit into
v2/modernizationfrom
feat/p6-harness-and-web-sast

rocklambros commented May 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,95 @@
		# CSCR held-out adversarial corpus

		This directory holds the adversarial probe corpus the maintainer measures CSCR against. Per `docs/explanation/why-bypure.md` and the design spec's P6, the measurement is self-attested + community-replicable — the maintainer runs the harness and publishes the results; anyone can re-run and confirm or refute. No paid third-party review is performed for v2.0.0.

Uh oh!

Conversation

rocklambros commented May 26, 2026

Summary

What's here

Cost model

Web/SAST coverage (OWASP Top 10 2025)

Honest framing

Test plan

What still has to land

Reviewer

Related

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants