Attack-surface catalog: schema, tooling, and 70-case bootstrap by jorgeraad · Pull Request #17 · pensarai/argus-validation-benchmarks

jorgeraad · 2026-05-08T12:54:37Z

Summary

Implements the attack-surface catalog design doc — a per-benchmark attack-surface.json covering every entry point in argus's 70 benchmarks (60 APEX + 10 TM-APP), backed by a schema, validator, and @pensar/surface-driven bootstrapper.

The catalog answers a different question than the existing GT files: "what is every door into this app?" alongside "which vulnerabilities exist?" (expected_results/*vulns.json) and "what threat model should an analyst produce?" (ground-truth.json).

Phase 1 — Schema & tooling

attack-surface-schema.json — JSON Schema (Draft 2020-12) with a kind discriminator (http, web, asset, grpc, graphql, event_handler) defaulting to "http". Strictly additive: existing apex/argus emissions stay valid as kind: "http" records.
scripts/validate-attack-surface.ts — CI validator. Implements all seven design checks: schema validation, canary, file resolution, per-kind source-grep, duplicate identity, service refs, and the D3 empty-catalog gate (empty entry_points requires top-level notes ≥ 20 chars).
scripts/bootstrap-attack-surface.ts — surface map driver with the auto-validate-and-iterate loop from the spec. Maps EndpointInfo[] → EntryPoint[], drops unresolvable entries, merges duplicates, defaults single-service refs. Prints a one-line per-case diagnostic to stdout (frameworks, raw vs. kept counts, auto-fix tallies) as the curator handoff.
.github/workflows/validate-attack-surface.yml — runs the validator on every PR touching catalogs, the schema, the scripts, or the lockfile.
package.json + bun.lock + tsconfig.json — pin @pensar/surface@0.2.1 (D4: exact-pin, no ^) plus ajv for schema validation. Bun is the script runner.
README.md — documents the new ground-truth layer with a comparison table of the three GT artifacts (catalog vs. vulnerability vs. threat-model) and the curator workflow.

Phase 2 — 70-case catalog bootstrap

Every benchmark now has an attack-surface.json. Generated by running bun run bootstrap:attack-surface --all against the repo:

62 cases populated with HTTP / web entries that surface successfully extracted (Express, Next.js, Django, Apollo-on-Express, Spring, Gin, …).
8 cases ship empty entry_points: [] per the D3 rule (with explanatory notes): APEX-011, APEX-016, APEX-026, APEX-037 (custom PHP / proxy / no-framework cases), and TM-APP-003, TM-APP-004, TM-APP-005, TM-APP-007 (gRPC / Lambda / non-HTTP infrastructure).

These are draft ground truth — schema-valid and source-traceable, not hand-verified. Curators re-run the bootstrap locally to see the per-case diagnostic line and identify gaps to fill (gRPC RPCs, GraphQL operations, Lambda triggers, cloud assets, custom-PHP routes).

CI status: bun run validate:attack-surface reports 70 valid · 0 missing catalog · 0 errors against the bootstrapped corpus.

Out of scope

This PR is argus-side only. The evalgate-side work (matcher kind dispatch, judge integration, suite expansion, submodule bump — phase 3 / T6–T9 in the design doc) lands in evalgate after this merges.

Verification

bun install
bun run validate:attack-surface           # 70 valid · 0 errors
bun run bootstrap:attack-surface --dry-run benchmarks/APEX-019-25
bun run bootstrap:attack-surface benchmarks/APEX-019-25 --force
bunx tsc --noEmit                          # types clean

Manually verified that the validator catches the four common failure modes:

wrong/missing canary
duplicate (kind, method-set, path) tuples
unresolvable file references
path not appearing in the cited file
empty entry_points with no rationale

Notes for reviewers

Per D5 (single PR), the schema, scripts, CI, README, and all 70 catalogs land together. Hand-curation of individual cases is intentionally a follow-up; the bootstrap's stdout diagnostic line is the curator's TODO list for that case.
The bootstrap drops surface's noise handlers (req, res, <anonymous>) so handler only carries genuine names. Surface-emitted source paths are rewritten with the src/ prefix to match the catalog's "relative to benchmark root" convention.

Introduce the per-benchmark attack-surface catalog tooling described in the attack-surface-catalog design doc: - attack-surface-schema.json: JSON Schema (Draft 2020-12) for the catalog, with a 'kind' discriminator (http, web, asset, grpc, graphql, event_handler) defaulting to 'http' so existing apex/argus emissions stay valid as kind: 'http' records. - scripts/validate-attack-surface.ts: CI validator. Runs schema validation, canary check, file reference resolution, per-kind source-grep verification, duplicate identity check, service reference check, and the D3 empty-catalog gate (empty entry_points requires top-level notes). - scripts/bootstrap-attack-surface.ts: surface-driven catalog bootstrapper. Wraps @pensar/surface, maps EndpointInfo[] -> EntryPoint[], runs auto-fix-and-iterate loop (drop file/grep failures, merge duplicates, default single-service refs), and writes a sibling attack-surface.curator-notes.md with bootstrap diagnostics + curator TODOs. - .github/workflows/validate-attack-surface.yml: runs the validator on every PR touching catalogs, the schema, or the scripts. - package.json + bun.lock + tsconfig.json: pinned @pensar/surface@0.2.1 (D4) plus ajv for schema validation. Bun is the script runner. - README.md: documents the new ground-truth layer alongside vulnerability and threat-model GT, with curator workflow examples. Co-authored-by: Jorge Alejandro Raad <jorge@pensarai.com>

Generated by scripts/bootstrap-attack-surface.ts running @pensar/surface against each benchmark's src/ tree, then auto-fixing the validation failures (drop unresolved file/grep entries, merge duplicates, default single-service refs). Each benchmark gets two new files: - attack-surface.json: the catalog itself. Schema-valid, canary present, every entry's file reference resolves under the benchmark root, every entry's path/identifier appears in its cited file. Cases where surface could not derive a surface (custom PHP, AWS SAM Lambda triggers, subdomain-takeover infrastructure, etc.) ship with empty entry_points and a top-level notes field per the D3 empty-catalog rule. - attack-surface.curator-notes.md: bootstrap diagnostics + curator TODO list. Lists which entries surface produced, which were auto-dropped and why, and which kinds (gRPC, GraphQL, event_handlers, cloud assets) need manual addition. These catalogs are draft ground truth: schema-valid and source-traceable, but not yet hand-curated. Future curation passes (per the design doc's phase 2) replace the bootstrap output with verified, complete entries. Coverage summary: - 70 benchmarks total (60 APEX, 10 TM-APP) - 8 cases with empty entry_points (D3 — no source-derivable surface): APEX-011, APEX-016, APEX-026, APEX-037, TM-APP-003, TM-APP-004, TM-APP-005, TM-APP-007 - All other cases populated with HTTP / web kind entries from surface - Add !attack-surface.curator-notes.md exception in benchmarks/APEX-003-25/.gitignore so its notes file is tracked. Co-authored-by: Jorge Alejandro Raad <jorge@pensarai.com>

Remove the 70 curator-notes markdown files generated alongside each attack-surface.json. The bootstrap script no longer writes them; instead it prints a one-line per-case diagnostic to stdout (frameworks, raw vs. kept counts, auto-fix tallies, surface error if any) when a curator re-runs it locally. - Drop curatorNotes() from scripts/bootstrap-attack-surface.ts and replace with diagnosticSummary() printed to the existing WROTE/DRY/FAIL output line. - Delete all 70 benchmarks/**/attack-surface.curator-notes.md files. - Revert benchmarks/APEX-003-25/.gitignore back to '*.md' (the !attack-surface.curator-notes.md exception is no longer needed). - README: remove curator-notes from the directory layout and reword the curator-handoff sentence to point at the bootstrap's stdout output. Co-authored-by: Jorge Alejandro Raad <jorge@pensarai.com>

cursoragent and others added 3 commits May 8, 2026 12:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attack-surface catalog: schema, tooling, and 70-case bootstrap#17

Attack-surface catalog: schema, tooling, and 70-case bootstrap#17
jorgeraad wants to merge 3 commits into
mainfrom
cursor/attack-surface-catalog-c4f5

jorgeraad commented May 8, 2026 •

edited by cursor Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jorgeraad commented May 8, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Phase 1 — Schema & tooling

Phase 2 — 70-case catalog bootstrap

Out of scope

Verification

Notes for reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jorgeraad commented May 8, 2026 •

edited by cursor Bot

Loading