Skip to content

Attack-surface catalog: schema, tooling, and 70-case bootstrap#17

Draft
jorgeraad wants to merge 3 commits into
mainfrom
cursor/attack-surface-catalog-c4f5
Draft

Attack-surface catalog: schema, tooling, and 70-case bootstrap#17
jorgeraad wants to merge 3 commits into
mainfrom
cursor/attack-surface-catalog-c4f5

Conversation

@jorgeraad
Copy link
Copy Markdown
Collaborator

@jorgeraad jorgeraad commented May 8, 2026

Summary

Implements the attack-surface catalog design doc — a per-benchmark attack-surface.json covering every entry point in argus's 70 benchmarks (60 APEX + 10 TM-APP), backed by a schema, validator, and @pensar/surface-driven bootstrapper.

The catalog answers a different question than the existing GT files: "what is every door into this app?" alongside "which vulnerabilities exist?" (expected_results/*vulns.json) and "what threat model should an analyst produce?" (ground-truth.json).

Phase 1 — Schema & tooling

  • attack-surface-schema.json — JSON Schema (Draft 2020-12) with a kind discriminator (http, web, asset, grpc, graphql, event_handler) defaulting to "http". Strictly additive: existing apex/argus emissions stay valid as kind: "http" records.
  • scripts/validate-attack-surface.ts — CI validator. Implements all seven design checks: schema validation, canary, file resolution, per-kind source-grep, duplicate identity, service refs, and the D3 empty-catalog gate (empty entry_points requires top-level notes ≥ 20 chars).
  • scripts/bootstrap-attack-surface.tssurface map driver with the auto-validate-and-iterate loop from the spec. Maps EndpointInfo[]EntryPoint[], drops unresolvable entries, merges duplicates, defaults single-service refs. Prints a one-line per-case diagnostic to stdout (frameworks, raw vs. kept counts, auto-fix tallies) as the curator handoff.
  • .github/workflows/validate-attack-surface.yml — runs the validator on every PR touching catalogs, the schema, the scripts, or the lockfile.
  • package.json + bun.lock + tsconfig.json — pin @pensar/surface@0.2.1 (D4: exact-pin, no ^) plus ajv for schema validation. Bun is the script runner.
  • README.md — documents the new ground-truth layer with a comparison table of the three GT artifacts (catalog vs. vulnerability vs. threat-model) and the curator workflow.

Phase 2 — 70-case catalog bootstrap

Every benchmark now has an attack-surface.json. Generated by running bun run bootstrap:attack-surface --all against the repo:

  • 62 cases populated with HTTP / web entries that surface successfully extracted (Express, Next.js, Django, Apollo-on-Express, Spring, Gin, …).
  • 8 cases ship empty entry_points: [] per the D3 rule (with explanatory notes): APEX-011, APEX-016, APEX-026, APEX-037 (custom PHP / proxy / no-framework cases), and TM-APP-003, TM-APP-004, TM-APP-005, TM-APP-007 (gRPC / Lambda / non-HTTP infrastructure).

These are draft ground truth — schema-valid and source-traceable, not hand-verified. Curators re-run the bootstrap locally to see the per-case diagnostic line and identify gaps to fill (gRPC RPCs, GraphQL operations, Lambda triggers, cloud assets, custom-PHP routes).

CI status: bun run validate:attack-surface reports 70 valid · 0 missing catalog · 0 errors against the bootstrapped corpus.

Out of scope

This PR is argus-side only. The evalgate-side work (matcher kind dispatch, judge integration, suite expansion, submodule bump — phase 3 / T6–T9 in the design doc) lands in evalgate after this merges.

Verification

bun install
bun run validate:attack-surface           # 70 valid · 0 errors
bun run bootstrap:attack-surface --dry-run benchmarks/APEX-019-25
bun run bootstrap:attack-surface benchmarks/APEX-019-25 --force
bunx tsc --noEmit                          # types clean

Manually verified that the validator catches the four common failure modes:

  • wrong/missing canary
  • duplicate (kind, method-set, path) tuples
  • unresolvable file references
  • path not appearing in the cited file
  • empty entry_points with no rationale

Notes for reviewers

  • Per D5 (single PR), the schema, scripts, CI, README, and all 70 catalogs land together. Hand-curation of individual cases is intentionally a follow-up; the bootstrap's stdout diagnostic line is the curator's TODO list for that case.
  • The bootstrap drops surface's noise handlers (req, res, <anonymous>) so handler only carries genuine names. Surface-emitted source paths are rewritten with the src/ prefix to match the catalog's "relative to benchmark root" convention.
Open in Web Open in Cursor 

cursoragent and others added 3 commits May 8, 2026 12:53
Introduce the per-benchmark attack-surface catalog tooling described in the
attack-surface-catalog design doc:

- attack-surface-schema.json: JSON Schema (Draft 2020-12) for the catalog,
  with a 'kind' discriminator (http, web, asset, grpc, graphql,
  event_handler) defaulting to 'http' so existing apex/argus emissions stay
  valid as kind: 'http' records.

- scripts/validate-attack-surface.ts: CI validator. Runs schema validation,
  canary check, file reference resolution, per-kind source-grep
  verification, duplicate identity check, service reference check, and the
  D3 empty-catalog gate (empty entry_points requires top-level notes).

- scripts/bootstrap-attack-surface.ts: surface-driven catalog bootstrapper.
  Wraps @pensar/surface, maps EndpointInfo[] -> EntryPoint[], runs
  auto-fix-and-iterate loop (drop file/grep failures, merge duplicates,
  default single-service refs), and writes a sibling
  attack-surface.curator-notes.md with bootstrap diagnostics + curator
  TODOs.

- .github/workflows/validate-attack-surface.yml: runs the validator on
  every PR touching catalogs, the schema, or the scripts.

- package.json + bun.lock + tsconfig.json: pinned @pensar/surface@0.2.1
  (D4) plus ajv for schema validation. Bun is the script runner.

- README.md: documents the new ground-truth layer alongside vulnerability
  and threat-model GT, with curator workflow examples.

Co-authored-by: Jorge Alejandro Raad <jorge@pensarai.com>
Generated by scripts/bootstrap-attack-surface.ts running @pensar/surface
against each benchmark's src/ tree, then auto-fixing the validation
failures (drop unresolved file/grep entries, merge duplicates, default
single-service refs).

Each benchmark gets two new files:

- attack-surface.json: the catalog itself. Schema-valid, canary present,
  every entry's file reference resolves under the benchmark root, every
  entry's path/identifier appears in its cited file. Cases where surface
  could not derive a surface (custom PHP, AWS SAM Lambda triggers,
  subdomain-takeover infrastructure, etc.) ship with empty entry_points
  and a top-level notes field per the D3 empty-catalog rule.

- attack-surface.curator-notes.md: bootstrap diagnostics + curator TODO
  list. Lists which entries surface produced, which were auto-dropped and
  why, and which kinds (gRPC, GraphQL, event_handlers, cloud assets) need
  manual addition.

These catalogs are draft ground truth: schema-valid and source-traceable,
but not yet hand-curated. Future curation passes (per the design doc's
phase 2) replace the bootstrap output with verified, complete entries.

Coverage summary:
- 70 benchmarks total (60 APEX, 10 TM-APP)
- 8 cases with empty entry_points (D3 — no source-derivable surface):
  APEX-011, APEX-016, APEX-026, APEX-037, TM-APP-003, TM-APP-004,
  TM-APP-005, TM-APP-007
- All other cases populated with HTTP / web kind entries from surface
- Add !attack-surface.curator-notes.md exception in
  benchmarks/APEX-003-25/.gitignore so its notes file is tracked.

Co-authored-by: Jorge Alejandro Raad <jorge@pensarai.com>
Remove the 70 curator-notes markdown files generated alongside each
attack-surface.json. The bootstrap script no longer writes them; instead
it prints a one-line per-case diagnostic to stdout (frameworks, raw vs.
kept counts, auto-fix tallies, surface error if any) when a curator
re-runs it locally.

- Drop curatorNotes() from scripts/bootstrap-attack-surface.ts and
  replace with diagnosticSummary() printed to the existing WROTE/DRY/FAIL
  output line.
- Delete all 70 benchmarks/**/attack-surface.curator-notes.md files.
- Revert benchmarks/APEX-003-25/.gitignore back to '*.md' (the
  !attack-surface.curator-notes.md exception is no longer needed).
- README: remove curator-notes from the directory layout and reword the
  curator-handoff sentence to point at the bootstrap's stdout output.

Co-authored-by: Jorge Alejandro Raad <jorge@pensarai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants