Attack-surface catalog: schema, tooling, and 70-case bootstrap#17
Draft
jorgeraad wants to merge 3 commits into
Draft
Attack-surface catalog: schema, tooling, and 70-case bootstrap#17jorgeraad wants to merge 3 commits into
jorgeraad wants to merge 3 commits into
Conversation
Introduce the per-benchmark attack-surface catalog tooling described in the attack-surface-catalog design doc: - attack-surface-schema.json: JSON Schema (Draft 2020-12) for the catalog, with a 'kind' discriminator (http, web, asset, grpc, graphql, event_handler) defaulting to 'http' so existing apex/argus emissions stay valid as kind: 'http' records. - scripts/validate-attack-surface.ts: CI validator. Runs schema validation, canary check, file reference resolution, per-kind source-grep verification, duplicate identity check, service reference check, and the D3 empty-catalog gate (empty entry_points requires top-level notes). - scripts/bootstrap-attack-surface.ts: surface-driven catalog bootstrapper. Wraps @pensar/surface, maps EndpointInfo[] -> EntryPoint[], runs auto-fix-and-iterate loop (drop file/grep failures, merge duplicates, default single-service refs), and writes a sibling attack-surface.curator-notes.md with bootstrap diagnostics + curator TODOs. - .github/workflows/validate-attack-surface.yml: runs the validator on every PR touching catalogs, the schema, or the scripts. - package.json + bun.lock + tsconfig.json: pinned @pensar/surface@0.2.1 (D4) plus ajv for schema validation. Bun is the script runner. - README.md: documents the new ground-truth layer alongside vulnerability and threat-model GT, with curator workflow examples. Co-authored-by: Jorge Alejandro Raad <jorge@pensarai.com>
Generated by scripts/bootstrap-attack-surface.ts running @pensar/surface against each benchmark's src/ tree, then auto-fixing the validation failures (drop unresolved file/grep entries, merge duplicates, default single-service refs). Each benchmark gets two new files: - attack-surface.json: the catalog itself. Schema-valid, canary present, every entry's file reference resolves under the benchmark root, every entry's path/identifier appears in its cited file. Cases where surface could not derive a surface (custom PHP, AWS SAM Lambda triggers, subdomain-takeover infrastructure, etc.) ship with empty entry_points and a top-level notes field per the D3 empty-catalog rule. - attack-surface.curator-notes.md: bootstrap diagnostics + curator TODO list. Lists which entries surface produced, which were auto-dropped and why, and which kinds (gRPC, GraphQL, event_handlers, cloud assets) need manual addition. These catalogs are draft ground truth: schema-valid and source-traceable, but not yet hand-curated. Future curation passes (per the design doc's phase 2) replace the bootstrap output with verified, complete entries. Coverage summary: - 70 benchmarks total (60 APEX, 10 TM-APP) - 8 cases with empty entry_points (D3 — no source-derivable surface): APEX-011, APEX-016, APEX-026, APEX-037, TM-APP-003, TM-APP-004, TM-APP-005, TM-APP-007 - All other cases populated with HTTP / web kind entries from surface - Add !attack-surface.curator-notes.md exception in benchmarks/APEX-003-25/.gitignore so its notes file is tracked. Co-authored-by: Jorge Alejandro Raad <jorge@pensarai.com>
Remove the 70 curator-notes markdown files generated alongside each attack-surface.json. The bootstrap script no longer writes them; instead it prints a one-line per-case diagnostic to stdout (frameworks, raw vs. kept counts, auto-fix tallies, surface error if any) when a curator re-runs it locally. - Drop curatorNotes() from scripts/bootstrap-attack-surface.ts and replace with diagnosticSummary() printed to the existing WROTE/DRY/FAIL output line. - Delete all 70 benchmarks/**/attack-surface.curator-notes.md files. - Revert benchmarks/APEX-003-25/.gitignore back to '*.md' (the !attack-surface.curator-notes.md exception is no longer needed). - README: remove curator-notes from the directory layout and reword the curator-handoff sentence to point at the bootstrap's stdout output. Co-authored-by: Jorge Alejandro Raad <jorge@pensarai.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the attack-surface catalog design doc — a per-benchmark
attack-surface.jsoncovering every entry point in argus's 70 benchmarks (60 APEX + 10 TM-APP), backed by a schema, validator, and@pensar/surface-driven bootstrapper.The catalog answers a different question than the existing GT files: "what is every door into this app?" alongside "which vulnerabilities exist?" (
expected_results/*vulns.json) and "what threat model should an analyst produce?" (ground-truth.json).Phase 1 — Schema & tooling
attack-surface-schema.json— JSON Schema (Draft 2020-12) with akinddiscriminator (http,web,asset,grpc,graphql,event_handler) defaulting to"http". Strictly additive: existing apex/argus emissions stay valid askind: "http"records.scripts/validate-attack-surface.ts— CI validator. Implements all seven design checks: schema validation, canary, file resolution, per-kind source-grep, duplicate identity, service refs, and the D3 empty-catalog gate (emptyentry_pointsrequires top-levelnotes ≥ 20 chars).scripts/bootstrap-attack-surface.ts—surface mapdriver with the auto-validate-and-iterate loop from the spec. MapsEndpointInfo[]→EntryPoint[], drops unresolvable entries, merges duplicates, defaults single-service refs. Prints a one-line per-case diagnostic to stdout (frameworks, raw vs. kept counts, auto-fix tallies) as the curator handoff..github/workflows/validate-attack-surface.yml— runs the validator on every PR touching catalogs, the schema, the scripts, or the lockfile.package.json+bun.lock+tsconfig.json— pin@pensar/surface@0.2.1(D4: exact-pin, no^) plusajvfor schema validation. Bun is the script runner.README.md— documents the new ground-truth layer with a comparison table of the three GT artifacts (catalog vs. vulnerability vs. threat-model) and the curator workflow.Phase 2 — 70-case catalog bootstrap
Every benchmark now has an
attack-surface.json. Generated by runningbun run bootstrap:attack-surface --allagainst the repo:entry_points: []per the D3 rule (with explanatorynotes): APEX-011, APEX-016, APEX-026, APEX-037 (custom PHP / proxy / no-framework cases), and TM-APP-003, TM-APP-004, TM-APP-005, TM-APP-007 (gRPC / Lambda / non-HTTP infrastructure).These are draft ground truth — schema-valid and source-traceable, not hand-verified. Curators re-run the bootstrap locally to see the per-case diagnostic line and identify gaps to fill (gRPC RPCs, GraphQL operations, Lambda triggers, cloud assets, custom-PHP routes).
CI status:
bun run validate:attack-surfacereports70 valid · 0 missing catalog · 0 errorsagainst the bootstrapped corpus.Out of scope
This PR is argus-side only. The evalgate-side work (matcher kind dispatch, judge integration, suite expansion, submodule bump — phase 3 / T6–T9 in the design doc) lands in evalgate after this merges.
Verification
Manually verified that the validator catches the four common failure modes:
(kind, method-set, path)tuplesfilereferencespathnot appearing in the cited fileentry_pointswith no rationaleNotes for reviewers
req,res,<anonymous>) sohandleronly carries genuine names. Surface-emitted source paths are rewritten with thesrc/prefix to match the catalog's "relative to benchmark root" convention.