Detection-engineering compiler for threatmodel-gcp-bigquery.json#4
Draft
karthikarunapuram8-dot wants to merge 1 commit into
Draft
Conversation
Treats the existing threat-model JSON as a first-class intermediate representation and compiles it into multi-target detection content for each modeled threat: - BigQuery-native scheduled-query detections (.sql + Terraform module) - Chronicle YARA-L 2.0 rules - Splunk SPL searches - Microsoft Sentinel KQL queries - MITRE ATT&CK Navigator coverage layer (single JSON across all threats) - Per-threat metadata.json + auto-generated coverage matrix README Three reference threats wired end-to-end (T1 destruction, T6 GCS exfiltration, T9 unauthorized query). Adding a new threat is then permission-map + per-emitter template + recompile — no compiler changes. Includes: - tools/threatmodel_compiler/ Python package (no runtime deps; PyYAML avoided via a tiny inline parser scoped to the permission-map shape) - pytest suite with 25 offline checks: IR loading, access-tree flatten, emitter content, sqlglot BigQuery SQL parse, KQL paren balance, YARA-L structure, MITRE technique resolution, ATT&CK Navigator well-formedness, terraform validate, and a no-destructive-SQL guard - .github/workflows/compile-detections.yml that runs on release tags + PRs touching the JSON or compiler, uploads artifacts, and hard-fails PRs whose detections/ tree drifts from the compiler output
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
This PR makes
threatmodel-gcp-bigquery.jsona compilation target. From the same source of truth you already maintain, it emits detective controls across Chronicle / Splunk / Sentinel + native BigQuery SQL, plus an ATT&CK Navigator coverage layer — regenerated on every release tag. No new content burden on maintainers; existing JSON drives everything.Why I think it's a good fit for this repo
The repo today ships prevention (Wiz Rego custom rules, GCP custom org constraints) and documentation (docx, pdf, drawio). It's missing the right-hand side of the kill chain — behavioral detect-and-respond — and nothing programmatically consumes the JSON. The existing NIST CSF
Detect-class controls are still Rego config-posture checks, not audit-log behavior.This PR closes the loop in one structural change:
Scope
tools/threatmodel_compiler/)Reference threats wired end-to-end
SELECT */ metadata abuse)Generated tree (this PR)
Validation evidence
The pytest suite is 25 tests, fully offline, runs in ~16s:
sqlglot(BigQuery dialect)meta:,events:,condition:sectionsterraform validateDROP/DELETE/UPDATE/GRANT/REVOKEguard)CI workflow:
.github/workflows/compile-detections.ymlruns on every push of av*tag and on PRs touching the JSON or compiler. It compiles, tests, uploads artifacts, and hard-fails PRs whosedetections/tree drifts from the freshly compiled output (so contributors are forced to commit regenerated artifacts).Adding new threats — one PR pattern
accesstree exists intools/threatmodel_compiler/permission_map.yaml. Add entries if not.bigquery_sql.py,chronicle_yaral.py,splunk_spl.py,sentinel_kql.py).PYTHONPATH=tools python -m threatmodel_compiler --repo-root . compile --threats T<N>.PYTHONPATH=tools python -m pytest tools/threatmodel_compiler/tests.detections/tree. CI fails the PR if it drifts.That's the move — pattern, not volume.
Test plan for reviewers
detections/Bigquery.T9/bigquery_sql/detect_t9.sqland confirm theSELECT * + >1 GiB/>50 GiB scan/>=10 referenced tablesrules match how you'd model T9 yourself.detections/Bigquery.T6/sentinel_kql/detect_t6.kqland confirm the GCS allowlist regex pattern is the right shape.detections/README.md(the coverage matrix) — does the prevent / existing-detect / compiled-detect tri-column read clearly?detections/attack_navigator/bigquery_threat_coverage.jsonin the ATT&CK Navigator and confirm the technique mapping is defensible.Open questions for maintainers (please weigh in)
tools/threatmodel_compiler/the right home, or would you prefer top-levelcompiler/ordetections/_compiler/?gcloud chronicle rules validateand Splunk'sbtoolwould catch real vendor-syntax errors but require gated credentials. OK to ship as optional CI jobs in a follow-up?Happy to address feedback in this PR or split into smaller PRs if preferred.