Detection-engineering compiler for threatmodel-gcp-bigquery.json by karthikarunapuram8-dot · Pull Request #4 · trustoncloud/threatmodel-for-google-bigquery

karthikarunapuram8-dot · 2026-04-29T22:07:42Z

Draft PR. The compiler + 3 reference threats (T1, T6, T9) are wired end-to-end. Maintainers approve patterns, not volume — once the pattern is accepted, the remaining threats are mechanical follow-ups.

What this PR does

This PR makes threatmodel-gcp-bigquery.json a compilation target. From the same source of truth you already maintain, it emits detective controls across Chronicle / Splunk / Sentinel + native BigQuery SQL, plus an ATT&CK Navigator coverage layer — regenerated on every release tag. No new content burden on maintainers; existing JSON drives everything.

Why I think it's a good fit for this repo

The repo today ships prevention (Wiz Rego custom rules, GCP custom org constraints) and documentation (docx, pdf, drawio). It's missing the right-hand side of the kill chain — behavioral detect-and-respond — and nothing programmatically consumes the JSON. The existing NIST CSF Detect-class controls are still Rego config-posture checks, not audit-log behavior.

This PR closes the loop in one structural change:

The JSON gets promoted from documentation artifact to first-class IR. Every future threat you add automatically yields detection content + ATT&CK mapping with zero extra labor.
BigQuery defending itself — SQL detections that run inside BigQuery against the audit-log sink. There's almost no public SQL-native detection library for BigQuery audit logs; this could anchor a conference talk for TrustOnCloud.
prevent → detect → respond is what every security buyer evaluates a posture artifact on. Closing it raises the value of the whole repo.

Scope

In scope	Out of scope (explicit non-goals)
Compiler (`tools/threatmodel_compiler/`)	Runtime detection engine / SaaS deployment
BigQuery SQL + Terraform module	Replacing the existing Wiz Rego prevention controls
Chronicle YARA-L, Splunk SPL, Sentinel KQL emitters	Generalizing the compiler across other TrustOnCloud services
ATT&CK Navigator coverage layer	Enumerating every threat — only T1, T6, T9 are seeded
pytest validation suite (25 checks, fully offline)	Vendor-SaaS validators (Chronicle / Splunk) gated behind optional creds
GitHub Actions workflow on release tags + PRs	Auto-deploying detections to a customer environment

Reference threats wired end-to-end

Threat	hlgoal	Tactic	Why it's a good first detection
T1 Destruction (delete table/dataset)	DoS	TA0040 Impact	Highest-signal, simplest admin-activity emitter — proves the pattern
T6 Exfiltration via export to GCS/DLP	DataTheft	TA0010 Exfiltration	Job-config inspection + authorized-bucket allowlist regex
T9 Unauthorized query (`SELECT *` / metadata abuse)	DataTheft	TA0010	Query-pattern detection inside BigQuery's own audit logs — the flagship example

Generated tree (this PR)

detections/
├── README.md                                     # auto coverage matrix
├── attack_navigator/bigquery_threat_coverage.json
└── Bigquery.T{1,6,9}/
    ├── metadata.json
    ├── bigquery_sql/detect_t<n>.sql
    ├── bigquery_sql/terraform/main.tf            # passes terraform validate
    ├── chronicle_yaral/detect_t<n>.yaral
    ├── splunk_spl/detect_t<n>.spl
    └── sentinel_kql/detect_t<n>.kql

Validation evidence

The pytest suite is 25 tests, fully offline, runs in ~16s:

IR loading + AND/OR/OPTIONAL access-tree flattening
Every emitter produces non-empty content for the seeded threats
BigQuery SQL parses through sqlglot (BigQuery dialect)
KQL passes structural sanity checks
YARA-L rules contain meta:, events:, condition: sections
MITRE technique IDs resolve and the ATT&CK Navigator layer is well-formed
Terraform module passes terraform validate
Compiled SQL contains no destructive statements outside string literals (DROP/DELETE/UPDATE/GRANT/REVOKE guard)

CI workflow: .github/workflows/compile-detections.yml runs on every push of a v* tag and on PRs touching the JSON or compiler. It compiles, tests, uploads artifacts, and hard-fails PRs whose detections/ tree drifts from the freshly compiled output (so contributors are forced to commit regenerated artifacts).

Adding new threats — one PR pattern

Confirm every IAM permission referenced by the threat's access tree exists in tools/threatmodel_compiler/permission_map.yaml. Add entries if not.
Add a per-threat detection template to each emitter (bigquery_sql.py, chronicle_yaral.py, splunk_spl.py, sentinel_kql.py).
Run PYTHONPATH=tools python -m threatmodel_compiler --repo-root . compile --threats T<N>.
Run PYTHONPATH=tools python -m pytest tools/threatmodel_compiler/tests.
Commit the new templates + the regenerated detections/ tree. CI fails the PR if it drifts.

That's the move — pattern, not volume.

Test plan for reviewers

Inspect detections/Bigquery.T9/bigquery_sql/detect_t9.sql and confirm the SELECT * + >1 GiB / >50 GiB scan / >=10 referenced tables rules match how you'd model T9 yourself.
Inspect detections/Bigquery.T6/sentinel_kql/detect_t6.kql and confirm the GCS allowlist regex pattern is the right shape.
Check detections/README.md (the coverage matrix) — does the prevent / existing-detect / compiled-detect tri-column read clearly?
Open detections/attack_navigator/bigquery_threat_coverage.json in the ATT&CK Navigator and confirm the technique mapping is defensible.

Run the pytest suite locally:

python -m pip install pytest sqlglot
PYTHONPATH=tools python -m pytest tools/threatmodel_compiler/tests -v

Run the compiler against the seeded threats:

PYTHONPATH=tools python -m threatmodel_compiler --repo-root . compile --threats T1 T6 T9
git diff --stat detections/   # should be clean

Open questions for maintainers (please weigh in)

Naming: is tools/threatmodel_compiler/ the right home, or would you prefer top-level compiler/ or detections/_compiler/?
Authorized-bucket allowlist in T6: should this be configurable per deployment via a YAML file alongside the SQL, or is a docstring marker fine?
Per-threat templates vs algorithmic: I deliberately wrote per-threat detection templates rather than an algorithmic rule generator — detection engineering is a content discipline, not codegen. If you'd prefer a more generic dispatch, happy to refactor.
Vendor validators in CI: Chronicle's gcloud chronicle rules validate and Splunk's btool would catch real vendor-syntax errors but require gated credentials. OK to ship as optional CI jobs in a follow-up?

Happy to address feedback in this PR or split into smaller PRs if preferred.

Treats the existing threat-model JSON as a first-class intermediate representation and compiles it into multi-target detection content for each modeled threat: - BigQuery-native scheduled-query detections (.sql + Terraform module) - Chronicle YARA-L 2.0 rules - Splunk SPL searches - Microsoft Sentinel KQL queries - MITRE ATT&CK Navigator coverage layer (single JSON across all threats) - Per-threat metadata.json + auto-generated coverage matrix README Three reference threats wired end-to-end (T1 destruction, T6 GCS exfiltration, T9 unauthorized query). Adding a new threat is then permission-map + per-emitter template + recompile — no compiler changes. Includes: - tools/threatmodel_compiler/ Python package (no runtime deps; PyYAML avoided via a tiny inline parser scoped to the permission-map shape) - pytest suite with 25 offline checks: IR loading, access-tree flatten, emitter content, sqlglot BigQuery SQL parse, KQL paren balance, YARA-L structure, MITRE technique resolution, ATT&CK Navigator well-formedness, terraform validate, and a no-destructive-SQL guard - .github/workflows/compile-detections.yml that runs on release tags + PRs touching the JSON or compiler, uploads artifacts, and hard-fails PRs whose detections/ tree drifts from the compiler output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detection-engineering compiler for threatmodel-gcp-bigquery.json#4

Detection-engineering compiler for threatmodel-gcp-bigquery.json#4
karthikarunapuram8-dot wants to merge 1 commit into
trustoncloud:mainfrom
karthikarunapuram8-dot:detection-engineering-compiler

karthikarunapuram8-dot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

karthikarunapuram8-dot commented Apr 29, 2026

What this PR does

Why I think it's a good fit for this repo

Scope

Reference threats wired end-to-end

Generated tree (this PR)

Validation evidence

Adding new threats — one PR pattern

Test plan for reviewers

Open questions for maintainers (please weigh in)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant