Skip to content

Detection-engineering compiler for threatmodel-gcp-bigquery.json#4

Draft
karthikarunapuram8-dot wants to merge 1 commit into
trustoncloud:mainfrom
karthikarunapuram8-dot:detection-engineering-compiler
Draft

Detection-engineering compiler for threatmodel-gcp-bigquery.json#4
karthikarunapuram8-dot wants to merge 1 commit into
trustoncloud:mainfrom
karthikarunapuram8-dot:detection-engineering-compiler

Conversation

@karthikarunapuram8-dot
Copy link
Copy Markdown

Draft PR. The compiler + 3 reference threats (T1, T6, T9) are wired end-to-end. Maintainers approve patterns, not volume — once the pattern is accepted, the remaining threats are mechanical follow-ups.

What this PR does

This PR makes threatmodel-gcp-bigquery.json a compilation target. From the same source of truth you already maintain, it emits detective controls across Chronicle / Splunk / Sentinel + native BigQuery SQL, plus an ATT&CK Navigator coverage layer — regenerated on every release tag. No new content burden on maintainers; existing JSON drives everything.

Why I think it's a good fit for this repo

The repo today ships prevention (Wiz Rego custom rules, GCP custom org constraints) and documentation (docx, pdf, drawio). It's missing the right-hand side of the kill chain — behavioral detect-and-respond — and nothing programmatically consumes the JSON. The existing NIST CSF Detect-class controls are still Rego config-posture checks, not audit-log behavior.

This PR closes the loop in one structural change:

  1. The JSON gets promoted from documentation artifact to first-class IR. Every future threat you add automatically yields detection content + ATT&CK mapping with zero extra labor.
  2. BigQuery defending itself — SQL detections that run inside BigQuery against the audit-log sink. There's almost no public SQL-native detection library for BigQuery audit logs; this could anchor a conference talk for TrustOnCloud.
  3. prevent → detect → respond is what every security buyer evaluates a posture artifact on. Closing it raises the value of the whole repo.

Scope

In scope Out of scope (explicit non-goals)
Compiler (tools/threatmodel_compiler/) Runtime detection engine / SaaS deployment
BigQuery SQL + Terraform module Replacing the existing Wiz Rego prevention controls
Chronicle YARA-L, Splunk SPL, Sentinel KQL emitters Generalizing the compiler across other TrustOnCloud services
ATT&CK Navigator coverage layer Enumerating every threat — only T1, T6, T9 are seeded
pytest validation suite (25 checks, fully offline) Vendor-SaaS validators (Chronicle / Splunk) gated behind optional creds
GitHub Actions workflow on release tags + PRs Auto-deploying detections to a customer environment

Reference threats wired end-to-end

Threat hlgoal Tactic Why it's a good first detection
T1 Destruction (delete table/dataset) DoS TA0040 Impact Highest-signal, simplest admin-activity emitter — proves the pattern
T6 Exfiltration via export to GCS/DLP DataTheft TA0010 Exfiltration Job-config inspection + authorized-bucket allowlist regex
T9 Unauthorized query (SELECT * / metadata abuse) DataTheft TA0010 Query-pattern detection inside BigQuery's own audit logs — the flagship example

Generated tree (this PR)

detections/
├── README.md                                     # auto coverage matrix
├── attack_navigator/bigquery_threat_coverage.json
└── Bigquery.T{1,6,9}/
    ├── metadata.json
    ├── bigquery_sql/detect_t<n>.sql
    ├── bigquery_sql/terraform/main.tf            # passes terraform validate
    ├── chronicle_yaral/detect_t<n>.yaral
    ├── splunk_spl/detect_t<n>.spl
    └── sentinel_kql/detect_t<n>.kql

Validation evidence

The pytest suite is 25 tests, fully offline, runs in ~16s:

  • IR loading + AND/OR/OPTIONAL access-tree flattening
  • Every emitter produces non-empty content for the seeded threats
  • BigQuery SQL parses through sqlglot (BigQuery dialect)
  • KQL passes structural sanity checks
  • YARA-L rules contain meta:, events:, condition: sections
  • MITRE technique IDs resolve and the ATT&CK Navigator layer is well-formed
  • Terraform module passes terraform validate
  • Compiled SQL contains no destructive statements outside string literals (DROP/DELETE/UPDATE/GRANT/REVOKE guard)

CI workflow: .github/workflows/compile-detections.yml runs on every push of a v* tag and on PRs touching the JSON or compiler. It compiles, tests, uploads artifacts, and hard-fails PRs whose detections/ tree drifts from the freshly compiled output (so contributors are forced to commit regenerated artifacts).

Adding new threats — one PR pattern

  1. Confirm every IAM permission referenced by the threat's access tree exists in tools/threatmodel_compiler/permission_map.yaml. Add entries if not.
  2. Add a per-threat detection template to each emitter (bigquery_sql.py, chronicle_yaral.py, splunk_spl.py, sentinel_kql.py).
  3. Run PYTHONPATH=tools python -m threatmodel_compiler --repo-root . compile --threats T<N>.
  4. Run PYTHONPATH=tools python -m pytest tools/threatmodel_compiler/tests.
  5. Commit the new templates + the regenerated detections/ tree. CI fails the PR if it drifts.

That's the move — pattern, not volume.

Test plan for reviewers

  • Inspect detections/Bigquery.T9/bigquery_sql/detect_t9.sql and confirm the SELECT * + >1 GiB / >50 GiB scan / >=10 referenced tables rules match how you'd model T9 yourself.
  • Inspect detections/Bigquery.T6/sentinel_kql/detect_t6.kql and confirm the GCS allowlist regex pattern is the right shape.
  • Check detections/README.md (the coverage matrix) — does the prevent / existing-detect / compiled-detect tri-column read clearly?
  • Open detections/attack_navigator/bigquery_threat_coverage.json in the ATT&CK Navigator and confirm the technique mapping is defensible.
  • Run the pytest suite locally:
    python -m pip install pytest sqlglot
    PYTHONPATH=tools python -m pytest tools/threatmodel_compiler/tests -v
  • Run the compiler against the seeded threats:
    PYTHONPATH=tools python -m threatmodel_compiler --repo-root . compile --threats T1 T6 T9
    git diff --stat detections/   # should be clean

Open questions for maintainers (please weigh in)

  1. Naming: is tools/threatmodel_compiler/ the right home, or would you prefer top-level compiler/ or detections/_compiler/?
  2. Authorized-bucket allowlist in T6: should this be configurable per deployment via a YAML file alongside the SQL, or is a docstring marker fine?
  3. Per-threat templates vs algorithmic: I deliberately wrote per-threat detection templates rather than an algorithmic rule generator — detection engineering is a content discipline, not codegen. If you'd prefer a more generic dispatch, happy to refactor.
  4. Vendor validators in CI: Chronicle's gcloud chronicle rules validate and Splunk's btool would catch real vendor-syntax errors but require gated credentials. OK to ship as optional CI jobs in a follow-up?

Happy to address feedback in this PR or split into smaller PRs if preferred.

Treats the existing threat-model JSON as a first-class intermediate
representation and compiles it into multi-target detection content for
each modeled threat:

- BigQuery-native scheduled-query detections (.sql + Terraform module)
- Chronicle YARA-L 2.0 rules
- Splunk SPL searches
- Microsoft Sentinel KQL queries
- MITRE ATT&CK Navigator coverage layer (single JSON across all threats)
- Per-threat metadata.json + auto-generated coverage matrix README

Three reference threats wired end-to-end (T1 destruction, T6 GCS
exfiltration, T9 unauthorized query). Adding a new threat is then
permission-map + per-emitter template + recompile — no compiler changes.

Includes:
- tools/threatmodel_compiler/ Python package (no runtime deps; PyYAML
  avoided via a tiny inline parser scoped to the permission-map shape)
- pytest suite with 25 offline checks: IR loading, access-tree flatten,
  emitter content, sqlglot BigQuery SQL parse, KQL paren balance,
  YARA-L structure, MITRE technique resolution, ATT&CK Navigator
  well-formedness, terraform validate, and a no-destructive-SQL guard
- .github/workflows/compile-detections.yml that runs on release tags
  + PRs touching the JSON or compiler, uploads artifacts, and
  hard-fails PRs whose detections/ tree drifts from the compiler output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant