Manages detection content for Microsoft Sentinel and Microsoft Defender XDR from a single GitHub Enterprise repository, driven by GitHub Actions, with Git as the source of truth.
Scope: this document is a contract with future-self, not a capability statement. It describes the full ContentOps vision. What ships today is a subset:
- Implemented: §1, §2 rows 1 (sentinel_analytic), 3 (defender_custom_detection), 4 (sentinel_hunting), 5 (sentinel_watchlist), 9 (sentinel_data_connector), plus sentinel_parser. §4, §6 (single-workspace today, multi-workspace iteration partial), §7 plan/apply, §8 drift + prune, §10 identity, §13 state, §16 observability (basic), §20 partial linter set.
- Roadmap (in the contract, not yet built): §2 rows 2 (NRT), 6 (workbooks), 7 (automation rules), 8 (playbooks), 10 (Content Hub solutions), §11 lifecycle gating, §12 promotion-tier workflows, §19 upstream-sync workflows, §21 full CI matrix.
- Authoritative on what's actually shipped: see
docs/reference/gap-assessment.mdanddocs/reference/generated-catalog.md.When this doc and the gap-assessment disagree, the gap-assessment wins.
Microsoft already ships a "Repositories" feature in Sentinel that natively connects a GitHub or Azure DevOps repo to a workspace and deploys analytic rules, automation rules, hunting queries, parsers, playbooks, and workbooks via a managed runner with "smart deployments" (skip-if-unchanged).
Before building anything, justify why we are not just using that. Honest answers — any one of these is a valid reason:
| Limitation of native Repositories | Our requirement |
|---|---|
| Workspace-scoped, one repo ↔ one workspace. No multi-tenant fan-out. | We want one repo, many envs (dev/test/prod). |
| Does not manage Defender XDR custom detections at all. | We want unified Sentinel + Defender XDR. |
| Does not manage data connectors, Content Hub solutions, or watchlists. | We want full CRUD over those. |
| Deletion is not handled — orphans accumulate forever. | We want explicit drift detection + prune. |
| ARM/Bicep only. No abstraction layer. Reviewers diff giant JSON. | We want lean YAML for analysts to PR. |
| No environment promotion model (no dev → prod gating). | We want enterprise change control. |
| No PR-time validation of KQL or schema beyond ARM template parse. | We want strict pre-merge validation. |
If none of those are blocking for you, stop reading and use Sentinel Repositories. Building a custom platform is only justified by those gaps.
The rest of this document assumes you are building the custom platform.
- Single GitHub Enterprise repo manages detection content for one or more Sentinel workspaces and one Defender XDR tenant.
- Lean, human-authored YAML for everything analysts touch (analytics rules, custom detections, watchlists, automation rules).
- Pass-through ARM/JSON for things humans don't author (workbooks, playbook logic apps).
- Declarative state for "knobs" (data connectors enabled, solutions installed).
- Full lifecycle: create / update / disable / delete, with drift detection.
- Multi-environment promotion (dev → test → prod) with environment-level approvals.
- Auditable deploys: every change traceable to a commit, PR, and reviewer.
- Replacing the Sentinel UI for ad-hoc investigation.
- Managing Log Analytics workspace creation, tables, retention, or DCRs (use separate IaC repo).
- Managing RBAC role assignments (separate IaC repo / privileged identity).
- Managing the App Registration that the pipeline uses (chicken-and-egg — bootstrap manually or with a separate one-time IaC).
- Replacing playbook authoring (Logic Apps Designer stays the source of truth for playbook internals; we only manage deployment).
| # | Asset | Platform | API / resource | Source format | Notes |
|---|---|---|---|---|---|
| 1 | Analytic rules (Scheduled) | Sentinel | Microsoft.SecurityInsights/alertRules (ARM 2025-07-01-preview) |
YAML (lean envelope) | Already in v1. |
| 2 | Analytic rules (NRT) | Sentinel | same, kind: NRT |
YAML | Already in v1. |
| 3 | Custom detections | Defender XDR | Graph beta /security/rules/detectionRules |
YAML | Already in v1. Tenant-wide, ~100-rule limit by default. |
| 4 | Hunting queries | Sentinel | savedSearches on the LA workspace (Microsoft.OperationalInsights) |
YAML | Cheap to manage; pure KQL. |
| 5 | Watchlists | Sentinel | Microsoft.SecurityInsights/watchlists + watchlistItems |
YAML metadata + CSV | Bulk path: PUT watchlist with rawContent (small) or SAS-URI to CSV blob (large >3.8 MB). |
| 6 | Workbooks | Sentinel | Microsoft.Insights/workbooks (ARM 2023-06-01, kind: shared) |
JSON file (serializedData) + small YAML manifest |
Authored in UI, exported as ARM. Not human-diffable. |
| 7 | Automation rules | Sentinel | Microsoft.SecurityInsights/automationRules |
YAML | Order matters (order field). |
| 8 | Playbooks (Logic Apps) | Sentinel | Microsoft.Logic/workflows (Consumption) or Microsoft.Web/sites (Standard) |
ARM JSON | Often in a separate RG with separate RBAC. Authored in Designer. |
| 9 | Data connectors | Sentinel | Microsoft.SecurityInsights/dataConnectors (legacy) and dataConnectorDefinitions (CCP) |
YAML state file | Many connectors require manual UI consent — partial automation only. |
| 10 | Content Hub solutions | Sentinel | Microsoft.SecurityInsights/contentPackages (ARM 2025-09-01) |
YAML state file | Lists which solutions+versions are installed in each workspace. |
| 11 | Threat intel indicators | Sentinel/Defender | Graph /security/threatIntelligence/... or Sentinel TI ingest API |
(out of scope v2) | Recommend dedicated TI platform (MISP/Sentinel TI), not Git. |
Things deliberately not managed via Git in v2: incidents, bookmarks, entities, MTH huntings (live state, not config), workspace settings (UEBA, diagnostic settings — separate IaC).
SIEMContent/ # GitHub Enterprise repo
├── README.md
├── DESIGN.md
├── pyproject.toml
├── requirements.txt
│
├── content/ # All deployable content (renamed from detections/)
│ ├── analytics/ # (1)(2) Sentinel scheduled + NRT
│ │ └── *.yml
│ ├── detections/ # (3) Defender XDR custom detections
│ │ └── *.yml
│ ├── hunting/ # (4) Sentinel saved KQL hunting queries
│ │ └── *.yml
│ ├── watchlists/ # (5)
│ │ ├── tor-exit-nodes/
│ │ │ ├── watchlist.yml # metadata
│ │ │ └── items.csv # data
│ │ └── ...
│ ├── workbooks/ # (6)
│ │ ├── identity-overview/
│ │ │ ├── workbook.yml # metadata (displayName, category, etc.)
│ │ │ └── workbook.json # exported serializedData
│ │ └── ...
│ ├── automation/ # (7) automation rules
│ │ └── *.yml
│ ├── playbooks/ # (8) ARM templates
│ │ ├── isolate-device/
│ │ │ ├── playbook.yml # metadata + parameter values
│ │ │ └── azuredeploy.json # ARM template (from Designer export)
│ │ └── ...
│ ├── connectors/ # (9) declarative state
│ │ └── connectors.yml # one file lists desired enabled/disabled
│ └── solutions/ # (10) declarative state
│ └── solutions.yml # one file lists installed packages + versions
│
├── environments/ # Per-environment configuration
│ ├── dev.yml
│ ├── test.yml
│ └── prod.yml
│
├── pipeline/ # Python package (extends current)
│ ├── cli.py
│ ├── config.py # loads environments/*.yml
│ ├── models/ # split per asset type
│ │ ├── analytics.py
│ │ ├── detections.py
│ │ ├── hunting.py
│ │ ├── watchlists.py
│ │ ├── workbooks.py
│ │ ├── automation.py
│ │ ├── playbooks.py
│ │ ├── connectors.py
│ │ └── solutions.py
│ ├── providers/ # one per backend API
│ │ ├── arm.py # generic ARM PUT/GET/DELETE
│ │ ├── graph.py # generic Graph beta wrapper
│ │ ├── sentinel.py # Sentinel-specific helpers (composes ARM)
│ │ └── defender.py # Defender-specific helpers (composes Graph)
│ ├── handlers/ # one per asset type — implements plan/apply/collect
│ │ ├── analytics.py
│ │ ├── detections.py
│ │ ├── ...
│ ├── core/
│ │ ├── auth.py # DefaultAzureCredential, scope helpers
│ │ ├── diff.py # git diff + content-hash diff
│ │ ├── plan.py # produce a typed Plan{create,update,delete,skip}
│ │ ├── apply.py # execute a Plan
│ │ ├── state.py # local + remote state correlation
│ │ ├── yaml_io.py # block scalar dumper, envelope load/dump
│ │ └── ids.py # slug, hash, GUID derivation rules
│ └── cli/ # subcommand modules
│ ├── validate.py
│ ├── plan.py
│ ├── apply.py
│ ├── collect.py
│ ├── diff.py
│ ├── prune.py
│ └── lint.py
│
├── tests/
│ ├── unit/
│ ├── integration/ # against a sandbox tenant
│ └── fixtures/
│
└── .github/
├── copilot-instructions.md
├── CODEOWNERS # require security team review
└── workflows/
├── validate.yml # PR: schema + KQL lint + plan against dev
├── deploy-dev.yml # push to main → deploy to dev
├── promote-test.yml # tag v*-rc.* → deploy to test (env approval)
├── promote-prod.yml # tag v* → deploy to prod (env approval)
├── collect.yml # nightly → drift report PR
└── prune.yml # manual → delete orphans (env approval)
Every YAML asset shares the same envelope and adds a single typed payload block. This is the contract that lets one validator/loader handle all asset types.
# Required envelope
id: <kebab-slug> # unique within asset type, e.g. "brute-force-ssh-001"
version: "1.2.0" # semver, bumped manually or by CI
asset: analytic | detection | hunting | watchlist | workbook | automation | playbook | connector | solution
status: experimental | test | production | deprecated
owner: "team-soc" # CODEOWNERS group; used in audit logs
references: # optional — links to docs, MITRE, ticketing
- "https://attack.mitre.org/techniques/T1110/"
# Exactly one of these blocks, matching `asset`
analytic: { ... }
detection: { ... }
hunting: { ... }
# etc.Pipeline guarantees:
idis unique within asset type (validator).- For human-authored YAML,
idis the deterministic remote ARM resource name (so an upsert by id works across replays). - For collected envelopes,
idis the slugifieddisplayName(lowercase, non-alphanumerics →-, capped at 80 chars). The original ARM resource name lives on undermetadata.arm_name.applyresolves the remote resource usingmetadata.arm_namefirst and falls back toidwhen absent (matches v1 behaviour for legacy envelopes). - If two envelopes of the same asset kind would slug to the same
id, both get suffixed with-<arm8>(first 8 alphanumeric chars of the ARM name) so neither steps on the other. The non-colliding case stays bare. - For Defender (Graph generates its own GUID), the Graph id is
metadata.arm_name; upsert is bydisplayName(already true in v1).
Collected envelopes are written to
<path>/<asset_kind>/<slug>.yml. The slug matches the envelope id.
Human-readable filenames let analysts find a rule by its title in git ls
output and PRs.
contentops collect --rename-existing walks the existing on-disk tree once
and renames any envelope whose filename does not already match its slug.
Idempotent. Off by default — opt in once after this contract lands so the
detections/sentinel/ legacy tree follows the same convention.
Sentinel exposes two kinds of remote state under the same API:
- Configuration — declarative content the team owns. Detection rules,
hunting queries, watchlists, parsers, automation rules, workbooks, data
connectors, summary rules. Round-trips cleanly through
collect → drift. - Operational — runtime state. Incidents, incident tasks, watchlist items, workspace manager assignments/configurations/groups/members. New items appear continuously, so including them creates permanent apparent drift.
Operational kinds are skipped by default in contentops collect and
contentops drift. Pass --include-operational to opt in. The set is
defined in contentops.core.drift.OPERATIONAL_ASSETS. See
docs/operations/collect.md for guidance.
CLI subcommands (collect, apply, drift, prune) run at a default
verbosity that demotes the chatty SDK loggers (azure.identity,
azure.core, httpx, urllib3, msal) to WARNING. The top-level
pipeline logger stays at INFO so per-asset progress lines still
surface. Pass -v to promote the SDK loggers to INFO (~10× output)
or -vv for DEBUG (~30× output). Default output for collect is
~25 lines: tenant banner + summary table.
- Authored as lean YAML envelopes under
detections/sentinel_analytic/anddetections/defender_custom_detection/. - Use the universal envelope shape (
asset,id,version,status,metadata,payload). - Optional
dryRunQuery: trueflag — the pipeline calls Sentinel's KQL validation API in PR builds (preview, rate-limited; treat as advisory not blocking).
ARM resource: Microsoft.OperationalInsights/workspaces/savedSearches.
id: hunting-anomalous-svc-account-001
version: "1.0.0"
asset: hunting
status: production
hunting:
displayName: "Anomalous service account logon"
category: "Hunting Queries"
query: |
SecurityEvent | where ...
tactics: [CredentialAccess]
techniques: [T1078]Caveat: hunting queries collide on name (the ARM segment). Use
hash(displayName) for the resource name to avoid collisions across files.
Two failure modes the design must address explicitly:
- Size:
rawContent(inline CSV) is fine ≤ ~3.8 MB. Above that, you must upload to a SAS-protected blob and PUT withsasUri. Pipeline picks automatically based on file size. - Item churn: PUT-replace replaces the whole list (no row diff). For
high-frequency append-only feeds (TI IPs, etc.), this is fine. For lists
analysts edit by hand in the portal, this pipeline will overwrite their
changes — call this out loudly in onboarding. Either:
- declare watchlists as Git-only (portal edits forbidden), OR
- exclude specific watchlists via
managed: falseand let collect surface them as drift.
id: tor-exit-nodes
version: "2025.04.25" # date-versioned for feeds
asset: watchlist
status: production
watchlist:
displayName: "Tor Exit Nodes"
description: "Daily refresh from torproject.org"
itemsSearchKey: "IPAddress" # column used to JOIN in KQL
source: "items.csv" # relative to YAML file
contentType: text/csv
numberOfLinesToSkip: 0Workbooks are the worst offender for "JSON in JSON" — the entire workbook is
a string-encoded JSON blob inside properties.serializedData. PR diffs are
unreviewable. Mitigations:
- Store the workbook as two files: small YAML manifest + pretty-printed
workbook.json. Pipeline serializesworkbook.jsoninto the ARM body at deploy time. - Pre-commit hook auto-formats
workbook.jsonwith stable key ordering so diffs are minimal. - Workbook
name(ARM segment) must be a GUID. Pipeline derives it deterministically fromid(UUIDv5 with a fixed namespace) so the sameidalways produces the same GUID across environments. sourceIdmust be the workspace resource ID — pipeline injects from environment config.
ARM Microsoft.SecurityInsights/automationRules. Key gotcha: order is a
required integer, must be unique across active rules in the workspace.
Validator enforces uniqueness within the repo; collect pulls remote orders
into the diff to prevent unintentional reordering.
This is where the design must be most conservative:
- Playbooks are typically authored visually in the Logic Apps Designer. Hand- editing the ARM JSON is painful. The pipeline does not try to be the authoring tool.
- Workflow: develop in a sandbox, "Export template" from the portal,
parameterize connections, commit
azuredeploy.jsonand a YAML manifest. - Playbooks live in their own resource group with their own RBAC. The
pipeline needs
Logic App Contributoron that RG and the Microsoft Sentinel service account needsMicrosoft Sentinel Automation Contributoron it (one-time bootstrap, document in README). - Connections (e.g., to Microsoft Teams) cannot be fully OIDC-deployed — most connectors require an interactive consent the first time. Document which connectors are needed and require manual consent post-deploy.
YAML manifest:
id: isolate-device
version: "1.4.0"
asset: playbook
status: production
playbook:
resourceGroup: "rg-sentinel-playbooks"
template: "azuredeploy.json"
parameters:
PlaybookName: "Isolate-Device"
UserName:
reference: "@parameters('UserName')" # passthrough; resolved at deploy
connections: # documented, not deployed
- "azuremonitorlogs"
- "wdatp"There are ~100+ connector kinds. They split roughly into:
- Fully automatable (REST API connectors, Entra ID diagnostic-style toggles,
syslog-AMA via DCR): can PUT
dataConnectorswith credentials in env vars. - Requires UI consent (Office 365, MDE-via-MDC integrations, anything with a "Connect" button that triggers OAuth): the API call succeeds but the connector remains "not connected" until a human clicks once.
- Codeless Connector Platform (CCP):
dataConnectorDefinitions+dataConnectorsof kindRestApiPoller/GCP/CCP. Fully scriptable, but Microsoft is steering everything new to CCP and the old API surface will keep changing.
Design decision: model connectors as declarative desired state, not imperative. One file:
# content/connectors/connectors.yml
connectors:
- kind: AzureActiveDirectory
enabled: true
dataTypes:
signinLogs: { state: enabled }
auditLogs: { state: enabled }
- kind: Office365
enabled: true
dataTypes:
exchange: { state: enabled }
sharePoint: { state: enabled }
teams: { state: enabled }
requiresManualConsent: true # pipeline warns, doesn't fail
- kind: MicrosoftDefenderAdvancedThreatProtection
enabled: falseapply reconciles to desired state. For connectors flagged
requiresManualConsent, after PUT the pipeline polls connection status; if
not connected within N minutes, it emits a warning annotation on the PR
("manual consent needed in portal") rather than failing the build.
ARM resource: Microsoft.SecurityInsights/contentPackages (latest API
version 2025-09-01). Installs an entire solution (e.g., "Microsoft Entra
ID", "Cloud Identity Threat Protection Essentials") with a specific version.
# content/solutions/solutions.yml
solutions:
- contentId: "azuresentinel.azure-sentinel-solution-azureactivedirectory"
version: "3.0.6"
enabled: true
- contentId: "azuresentinel.azure-sentinel-solution-mdc"
version: "3.0.3"
enabled: trueCritical caveat: installing a solution dumps a pile of templates (analytics, hunting, workbooks, playbooks) as templates into the workspace — they are not active until separately enabled. Don't conflate "solution installed" with "rules running". The pipeline should:
- Reconcile installed solutions and versions.
- Not auto-enable templates from solutions (that's what custom rules files are for — copy from template, commit, manage explicitly).
- Surface installed-but-unmanaged content in
collectas informational.
The tenant file declares 0 to N Sentinel workspaces under
sentinelWorkspaces:, each tagged with a role. Defender XDR is
tenant-level (0 or 1 per Entra ID tenant).
A tenant ↔ Entra ID tenant. Within one Entra ID tenant you can have:
- 0 or 1 Defender XDR instance (Defender is tenant-scoped — no workspace concept; either present or not).
- 0 to N Sentinel workspaces, in any number of subscriptions. Each
workspace declares its own subscription / resource group / name and
carries one of these roles:
prod— receives content frommainmerges (deploy.yml).integration— receives content from PR builds (integration-deploy.yml). Used to catch broken KQL or wrong-column references before they hit prod. Defender content is not deployed to integration — Defender is prod-only by design (no integration Defender XDR exists).dev— receivesexperimental+test+productionstatuses. Useful for personal sandboxes; not auto-deployed by any workflow.
# config/tenant.yml
tenant:
name: production-tenant # human label, for logging
tenantId: "..." # Entra ID tenant GUID (must match the
# federated credential / .env
# AZURE_TENANT_ID — the loader does
# not verify, the API call will fail
# if they mismatch).
defender:
enabled: true # omit the block entirely for tenants
# that don't have Defender XDR.
sentinelWorkspaces:
- role: prod
subscriptionId: "..."
resourceGroup: "rg-sentinel"
workspaceName: "law-sentinel"
location: westeurope
- role: integration
subscriptionId: "..." # may differ from prod's; same Entra ID tenant.
resourceGroup: "rg-sentinel-int"
workspaceName: "law-sentinel-int"
location: westeuropeWorkflows / operators select workspaces by role (the common case) or by name (when one role has multiple workspaces and only one is the target). Selection happens at command invocation:
contentops apply --role prod # every prod workspace
contentops apply --role integration # every integration workspace
contentops apply --workspace law-sentinel # exactly one, by name
contentops apply # implicit when the tenant
# has exactly one Sentinel
# workspace; error otherwise.
Implementation: contentops.config.select_workspaces(cfg, role=..., workspace=...)
resolves a list of SentinelWorkspaceConfig. The CLI orchestrator
exports PIPELINE_WORKSPACE_NAME per iteration; handler factories
in contentops.cli.bootstrap read that env var to instantiate clients
against the active workspace. This keeps handlers workspace-scoped
without N-way registry rewiring.
Iteration follow-up. Today
apply_cmdresolves--roleto a single workspace and errors when N>1 match. Multi-workspace iteration (deploy to every--role prodworkspace independently, aggregate exit codes, group output by workspace) is the immediate follow-up. The schema and selector are in place; the iteration loop is mechanical work that lands in a follow-up PR.
Status gating now keys off the active workspace's role, not the tenant name:
dev→experimental+test+productionintegration(wastest) →test+productionprod→productiononly (anddeprecateddeploys as disabled)
deprecated is never auto-deployed by any role; it lives on the
explicit deprecate / prune flow.
Identity (AZURE_CLIENT_ID / AZURE_TENANT_ID / AZURE_CLIENT_SECRET)
is never in tenant.yml. It comes from environment variables —
local .env or GitHub Actions Variables (not Secrets — these IDs are
public). Auth flows:
- Local dev —
.envpopulates env vars;DefaultAzureCredentialpicks up theClientSecretCredentialchain. - CI —
azure/login@v2uses OIDC federated credentials (no client secret in CI). Federated credential subject is bound torepo:KustoKing/SIEMContent:environment:<production|integration>on the App Registration.
GitHub Variables (not Secrets) hold AZURE_CLIENT_ID and
AZURE_TENANT_ID. No AZURE_SUBSCRIPTION_ID variable is needed at
the workflow level — workflows read subscriptions from tenant.yml
and pass them to azure/login's subscription-id input.
Multi-tenant is explicitly out of scope. One Entra ID tenant per repo, full stop.
Inspired by Terraform; non-negotiable for safety at this scale.
contentops plan --env prod [--asset analytic,detection]
→ reads YAML, fetches remote state, prints typed Plan to stdout + JSON
to artifact. Does not call any write APIs.
contentops apply --env prod --plan plan.json
→ executes the previously-produced plan. Idempotent on retry.
contentops collect --env prod
→ pulls remote state to YAML in a scratch directory; emits diff as PR.
contentops prune --env prod --confirm
→ deletes remote items not present locally (gated; prod requires
2-reviewer approval via GitHub Environment).
Why a separate plan step: PR builds run plan against prod and post
the summary as a PR comment. Reviewers see exactly what will change before
merging. apply then runs in the deploy workflow with the merged plan.
Three sources of truth get out of sync:
- Git (what we want).
- Remote API (what's actually deployed).
- State file (what we last successfully deployed — optional but
recommended; stored as a workflow artifact + checked into a
state/branch, NOT main).
Drift cases:
| Local | Remote | State | Meaning | Action |
|---|---|---|---|---|
| ✅ | ✅ | ✅ | in sync | nothing |
| ✅ | ❌ | ❌ | new | create |
| ✅ | ✅ | ✅(diff) | locally modified | update |
| ✅ | ✅(diff) | ✅ | remote drift | warn; require explicit re-apply |
| ❌ | ✅ | ✅ | deleted from Git | mark for prune (separate flow) |
| ❌ | ✅ | ❌ | unmanaged (never ours) | leave alone, surface in report |
Default deploy never deletes. Deletion is a separate prune workflow
that requires reviewer approval — silent deletion of a detection rule is a
catastrophic failure mode you do not want to have to explain to an auditor.
The CLI command is contentops prune and the workflow is .github/workflows/prune.yml.
Handler contract. Every handler implements delete(remote_id) -> ActionResult.
For write-capable handlers, this calls the appropriate DELETE endpoint
on the remote API. For read-only / collect-only handlers (workspace
manager, source controls, incidents, incident tasks, watchlist items)
the method raises NotSupportedError. For singleton handlers
(sentinel_settings, sentinel_onboarding) it also raises
NotSupportedError — those resources represent workspace-wide
capabilities that are toggled via status: deprecated, not pruned.
404 is success. If the resource is already gone, the prune is a
no-op success — the operation is idempotent. Anything else maps to
error-<status_code>.
Safety rails:
--dry-rundefaults to true. The CLI lists the orphan plan and exits without touching anything.--yesis required to actually delete (alongside--no-dry-run). Two flags, not one — defends against shell-history accidents.--max-deletes N(default 25) is fail-closed. Exceeding it is an exit-1 error, not a "ok let's keep going" warning.- Locked envelopes (top-level
localCustomization: trueon the YAML) are NEVER pruned by default.--include-lockedis required to override, and the workflow exposes that as an explicit boolean input. NotSupportedErrorfrom any handler is caught and surfaced as a per-asset SKIP — never a batch failure.- Every actual deletion writes an
AuditRecordonto the same JSONL chainapplyuses, so the hash chain spans both create/update and delete operations.
Workflow: prune.yml is workflow_dispatch only. Inputs:
env, asset (optional), dry_run (default true), max_deletes
(default 25), include_locked (default false), confirm (must
equal the literal string CONFIRM when dry_run=false). Uses the
GitHub Environment matching the env input so production prune runs
require reviewer approval. Posts the prune plan as a step summary;
uploads the audit JSONL as an artefact (90-day retention).
Live-tested: every write-capable handler has a tests/integration/
test that creates a temp resource on the live tenant, deletes it via
handler.delete(), and asserts a subsequent GET returns 404.
This is GitHub Enterprise (likely GHEC, possibly GHES). Practical implications:
- OIDC with Azure: GHEC supports federated credentials to Azure normally. GHES requires the GHES instance's OIDC issuer to be reachable by Azure (public DNS) and the federated credential subject claim format may differ. Verify before designing around OIDC; fall back to a managed-identity-on- self-hosted-runner if OIDC isn't viable.
- Federated credential subject limit: an Azure App Registration accepts
at most 20 federated credentials. With multiple environments × branches ×
workflows you can hit this. Mitigation: use the
repo:org/repo:environment:<env>subject pattern (one credential per environment, not per workflow). - GitHub Environments: required for reviewer-gated deploys. One per
Azure environment (
dev,test,prod); secrets/variables scoped per environment. - CODEOWNERS in the security project: split ownership — analysts own
content/analytics/andcontent/detections/; platform team ownscontent/connectors/,content/solutions/,pipeline/,.github/. - Branch protection on
main: required reviews from CODEOWNERS, status checks (validate,plan-prod), no force-push, no admin bypass. - GitHub Advanced Security: enable secret scanning + push protection so someone can't commit a workspace key into a sample.
| Component | Identity | Scope |
|---|---|---|
| Pipeline → Sentinel ARM | App Reg + OIDC | Microsoft Sentinel Contributor on each workspace |
| Pipeline → Workbooks | same | Monitoring Contributor on workbook RG (workbooks live there) |
| Pipeline → Playbooks | same | Logic App Contributor on playbook RG |
| Sentinel service → Playbooks | Sentinel managed identity (auto-created) | Microsoft Sentinel Automation Contributor on playbook RG |
| Pipeline → Defender XDR | same App Reg, app permission | Graph CustomDetection.ReadWrite.All (admin consent) |
| Pipeline → Content Hub | same | Microsoft Sentinel Contributor already covers it |
| KQL dry-run validation | same | Log Analytics Reader on workspace |
One App Registration per environment is cleaner than one shared App Reg with multiple subjects — easier to revoke a single env without affecting others, fits the "blast radius" principle. Trade-off: more App Regs to manage.
PR-time, in order, fail-fast on first hard error:
- YAML well-formed, file extensions correct.
- Envelope schema valid (id pattern, semver version, known asset/status).
- Per-asset payload schema valid (Pydantic models per asset type).
- Cross-file invariants:
idunique within asset type.- Defender
displayNameunique across detection files. - Automation rule
orderunique within workspace scope. - Watchlist CSV columns match
itemsSearchKey. - Workbook
serializedDataparses as JSON; matches declaredversion.
- Reference checks:
- Playbook references in automation rules resolve to a known playbook.
- Connector references in solutions warn if the solution provides a
connector you've also declared in
content/connectors/(potential conflict).
- KQL lint (advisory, soft-fail):
- Static parse via
Kusto.Language(.NET) shelled, orpykusto. - Optional
dryRunQueryagainst workspace (preview API, rate-limited, advisory only).
- Static parse via
- Plan against target env(s) — requires Azure auth, so runs after
azure/login; if creds unavailable (fork PR), skip with notice.
| Workflow | Trigger | Permissions | Action |
|---|---|---|---|
validate.yml |
PR opened/updated | id-token: write |
steps 1–6 above; plan against dev; comment on PR |
deploy-dev.yml |
push to main |
id-token: write |
apply diff to dev automatically |
promote-test.yml |
tag v*-rc.* |
id-token: write + env approval |
apply to test |
promote-prod.yml |
tag v* (no -rc) |
id-token: write + env approval |
apply to prod (2 reviewers required) |
collect.yml |
nightly cron + manual | id-token: write + pr: write |
collect each env, open PR with drift |
prune.yml |
manual workflow_dispatch |
id-token: write + env approval |
delete orphans (env chosen as input) |
kql-lint.yml |
PR (paths: KQL-bearing files) | none | static KQL parse only (no Azure auth needed) |
solutions-catalog.yml |
daily cron + manual | id-token: write + pr: write |
upstream PR for new Content Hub versions (§19.2) |
templates-catalog.yml |
weekly cron + manual | id-token: write + pr: write |
upstream PR for new built-in template versions (§19.3) |
models-catalog.yml |
weekly cron + manual | pr: write |
upstream PR for ARM/Graph schema diffs (§19.4) |
tables-catalog.yml |
weekly cron + manual | id-token: write + pr: write |
upstream PR for new advanced-hunting tables (§19.5) |
community-watch.yml |
weekly cron + manual | pr: write |
informational PR for Azure-Sentinel community (§19.6) |
The five *-catalog.yml / community-watch.yml workflows are the
"upstream → repo" PR producers detailed in §19. They share the PR
machinery in §19.7 and run the full validation matrix (§21) on every PR
they open.
Concurrency: every deploy/promote workflow has
concurrency: { group: deploy-${{ env }}, cancel-in-progress: false } so
two simultaneous merges don't race.
Self-hosted runners: required if GHES + private Sentinel workspace + no public ingress. Use ephemeral runners with managed identity instead of OIDC in that case.
- All upserts are idempotent by construction (PUT with deterministic name for ARM; displayName-keyed PATCH for Defender).
- Resumable applies:
applywrites a per-item progress JSON; on retry, items already markedsuccessfor that plan ID are skipped. - State file (Chunk 5, implemented): per-env JSON, one file per env,
conventionally written to a separate
state/<env>orphan branch on CI runs. Local-dev path isstate/state.json. Used to detect "Git removed it but remote still has it → flag for prune". Without state, drift / prune still work via the local YAML index (the collect-as-truth fallback), but you can't distinguish "removed intentionally" from "never managed".
{
"schema_version": "1.0",
"env": "prod",
"last_apply_sha": "abc123...",
"last_apply_at": "2026-05-06T12:00:00.000000Z",
"managed_assets": {
"sentinel_analytic": {
"brute-force-001": {
"remote_id": "brute-force-001",
"last_applied_at": "2026-05-06T12:00:00.000000Z",
"last_applied_sha": "abc123...",
"status": "success"
}
}
}
}applyupdates the file after a successful batch: onemanaged_assets[<asset>][<envelope_id>]entry per non-skipped result, plus the run'slast_apply_sha/last_apply_at.classify_remote(state, asset, envelope_id, in_local)returns one ofin-sync,new-local,orphan,unmanaged— the contract drift / prune use to decide what to do with each remote item.- The state file is advisory. A corrupt or missing file falls back to "treat everything as unmanaged" (no prune candidates surfaced via state, but git-history fallback still works). The pipeline never fails because of state.
| Command | Effect |
|---|---|
contentops state show |
Print the per-env state. JSON via --format json. |
contentops state forget <id> --asset <kind> |
Drop one envelope from state (e.g. after a manual portal cleanup). |
contentops apply |
After every successful batch, merges results into state and writes it. Failures and skips are recorded too (failed entries get re-tried by contentops retry-failed). |
- Connector credentials (e.g., third-party API keys for CCP connectors): never
in YAML. Reference by name; pipeline resolves from GitHub environment
secrets at apply time and POSTs to the connector's
authblock. - Watchlist contents: typically not sensitive, but TI feeds can be licensed.
Use
.gitattributesfilterfor CSVs that must stay encrypted, or store outside the repo and download at deploy time. - Workspace and tenant IDs: not secrets; commit them in
environments/*.yml. - App Registration client IDs: not secrets; commit them.
- Anything referenced by a
${{ secrets.* }}pattern is mandatory to lock via push protection / secret scanning.
| Layer | What | Where |
|---|---|---|
| Unit | Pydantic models, YAML round-trip, ARM body builders, plan diff | tests/unit/ (current tests/ extended) |
| Contract | HTTP responses recorded with respx per provider |
tests/unit/providers/ |
| Integration | Real plan + apply against a sandbox tenant, then prune everything | tests/integration/ (only on schedule + manual) |
| Smoke (CI) | After each deploy, GET 5 random rules and verify they exist | inside deploy-*.yml |
Sandbox tenant is non-negotiable for an enterprise rollout — a separate
Sentinel workspace with isolated RBAC, used by dev and integration tests.
- Every CLI run emits a structured JSON log line per item; workflows upload the run log as an artifact (90-day retention).
- A summary table is posted as a PR comment / workflow summary
(
$GITHUB_STEP_SUMMARY). - Optional: ship logs to the very Sentinel workspace they manage (close the
loop) via Log Ingestion API → custom table
SIEMContentDeploys_CL. Then build an analytic rule: "deploy failure rate > N in 1h". - Notifications on prod failure: GitHub Issue auto-opened + Teams webhook.
Honest list:
- Defender XDR API is
beta. Microsoft can and does change beta shapes. PinUser-Agentso support can identify you, and budget for recurring schema fixups. There is no GA detection-rule API as of this writing. - Workbook serializedData is a moving target. Microsoft adds fields to the workbook JSON over time; an older committed workbook may "downgrade" features when re-applied. Mitigation: collect-then-PR workflow keeps the committed JSON close to portal reality.
- Solutions auto-update in the portal. If "Auto-update" is on, the
pipeline's "installed version: 3.0.6" will fight the portal. Turn auto-
update off for managed solutions, or accept that
collectwill keep producing version-bump PRs. - Data connector kinds change. New connectors (CCP-based) replace old
ones (e.g., Office365 connector is being replaced). Treat
content/connectors/connectors.ymlas a convention, not as schema- validated; use a permissive Pydantic model withextra='allow'. - Microsoft Sentinel is moving from Azure Portal to Defender Portal (deadline 2027-03-31). The ARM API surface stays, but the semantics around unified incidents, automation, etc., are shifting. Don't tightly couple to Azure-portal-only concepts.
git diff HEAD~1..HEADis wrong for squash merges with multiple logical changes. Prefer comparingmainto the previous successful deploy SHA stored in the state file. Falling back to "deploy everything in main" is safer than a half-applied subset.- Rate limits: ARM throttles around 12K req/h per subscription; Graph beta is more aggressive. Batch + parallelize cautiously (httpx with bounded semaphore, ~5 concurrent).
- Logic Apps connections cannot be fully automated. Some require interactive OAuth. The pipeline should clearly document which connectors need manual consent and not pretend otherwise.
- Two App Regs with the same Graph permission can both manage Defender detections. There is no per-rule ownership tag. If another tool also pushes Defender detections, you will fight it. Solve organizationally, not technically.
- Watchlist updates are not transactional. Mid-update the watchlist can be empty for seconds. KQL detections joining it will miss alerts. Schedule watchlist updates outside detection trigger windows.
| Phase | Scope | Why |
|---|---|---|
| 1 | Refactor v1 to envelope+handlers+providers; add plan/apply; multi-env config. |
Foundation; nothing new functionally. |
| 2 | Hunting queries, automation rules, watchlists. | Highest ROI, lowest API risk. |
| 3 | Workbooks (with two-file layout) and Content Hub solutions. | Useful but JSON-heavy; needs UX care. |
| 4 | Data connectors (CCP first, classic last). | Highest API risk, partial automation. |
| 5 | Playbooks (deploy-only; authoring stays in Designer). | Crosses RG/RBAC boundary; high blast radius. |
| 6 | KQL dry-run validation, drift dashboard workbook, deploy telemetry to Sentinel. | Polish. |
Each phase is independently shippable; phases 4–5 are optional if Sentinel Repositories already covers your playbook/workbook flow well enough.
Requirement: when reality upstream changes (live workspace, Microsoft
catalogs, API schemas), the pipeline must open a PR against this repo with
the proposed change. Humans always merge; the pipeline never pushes to
main directly.
There are six distinct upstream streams. Each is a separate scheduled workflow with its own label, branch name pattern, and CODEOWNERS routing. They share one underlying "open-or-update PR" library.
| # | Stream | Workflow | Trigger | Label | Branch | Owner team |
|---|---|---|---|---|---|---|
| 1 | Live-state drift (portal hand-edits) | collect.yml (§12) |
nightly | upstream:drift |
upstream/drift/<env>/<date> |
SOC analysts |
| 2 | Content Hub solution catalog | solutions-catalog.yml |
daily | upstream:solutions |
upstream/solutions/<contentId> |
Platform |
| 3 | Built-in analytic rule templates | templates-catalog.yml |
weekly | upstream:templates |
upstream/templates/<rule-id> |
SOC analytics |
| 4 | ARM / Graph API schema (Pydantic) | models-catalog.yml |
weekly | upstream:schema |
upstream/schema/<api>-<date> |
Platform |
| 5 | Advanced hunting / table schema | tables-catalog.yml |
weekly | upstream:tables |
upstream/tables/<date> |
Platform |
| 6 | Azure-Sentinel community repo (opt-in) | community-watch.yml |
weekly | upstream:community |
upstream/community/<date> |
SOC analytics |
Already specified in §12. Strengthen:
- Group changes by asset type and produce one PR per asset type per env (not one giant PR), to keep CODEOWNERS routing clean.
- For modified items, the PR body contains a 3-way diff: git HEAD, last applied state (§13), current remote. This distinguishes "we forgot to re-deploy" from "someone hand-edited in the portal".
- Source:
GET /providers/Microsoft.SecurityInsights/contentProductPackages(workspace-scoped) and the MarketplaceproductPackageslisting. - Compare each
contentIdincontent/solutions/solutions.ymlagainst the latest publishedversion. Open / update a PR bumping the version. - PR body includes the package's published changelog URL and a list of templates the new version adds/removes (so reviewers see if a managed rule is impacted).
- Opt-out per solution via
pinVersion: trueinsolutions.yml(and document why — usually "incompatible breaking change in 4.x").
- Source:
GET .../alertRuleTemplatesandMicrosoft.SecurityInsights/contentTemplates. - For every analytic rule whose YAML carries
alertRuleTemplateNameandtemplateVersion, detect a newertemplateVersionand open a PR. - The PR shows the KQL and property diff between the two template versions, plus the diff against the customer's local override. Reviewer decides what to keep.
- Never auto-merge — these almost always need analyst judgement (false- positive rate, environment-specific thresholds).
- Source: snapshot the relevant slices of the
Azure/azure-rest-api-specsSentinel swagger and Graph beta$metadataintocontentops/schemas/(committed JSON). - Weekly job re-fetches and diffs.
- For added optional fields → auto-regenerate Pydantic models in a PR with matching test fixtures (additive, non-breaking).
- For renamed/removed fields → open a PR with failing test stubs and TODOs in handler code; never auto-fix. This is how the team discovers Microsoft removed something before prod blows up.
- Bumps to
api-version(preview → GA) require a manual decision; the PR proposes the new version but leaves the call sites unchanged.
- Source: Defender XDR advanced hunting schema (via Graph
/security/runHuntingQuerymetadata or the public schema doc). - Diff against
contentops/schemas/defender_tables.json. - New columns / tables → PR updates the snapshot. The KQL linter (§21 + §22) automatically gains awareness on merge.
- Source:
Azure/Azure-Sentinelreleases / new files underDetections/,Hunting Queries/. - Filter by configured MITRE techniques, data sources, and severity.
- PR is informational only — lists candidate detections with summaries,
links, and a copy/paste YAML stub (status
experimental). Never imports KQL into the repo automatically.
All six streams share one library (contentops/upstream/pr.py):
- Stable branch name per stream + key. Re-runs force-update the branch
rather than stacking PRs (
git push --force-with-lease). - Content-hash dedupe: if the new branch's tree hash matches the open PR's head, the run is a no-op (no comment spam).
- Title format:
[upstream:<stream>] <key> <vfrom>→<vto>(parseable). - Body: structured Markdown — summary table, collapsible per-item details, links to upstream sources, instructions for reviewers.
- Labels:
auto-generated,upstream:<stream>, plus risk labels (risk:low|medium|high) computed from diff size. - CODEOWNERS auto-request: derived from path +
.github/upstream-owners.yml. - Bot identity: a dedicated GitHub App (preferred) or
github-actions[bot]withpull-requests: write,contents: writeon a protected branch namespaceupstream/*— and no push permission tomain. - Conflict policy: if a human pushed commits to the upstream branch since last run, the bot does NOT force-overwrite — it opens a follow-up PR with the new upstream change, against the existing branch, and lets the human merge/rebase. Never lose human work silently.
- Every upstream PR runs the full
validate.ymlmatrix (§22). No special fast-path — these PRs are reviewed like any human PR.
- Microsoft Sentinel incidents, alerts, entities (operational, not config).
- Workspace-level settings (UEBA, EntityAnalytics, diagnostic settings — separate IaC repo).
- Connector credentials rotated upstream (handled via secret rotation, not Git).
- Defender XDR live machine state (devices, users) — not config.
Big systems fail at integration seams. This table is the contract that CI
enforces; each row is a dedicated linter under contentops/lint/.
| # | Invariant | Source | Target | Enforced in | Severity |
|---|---|---|---|---|---|
| 1 | Every _GetWatchlist("X") in any KQL resolves to a content/watchlists/X/ directory |
analytics + detections + hunting KQL | content/watchlists/ |
lint-watchlist-refs |
error |
| 2 | Every table referenced in KQL exists in the target workspace (built-in or from connector) | KQL | LA built-in tables ∪ tables provided by enabled conns | lint-table-refs |
warn |
| 3 | Every playbookId in an automation rule resolves to a content/playbooks/ entry |
automation YAML | content/playbooks/ |
lint-playbook-refs |
error |
| 4 | Solution-provided rule that is locally customized declares matching alertRuleTemplateName |
analytics YAML | solutions.yml |
lint-solution-overrides |
warn |
| 5 | Analytics rule's alertRuleTemplateName + templateVersion exists upstream |
analytics YAML | live alertRuleTemplates |
plan step |
warn |
| 6 | Defender displayName unique across all detection files |
detections YAML | each other | already (v1) | error |
| 7 | Workbook sourceId resolves to the target env's workspace |
workbook YAML | environments/*.yml |
injected at plan time | error |
| 8 | Required connector enabled before a detection that depends on it (requires: field) |
detection / analytics YAML | connectors.yml |
lint-connector-deps |
error |
| 9 | A production rule does not reference an experimental/test playbook |
rule status | playbook status | lint-status-coherence |
error |
| 10 | Automation rule order unique within workspace |
automation YAML | each other | already (§5.5) | error |
| 11 | id slug regex consistent across all asset types |
every YAML | regex | envelope (§4) | error |
| 12 | Pipeline version pinned in workflows matches pyproject.toml |
.github/workflows/* |
pyproject.toml |
lint-self-version |
error |
| 13 | Every committed workbook.json parses + matches its declared version |
workbook JSON | manifest YAML | lint-workbook-shape |
error |
| 14 | Every YAML file referenced by references: URL pattern is reachable (HEAD 200) |
envelope references |
the internet | lint-references (advisory) |
warn |
| 15 | Watchlist CSV columns include the declared itemsSearchKey |
watchlist YAML + CSV | each other | lint-watchlist-shape |
error |
| 16 | No two analytics rules share displayName per workspace (Sentinel UI assumes uniqueness) |
analytics YAML | each other | lint-displayname-unique |
warn |
| 17 | Hunting query name (ARM segment) collisions resolved via deterministic hash |
hunting YAML | derivation rule (§5.2) | computed at build | error |
Linter output is GitHub-annotation-formatted (::error file=,line=::) so
findings render inline on the PR diff.
Every PR runs all of this. Matrices fan out per asset type and per env so failures are localized. Required vs. advisory is called out per check.
| Job | What | Required to merge? |
|---|---|---|
plan-dev |
Plan against dev. Comments diff on PR. | advisory |
plan-test |
Plan against test. Comments diff on PR. | advisory |
plan-prod |
Plan against prod. Comments diff on PR. | required |
try-sandbox |
Apply to throwaway RG rg-sentinel-pr-${{ pr_number }}; teardown on PR close. Opt-in via label try:sandbox. |
advisory |
A red plan-prod blocks merge — that's the safety net for "this PR would
break prod even if it looks fine in dev".
| Job | What | Required |
|---|---|---|
build-yaml |
Schema validation (Pydantic) of every YAML. | req |
build-arm |
az deployment group validate for every workbook + playbook against sandbox RG. |
req |
build-kql |
Kusto.Language static parse of every KQL block. | req |
build-models |
Re-run code generators from committed schemas; fail if git status non-empty (ensures regen-up-to-date). |
req |
build-docs |
Regenerate content/INDEX.md (catalog, MITRE coverage). Fail on drift. |
advisory |
build-bicep |
If any file under content/playbooks/**/*.bicep, transpile with bicep build and validate. |
conditional |
| Job | What | Where it runs |
|---|---|---|
test-unit |
pytest matrix Python 3.12 / 3.13. | every PR |
test-contract |
respx-replay HTTP fixtures per provider; fail if recorded responses contradict snapshots. | every PR |
test-integration |
Real plan + apply + GET assert + prune against sandbox tenant. | nightly + on prod-tag |
test-smoke |
Post-deploy: GET 5 random items per asset type; assert exists + enabled-as-expected. | inside deploy-*.yml |
test-rollback |
Apply, then re-apply previous SHA; assert state matches. | nightly |
test-permissions |
New SP with documented role set is sufficient to run a full plan+apply. | nightly |
| Job | What | Required |
|---|---|---|
review-codeowners |
Native GitHub: .github/CODEOWNERS requires the right team(s) per touched path. |
req |
review-plan-comment |
Bot posts plan summary table on PR; updates on each push. | bot |
review-diff-render |
For workbooks: render before/after JSON to PNG via headless workbook renderer; attach to PR. | advisory |
review-kql-render |
For KQL: side-by-side syntax-highlighted diff in PR comment. | advisory |
review-policy |
OPA / Conftest policies (e.g., "Severity High ⇒ createIncident=true"; "no * in customDetails"). |
req |
review-checklist |
PR template enforces analyst checklist (tested in dev, MITRE mapping, FP rate measured, runbook link). | req |
review-ai |
Optional Copilot/code-review agent posts inline suggestions. | advisory |
Required status checks (must all be green to merge):
build-yaml,build-arm,build-kql,build-models,build-bicep- all linters from §20 (errors only)
test-unit,test-contractplan-prodreview-codeowners,review-policy,review-checklist
Plus: linear history, no force-push, no admin bypass, dismiss stale reviews on push, signed commits required.
A .pre-commit-config.yaml runs the cheap subset locally:
yaml-lint, json-lint, KQL formatter, schema regen check,
linters #1, #3, #6, #11, #15 from §20. This is the same code paths CI
uses (no duplication), invoked via python -m contentops lint --fast.
To make §19 concrete, here is the end-to-end flow for an upstream template update (stream #3):
- Tue 02:00 UTC —
templates-catalog.ymlruns on schedule (workflow_dispatchalso enabled). - Pipeline lists
alertRuleTemplatesfrom each managed workspace; finds templateBruteForceAttackadvanced from1.4.0(what we have) to2.1.0(upstream). - Pipeline computes structural diff (KQL, severity, tactics) between the
two template versions and against our local overrides in
content/analytics/aad-brute-force-001.yml. - Pipeline checks branch
upstream/templates/aad-brute-force-001.- If the branch exists and its tree hash equals the freshly-computed hash → no-op (already proposed, awaiting review).
- If the branch exists with a different hash and no human commits
since the last bot push → force-update with
--force-with-lease. - If the branch exists with human commits → open a follow-up PR against the existing branch with the new upstream delta. Never overwrite human work.
- Pipeline opens / updates the PR
[upstream:templates] aad-brute-force-001 1.4.0→2.1.0with:- structured diff table,
- rendered KQL side-by-side,
- link to the upstream template + Microsoft changelog,
- labels
auto-generated,upstream:templates,risk:medium, - CODEOWNERS auto-request
@org/soc-analytics.
- PR triggers the full §21 matrix:
build-yaml✓ — envelope still valid.build-kql✓ — new KQL parses.lint-table-refs(warn) — new template referencesSecurityEventwhich our connector enumeration shows is enabled. Pass.plan-devposts: "1 analytic rule will UPDATE; threshold 5→3, query body changed by 14 lines." ✓plan-prod✓ (read-only).review-policy✓ — Severity unchanged (High), createIncident still true.
- Analyst reviews, possibly tweaks the threshold (their FP-rate data shows 3 is too noisy; bumps to 4) and pushes a commit to the bot's branch.
- On the next Tuesday run, pipeline detects upstream still at 2.1.0 → diff unchanged → no-op (the human commit is preserved).
- Analyst merges →
deploy-dev.ymlapplies → SOC validates over the week → tagsv2026.04.30-rc.1→promote-test.yml(env approval) → tagsv2026.04.30→promote-prod.yml(env approval, 2 reviewers). - State file (§13) records
aad-brute-force-001last-applied template version is now2.1.0. Nexttemplates-catalog.ymlrun sees no diff.
Same loop, parameterized, runs for solutions, schemas, tables, drift, and community streams. The pipeline opens PRs; humans merge. Always.
- One repo or many? I assume one repo (
SIEMContent) covering all asset types and all environments. Alternative: one repo per asset type (siem-analytics,siem-playbooks). One-repo is simpler for promotion and CODEOWNERS; many-repo isolates blast radius and makes per-team ownership cleaner. Default: one repo. - One Defender XDR tenant or many? v2 assumes one. Multi-tenant
Defender requires a foreach over tenants and per-tenant App Regs (Graph
has no cross-tenant app permissions for
CustomDetection.ReadWrite.All). - Sentinel Repositories side-by-side? Some teams may want to keep workbooks/playbooks in Sentinel Repositories (because the native flow is simpler) and only this pipeline for analytics + Defender. This is fine and possibly the pragmatic answer.
- State file: where? Orphan Git branch (auditable but noisy), Azure Storage blob (clean but extra resource), or no state at all (rely on collect-as-truth). I lean Azure Storage blob per env.
- Versioning of the platform itself: this repo also hosts the Python pipeline. Pin pipeline version per env, or always-latest? Recommended: pipeline is versioned with the repo; workflows install from the same commit they're triggered by — one fewer thing to drift.