Audience: anyone hitting an error and looking for the fix. Junior or senior; first-week or veteran. Each entry tells you (a) what the error looks like, (b) why it happens, (c) the exact fix.
Index:
- Authentication errors
- Configuration errors
- Lint + validate errors
- Apply errors
- Drift + collect errors
- CI gate failures
- Fork PR limitations
- Workspace + tenant data errors
For known issues with no available fix yet, see
docs/reference/gap-assessment.md.
For full audit-chain repair, see
docs/operations/audit-recovery.md.
Looks like:
EnvironmentCredential: Authentication failed: AADSTS7000215:
Invalid client secret provided. Ensure the secret being sent in
the request is the client secret value, not the client secret ID,
for a secret added to app '<guid>'.
Why: the most common Entra ID papercut. In the Azure Portal under App Registrations → your app → Certificates & secrets, the table shows two columns that both look copyable:
| Column | What it is | Goes in .env? |
|---|---|---|
| Value | The actual secret (a random ~40-char string) | ✅ Yes — this is AZURE_CLIENT_SECRET |
| Secret ID | A GUID identifying the secret entry | ❌ No |
The Value is shown once at creation. If you clicked away, it's gone — you have to create a new client secret.
Fix:
- Azure Portal → App Registrations → your app → Certificates & secrets.
- New client secret → name + expiry.
- Immediately copy the Value column (not Secret ID).
- Paste into
.envasAZURE_CLIENT_SECRET=<value>. - Re-run
contentops doctor --matrix.
Verify it loaded:
$s = $env:AZURE_CLIENT_SECRET
"len=$($s.Length), prefix=$($s.Substring(0,3))..."A real Value is ~40 characters. A Secret ID is exactly 36 (GUID
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) — if your length is 36
that's the smoking gun.
Don't paste the actual secret in chat / issue trackers / PRs.
Looks like (in CI logs):
AADSTS700213: No matching federated identity record found for
presented assertion subject 'repo:ORG/REPO:environment:production'.
Why: the App Registration's federated credential subject
doesn't match the GitHub OIDC token's subject. GitHub mints the
token with subject repo:<org>/<repo>:environment:<env-name> (or
...:ref:refs/heads/main, or ...:pull_request); your App Reg
must have a federated credential entry that exactly matches.
Fix:
- Azure Portal → App Registrations → your app → Certificates & secrets → Federated credentials.
- Add credential → "GitHub Actions deploying Azure resources".
- Fill in: org, repo, "Environment" (or "Branch" / "Pull request" depending on which workflow triggered the failure).
- Save. The subject is auto-derived from your inputs — it should match the error message verbatim.
- Re-run the workflow.
Common pitfalls: typos in org/repo, wrong environment name (case
matters), missing entries for both environment:production AND
ref:refs/heads/main (workflows that fire on both triggers need
both subjects).
Looks like (in contentops doctor --matrix):
[FAIL] workspace_reachable — GET alertRules returned 401 —
token rejected as unauthenticated. Check `az account show` tenant
context matches tenant.yml.
Why: the token was acquired successfully but ARM/LA rejected
it. Either (a) the identity is from a different Entra tenant than
the workspace, or (b) DefaultAzureCredential returned a stale
cached identity (VSCode account, SharedTokenCache) instead of the
one you authenticated with.
Fix:
- Check the tenant context:
az account show --query tenantId # compare against config/tenant.yml `tenantId:` field
- If they match but the error persists, force the dev-credential
chain:
$env:AZURE_TOKEN_CREDENTIALS = "dev" contentops doctor --matrix
- If still failing,
az logoutandaz login --tenant <id>explicitly.
Looks like:
[FAIL] workspace_reachable — GET alertRules returned 403 —
authenticated but lacks RBAC on this workspace.
Why: the identity is reaching the workspace correctly but doesn't have permission to read alert rules.
Fix: grant Microsoft Sentinel Contributor on the workspace's
resource group to whichever identity is active:
- Path A (
az loginas user): grant to your user account. - Path B (
.envwith App Reg secret): grant to the App Reg's service principal.
# Path B example
az role assignment create `
--role "Microsoft Sentinel Contributor" `
--assignee "<App-Reg-objectId-or-clientId>" `
--scope "/subscriptions/<sub>/resourceGroups/<rg>"Wait 1–2 minutes for RBAC to propagate, then re-run doctor.
Looks like:
GET /security/rules/detectionRules returned 403
Why: the App Registration lacks the CustomDetection.Read.All
(or ReadWrite.All) Graph application permission, or admin consent
hasn't been granted.
Fix:
- Azure Portal → App Registrations → your app → API permissions.
- Add a permission → Microsoft Graph → Application
permissions → search "CustomDetection" → check
CustomDetection.ReadWrite.All. - Click Grant admin consent for .
- Wait 1–2 minutes; re-run doctor.
For the navigator / auto-disabled-rules paths that also hit the
Log Analytics Query API, Microsoft Sentinel Reader (or
Contributor) on the workspace is enough — those use a different
token audience (api.loganalytics.io), not Graph.
Looks like:
error: Tenant has 2 Sentinel workspaces; specify --role or
--workspace. Available: [law-sentinel (prod), SIT-Workspace
(integration)]
Why: your config/tenant.yml lists multiple workspaces; the
command needs to know which one to target.
Fix: pass --role or --workspace:
contentops apply --role prod
contentops apply --workspace SIT-WorkspaceFor commands that should iterate every workspace of a role (e.g.
deploy.yml), the workflow already passes --role prod —
single-workspace local runs are the only place you usually hit
this.
Looks like:
[FAIL] detections_dir — not a directory: detections
[FAIL] detections_parse — detections/ missing
Why: the doctor check is cwd-relative. You're running from
somewhere other than the repo root (commonly the config/
subdirectory).
Fix:
cd <repo-root> # e.g. cd C:\git\SIEMContent
contentops doctor --matrixBoth FAILs become PASSes.
Why: you ran contentops collect --path detections from
config/. The --path argument is cwd-relative, so collect
created config/detections/. The real corpus is at the repo root,
not under config/.
Fix:
cd <repo-root>
Remove-Item -Recurse -Force .\config\detections\
contentops collect --role prod # uses default --path detectionsRun all CLI commands from the repo root. The only files inside
config/should betenant.yml,lifecycle.yml,kql_lint_allowlist.yml, etc. — nodetections/.
Why: config/tenant.yml is gitignored on purpose; it carries
your tenant identifiers.
Fix: copy the example and edit:
Copy-Item config\tenant.yml.example config\tenant.yml
# Edit: subscriptionId, resourceGroup, workspaceName, tenantIdSee
docs/operations/tenant-config-modes.md
for the supported modes (single-workspace, multi-workspace,
Defender-only).
Looks like:
Lint summary: 152 files scanned, 152 with findings, 1176 finding(s) total.
META rules in strict mode (tenant.policy.scaffoldStrict=true).
608 finding(s) at-or-above severity 'error'.
Why: you set policy.scaffoldStrict: true (explicitly or
through an older example file). META002–005 — the authoring fields
description / attackDescription / references /
falsePositives — are CI-blocking in strict mode. The G24
authoring backlog (51 production rules without these fields)
hasn't been written yet.
Fix (default, lenient): since PR #241 the default is lenient. Set the policy block to false (or remove it entirely):
# config/tenant.yml
policy:
scaffoldStrict: false # or omit the policy block; both behave the sameRe-run: META002–005 become warnings, exit 0.
Fix (you actually want strict): keep scaffoldStrict: true and
fill in the missing fields. Each envelope needs four paragraphs of
content (description, attackDescription, references[],
falsePositives[]); see
docs/reference/envelope-schema.md
for the schema.
Looks like (in CI):
detections/sentinel_analytic/my-rule.yml:
before: 0.1.0
after: 0.1.0
diff: present
error: file changed without a version bump
Why: the scripts/check_version_bump.py CI gate refuses any
PR that changes an envelope's content without bumping the
version: field — silent overwrites would defeat the audit trail.
Fix: bump the version in your envelope:
# detections/sentinel_analytic/my-rule.yml
id: my-rule
version: 0.1.1 # was 0.1.0Whitespace-only diffs still trip the check — that's intentional (if the diff doesn't matter, revert the cosmetic edit; otherwise bump).
Looks like (in CI):
catalog drift: committed docs/reference/generated-catalog.md
disagrees with the regenerated output.
Why: you added a new CLI command, lint rule, workflow, or
handler without re-running contentops catalog regenerate. The
generated catalog is drift-gated to prevent stale references in
docs.
Fix:
contentops catalog regenerate
git add docs/reference/generated-catalog.md
git commit --amend --no-edit --signoff # or new commit
git push --force-with-leaseLooks like:
docs/detections/sentinel_analytic/my-rule.md is out of sync with
the envelope.
Why: you changed an envelope without re-running
contentops detection-docs regenerate. Same drift-gate pattern as
the catalog above.
Fix:
contentops detection-docs regenerate
git add docs/detections/
git commit -m "chore(docs): regenerate detection docs" --signoff
git pushLooks like (in CI pytest output):
FAILED tests/e2e/test_capability_drift_guard.py::test_registry_matches_click_tree
New CLI commands found without an e2e capability entry:
my-new-command
Why: you added a Click command but didn't register it in the
e2e capability matrix at tests/e2e/_capabilities.py. The matrix
exists to keep new commands from silently bypassing the live
integration test suite.
Fix: add the command to INTENTIONALLY_UNCOVERED (with a
justification referencing where it's tested instead) OR add a
Capability(...) entry to CAPABILITIES and append the path to
COVERED_LEAVES. See existing entries in the file for the shape.
Looks like:
ERROR contentops.handlers.sentinel_analytic: Failed to deploy
sentinel rule aa-azure-vm-...: 400 {"error":{"code":"BadRequest",
"message":"Failed to run the analytics rule query. One of the
tables does not exist."}}
Why: the rule's KQL references a table that's available in one workspace (e.g. prod) but not in another (e.g. integration). Common when a data connector is enabled in prod but not in integration.
Fix (three options):
- Enable the connector on the workspace that's missing it (e.g. AzureActivity, SecurityEvent). Permanent fix.
- Mark the rule
status: experimentallocally so integration skips it (env-status gate). Prod keeps deploying it. - Leave it. The rule's existing tenant-side state isn't
touched by the failed update — drift will report it as
in-syncagainst the OLD payload. Acceptable for known-bad single failures; track in your runbook.
Looks like:
my-detection defender_custom_detection update success MISMATCH
Why: apply succeeded (200 OK on PUT) but the round-trip hash check failed. The local envelope and the immediate post-PUT GET produce different content hashes — usually because Defender's server adds or normalises a field (timestamps, computed IDs).
Fix — diagnose first:
contentops defender-roundtrip-diff <envelope_id>This shows you exactly which fields differ. If the difference is a
server-managed field that should be stripped, file an issue
against contentops/handlers/defender_custom_detection.py's
_SERVER_FIELDS set. The --raw flag skips the strip step so you
can see new server-managed fields the codebase doesn't know about
yet.
Looks like:
[summary table — every row PASS except one]
1 error(s).
Why: the apply summary at the end is per-row; the actual error
text was printed inline during the apply loop. Scroll up in the
output to find the ERROR contentops.handlers... line for the
failed row.
Fix: if you missed it, the audit log has it:
contentops audit query failures --since 1hWhy: prune only flags rules in the tenant that don't
appear in your local YAML. If your local detections/ already
matches the tenant (e.g. you just ran collect), there's nothing to
prune.
If you wanted to clear the tenant of rules NOT in detections/,
that's what prune does. If you wanted the opposite — clear local
YAMLs that aren't in the tenant — that's contentops clean or
manual rm.
Why: collect returns "in-sync" when the local envelope's
hash matches the server's. If a portal user just edited a rule and
your local copy was already correct (matching the post-edit
state), no change shows.
Fix to force a refresh:
contentops clean --asset sentinel_analytic --yes
contentops collect --role prod --asset sentinel_analytic --full--clear does both in one step:
contentops collect --role prod --clearLooks like:
./detections/sentinel_analytic/my-rule.yml:42: tehcnique ==> technique
Why: typo in author-controlled prose. (If it's a real domain term, see below.)
Fix (real typo): fix the spelling in the YAML and recommit.
Fix (false positive — legitimate domain term): add it to
ignore-words-list in .codespellrc:
ignore-words-list = iif,te,ans,fpr,...,your-new-term-hereLooks like:
broken: 2 of 47 URL(s)
https://blog.example.com/old-post
HTTP 404
in detections/sentinel_analytic/my-rule.yml
Why: a URL in metadata.references[] or metadata.runbookUrl
returned 4xx/5xx. The PR-time check (validate.yml) flags only
URLs newly added in the diff; the weekly full scan
(references-check.yml) catches URL rot over time.
Fix: update or remove the broken URL. If it's a transient CDN
issue (e.g. login.microsoftonline.com sometimes redirects oddly),
add the substring to the workflow's --allow list.
If you opened a PR from a fork (not a branch in KustoKing/SIEMContent
itself), some CI checks will skip or render degraded output. This is
intentional — GitHub doesn't mint OIDC tokens for fork PRs, so
we can't trust them with tenant credentials.
| Check | Behaviour on fork PR | Why |
|---|---|---|
drift-pr (informational drift comment) |
Skipped | OIDC unavailable; cannot query the tenant. |
tuning-impact-preview |
Posts comment with - for counts |
OIDC unavailable; --no-workspace-query mode used. |
Pre-PR schema refresh in validate.yml |
Falls through to committed baseline | continue-on-error: true; lint still runs against the existing schema. |
plan --against-tenant overlay |
Not exercised in CI on fork PRs | Same reason. |
What still works on fork PRs:
- All YAML / Python / metadata lint
- Pytest, SAST (bandit + semgrep), DCO, SPDX
- Spelling check, references URL check (outbound HTTP only)
- Structural
validate.ymlplan (no API calls)
If you need full signal: push your branch into the base repo (any maintainer can help with this), reopen the PR there.
Why: SentinelHealth is an opt-in diagnostic data collection
on the workspace (opt-in since approximately 2022). If it's not
turned on, the table simply has no rows, even though your alert
rules are running normally.
Verify the diagnostic:
contentops doctor --matrix
# Look for [WARN] sentinel_health — SentinelHealth returned 0 rows...Fix: enable the diagnostic per https://learn.microsoft.com/en-us/azure/sentinel/health-audit. Azure Portal → Sentinel → Settings → Diagnostic settings → enable "SentinelHealth". Wait 15–30 minutes for the first rows to appear.
Once enabled, contentops auto-disabled-rules will surface real
data instead of empty results.
Looks like:
env-status filter (gate=integration): 46 asset(s) skipped (allowed:
['deprecated', 'production', 'test'])
- detection-of-attempts-to-disable-microsoft-defender (status=production [defender:prod-only])
...
Why: Defender XDR is tenant-wide — there's no
"integration" instance of it like there is for Sentinel. The apply
gate marks every defender_* envelope as defender:prod-only so
they only deploy on --role prod runs. By design, not a bug.
Fix: none needed. To deploy a Defender custom detection, use
contentops apply --role prod (or merge to main and let
deploy.yml run).
Why: Sentinel rules can take 30–60 seconds to surface in the
portal after a successful PUT (caching layer). The audit record
will say success immediately; the portal UI lags.
Fix: wait, then refresh the portal. If they still don't appear after 5 minutes:
contentops drift --role integration
# Look for 'new' entries — that's what apply created.
# If drift shows 0 new + 0 changed but the portal is empty,
# you've hit a tenant-side caching issue. Try the Azure Sentinel
# REST API directly:
az rest --method GET --url "https://management.azure.com/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.OperationalInsights/workspaces/<ws>/providers/Microsoft.SecurityInsights/alertRules?api-version=2025-07-01-preview"- Audit chain integrity: see
docs/operations/audit-recovery.md - Multi-workspace setup: see
docs/operations/multi-workspace.md - Authentication paths (A vs B): see
docs/operations/authentication-setup.md - The when-things-break decision tree (deep ops scenarios): see
docs/OPERATOR_GUIDE.md
If your error isn't above, open an issue with: command run, full
output (redact secrets!), git rev-parse HEAD, and contentops doctor --matrix output.