Summary
The current destroy flow in .github/workflows/git-ape-destroy.exampleyml primarily deletes a single resource group (az group delete) plus a narrow sweep for subscription-scope Microsoft.Authorization/* and Microsoft.Authorization/policyAssignments resources discovered via az deployment operation sub list.
This works for the single-RG Key Vault template we ship today, but it is not idempotent once a deployment spans more than one resource group, creates subscription/MG-scope resources via nested deployments, or creates soft-deletable services (Key Vault, APIM, Log Analytics, App Configuration, Cognitive Services, Recovery Services, ML workspace, …).
Observed concretely after running @git-ape destroy deployment deploy-20260423-092136 (single-RG Key Vault with purge protection): the RG is gone but the Key Vault remains soft-deleted at subscription scope for 90 days and cannot be purged (purge protection enabled). Re-running the exact same template will fail with VaultAlreadyExists until retention expires — destroy + redeploy is not idempotent.
Orphan categories a "delete the RG" strategy can leave behind
| # |
Category |
Example |
| 1 |
Soft-deleted data services |
Key Vault, APIM, Cognitive Services, App Configuration, Log Analytics workspace, Recovery Services vault, ML workspace |
| 2 |
Purge-protected resources |
Key Vault with enablePurgeProtection: true |
| 3 |
Multiple resource groups |
Template creates rg-app + rg-data — only one is tracked in state.resourceGroup |
| 4 |
Subscription-scope role assignments created via nested deployments |
Not always enumerable through az deployment operation sub list |
| 5 |
Subscription-scope policy assignments / definitions / exemptions |
Same as above |
| 6 |
Management-group-scope resources |
Custom policies, role assignments at MG scope |
| 7 |
Cross-RG resources from nested deployments |
VNet peering in a hub RG, DNS record in a shared DNS RG, secret in a shared KV |
| 8 |
Cross-subscription nested deployments |
Destroy runs against one subscription only |
| 9 |
Tenant / Entra ID objects |
App registrations, directory groups |
| 10 |
Backup protected items / recovery points in cross-RG Recovery Services vaults |
Survive source-RG delete |
| 11 |
Subscription-level diagnostic settings |
microsoft.insights/diagnosticSettings at sub scope |
| 12 |
Subscription budgets & cost alerts |
Microsoft.Consumption/budgets |
| 13 |
Resource locks |
Don't orphan but block delete and leave partial state |
| 14 |
Remote-side references |
Approved Private Endpoint connections on a shared service, remote VNet peerings, DNS records in shared zones |
| 15 |
Subscription deployment-history entries |
Accumulate toward the 800/scope limit |
Proposed approach — two layers
Layer A — Azure Deployment Stacks (primary, for new deployments)
Deployment Stacks natively track every resource in a deployment regardless of scope.
- Replace
az deployment sub create with az stack sub create --action-on-unmanage deleteAll --deny-settings-mode denyDelete in git-ape-deploy.exampleyml.
- Stack name = deployment id; store it in
state.stackId.
- Destroy becomes a single
az stack sub delete --action-on-unmanage deleteAll, covering multi-RG, sub-scope, and MG-scope uniformly.
- Remaining gaps to handle explicitly: soft-delete purge (1, 2) and remote-side references (14) — stacks don't handle either.
Layer B — State-driven fallback (retrofits existing + legacy deployments)
For pre-stack deployments and cases where stacks can't be used:
- Capture-at-deploy: walk the deployment-operation graph recursively (root + every nested op) and emit a flat list of every
targetResource.id into state.managedResources[] with {id, type, scope, apiVersion, softDeletable, purgeProtected}. Also populate state.resourceGroups[], state.subscriptions[], state.externalReferences[], state.stackId (nullable).
- Destroy algorithm (idempotent):
- If
stackId present → az stack sub delete; skip to step 7.
- Topologically sort
managedResources[] (locks → role/policy assignments → children → parents → RGs).
- For each resource:
az resource show → if 404 mark already-gone; else delete; retry transient.
- For each RG in
resourceGroups[]: az group delete --yes.
- For each
softDeletable[] entry: list soft-deleted → purge if purgeProtected=false, else record retained-soft-deleted with expiry date.
- Probe
externalReferences[] for remote-side leftovers (stale PE connections, peerings, DNS records).
- Delete subscription deployment-history entry for the deployment.
- Write terminal status with per-resource outcome. Re-runs converge to the same end state.
Proposed schema changes
Extend state.json:
Extend metadata.json: resourceGroup (string) → resourceGroups (array). Add scope to allow subscription | managementGroup.
Add new status values to docs/DEPLOYMENT_STATE.md: retained-soft-deleted, partially-destroyed.
Implementation phases
- Phase 1 — Schema & state capture: extend
state.json / metadata.json, update DEPLOYMENT_STATE.md, update azure-template-generator.agent.md / deploy agent to walk deployment operations after deploy and populate managedResources[].
- Phase 2 — Deployment Stacks integration: add
deployMethod toggle (default stack) in requirements gathering; stack create in git-ape-deploy.exampleyml; stack-delete branch in git-ape-destroy.exampleyml.
- Phase 3 — Fallback hardening: extract destroy logic into
.github/scripts/destroy.sh (or .ps1) implementing the idempotent algorithm above; add soft-delete purge loop + remote-reference probe.
- Phase 4 — Validation: fixture deployment with 2 RGs + purge-protected KV + sub-scope role assignment + cross-RG reference; destroy → re-run destroy (must be
already-destroyed); stack-vs-fallback parity; soft-delete replay (redeploy succeeds once retention allows).
Out of scope
- Entra ID / app-registration cleanup (requires Graph permissions; separate issue).
- Data-plane cleanup (KV secrets, blob contents — gone with control plane).
- Management-group-scope deployments (noted but deferred).
Open questions for discussion
- Stacks opt-in or default? Recommend
stack as the default for new deployments, keeping sub-deployment as an explicit fallback. Stacks are GA.
- Auto-purge non-protected soft-deleted resources? Recommend yes on destroy (never purge protected); surface both in the summary. Alternative: require an explicit
--purge-soft-deleted flag.
- Clean up deployment-history entries after destroy? Recommend yes (to stay well below the 800/scope cap).
- Scope of this work: single issue or should each phase be split into its own issue once we align on direction?
Reproduction
- Deploy the included Key Vault + private endpoint template (
.azure/deployments/deploy-20260423-092136).
- Run
@git-ape destroy deployment deploy-20260423-092136.
- Observe: RG is deleted; Key Vault remains soft-deleted at subscription scope; purge protection prevents purge; redeploying with the same name fails until retention expires.
Happy to open a draft PR for Phase 1 (schema + capture) as the foundation once we align on the two-layer direction.
Summary
The current destroy flow in
.github/workflows/git-ape-destroy.exampleymlprimarily deletes a single resource group (az group delete) plus a narrow sweep for subscription-scopeMicrosoft.Authorization/*andMicrosoft.Authorization/policyAssignmentsresources discovered viaaz deployment operation sub list.This works for the single-RG Key Vault template we ship today, but it is not idempotent once a deployment spans more than one resource group, creates subscription/MG-scope resources via nested deployments, or creates soft-deletable services (Key Vault, APIM, Log Analytics, App Configuration, Cognitive Services, Recovery Services, ML workspace, …).
Observed concretely after running
@git-ape destroy deployment deploy-20260423-092136(single-RG Key Vault with purge protection): the RG is gone but the Key Vault remains soft-deleted at subscription scope for 90 days and cannot be purged (purge protection enabled). Re-running the exact same template will fail withVaultAlreadyExistsuntil retention expires — destroy + redeploy is not idempotent.Orphan categories a "delete the RG" strategy can leave behind
enablePurgeProtection: truerg-app+rg-data— only one is tracked instate.resourceGroupaz deployment operation sub listmicrosoft.insights/diagnosticSettingsat sub scopeMicrosoft.Consumption/budgetsProposed approach — two layers
Layer A — Azure Deployment Stacks (primary, for new deployments)
Deployment Stacks natively track every resource in a deployment regardless of scope.
az deployment sub createwithaz stack sub create --action-on-unmanage deleteAll --deny-settings-mode denyDeleteingit-ape-deploy.exampleyml.state.stackId.az stack sub delete --action-on-unmanage deleteAll, covering multi-RG, sub-scope, and MG-scope uniformly.Layer B — State-driven fallback (retrofits existing + legacy deployments)
For pre-stack deployments and cases where stacks can't be used:
targetResource.idintostate.managedResources[]with{id, type, scope, apiVersion, softDeletable, purgeProtected}. Also populatestate.resourceGroups[],state.subscriptions[],state.externalReferences[],state.stackId(nullable).stackIdpresent →az stack sub delete; skip to step 7.managedResources[](locks → role/policy assignments → children → parents → RGs).az resource show→ if 404 markalready-gone; else delete; retry transient.resourceGroups[]:az group delete --yes.softDeletable[]entry: list soft-deleted → purge ifpurgeProtected=false, else recordretained-soft-deletedwith expiry date.externalReferences[]for remote-side leftovers (stale PE connections, peerings, DNS records).Proposed schema changes
Extend
state.json:{ "stackId": "string | null", "managedResources": [ { "id": "/subscriptions/.../Microsoft.KeyVault/vaults/foo", "type": "Microsoft.KeyVault/vaults", "scope": "resourceGroup", "apiVersion": "2024-11-01", "softDeletable": true, "purgeProtected": true } ], "resourceGroups": ["rg-app", "rg-data"], "subscriptions": ["<subId>"], "externalReferences": [ { "kind": "privateEndpointConnection", "targetResourceId": "..." } ] }Extend
metadata.json:resourceGroup(string) →resourceGroups(array). Addscopeto allowsubscription | managementGroup.Add new status values to
docs/DEPLOYMENT_STATE.md:retained-soft-deleted,partially-destroyed.Implementation phases
state.json/metadata.json, updateDEPLOYMENT_STATE.md, updateazure-template-generator.agent.md/ deploy agent to walk deployment operations after deploy and populatemanagedResources[].deployMethodtoggle (defaultstack) in requirements gathering; stack create ingit-ape-deploy.exampleyml; stack-delete branch ingit-ape-destroy.exampleyml..github/scripts/destroy.sh(or.ps1) implementing the idempotent algorithm above; add soft-delete purge loop + remote-reference probe.already-destroyed); stack-vs-fallback parity; soft-delete replay (redeploy succeeds once retention allows).Out of scope
Open questions for discussion
stackas the default for new deployments, keepingsub-deploymentas an explicit fallback. Stacks are GA.--purge-soft-deletedflag.Reproduction
.azure/deployments/deploy-20260423-092136).@git-ape destroy deployment deploy-20260423-092136.Happy to open a draft PR for Phase 1 (schema + capture) as the foundation once we align on the two-layer direction.