Skip to content

Day-2 operations framework: TTL reaper, monitoring skill, and runbook generation #23

@arnaudlh

Description

@arnaudlh

Description

The manifesto describes AI in operations: "incident detection, root cause analysis, and remediation suggestions." Git-ape currently stops at post-deployment health checks. This issue establishes Day-2 operational foundations.

Related: #18 (drift detection — another Day-2 capability)

Scope

  1. TTL Reaper workflowgit-ape-ttl-reaper.yml (or gh-aw agentic workflow) that checks deployment TTL (set in metadata.json) and auto-destroys expired resources after notification.
  2. Monitoring setup — During deployment, auto-configure Azure Monitor alerts for key metrics (availability, errors, latency).
  3. Post-deploy monitoring skill/azure-monitor-checker that queries Azure Monitor for resource health status. Enables @git-ape status <deployment-id>.
  4. Runbook generation — Auto-generate operational runbooks from deployment architecture (what to check, how to restart, escalation paths).
  5. Azure SRE Agent compatibility — Ensure deployment artifacts (architecture diagrams, runbooks) are consumable by Azure SRE Agent.

Acceptance Criteria

  • TTL reaper workflow auto-destroys expired deployments.
  • Deployments include Azure Monitor alert configurations.
  • @git-ape status <deployment-id> shows resource health.
  • Operational runbooks generated after deployment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions