Operational runbooks for SQL Server database administration. Step-by-step procedures for incident response, disaster recovery, backup/recovery, maintenance, performance troubleshooting, and standardized templates.
- database-incident-response.md - General incident response framework with severity classification
- RUNBOOK-DatabaseOffline.md - Database in SUSPECT, RECOVERY_PENDING, or OFFLINE state
- RUNBOOK-DiskSpaceCritical.md - Disk space triage: identify consumers, emergency recovery, root cause
- RUNBOOK-HighCPU.md - CPU diagnosis: query identification, plan analysis, immediate mitigation
- RUNBOOK-BlockingChain.md - Head blocker identification, kill decision criteria, root cause analysis
- RUNBOOK-CorruptionDetected.md - DBCC CHECKDB errors: severity assessment, page restore, emergency repair
- RUNBOOK-FullDRFailover.md - Complete DR failover: database activation, DNS updates, app validation, failback
- RUNBOOK-DRTest.md - Quarterly DR test procedure with RTO/RPO measurement and sign-off
- DR-ContactSheet.md - Emergency contacts, escalation matrix, network info template
- backup-recovery-procedure.md - Full, differential, and log backup procedures with point-in-time and page-level restore
- patching-checklist.md - Pre-patch, patching, and post-patch verification for SQL Server cumulative updates
- performance-troubleshooting.md - Systematic bottleneck diagnosis: wait stats, I/O, CPU, memory, locking
- TSHOOT-SlowApplication.md - Decision tree: SQL vs application vs network performance issues
- TSHOOT-ConnectivityIssues.md - SQL Server connectivity diagnosis: network, auth, firewall, SSL/TLS
- TSHOOT-JobFailures.md - Common SQL Agent job failure patterns and resolution
- TEMPLATE-Runbook.md - Standard runbook format with severity, procedure, escalation, rollback
- TEMPLATE-ChangeRequest.md - Database change request with approvals and rollback plan
- TEMPLATE-PostIncidentReview.md - Post-mortem template with timeline, root cause, and action items
Each runbook follows a consistent format:
- Summary - What the procedure covers
- Severity - Classification and response time
- Prerequisites - Required access and tools
- Procedure - Step-by-step instructions with embedded SQL
- Escalation - When and who to escalate to
- Rollback - Recovery steps if something goes wrong
These runbooks are designed to be followed during real incidents and maintenance windows. They assume the reader has SQL Server administrative access and basic T-SQL proficiency.
MIT