feat(observability): Alertmanager Discord receiver + healthchecks.io Watchdog#206
Merged
jvcorredor merged 4 commits intoMay 16, 2026
Merged
Conversation
…Watchdog Route Alertmanager's default route to a Discord channel webhook and the chart's always-firing `Watchdog` alert to a healthchecks.io ping URL on a 1m repeat_interval — the dead-man's switch that pages on a fully-dark cluster. `InfoInhibitor`, the chart's non-actionable inhibition helper, is dropped into an empty `blackhole` receiver so the now-`discord` default route does not spam it. Both endpoint URLs are sourced from files mounted out of ESO-synced K8s Secrets (`alertmanagerSpec.secrets`), so neither URL enters Git or Terraform state — same out-of-band credential pattern as the Grafana admin password. - terraform/gcp/: two empty GSM containers, `discord-alertmanager-webhook` and `healthchecks-watchdog-url`, populated by the operator out of band. - kube-prometheus-stack/manifests/: one ExternalSecret per container, syncing into the `observability` namespace. - kube-prometheus-stack/helm-values.yaml: Alertmanager `config` with the discord/watchdog/blackhole receivers and routes; `inhibit_rules` restated verbatim from the chart default since supplying `config` replaces it wholesale. - kube-prometheus-stack/README.md: out-of-band operator steps (Discord webhook, healthchecks.io check) and two smoke-test steps. Closes: #181 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Terraform plan:
|
…alertmanager-discord-watchdog # Conflicts: # kubernetes/apps/kube-prometheus-stack/README.md
…r-discord-watchdog' into worktree-homelab-181-alertmanager-discord-watchdog
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Wires Alertmanager's two delivery paths for the ADR-0007 observability stack:
discordalertname = "Watchdog"watchdogrepeat_interval: 1m)alertname = "InfoInhibitor"blackholeBoth endpoint URLs are sourced from files mounted out of ESO-synced K8s Secrets (
alertmanagerSpec.secrets), so neither URL enters Git or Terraform state — the same out-of-band credential pattern asgrafana-admin-password.Changes
terraform/gcp/main.tf— two empty GSM containers,discord-alertmanager-webhookandhealthchecks-watchdog-url, populated by the operator out of band.kube-prometheus-stack/manifests/external-secret-{discord,healthchecks}.yaml— oneExternalSecretper container → K8s Secrets inobservability.kube-prometheus-stack/helm-values.yaml— Alertmanagerconfigwith the discord/watchdog/blackhole receivers and routes.inhibit_rulesare restated verbatim from the chart default, since supplyingconfigreplaces it wholesale.kube-prometheus-stack/README.md— out-of-band operator steps (create Discord webhook, create healthchecks.io check, populate both GSM containers) and two new smoke-test steps.Verification
tofu validate(terraform/gcp) — passes;tofu fmt -check— clean.scripts/lint-apps.sh kubernetes/apps/kube-prometheus-stack— passes (123 valid resources, 5 manifests valid).helm templateof the chart with these values renders the Alertmanagerconfigsecret exactly as designed —webhook_url_file/url_fileresolve against thealertmanagerSpec.secretsmount paths.In-cluster acceptance (Discord test alert, healthchecks.io Up/Down) requires the operator to populate the two GSM containers first — documented in the README's "Out-of-band operator steps".
Out of scope
Per the issue: application-level alert rules, and routing to Slack/PagerDuty/email.
Closes: #181
🤖 Generated with Claude Code