Symptom
PR #41 (`approver: silence ⚠️ timeout for lobby rooms nobody joined`) merged 2026-05-25, auto-tagged v0.7.03, kicked off a deploy that failed at the heads-up step (stale token, see related issue). The fix sat undeployed for 2 days before I noticed only because the very spam the PR was meant to silence was still landing.
There is no mechanism in deploy.yml that posts to Matrix / opens a GitHub issue / pages anywhere when the workflow fails.
Proposed fix
Add an if: failure() step at the end of the deploy job (depends on heads-up channel decision — see related issue). Skeleton:
- name: notify on failure
if: failure()
env:
KNOCK_APPROVER_TOKEN: ${{ secrets.KNOCK_APPROVER_TOKEN }}
ADMIN_COMMAND_ROOM: ${{ secrets.ADMIN_COMMAND_ROOM }}
run: |
# post "❌ deploy {{ref}} ({{sha}}) failed — see {{run_url}}"
# whichever channel issue #TBD picks
Caveat: if KNOCK_APPROVER_TOKEN itself is the thing that's stale, this notification step will also fail. The pre-flight validation issue is the more reliable backstop for that specific class.
Worth considering: GH Actions has built-in workflow-failure email and the mobile app pushes for failed runs. May be enough on its own once the team confirms they actually receive those.
Symptom
PR #41 (`approver: silence⚠️ timeout for lobby rooms nobody joined`) merged 2026-05-25, auto-tagged v0.7.03, kicked off a deploy that failed at the heads-up step (stale token, see related issue). The fix sat undeployed for 2 days before I noticed only because the very spam the PR was meant to silence was still landing.
There is no mechanism in
deploy.ymlthat posts to Matrix / opens a GitHub issue / pages anywhere when the workflow fails.Proposed fix
Add an
if: failure()step at the end of the deploy job (depends on heads-up channel decision — see related issue). Skeleton:Caveat: if
KNOCK_APPROVER_TOKENitself is the thing that's stale, this notification step will also fail. The pre-flight validation issue is the more reliable backstop for that specific class.Worth considering: GH Actions has built-in workflow-failure email and the mobile app pushes for failed runs. May be enough on its own once the team confirms they actually receive those.