Cloudflare Pages build for npsnav.in got stuck in the “Building…” state for several hours.
During this time, all subsequent deployments were queued and never executed.
GitHub Actions successfully updated data.json in the repository, but the site did not update because Cloudflare never completed the deployment.
The issue was resolved only after manually deleting the stuck deployment from the Cloudflare Pages dashboard.
This incident also coincided with a Cloudflare-wide outage, which may have contributed to the build becoming stuck.
Impact
- Website served stale NAV data even though the repo contained the latest updates
- Automated daily updates silently failed
- All new deployments were blocked behind the stuck one
- Required manual intervention
- High chance of recurrence without a safeguard
Proposed Fix
Introduce a “watchdog” mechanism to detect and recover from stuck Cloudflare Pages builds.
Instead of running on a schedule, the watchdog can also be triggered only after a new deployment should have started (i.e., after our GitHub Action pushes changes).
Option A — Trigger Watchdog After Each Push / Deployment Attempt
After the NAV update workflow completes:
- Query Cloudflare Pages API for the latest deployment
- If the deployment is queued or building for too long (e.g., >10 minutes) → automatically cancel
- Trigger a new deployment
- Optional: send alerts (Slack/Discord/etc.)
This approach runs only when needed and avoids unnecessary scheduled jobs.
Option B — Health Check After Deployment
After GitHub Actions finishes pushing updates:
- Fetch
https://npsnav.in/data.json
- Validate the timestamp
- If the deployed data is stale → retry or alert
This ensures correctness even if Cloudflare marks a build as “successful” but the live site didn’t update.
Option C — Optional Pipeline Change
Switch to building the site inside GitHub Actions and let Cloudflare Pages only handle hosting (Direct Upload).
This avoids Cloudflare’s build system entirely, but requires modifying the current deployment setup.
Next Steps
- Decide which recovery approach to implement
- If Option A is chosen: implement watchdog logic triggered after the NAV update workflow
- If Option B is chosen: add health-check logic
- Optional: add fallback scheduled watchdog for resilience during Cloudflare outages
Cloudflare Pages build for npsnav.in got stuck in the “Building…” state for several hours.
During this time, all subsequent deployments were queued and never executed.
GitHub Actions successfully updated
data.jsonin the repository, but the site did not update because Cloudflare never completed the deployment.The issue was resolved only after manually deleting the stuck deployment from the Cloudflare Pages dashboard.
This incident also coincided with a Cloudflare-wide outage, which may have contributed to the build becoming stuck.
Impact
Proposed Fix
Introduce a “watchdog” mechanism to detect and recover from stuck Cloudflare Pages builds.
Instead of running on a schedule, the watchdog can also be triggered only after a new deployment should have started (i.e., after our GitHub Action pushes changes).
Option A — Trigger Watchdog After Each Push / Deployment Attempt
After the NAV update workflow completes:
This approach runs only when needed and avoids unnecessary scheduled jobs.
Option B — Health Check After Deployment
After GitHub Actions finishes pushing updates:
https://npsnav.in/data.jsonThis ensures correctness even if Cloudflare marks a build as “successful” but the live site didn’t update.
Option C — Optional Pipeline Change
Switch to building the site inside GitHub Actions and let Cloudflare Pages only handle hosting (Direct Upload).
This avoids Cloudflare’s build system entirely, but requires modifying the current deployment setup.
Next Steps