fix(scheduler): seed recurring_jobs via migration + correct worker healthcheck#388
Merged
Conversation
…althcheck Fixes #383. The adaptive schedulers (host monitoring, compliance scanning) were silently dormant on fresh deploys because recurring_jobs was never populated. The seed_schedule module existed but was never invoked by any startup path. Migration 054 inserts the 9 baseline schedules with ON CONFLICT (name) DO NOTHING so it is idempotent against manual invocations of the seed script and safe to re-run. Downgrade removes only these 9 named rows, leaving operator-added schedules untouched. Also overrides the worker container healthcheck, which inherited the backend Dockerfile's curl-localhost-8000 probe and reported unhealthy forever. The new probe verifies DB connectivity via SQLAlchemy, which is the actual precondition for worker function.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #383.
Summary
054_seed_recurring_jobsthat inserts the 9 baseline recurring schedules into therecurring_jobstable on every deploy, idempotentlydocker-compose.yml(was inheriting the backend Dockerfile'scurl localhost:8000, which hits nothing in the worker container and reportsunhealthyforever)Why
The PostgreSQL job queue's scheduler polls
recurring_jobsevery 10s and enqueues due entries. On a fresh deploy, the table was empty and stayed empty — no scheduler, no host monitoring, no compliance scans. Silent failure with no errors logged.The
app/services/job_queue/seed_schedule.pymodule exists and works, but nothing invoked it: not the worker entrypoint, notdocker-compose, not a FastAPI startup hook, not a migration.Discovered in production 2026-04-13 when worker had been up 5 hours with zero jobs dequeued, last host liveness ping 5 hours stale, last scan 5.5 hours overdue.
Why a migration (vs. entrypoint hook or FastAPI startup event)
Idempotency
ON CONFLICT (name) DO NOTHINGmeans the migration is safe to re-run against a DB where someone manually invokedpython -m app.services.job_queue.seed_schedule. Validated against the production DB here, which had 8 rows from yesterday's manual seed plus one missing (retention policies, added later); the migration correctly inserted only the missing row.Future schedule changes
If a new recurring schedule is added to
SCHEDULEinapp/services/job_queue/seed_schedule.py, add a follow-up migration (055_add_<name>_schedule.py) rather than editing this one. Keeps the migration history honest about what was seeded when.Worker healthcheck
Replaces
curl localhost:8000/healthwith a SQLAlchemySELECT 1against the configured DB URL. Rationale: the worker's only hard dependency is DB connectivity — without it, it can't dequeue or enqueue anything. A "worker is alive" probe that doesn't touch the thing the worker needs is not a healthcheck.Test plan
RETURNING namereported only the one missing row inserted, 8 conflicts silently skippedblack --checkpasses on the migration filedispatch_host_checks+dispatch_compliance_scansexecuting every 30s/2min,check_host_connectivityfanout working across 7 hosts,run_scheduled_kensa_scanfiring for overdue hostsalembic upgrade head, verifySELECT count(*) FROM recurring_jobsreturns 9)