Skip to content

feat: split liveness and readiness probes with dependency checks#1289

Merged
Baskarayelu merged 3 commits into
QuickLendX:mainfrom
mikkyvans0-source:Backend--Add-health/readiness/liveness-split-with-dependency-probes
Jun 3, 2026
Merged

feat: split liveness and readiness probes with dependency checks#1289
Baskarayelu merged 3 commits into
QuickLendX:mainfrom
mikkyvans0-source:Backend--Add-health/readiness/liveness-split-with-dependency-probes

Conversation

@mikkyvans0-source
Copy link
Copy Markdown
Contributor

What I did
Split the flat /health into proper liveness and readiness probes, mounted at the root of the app and left unauthenticated so orchestrators can reach them.
Closes #1065

Files changed (commit def6bce):

backend/src/routes/health.ts (new) — the probe router:
GET /health, GET /livez — cheap, dependency-free liveness (always 200).
GET /readyz — readiness: probes DB connectivity, ingest lag, and webhook queue, honours maintenance mode, returns 503 when not ready. Reuses the SubStatus (ok/degraded/unavailable) pattern from monitoring.ts: degraded stays in rotation, unavailable fails readiness. Maintenance mode short-circuits to 503 maintenance before probing deps.
backend/src/lib/database.ts — added pingDatabase() (a SELECT 1 round-trip that never throws) for the DB connectivity check.
backend/src/app.ts — replaced the inline flat /health handler with the new router (and dropped the leaked version field).
backend/src/tests/readiness.test.ts (new) — 18 tests: happy path, DB-down, critical lag, lag-probe-throws, maintenance, partial failure, queue saturation, queue-store-unreachable, and information-leak checks. All 18 pass; health.ts is at 100% coverage.
backend/docs/health.md (new) — documents probe semantics, sub-status table, edge cases, security notes, and a Kubernetes config example.
Things worth flagging
Pre-existing breakage on the base branch (not mine, left untouched): routes/v1/index.ts imports FileSystemRawEventStore which was renamed to FileRawEventStore; controllers/v1/invoices.ts has missing imports; several migration version-mismatch failures; webhookQueue.persist.test.ts fails identically on the clean base. Because of these, any test importing the full app.ts won't compile, so I mounted the health router in isolation in my test (verifying the same wiring app.ts uses). The full suite shows 31 pre-existing failures; my suite is green and none of the failures touch files I changed. I did not fix these since you asked me to stay on task.
The two test-rbac-.db files were SQLite artifacts the RBAC test wrote to backend/.data/ during the full run — I deleted them per your choice; they're regenerated each run. Consider gitignoring backend/.data/.db separately.
Docs went in backend/docs/health.md (where all other docs live) rather than a top-level docs/health.md.What I did
Split the flat /health into proper liveness and readiness probes, mounted at the root of the app and left unauthenticated so orchestrators can reach them.

Files changed (commit def6bce):

backend/src/routes/health.ts (new) — the probe router:
GET /health, GET /livez — cheap, dependency-free liveness (always 200).
GET /readyz — readiness: probes DB connectivity, ingest lag, and webhook queue, honours maintenance mode, returns 503 when not ready. Reuses the SubStatus (ok/degraded/unavailable) pattern from monitoring.ts: degraded stays in rotation, unavailable fails readiness. Maintenance mode short-circuits to 503 maintenance before probing deps.
backend/src/lib/database.ts — added pingDatabase() (a SELECT 1 round-trip that never throws) for the DB connectivity check.
backend/src/app.ts — replaced the inline flat /health handler with the new router (and dropped the leaked version field).
backend/src/tests/readiness.test.ts (new) — 18 tests: happy path, DB-down, critical lag, lag-probe-throws, maintenance, partial failure, queue saturation, queue-store-unreachable, and information-leak checks. All 18 pass; health.ts is at 100% coverage.
backend/docs/health.md (new) — documents probe semantics, sub-status table, edge cases, security notes, and a Kubernetes config example.
Things worth flagging
Pre-existing breakage on the base branch (not mine, left untouched): routes/v1/index.ts imports FileSystemRawEventStore which was renamed to FileRawEventStore; controllers/v1/invoices.ts has missing imports; several migration version-mismatch failures; webhookQueue.persist.test.ts fails identically on the clean base. Because of these, any test importing the full app.ts won't compile, so I mounted the health router in isolation in my test (verifying the same wiring app.ts uses). The full suite shows 31 pre-existing failures; my suite is green and none of the failures touch files I changed. I did not fix these since you asked me to stay on task.
The two test-rbac-.db files were SQLite artifacts the RBAC test wrote to backend/.data/ during the full run — I deleted them per your choice; they're regenerated each run. Consider gitignoring backend/.data/.db separately.
Docs went in backend/docs/health.md (where all other docs live) rather than a top-level docs/health.md.

The app exposed a single flat /health that always returned status: ok, with
no distinction between liveness (process up) and readiness (dependencies
healthy). Orchestrators could therefore route traffic to instances that were
up but unable to serve.

Split the signal into two probes, mounted at the root and unauthenticated so
orchestrators can reach them:

- /health, /livez  — cheap, dependency-free liveness check (always 200).
- /readyz          — readiness check that probes DB connectivity
  (pingDatabase), ingest lag (lagMonitor), and webhook queue health, honours
  maintenance mode, and returns 503 when not ready. Reuses the SubStatus /
  degradation pattern from the monitoring route: "degraded" stays in rotation,
  "unavailable" fails readiness.

Add a pingDatabase() helper (SELECT 1 round-trip), readiness.test.ts covering
DB-down, critical lag, maintenance mode, partial failure and queue-saturation
edge cases plus information-leak checks, and docs/health.md documenting probe
semantics. Probe responses expose only coarse status enums — no hostnames,
versions, ledger numbers, or error messages.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@drips-wave
Copy link
Copy Markdown

drips-wave Bot commented Jun 2, 2026

@mikkyvans0-source Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

The app exposed a single flat /health that always returned status: ok, with
no distinction between liveness (process up) and readiness (dependencies
healthy). Orchestrators could therefore route traffic to instances that were
up but unable to serve.

Split the signal into two probes, mounted at the root and unauthenticated so
orchestrators can reach them:

- /health, /livez  — cheap, dependency-free liveness check (always 200).
- /readyz          — readiness check that probes DB connectivity
  (pingDatabase), ingest lag (lagMonitor), and webhook queue health, honours
  maintenance mode, and returns 503 when not ready. Reuses the SubStatus /
  degradation pattern from the monitoring route: "degraded" stays in rotation,
  "unavailable" fails readiness.

Add a pingDatabase() helper (SELECT 1 round-trip), readiness.test.ts covering
DB-down, critical lag, maintenance mode, partial failure and queue-saturation
edge cases plus information-leak checks, and docs/health.md documenting probe
semantics. Probe responses expose only coarse status enums — no hostnames,
versions, ledger numbers, or error messages.
…dency-probes' of https://github.com/mikkyvans0-source/quicklendx-protocol into Backend--Add-health/readiness/liveness-split-with-dependency-probes
@Baskarayelu Baskarayelu merged commit 914818a into QuickLendX:main Jun 3, 2026
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Backend: Add health/readiness/liveness split with dependency probes

2 participants