fix(backend): retry duckdb health at boot + warn on unset DUCKDB_SERVICE_URL#193
fix(backend): retry duckdb health at boot + warn on unset DUCKDB_SERVICE_URL#193schutera wants to merge 1 commit into
Conversation
|
@schutera — senior-review + full local test sweep of this PR. Verdict: mergeable, but one real availability regression and a stale doc the PR touches should be fixed first. The core fix is sound and small; the Test results — every layer I can run locally is green
Findings (ranked, independently verified against the files — not just the diff)P1 — should fix before merge1. So 2. 3. Stale doc the PR touches. 4. Zero tests for the new logic. The analogous P2 — nits / defer5. 6. The chapter-11 incident this warning-pattern guards against was actually triggered by Genuinely fine — don't change
Manual test plan (what local tests can't cover — needs the Docker stack)A — the fix: no stale "not reachable" after a normal boot docker compose up --build -d
curl.exe http://localhost:3002/api/health
curl.exe -s -H "X-Admin-Key: hf_dev_key_2026" "http://localhost:3002/api/admin/logs?service=backend" | Select-String "DuckDB service"Expect B — the new unset-URL warning fires docker compose run --rm -e DUCKDB_SERVICE_URL= --service-ports backend node dist/server.jsExpect C — confirm the P1 delay (optional) docker compose stop duckdb-service; docker compose restart backend
(Invoke-WebRequest http://localhost:3002/api/health -UseBasicParsing).StatusCode # ~4.5s stall, then 200
docker compose start duckdb-serviceBottom lineHappy-path is correct (hence green CI), but #1 (listen first, probe in background) and #3 (fix the doc) shouldn't merge as-is. #2 and #4 are strongly recommended alongside. Full local sweep above is green. 🤖 Review generated with Claude Code (senior-reviewer + local test sweep). |
…B_SERVICE_URL The API and duckdb-service start together (pm2/compose), so the one-shot health probe in bootstrap() races the service binding its port. On every restart it logs '⚠ DuckDB service not reachable', which the in-memory log ring (#171) then keeps at the top of the admin server-logs panel for the whole process lifetime — looking like an outage when the service is fine. (Prod: API + duckdb-service started 1s apart; duckdb bound :8000 just after the probe -> ECONNREFUSED, logged once, non-fatal.) bootstrap(): retry duckdbHealth() up to 10x500ms before warning; still serves regardless (advisory). duckdbClient: keep the 8002 default (documented docker host-port mapping 8002:8000) but expose duckdbUrlFromDefault so server.ts warns when DUCKDB_SERVICE_URL is unset, mirroring the existing PORT-unset warning. Trims/treats blank as unset. Type-checked clean (tsc --noEmit) for both changed files. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9690438 to
9818abf
Compare
Symptom
The admin server-logs panel shows, seemingly as a live outage:
…even though
duckdb-serviceis up and/healthreturns200.Root cause — a startup race, not an outage
bootstrap()inserver.tsprobesduckdbHealth()once, immediately at boot. The API andduckdb-serviceare started together (pm2/compose), so the probe fires ~1s before duckdb finishes binding its port →ECONNREFUSED. It's caught and non-fatal (the API serves anyway), but:Verified on prod:
highfive-apiandduckdb-servicelast started 2026-06-26 23:00:24 / :25 UTC — 1 second apart;/healthis200now, service has not restarted since.Changes
server.ts) —bootstrap()now retriesduckdbHealth()up to 10 × 500ms before logging "not reachable". It still starts serving regardless (the check is advisory). A normal boot resolves on attempt 1–2; a genuinely-down service still surfaces the warning after ~5s.DUCKDB_SERVICE_URLis unset (duckdbClient.ts→server.ts) — mirrors the existingPORT-unset warning. Makes a real misconfiguration loud instead of silently pointing at the wrong port. Also trims / treats blank as unset.Deliberately NOT changed
http://127.0.0.1:8002default is kept —8002is the documented Docker host-port mapping (duckdb-serviceis8002:8000), correct for a backend running on the host against a composed duckdb. A bare-metal/pm2 box (duckdb directly on:8000) setsDUCKDB_SERVICE_URLexplicitly, whichecosystem.config.jsalready does. Changing the default to8000would break the compose convention; the new startup warning addresses the silent-fallback footgun instead.Verification
tsc --noEmitis clean for both changed files. (Other files in the tree report errors only from a stale@highfive/contractsbuild and a missingrotating-file-streamdep in this host'snode_modules— pre-existing environment drift, unrelated to this change; CI installs/builds both.)🤖 Generated with Claude Code