Skip to content

fix(deploy): install workspace npm + service pip deps in deploy.sh#196

Open
schutera wants to merge 2 commits into
mainfrom
fix/deploy-sh-install-deps
Open

fix(deploy): install workspace npm + service pip deps in deploy.sh#196
schutera wants to merge 2 commits into
mainfrom
fix/deploy-sh-install-deps

Conversation

@schutera

Copy link
Copy Markdown
Owner

Problem

scripts/deploy.sh rebuilds + reloads but never installs new dependencies, so a release that adds a dep fails its build and auto-rolls-back:

Changes — scripts/deploy.sh

  1. Root npm ci when the root package-lock.json or any workspace package.json changed, run before the builds; dropped the per-prefix npm --prefix backend|homepage ci. Fatal on failure (rollback) — a broken install means a broken build anyway.
  2. pip install for duckdb-service / image-service when their requirements.txt changed, into the system python3 pm2 runs them with (no venv). Non-fatal: a resolver miss (e.g. an onnxruntime with no wheel for this Python) must not block the deploy — the existing post-reload health check is the real gate (services degrade gracefully on a missing optional dep; a genuinely-required missing module crashes the reload → health fails → rollback).

Changes — docs/07-deployment-view/production-runbook.md

Rewrote the stale "Updates & Redeployment" section to match reality: deploys from main (not a production branch), root npm ci (workspaces), pip install for both Python services into system python3, the Node builds, pm2 reload of all four apps, and the four health checks. Added the Python-3.10 ceiling note (onnxruntime pinned to 1.23.2 = max cp310 wheel; ESP runs no models — ADR-028).

Verification

  • bash -n scripts/deploy.sh passes.
  • On a clean Python 3.10 venv: both Python services import/boot with deps installed; onnxruntime==1.23.2 loads and runs the real hole_detector.onnx; image-service also boots with opencv/numpy/onnxruntime all absent (graceful no-op) — which is why the pip step is non-fatal.
  • The npm gap is the documented rotating-file-stream (Turn the admin "Server Logs" panel into a real live log console #178) failure; root npm ci resolves it (the dep is in the root lockfile).

Pairs with #191 (Python 3.10 compat) — together they let main redeploy cleanly on the 3.10 host.

🤖 Generated with Claude Code

@cofade

cofade commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

Senior review — PR #196 fix(deploy): install workspace npm + service pip deps

@schutera I ran this through the senior-reviewer gate plus all the local testing I can do off the prod host. Verdict: the scripts/deploy.sh change is solid and self-contained — ship it. The production-runbook.md rewrite is not yet correct and should be fixed (or split out) before merge.

✅ What's good (and verified)

  • Root npm ci gate is right for the workspaces monorepo. Root package.json declares workspaces: [contracts, backend, homepage]; a new backend/homepage dep lands only in the root lockfile, which the old npm --prefix <pkg> ci would miss. Running it before the builds is the correct ordering. The rotating-file-stream / Turn the admin "Server Logs" panel into a real live log console #178 motivation is real.
  • The non-fatal pip design is grounded, not hand-waved. image-service/services/hole_detection.py imports onnxruntime/cv2/numpy under a try/except with _RUNTIME_AVAILABLE, and app.py's /health doesn't touch them — so a missing optional wheel leaves health green + detection degraded, while a genuinely-required missing module crashes the reload → health fails → rollback. "Health check is the real gate" holds.
  • changed_match gates fire correctly (I tested the regexes against 10 representative change-sets): requirements.txt changes correctly trigger both pip install and the reload/health gate; requirements-dev.txt correctly does not trigger a pip install.

🔴 P0 — the runbook documents a dependency strategy that doesn't exist

The new "Python 3.10 ceiling" paragraph + step-2 comment state onnxruntime is "pinned to 1.23.2 (max cp310 wheel)". That contradicts the repo on two fronts:

  1. This branch's own image-service/requirements.txt pins onnxruntime==1.27.0 (not 1.23.2). Per the repo's own comment on main, "onnxruntime cp310 ≤ 1.23.2" — i.e. 1.27.0 has no cp310 wheel, so on the very "Python 3.10 host" the doc insists on, that line would fail to install. The doc names the wrong version and the branch ships an uninstallable one.
  2. main has already abandoned the pin. Commit a0e7374 (ci: test the Python services across a 3.10–3.14 matrix (closes #192) #195 / ADR-029 — which this branch does not contain) deliberately floats numpy>=2.0.0 and onnxruntime>=1.23.2 to support a 3.10–3.14 matrix, with an in-file comment saying "floated to lower bounds (not ==)". There is no "ceiling" anymore.

Because this PR doesn't touch requirements.txt, merging takes main's floated version — so the runbook would ship contradicting the file sitting right beside it, and cite ADR-028 while the relevant rationale is ADR-029. This is the exact "trust code, not your memory of the pin" failure the CLAUDE.md rule exists to catch.

Fix: rebase onto main first, then rewrite the dependency section against the post-#195 requirements.txt + ADR-029 — drop "pinned to 1.23.2" and "Python 3.10 ceiling" entirely.

🟡 P1 / P2 — smaller items

  • P1 — Branch is behind main by ci: test the Python services across a 3.10–3.14 matrix (closes #192) #195 and has diverged (HEAD is not an ancestor of main). deploy.sh itself blocks on git pull --ff-only, and the doc you're editing is one commit behind the commit that removed the thing it documents. Rebase before merge.
  • P2 — Rollback is dependency-blind. rollback() restores build artifacts + git reset --hard, but never reinstalls old Python deps or downgrades a pip-upgraded shared package. A forward pip install (e.g. numpy → 2.x on main) is sticky across rollback. Blast radius is zero today (duckdb-service doesn't import numpy), but the runbook's confident "all pure / cp310-ok" framing should at least state that pip upgrades are forward-only and not reverted.
  • P2 — Silent pip failures have no diagnostics. python3 -m pip install ... >/dev/null 2>&1 → the WARN log carries zero detail (no "no matching distribution"). For the exact failure this feature anticipates (no cp310 wheel), the operator gets a content-free warning. Consider teeing pip stderr to the deploy log on the failure branch.
  • P2 — npm ci gate omits the root package.json. Gate is ^package-lock\.json$|^(backend|homepage|contracts)/package\.json$. A root-level dep/workspace-list change that somehow didn't touch the lockfile would skip npm ci. Theoretical (real dep changes always touch the lockfile), but conspicuous given the other three are listed.
  • P2 — No ch11 lessons-learned entry. This fixes a non-obvious deploy bug (per-prefix npm ci silently misses root-lockfile deps); per CLAUDE.md that lesson belongs in docs/11-risks-and-technical-debt/, not only a code comment.

Testing run

Check Result
bash -n scripts/deploy.sh ✅ pass
make check-citations ✅ 7 OK, 0 problems
changed_match regex matrix (10 scenarios) ✅ gates fire as intended
CI on the PR ✅ all 9 checks green
Merge state MERGEABLE / CLEAN (against main, ignoring staleness)
shellcheck ⏭️ not installed locally

Couldn't test locally (needs the prod host): the actual end-to-end deploy.sh run (pm2 / systemd timer / /var/www/highfive), a real root npm ci resolve+build, and pip install resolving an onnxruntime cp310 wheel on the real 3.10 interpreter.

Reviewed with Claude Code (senior-reviewer gate). Findings are input, not a verdict — worth a second look at the P0 against your intended requirements strategy.

@cofade

cofade commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Reviewed + pushed a fix — 1951ddc

@schutera We ran this through the senior-reviewer gate plus the local testing layers (bash -n, make check-citations, a changed_match regex matrix, branch-divergence analysis). The scripts/deploy.sh change reviewed as solid and left unchanged — the root npm ci gate, install-before-build ordering, and non-fatal-pip design are all correct. The one blocker was the runbook doc, which I've now corrected and pushed to this branch.

What I pushed (two commits):

  1. Merged main into the branch (f4bce7f). The branch was one commit behind main (ci: test the Python services across a 3.10–3.14 matrix (closes #192) #195 / ADR-029, the 3.10–3.14 Python matrix). Clean merge, no conflicts — this brings the real, floated requirements.txt onto the branch so the runbook can describe what actually ships.

  2. Corrected the runbook + logged the lesson (1951ddc):

    • The new section claimed onnxruntime is "pinned to 1.23.2" under a "Python 3.10 ceiling". That contradicted reality on two fronts: this branch's own requirements.txt had onnxruntime==1.27.0 (which has no cp310 wheel), and main (ADR-029) deliberately floats numpy>=2.0.0 / onnxruntime>=1.23.2 / pydantic>=2.12.5 for a 3.10–3.14 matrix — a floor, not a pin. Rewrote the step-2 comment and the "Python 3.10 floor (not a pin)" paragraph to match, citing ADR-029.
    • Added a caveat that a rollback does not downgrade a pip-upgraded shared dep (e.g. the numpy → 2.x bump is sticky), since deploy.sh's rollback() only restores the git tree + Node artifacts.
    • Added a ch11 lessons-learned entry for the workspace-lockfile npm ci miss this PR addresses (per CLAUDE.md's mandatory doc gate) — distinct from ci: test the Python services across a 3.10–3.14 matrix (closes #192) #195's version-matrix entry.

Optional follow-ups I did not push (your call — all P2 nits on deploy.sh, none blocking):

  • The pip install (>/dev/null 2>&1) gives a content-free WARN on failure — teeing pip stderr to the deploy log would help diagnose the exact "no cp310 wheel" case it anticipates.
  • The npm-ci gate lists the three workspace package.jsons but not the root package.json (theoretical — real dep changes always touch the lockfile).

CI should re-run on the push. Full original review is in the earlier comment.

Mark Schutera and others added 2 commits June 29, 2026 22:26
deploy.sh rebuilt/reloaded but never installed new deps, so a dep-adding release failed its build and auto-rolled-back. npm: it gated npm ci on backend/package-lock.json, but this is a workspaces monorepo with one ROOT lockfile, so new backend/homepage deps were missed (broke on rotating-file-stream, #178). Now a single root 'npm ci' gated on the root lockfile / any workspace package.json, before the builds; dropped the wrong per-prefix ci. pip: never ran. Now 'python3 -m pip install -r <svc>/requirements.txt' for duckdb-service/image-service when their requirements changed, into the system python3 pm2 uses; non-fatal, the post-reload health check is the real gate (graceful degradation on a missing optional dep).

Also rewrites the stale production-runbook 'Updates & Redeployment' section to match reality (main branch, root npm ci, pip into system python3, all 4 pm2 apps, health checks, Python 3.10 / onnxruntime 1.23.2 note).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…R-029)

The PR's runbook rewrite asserted onnxruntime is "pinned to 1.23.2" under a
"Python 3.10 ceiling". After folding in main (#195 / ADR-029), the real
requirements float numpy>=2.0.0 / onnxruntime>=1.23.2 / pydantic>=2.12.5 for a
3.10-3.14 matrix — a floor, not a pin. Rewrote the step-2 comment and the
"Python 3.10 floor" paragraph to match, citing ADR-029, and noted that a pip
upgrade is not reverted on rollback.

Added a ch11 lessons-learned entry for the workspace-lockfile npm-ci miss this
PR corrects (per CLAUDE.md's mandatory doc gate). Addresses the senior-reviewer
P0/P2 findings on the PR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cofade cofade force-pushed the fix/deploy-sh-install-deps branch from 1951ddc to 135dc1f Compare June 29, 2026 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants