Skip to content

fix: helm chart fail-closed gates, latent DATABASE_URL bug, deploy-path docs#11

Merged
jeffyaw merged 1 commit intomasterfrom
fix/full-pass-cleanup
Apr 25, 2026
Merged

fix: helm chart fail-closed gates, latent DATABASE_URL bug, deploy-path docs#11
jeffyaw merged 1 commit intomasterfrom
fix/full-pass-cleanup

Conversation

@jeffyaw
Copy link
Copy Markdown
Member

@jeffyaw jeffyaw commented Apr 25, 2026

Summary

A /full-pass audit surfaced a mix of latent bugs and silent footguns across the Helm chart, the Fly + Cloud Run paths, and the operator-facing docs. This PR fixes them in one cohesive batch.

Worth flagging loudly

  • Latent DATABASE_URL bug. _helpers.tpl databaseUrl was rendering "pg.example.comtestpostgresql://..." for every external-database deploy because required returns its asserted value, which got concatenated before the URL string. Captured into $_ := so it works as a pure assertion.
  • Breaking-ish chart change. redis.authToken is now required at render time when redis.enabled=true. The previous chart rendered Valkey with no --requirepass and the app pod with no REDIS_AUTH_TOKEN if the value was left at its default empty string -- silently fail-open, asymmetric to the Compose path's ${REDIS_AUTH_TOKEN:?...} guard. Justifies a v0.2.1 cut.

Helm

  • postgres.enabled=true -> pgvector/pgvector:pg18 (dev-mode toggle no longer crash-loops on CREATE EXTENSION vector).
  • redis.authToken required when in-cluster Valkey is enabled.
  • DATABASE_URL render bug fixed (see above).
  • Probes use valkey-cli with --no-auth-warning to match Compose.
  • NOTES.txt redundant or; app-configmap redundant default; both cleaned up.

Cloud Run

  • Step 7 example deploy command had REDIS_TLS=true,DATABASE_SSL=require which contradicted the basic-tier Memorystore + Cloud SQL Auth Proxy unix-socket provisioning the same doc walks the operator through. Now REDIS_TLS=false,DATABASE_SSL=disable with explainer notes for when to flip them back.

Fly

  • fly.toml no longer points at the unfetchable private GHCR image; placeholder + comment direct the operator to mirror via bootstrap.sh or set --image on deploy.
  • fly postgres attach | grep -v ... || true was swallowing real failures under set -euo pipefail (auth, network, 5xx) and leaving DATABASE_URL unset. Now captures exit code and only forgives already attached.
  • whoami slugified so corp-SSO usernames don't generate invalid Fly app names.
  • fly certs add now runs for non-default domains.

Docs

  • README: stale --version 0.1.0 -> 0.2.0 to match Chart.yaml.
  • docs/upgrade.md: removed the MCP_HOSTING_IMAGE_TAG=v0.8.0 rollback recipe (no interpolation in docker-compose.yml, the env var did nothing); replaced with a docker-compose.override.yml block.
  • docs/backup-restore.md: filename references mcp-hosting-backup-* -> mcp-hosting-* to match scripts/backup.sh.

CI

  • deploy-test.yml: skip Python setup on the helm matrix entry (kubeconform is a Go binary).
  • test.sh: pass --set redis.authToken=... to helm lint and helm template so the new required gate doesn't break CI.

Test plan

  • helm lint --strict passes.
  • helm template renders cleanly with full value set (646 lines, 0 errors).
  • helm template fails closed when redis.authToken is omitted.
  • helm template fails closed when externalDatabase.password is omitted.
  • DATABASE_URL renders as a clean postgresql://user:pass@host:port/db?sslmode=require.
  • In-cluster postgres path renders pgvector/pgvector:pg18.
  • bash -n passes on all five shell scripts.
  • fly.toml parses + has required keys.
  • cloudrun/service.yaml parses + has correct Knative shape.
  • CI's helm + fly + cloudrun deploy-test matrix passes (verified by this PR's CI run).
  • CI's Validate Templates (helm-lint, yaml-lint, docker-compose-validate, shellcheck) passes (verified by this PR's CI run).

🤖 Generated with Claude Code

…th docs

helm:
- postgres.enabled=true now uses pgvector/pgvector:pg18 so the dev-mode
  toggle no longer crash-loops on the CREATE EXTENSION vector migration
- redis.authToken is required when redis.enabled=true (mirrors the
  Compose --requirepass guard so Valkey can't boot unauthenticated)
- _helpers databaseUrl: capture `required` returns into $_, otherwise
  externalDatabase deploys rendered "pg.example.comtestpostgresql://..."
  -- the assertion's return value was prepending into the URL string
- redis probes use valkey-cli with --no-auth-warning to match Compose
- NOTES.txt: drop redundant `or` wrap; app-configmap: drop redundant
  default since values.yaml already defaults sslMode

cloud platforms:
- cloudrun README step 7: REDIS_TLS=false / DATABASE_SSL=disable to
  match basic-tier Memorystore + Cloud SQL Auth Proxy unix socket
- fly.toml: replace ghcr.io/yawlabs/mcp-hosting:latest (unfetchable
  by Fly) with a registry.fly.io placeholder + comment
- fly bootstrap: stop swallowing real `fly postgres attach` failures
  (capture exit, only forgive "already attached"); slugify whoami;
  run `fly certs add` for non-default domains

docs:
- README: bump 0.1.0 -> 0.2.0 to match Chart.yaml
- upgrade.md: replace MCP_HOSTING_IMAGE_TAG rollback recipe with
  docker-compose.override.yml block (the env var did nothing)
- backup-restore.md: filenames mcp-hosting-backup-* -> mcp-hosting-*
  to match what scripts/backup.sh actually writes

ci:
- deploy-test: skip Python setup on the helm matrix entry
- test.sh: pass --set redis.authToken=... to helm lint/template so the
  new required gate doesn't break CI

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jeffyaw jeffyaw merged commit a643007 into master Apr 25, 2026
7 checks passed
@jeffyaw jeffyaw deleted the fix/full-pass-cleanup branch April 25, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant