Skip to content

Rebrand quickstack → bento: full installer platform, unattended mode, handoff report#6

Open
felipefontoura wants to merge 42 commits into
mainfrom
feat/bento-platform
Open

Rebrand quickstack → bento: full installer platform, unattended mode, handoff report#6
felipefontoura wants to merge 42 commits into
mainfrom
feat/bento-platform

Conversation

@felipefontoura
Copy link
Copy Markdown
Owner

Why

This PR turns quickstack (a curated docker stack deploy collection) into bento — a guided installer that takes a fresh Ubuntu/Debian VPS to "production-ready apps running" with one curl | bash command. The repo's own name will be renamed in the same move (gh repo rename bento already executed).

The shape change:

  • before: clone the repo, edit each *.yml, copy-paste the YAML into the Portainer UI, paste env vars in the form, click deploy.
  • after: paste a single one-liner into a fresh VPS, answer 3 questions (domain / admin email / public IP), get a hardened host, Traefik + Portainer with TLS, the apps you picked, and an HTML handoff report — all automated.

What's in this PR

Installer platform

  • boot.sh — the curl|bash entry point. Pre-flight checks render as ⏵ → ✓ progressive lines from the first byte (Nielsen H1: visibility of system status). Apt output is captured to /tmp/bento-deps.log so the terminal stays scannable. NEEDRESTART_MODE=a is set on every apt call to silence the Ubuntu 24.04+ whiptail prompt.
  • install.sh — main menu driven by gum. Three steps (Harden → Infra → Apps) with status indicators (⏵ pending, ✓ done, 🔒 locked, ✗ failed) read from ~/.config/bento/state.json.
  • lib/ — eight modules with idempotent source guards so they survive being sourced multiple times by install.sh + child libs:
    • banner.sh + ui.sh — themed banner + gum wrappers with a consistent salmon/wasabi/rice palette
    • state.sh — schema-versioned JSON state at ~/.config/bento/state.json, migrate-on-read
    • deps.sh — apt validation + gum/jq/envsubst install with binary fallback
    • hardening.sh — adapted from ubinkaze
    • infra.sh — swarm init + overlay network + Traefik + Portainer + admin init
    • portainer.sh — REST client (auth + stacks CRUD + git redeploy)
    • stacks.sh — manifest parser + env resolution + deploy via API
    • install-helpers.sh — helpers for per-stack install.sh (postgres readiness wait, ensure_database, container discovery)
    • report.sh — handoff HTML generator

Per-stack convention

Each stack lives at stacks/<category>/<key>/ with:

  • compose.yml — parametrized Docker Compose (${VAR} placeholders for hostnames, secrets, DB creds)
  • manifest.json — env spec; declares default, generate, from_state, prompt, required, hide per variable
  • install.sh (optional) — runs after docker stack deploy to seed DBs, run migrations, etc.

Discovery in lib/stacks.sh globs stacks/*/*/manifest.json — adding a stack means adding a directory.

Unattended mode (BENTO_UNATTENDED=1)

End-to-end zero-prompt run for "paste into Hetzner Cloud config or SSH and walk away":

BENTO_UNATTENDED=1 \
BENTO_BASE_DOMAIN=mydomain.com \
BENTO_ADMIN_EMAIL=admin@mydomain.com \
BENTO_APPS=n8n,plunk,evolution-api \
bash install.sh

Reads inputs from env vars, skips every ui_confirm, polls DNS for 120s instead of asking for confirmation, walks the depends_on graph depth-first, and generates the handoff at the end. Pre-populates "already deployed" from state.stacks.*.stack_id so running the same command twice only deploys new entries.

After the post-hardening reboot, a one-shot bento-resume.service systemd unit picks the install back up from the same env file and runs Step 2 → 3 → report.

Bento ↔ Portainer ownership

This PR codifies the contract:

Concern bento Portainer
Declarative state (what should run, with which envs) owner viewer
First deploy + git-backed updates owner (via API) executor
Day-to-day ops (logs, restart, scale, exec) redirect owner
Stacks created outside bento (no BENTO_MANAGED label) ignored full owner

Every bento-deployed stack carries BENTO_MANAGED=true + its source commit. Same model as Helm + kubectl.

Handoff HTML report

lib/report.sh writes a self-contained HTML file with the VPS overview, Traefik + Portainer credentials, and every deployed stack's URL + resolved envs. Secrets are masked behind click-to-reveal; a print stylesheet auto-reveals everything so PDF export is a complete record. Lives at ~/.local/share/bento/reports/handoff-<ts>.html (chmod 600). Auto-generated at end of Step 3 + on demand via menu.

Docs

  • README.md — rebranded with a themed bento ASCII banner PNG (.assets/bento-banner.png), Best-README-Template structure, TL;DR command at the very top, recommended VPS (Hetzner primary + Hostinger BR secondary), DNS guide with Cloudflare deep link, Hetzner Cloud Firewall ruleset, ownership table.
  • CLAUDE.md — maintainer guide loaded by Claude Code. Architecture in 60 seconds, manifest schema, env resolution order, code style, recipe for adding new app stacks with n8n called out as the gold-standard quality bar.
  • .claude/skills/add-app-stack/SKILL.md — playbook for AI-assisted stack additions; requires fetching upstream's docker-compose before writing anything.
  • LICENSE (MIT), CONTRIBUTING.md ("Don't be an asshole.").
  • .github/ISSUE_TEMPLATE/{bug,feature,stack-request,config}.yml + PULL_REQUEST_TEMPLATE.md.

Stack changes

  • Reorganized 11 stacks into per-directory layout.
  • Parametrized all hostnames and secrets.
  • Postgres postgres-init service removed — each app's install.sh creates its own DB via ensure_database.
  • Dropped Portainer agent (single-node Swarm doesn't need it; the agent's tasks.<self-service> lookup fails with one peer).
  • Portainer publishes 9000 on host loopback so bento's external wait_ready + admin init can reach it without going through Traefik.
  • Internal services (postgres 5432, redis 6379, rabbitmq 5672) no longer publish to the host — apps reach them via overlay DNS.
  • Redis DB indexes allocated cleanly: /1 n8n, /2 plunk, /3 chatwoot, /4 evolution-api. plunk and evolution were both writing to /2 before.
  • Default hostnames moved to single-label form so a wildcard *.<base> covers everything: typebot-builder (was builder.typebot), n8n-mcp (was mcp.n8n).
  • Secret generation switched from openssl rand -base64 N | tr ... | head -c X (variable length, broke typebot's 32-char validation) to openssl rand -hex N (exact length, safe characters).

Test results — Hetzner CX22 (4 GB, Ubuntu 26.04, base25.so)

Unattended end-to-end run with BENTO_APPS=n8n,plunk,n8n-mcp,evolution-api,rabbitmq,cli-proxy-api,typebot:

Stack Status URL
Traefik ✓ 1/1 (internal)
Portainer ✓ 1/1 https://portainer.base25.so → 200
Postgres ✓ 1/1 overlay only
Redis ✓ 1/1 overlay only
n8n (editor + worker + webhook) ✓ 3/3 https://n8n.base25.so → 200
Plunk ✓ 1/1 https://plunk.base25.so → 200
n8n-mcp ✓ 1/1 https://n8n-mcp.base25.so → 200
Evolution API ✓ 1/1 https://evolution.base25.so → 200
RabbitMQ ✓ 1/1 https://rabbitmq.base25.so → 200
CLI Proxy API ✓ 1/1 overlay only (no Traefik route by design)
Typebot ✗ 0/1 (OOM) needs CX32+ — see Known limitations

8/9 application services healthy after a single end-to-end run on a 4 GB box.

Handoff HTML generated at ~/.local/share/bento/reports/handoff-*.html. The test VPS will be torn down; nothing in the credentials is reusable.

Known limitations

  • Typebot crashes with V8 SIGABRT on CX22 — even with memory: 512M per service. Recommend CX32 (8 GB) when typebot is in the mix.
  • Chatwoot wasn't included in the test run; its 2 Rails services (web + sidekiq) typically need 1.5+ GB each. Skipping until we validate on CX32.
  • Paperclip depends on a custom image built from a local Dockerfile (paperclip-custom). Swarm's docker stack deploy does not build — the image needs to land in a registry first (or be built on the node before deploy). Not part of the unattended path yet.
  • Cloudflare proxy + HTTP-01: with the orange-cloud proxy on, Let's Encrypt HTTP-01 fails because Cloudflare intercepts the challenge. The test verified by turning the wildcard to "DNS only". Documented in the README.
  • Hardening sentinel race: a long hardening run can finish without writing /var/lib/bento/reboot-required if the systemctl phase is interrupted. The unattended Step 1 has a fallback that re-marks done when docker is up + a sentinel is present from a previous run.

Files

  • 37 commits, ~6.5k LOC added across boot.sh, install.sh, lib/, stacks/ reorg, README.md rewrite, CLAUDE.md, .claude/, .github/, .assets/.

Test plan

  • Spin a fresh CX32 with the latest Ubuntu LTS + a Cloudflare-managed domain with *.<domain> set to DNS only.
  • Paste BENTO_REF=stable bash <(curl -sSL https://raw.githubusercontent.com/felipefontoura/bento/stable/boot.sh) once main has caught up. For now use BENTO_REF=feat/bento-platform + the feat URL.
  • Answer the 3 bootstrap prompts (or set BENTO_UNATTENDED=1 + the BENTO_* env vars).
  • After the post-hardening reboot, re-paste; bento detects Step 1 is done and proceeds.
  • Sign into Portainer at the URL printed by Step 2.
  • Open each app's URL — verify the login screen renders cleanly.
  • Read the handoff HTML report; every URL, username, and (revealed) password should match what's on disk.

…ized envs

Each stack moves from a flat <name>.yml file to <category>/<name>/ with
compose.yml, manifest.json, and optional install.sh. Hostnames, secrets,
and DB connections all use ${VAR} placeholders so the bento installer can
substitute them via Portainer's API at deploy time.

Postgres no longer creates per-app databases automatically — each app's
install.sh now calls ensure_database via lib/install-helpers.sh. Removes
the postgres-init service. Chatwoot's rails db:chatwoot_prepare migration
moves into chatwoot/install.sh.

paperclip.Dockerfile moves into paperclip/ and its OCI source label is
updated to felipefontoura/bento.
boot.sh is the curl|bash entry point: validates apt-get exists, installs
git, clones the repo into ~/.local/share/bento, and sources install.sh.

install.sh is the gum-driven main menu. After a one-time bootstrap that
captures BASE_DOMAIN + ADMIN_EMAIL + ADVERTISE_ADDR, it walks the user
through Step 1 (Harden), Step 2 (Infra), and Step 3 (Apps), with extra
Settings, Status, and Update menu items. Status indicators come from
~/.config/bento/state.json.

lib/ holds the stateless modules:
- banner.sh + ui.sh — themed bento banner and gum wrappers.
- state.sh — schema-versioned JSON state with migrate-on-read.
- deps.sh — apt validation + gum/jq/envsubst install with binary fallback.
- hardening.sh — adapted from felipefontoura/ubinkaze.
- infra.sh — swarm init + network + Traefik + Portainer deploy + admin init.
- portainer.sh — REST API client (auth, stacks CRUD, redeploy).
- stacks.sh — manifest discovery, env resolution, deploy-via-API, hooks.
- install-helpers.sh — helpers (ensure_database, wait_for_service) for
  per-stack install.sh scripts.
CLAUDE.md is the canonical maintainer guide loaded automatically by Claude
Code (and equally usable by humans): explains the bento mental model, the
Bento ↔ Portainer ownership boundary, the per-stack directory convention,
manifest schema, env resolution order, code style, and a step-by-step
"how to add a new application stack" recipe with n8n called out as the
gold-standard quality bar.

.claude/skills/add-app-stack/SKILL.md operationalizes the recipe for AI
assistants: requires fetching the upstream project's docker-compose.yml,
.env.example, and latest release tag via gh api BEFORE writing anything,
so envs and image tags come from the project's own truth instead of model
training data.
…EADME structure

Wraps the README in the Best-README-Template pattern: centered transparent
PNG logo, tagline, quick badges (License/Bash/Docker/distro), table of
contents, and a back-to-top link. All existing content stays — only the
presentation changes.

Adds a prominent "Get a VPS (recommended: Hetzner)" section right before
Quickstart. The Hetzner link is an affiliate referral with a clear,
upfront disclosure that the commission funds new stacks and that the
installer works identically on any apt-based VPS for those who'd rather
skip it.

Generalizes Ubuntu version references throughout (README + lib/hardening.sh
comment): "latest Ubuntu LTS" instead of pinning to a specific dot-release,
so the docs stay correct as new LTSs land.
…RL flow

lib/cloudflare.sh wraps the Cloudflare API for the records bento needs
(token verify, zone lookup, idempotent ensure-A-record for *. and root).
install.sh's bootstrap now asks the user to set up Cloudflare DNS and,
when they accept, opens a Cloudflare template URL that pre-fills the
token creation form with DNS:Edit already selected — so the user lands on
the review screen instead of hunting through menus, clicks Create, and
pastes the token back.

Step 2 (lib/infra.sh) now runs an infra_ensure_dns check before deploying
Traefik. With a Cloudflare token, the wildcard and root A records are
synced via API. Without one, bento prints the records the user must
create manually and won't proceed until they confirm DNS is in place,
since Let's Encrypt would otherwise fail at deploy time.

Closest equivalent to "OAuth consent flow" Cloudflare exposes for
third-party tools today: see
https://developers.cloudflare.com/fundamentals/api/how-to/account-owned-token-template/
Switches `ufw allow ssh` to `ufw limit ssh`, which drops brute-force
attempts at 6 connections/30s at the OS firewall — complementing fail2ban
(longer window, more aggressive). Adds `ufw allow proto icmp` so `ping`
keeps working for connectivity debugging from outside; the kernel sysctl
net.ipv4.icmp_echo_ignore_broadcasts already blocks broadcast pings.

Pairs with the Hetzner Cloud Firewall documentation in the README — these
are layered defenses: Hetzner blocks at the edge, UFW + fail2ban handle
anything that gets through.
README gets two new sections after "Get a VPS":

1. DNS (recommended: Cloudflare) — explains why we recommend it (free
   tier, robust API), explicitly notes there is NO affiliate program for
   individuals so this is a pure technical recommendation, walks through
   the one-click token template flow as Option A and a manual DNS table
   as Option B.

2. Network firewall (Hetzner Cloud Firewall) — explains the layered model
   (Hetzner edge + UFW + fail2ban), documents the recommended Hetzner
   panel ruleset, and lists the UFW rules bento already applies.

CLAUDE.md updates the architecture diagram and the lib/ tree to mention
lib/cloudflare.sh.
Per follow-up: the Cloudflare token flow (even with the template URL)
still pushed friction onto a beginner — sign in to Cloudflare, click
through, copy a value, paste it. The manual path is simpler and works
for every DNS host, not just Cloudflare.

Bento now only prints the wildcard + root A records the user must create
in their DNS provider, waits for explicit confirmation, then proceeds.
No tokens, no API calls, no stored credentials.

Removes:
- lib/cloudflare.sh
- install.sh's bootstrap_prompt_cloudflare + the Cloudflare source line
- The Cloudflare branch in lib/infra.sh's infra_ensure_dns
- README's "Option A / Option B" split — collapses into a single
  records-to-create table
- CLAUDE.md references to lib/cloudflare.sh

Cloudflare is still the recommended DNS host on technical merits; the
removal is only about how bento talks to it.
For users on Cloudflare, jumping to the DNS records page still takes 3-4
clicks of navigation in the dashboard. Cloudflare exposes a generic deep
link pattern (`dash.cloudflare.com/?to=/:account/:zone/dns`) that routes
the user through account + zone pickers and lands them on the DNS records
page for the selected zone.

It does not pre-fill the record form (Cloudflare reserves that for
official partners like Microsoft 365 via Domain Connect), so the wildcard
+ root A records still come from the table bento prints. The deep link
just trims navigation.

lib/infra.sh's Step 2 DNS prompt now shows the link alongside the records
table. README adds the same link with a short note explaining why it
isn't a full one-click flow.
After Step 3 finishes, bento auto-generates a self-contained HTML report
at ~/.local/share/bento/reports/handoff-<ts>.html (chmod 600) that the
operator can hand to the client. The report covers:

- VPS overview (public IP, domain, admin email, SSH hint)
- Traefik (ACME email, exposed ports)
- Portainer (URL, admin user, masked password)
- One card per bento-deployed application stack with URL + every resolved
  env, with secrets masked behind a click-to-reveal toggle (read from
  each manifest's `hide: true` flag)

The HTML inlines its own CSS + a tiny JS toggle; no external assets, so
the file works offline and prints cleanly. A print stylesheet
auto-reveals all secrets so the PDF is a complete record.

A new "Report — handoff HTML" item in the main menu lets the operator
regenerate the file at any time, e.g. after rotating credentials.
Adds a "Handoff report (HTML)" section to README explaining what the file
contains, where it lands, how to move it off the VPS with scp, and the
security caveat that it carries live credentials and should be delivered
over an encrypted channel.

CLAUDE.md adds lib/report.sh to the architecture diagram and the lib/
tree so contributors see the module at a glance.
Hetzner remains the primary recommendation (we validate every release
against it). Hostinger is added as a secondary option specifically for
Brazilian users:

- BRL billing avoids FX volatility for BR-based operators.
- Lower latency to Brazilian end users than Hetzner's European DCs.
- Same affiliate disclosure pattern as the Hetzner section — explicit
  that the link is referral, that it helps fund bento, and that signing
  up directly at hostinger.com is a fine alternative.

Affiliate link path: hostinger.com/br/smartdev
Rewrites the README to compress everything that was repeating itself —
notably the affiliate disclosures, the DNS automation explanation, and
the Hetzner firewall ruleset — behind <details> collapsibles so the
top-level page is scannable. Devs skim tables; beginners click to
expand. The full information is still there, just one click away.

Specific changes:

- Tagline below the logo is now a single em line; the banner PNG no
  longer carries text (just the bento tray + BENTO letters), so the
  tagline can be edited without regenerating the image.
- Affiliate sections for Hetzner and Hostinger collapse from ~30 lines
  to a single row each in a partner table, plus one shared <details>
  with the disclosure that applies to both.
- DNS section: drops the "why no token" rationale, keeps the records
  table and the Cloudflare deep link.
- Firewall section: 2-row summary table; full Hetzner ruleset behind
  <details>.
- New "What is bento" pitch leads with one bold sentence summarising
  the value proposition, then immediately shows the curl|bash command.
- Stacks listed as one compact table instead of three bulleted lists.
- Added GitHub Stars + Last commit badges for "alive" signals.

No information removed, only re-tiered. CLAUDE.md remains the deep
maintainer reference; the README is now the marketing surface.
Puts the curl|bash command on the very first content line below the
hero, so the experienced reader can copy and bounce without scrolling.
The deeper "What is bento" pitch and the explanatory sections follow
underneath for anyone evaluating bento for the first time.

The Table of contents is now inside a <details> collapsed by default so
it doesn't interrupt the TL;DR → context flow but stays one click away
for navigators.
LICENSE is the standard MIT text — the README and image labels have
been declaring MIT since the rebrand; the file itself was missing.
Copyright held by Felipe Fontoura.

CONTRIBUTING.md is one line: "Don't be an asshole." That's the rule.
GitHub auto-links to it from new issues and PRs so the bar is set
the moment someone shows up.
…AUDE.md

Adds standard GitHub templates:

- .github/ISSUE_TEMPLATE/bug.yml — modern form with required fields for
  "which step broke", distro/version, bento commit hash, and a logs box.
- .github/ISSUE_TEMPLATE/feature.yml — problem / proposal / alternatives.
- .github/ISSUE_TEMPLATE/stack-request.yml — upstream repo, dependencies
  checkbox, pitch for why bento should adopt it.
- .github/ISSUE_TEMPLATE/config.yml — blank issues off, questions routed
  to Discussions.
- .github/PULL_REQUEST_TEMPLATE.md — description / why / type / test plan
  / conventions checklist plus a tongue-in-cheek "not being an asshole"
  line that points to CONTRIBUTING.md.

CLAUDE.md gets a new "Repo meta" section that links LICENSE and
CONTRIBUTING.md via the `@` syntax so Claude Code auto-loads them as
context, and enumerates the four templates so the maintainer guide
stays in sync with what's in .github/.
…rything

Two manifests were defaulting to two-level subdomains, which a plain
`*.${BASE_DOMAIN}` wildcard does NOT cover (wildcards match exactly one
label per RFC 4592):

- typebot: TYPEBOT_BUILDER_HOST `builder.typebot.${BASE_DOMAIN}`
  → `typebot-builder.${BASE_DOMAIN}`
- n8n-mcp: N8N_MCP_HOST `mcp.n8n.${BASE_DOMAIN}`
  → `n8n-mcp.${BASE_DOMAIN}`

Operators who prefer the nested style can still override at the
prompt, but the default no longer requires extra `*.typebot.<domain>`
or `*.n8n.<domain>` wildcards to make Let's Encrypt succeed.
Bento never deploys anything at the bare `${BASE_DOMAIN}` — only at
subdomains under it. Requiring an A record at the root was confusing for
operators who already have a website there: following the README would
shadow their existing setup.

README and lib/infra.sh's Step 2 DNS check now ask only for the
wildcard, with an explicit note that the root domain is left untouched
so an existing site keeps working.
The old check forced the user to create a sudo user before pasting the
curl|bash command. On a brand-new Hetzner/Ubuntu VPS there's only root,
which made bento's quickstart involve `adduser felipe / usermod -aG sudo
/ rsync .ssh / exit / ssh again` before even starting — exactly the kind
of friction bento exists to remove.

In practice bento needs root throughout (kernel sysctl, package install,
docker swarm init), so blocking root was paranoia. The new check just
requires *some* path to privilege: root, or a user with sudo available.

README's Requirements section is reworded to match.
…start

User-visible problem: pasting the curl|bash command on a fresh VPS
produced ~20 seconds of apt-get output with no banner, no progress, and
no confirmation that the right URL had landed. On Ubuntu 24.04+ the run
also hung on needrestart's whiptail "which services to restart?" prompt
because we never set NEEDRESTART_MODE.

Applies Nielsen's heuristics to the bootstrap surface:

- Visibility (H1): boot.sh prints a tiny "▸ bento bootstrap  ref: X"
  prebanner immediately, before any apt-get call. Each pre-flight
  check (distro, privileges, network, disk, git, clone) shows a
  ⏵ line that flips to ✓ on success — the user sees forward motion
  from the first second.
- Error prevention (H5): network and disk are validated before any
  install attempt.
- Recovery (H9): every failure points at /tmp/bento-deps.log instead
  of swallowing apt's stderr.
- Aesthetic / minimalism (H8): apt-get output is redirected to the log;
  the terminal carries one step line per operation, overwritten with
  ✓ when done.
- Consistency (H4): boot.sh + lib/deps.sh share the same salmon/wasabi
  ANSI palette that the gum banner uses later, so the visual reads as
  one continuous experience.

Functional fix beneath the UX: every apt-get invocation in boot.sh,
lib/deps.sh, and lib/hardening.sh now sets NEEDRESTART_MODE=a so
needrestart auto-restarts services without prompting, and
DEBIAN_FRONTEND=noninteractive prevents any other dialog from blocking
the run.

lib/deps.sh's apt output is captured to /tmp/bento-deps.log; the
terminal shows "Installing core packages…" → "Core packages ready"
and "Installing gum…" → "gum installed" instead of streamed dpkg
output.
Adds an explicit "English only" callout to CLAUDE.md and the
add-app-stack skill so every contributor (human or AI) knows the rule
without having to infer it from existing files. Fixes the one PT-BR
leak that was hiding in lib/ui.sh as a comment.

Every user-facing string — prompts, log lines, error output, READMEs,
in-code comments, commit messages going forward — must be in English.
Bento has users outside Brazil; mixed-language strings break docs
tooling and confuse contributors.
Real symptom: pasting the curl|bash on a fresh VPS got past the
prebanner and the core-deps install, then crashed with

  /root/.local/share/bento/lib/ui.sh: line 10: BENTO_COLOR_SALMON: readonly variable

Cause: install.sh sources lib/ui.sh, then later sources lib/banner.sh,
which independently sources lib/ui.sh again. The second pass tried to
re-declare the `readonly` palette constants and aborted under `set -e`
inherited from boot.sh. The same pattern affects every lib/ module that
has `readonly` globals — they were just lucky not to trip in earlier
runs because deps.sh failed first on Ubuntu 26.04's needrestart prompt.

Fix: add an idempotent source guard at the top of every lib/*.sh:

    [[ -n "${_BENTO_<MOD>_LOADED:-}" ]] && return 0
    _BENTO_<MOD>_LOADED=1

This is the standard bash library pattern (akin to header include
guards in C). Second `source` of any lib short-circuits at the guard,
so `readonly` runs exactly once per shell process.

Applied to: ui.sh, state.sh, deps.sh, banner.sh, portainer.sh,
infra.sh, stacks.sh, report.sh. lib/install-helpers.sh is excluded
because it runs in a fresh process per per-stack install.sh anyway.
install.sh sources both lib/infra.sh and lib/stacks.sh. Both files were
independently declaring `readonly BENTO_REPO_ROOT="$(...pwd)"`, so the
second source aborted with "readonly variable" — the source guards from
the previous commit prevented each *file* from being sourced twice but
didn't stop two *files* from claiming the same readonly name.

install.sh already exports BENTO_REPO_ROOT at the top. The libs now
fall back to that with `: "${BENTO_REPO_ROOT:=$(...pwd)}"`, which
assigns only when unset — safe on every re-source and still works when
a single lib is sourced standalone (smoke tests).

Same treatment for BENTO_REPO_URL, BENTO_REPO_REF, and
BENTO_INFRA_STACK_NAME — single-owner constants stop fighting each
other.
Running boot.sh through `ssh host 'bash <(curl ...)'` (no PTY) leaves
TERM unset, which breaks any apt postinst hook that calls tput, and
later breaks `clear` in banner_render. Exports a sane default early so
the whole chain works whether the user paste is interactive or scripted.
End-to-end zero-prompt run for the "paste into Hetzner Cloud config or
SSH and walk away" scenario. Inputs come from env vars instead of gum:

  BENTO_UNATTENDED=1          # toggle
  BENTO_BASE_DOMAIN=...       # required
  BENTO_ADMIN_EMAIL=...       # default admin@<domain>
  BENTO_ADVERTISE_ADDR=...    # default auto-detect via ifconfig.me
  BENTO_APPS=postgres,redis,n8n,plunk   # comma-separated app stack keys
  BENTO_ENV_<STACK>_<VAR>=...           # per-stack override for prompts

Flow:
- bootstrap_from_env writes state.json from BENTO_* envs.
- Step 1: auto-detects if hardening already ran (docker present + reboot
  sentinel), marks done and skips. Otherwise runs hardening, installs a
  one-shot bento-resume.service before sudo reboot. After reboot
  systemd re-runs install.sh with the same env file and picks up at
  Step 2.
- Step 2: infra_ensure_dns polls portainer.<domain> for 120s instead of
  asking ui_confirm.
- Step 3: depends_on graph is walked depth-first, each stack deployed
  via Portainer API. Per-prompt envs resolve in this order — existing
  state → from_state → generated → manifest default → BENTO_ENV_*
  override → empty (fail if required).
- Report HTML auto-generated at the end.

Side fixes that landed with this:
- `from_state` lookup actually searches every deployed stack's envs for
  the named key, instead of only `.envs.global.X` and `.bootstrap.X`
  (neither of which existed). This made n8n, plunk, etc. silently fail
  to pick up POSTGRES_PASSWORD from the postgres stack.
- Hardening invocation now passes NEEDRESTART_MODE=a +
  DEBIAN_FRONTEND=noninteractive even when triggered through bento
  (was only set in the user-side shell before).
…eady

Two bugs that would have blown up every Step 3 deploy that depended on
postgres (n8n, plunk, typebot, evolution-api, chatwoot):

1. postgres_container searched for "db_postgres" in container names, but
   Swarm names containers as <stack>_<service>.<task-id>. The postgres
   stack uses key "postgres" and service name "postgres", so containers
   are "postgres_postgres.xxxxx". The lookup found nothing.

2. ensure_database / psql_exec ran immediately after the postgres stack
   was created via Portainer's API, which returns before containers are
   actually up. The first psql call would race the boot of postgres.
   Adds _wait_for_postgres that polls `pg_isready` for up to 120s before
   running any SQL, so installers like plunk/n8n actually find the DB
   server alive.
… host

Two cascading failures on single-node Swarm:

1. Portainer agent tries to do cluster discovery via tasks.<service>
   DNS at startup, before any peer exists, and dies with
   "lookup tasks.infra_agent on 127.0.0.11:53: no such host" — the
   service then crash-loops, Portainer waits forever for an
   unreachable upstream. On single-node setups Portainer can talk to
   docker.sock directly, so the agent service is removed entirely and
   Portainer is reconfigured with -H unix:///var/run/docker.sock.

2. portainer/portainer-ce ships without a shell, so the
   CMD-SHELL wget healthcheck couldn't even be exec'd ("sh: executable
   file not found"). Swarm flipped the container to unhealthy and
   killed it, even though the HTTP server was already listening. The
   healthcheck is dropped; bento's portainer_wait_ready (lib/portainer.sh)
   does the real readiness probe externally.

For bento to do that external probe, Portainer 9000 is now published
to the host via `mode: host` ports. lib/portainer.sh adds
portainer_local_url() returning http://127.0.0.1:9000 and every API
call switches from portainer_base_url (which tracks the public HTTPS
URL for reports) to the local one. The cert-protected public URL is
still what bento writes into state for the handoff HTML.
stacks.sh — BENTO_REPO_REF default was refs/heads/main, but Portainer
clones whatever ref bento sends, and main is behind feat/bento-platform
where the per-stack directories actually live. Every Step 3 deploy
returned HTTP 500 with "open /data/compose/N/stacks/db/postgres/compose.yml:
no such file or directory" because main has no such file yet. Now
defaults to the branch the local clone is on (via git symbolic-ref),
falling back to main when git isn't available.

portainer.sh — added portainer_invalidate_token() so retry paths can
drop a stale cached JWT and force a fresh /api/auth. Was hitting a
state where Portainer's session timeout or rate-limit recovery rejected
the cached token, but bento kept reusing it forever and failed every
subsequent call.
evolution-api and plunk were both reading/writing Redis DB 2, so they
silently stepped on each other's keys. Reallocates per stack:

  /0 free  /1 n8n  /2 plunk  /3 chatwoot  /4 evolution-api  /5+ free

The header comment in the compose now lists the full allocation so
future stacks land on a free index without re-auditing the tree.
Adding a stack to BENTO_APPS on a second unattended run was trying to
re-deploy everything in the list, including the postgres/redis/n8n that
were already up. Portainer rejected duplicates and the dependency walk
got confused.

Pre-populates the deploy_with_deps "seen" set from state.stacks.* — so
the second run only touches stacks that don't have a stack_id recorded
yet. Letting the user grow the app list incrementally is a basic
expectation; this makes it work.
editor was capped at 256M, but n8n runs 100+ TypeORM migrations on
first boot — the container was killed with exit 137 mid-migration and
Swarm restarted it, restarting the migration churn, never reaching
ready state.

editor: 256M → 768M (cpus 0.5 → 1)
webhook: 256M → 512M (cpus stays at 0.5)
worker stays at 768M (already enough)

Total n8n peak memory now ~2GB. Combined with postgres/redis/plunk,
fits a CX22 with breathing room; heavier sets (chatwoot, paperclip)
would justify a CX32.
…crets

openssl rand -base64 N | tr -d '\n=/+' | head -c 32 has a hidden bug:
tr strips chars first, leaving fewer than 32, and head -c just takes
whatever exists. Typebot validation rejects anything below 32 chars
and refused to start with "ENCRYPTION_SECRET: Too small: expected
string to have >=32 characters".

Switches to `openssl rand -hex N` everywhere, which always emits exactly
2N chars from [0-9a-f] — no quoting hazards for postgres/Rails/Next.js
either:

  postgres POSTGRES_PASSWORD       -> -hex 16 (32 chars)
  typebot  TYPEBOT_ENCRYPTION_SECRET -> -hex 16 (32 chars)
  rabbitmq RABBITMQ_DEFAULT_PASS   -> -hex 12 (24 chars)
Two bugs in one stack:

- The seed config.yaml was written with `cat <<EOF` inside a heredoc
  whose body was indented with 4 spaces. YAML kept those spaces, so
  the `server` key never landed at column 0 and CLIProxyAPI fell back
  to a random ephemeral port ("API server started successfully on: :0"
  in the logs). The wget healthcheck on 127.0.0.1:8317 then failed
  forever and Swarm churned the task. Rewritten with `printf 'server:
  \n  port: 8317\n' >` which sidesteps YAML indent entirely.

- The host port publish (8317:8317) wasn't needed — Traefik fronts the
  service via the overlay network. Dropped, since nothing else
  consumes the host-side port.

- Volumes were `external: true` for no reason on a fresh install;
  switched to `driver: local` so the stack works without prior `docker
  volume create` ceremony.
postgres 5432, redis 6379, and rabbitmq 5672 were all bound to the
host's :0.0.0.0 — so on a freshly-bento'd VPS with UFW inactive (the
default until full hardening is rerun) those services were exposed to
the public internet. Apps reach all three via the overlay network as
service names (`postgres`, `redis`, `rabbitmq`) already, so the host
publish was contributing nothing but attack surface.

Drops the `ports:` block from each. The rabbitmq management UI keeps
its HTTPS exposure via Traefik (15672 -> the public host). Comments
explain the SSH-tunnel pattern for dev access without re-opening
ports.
Upstream's config.example.yaml uses a flat `port: 8317` at the document
root, not `server.port`. The previous seed wrote the wrong shape, so
CLIProxyAPI silently fell back to an ephemeral port (logged as ":0"),
the wget healthcheck on :8317 found nothing, and Swarm crash-looped
the task. Switches the seed to `printf 'port: 8317\n'`.

Refs:
  https://github.com/router-for-me/CLIProxyAPI/blob/main/config.example.yaml
  https://help.router-for.me/configuration/basic
stacks.sh: when a stack's compose declares `build:`, bento now runs
`docker compose build --pull` locally before calling Portainer's create
stack endpoint. Swarm's stack deploy ignores `build:` and tries to pull
the image, so the build step is what makes paperclip-custom actually
exist on the daemon. The build output goes to /tmp/bento-build-<key>.log
so the terminal stays quiet; the path is printed on failure for tail.

typebot/compose.yml: bumps builder + viewer to 1 GB and pins
NODE_OPTIONS=--max-old-space-size=768 inside the typebot-common
environment block. Without the V8 cap the runtime hit assertion failures
and SIGABRT'd at startup well before the 1 GB container limit; capping
the old-space lets Node abort gracefully on its own bookkeeping. 768 MB
of heap inside a 1 GB container leaves ~256 MB for sockets/buffers.
…ndalone

- typebot: drop Swarm healthcheck on builder + viewer. Next.js renders /
  through a /signin redirect whose SSR compile can exceed the 10s probe
  timeout on small VPS, causing Swarm to SIGKILL healthy containers.
  Traefik does its own backend health checking externally.
- chatwoot install.sh: stop waiting for chatwoot_web to become healthy —
  it never will until migrations run, so we were deadlocking. Run
  db:chatwoot_prepare in a one-shot docker container instead.
- lib/stacks.sh: export every resolved env for the stack (not just
  POSTGRES_PASSWORD) so install scripts can read CHATWOOT_HOST,
  CHATWOOT_SECRET_KEY_BASE, etc. without cracking open state.json.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant