Skip to content

Add Docker Compose deployment workflow#65

Draft
wangwanjie wants to merge 7 commits intonexu-io:mainfrom
wangwanjie:feature/docker-compose-deploy
Draft

Add Docker Compose deployment workflow#65
wangwanjie wants to merge 7 commits intonexu-io:mainfrom
wangwanjie:feature/docker-compose-deploy

Conversation

@wangwanjie
Copy link
Copy Markdown

Summary

  • Add a single Alpine-based Docker runtime image for Open Design.
  • Add Docker Compose deployment with persistent /app/.od storage, health checks, memory limits, and .env.example defaults.
  • Add Docker Hub publishing and verification scripts with multi-arch support, proxy detection, skopeo push support, and manifest checks.

Test Plan

  • bash -n deploy/scripts/publish-images.sh && bash -n deploy/scripts/verify-image.sh && bash -n deploy/scripts/verify-image-manifest.sh
  • docker compose --env-file deploy/.env.example -f deploy/docker-compose.yml config
  • deploy/scripts/publish-images.sh --dry_run --arch arm64 --image_tag pr-check
  • git diff --check
  • docker buildx build --platform linux/arm64 -t open-design:pr-check -f deploy/Dockerfile --load .
  • deploy/scripts/verify-image.sh open-design:pr-check

Published Image

  • docker.io/vanjayak/open-design:latest
  • Manifest includes linux/amd64 and linux/arm64.

@lefarcen lefarcen self-requested a review April 29, 2026 10:38
@lefarcen lefarcen added the enhancement New feature or request label Apr 29, 2026
@lefarcen
Copy link
Copy Markdown
Contributor

Hi @wangwanjie! 🎉

Thanks for this comprehensive Docker deployment contribution — great to see a production-ready Alpine runtime with multi-arch support, health checks, and memory limits built in!

I will run a deep review across code correctness + deployment design and get back to you within 24h.

Join our community while you wait:

Thanks for making open-design better!
— open-design team

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc7d4ee173

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

image_sources=()

for platform in "${platform_list[@]}"; do
ensure_base_images_preloaded "$platform"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate base-image preloading by push strategy

The main loop always calls ensure_base_images_preloaded before each build, and that helper always invokes skopeo copy when PRELOAD_BASE_IMAGES=1. This makes --push_strategy buildx fail on hosts without skopeo, even though the script advertises buildx as an alternative push path. In practice, users selecting buildx to avoid skopeo still hit a hard failure before any build starts.

Useful? React with 👍 / 👎.

Comment thread deploy/scripts/publish-images.sh Outdated
[[ "$INSPECT_AFTER_PUSH" == "1" ]] || return 0

if [[ "$PUSH_STRATEGY" == "skopeo" ]]; then
skopeo inspect --raw "docker://${image}" >/dev/null
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reuse authfile for post-push skopeo inspection

After a successful skopeo push, the script validates the remote image with skopeo inspect --raw but does not pass --authfile. When credentials come from credsStore (and are materialized into EFFECTIVE_SKOPEO_AUTHFILE earlier), private repositories can push successfully and then fail at inspection with an auth error, causing false-negative publish failures.

Useful? React with 👍 / 👎.

Comment thread deploy/scripts/verify-image-manifest.sh Outdated
os="${platform%/*}"
arch="${platform#*/}"
if ! jq -e --arg os "$os" --arg arch "$arch" '
.mediaType == "application/vnd.docker.distribution.manifest.list.v2+json" and
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Accept OCI index manifests in platform verifier

The manifest verifier hard-requires Docker manifest-list media type before checking platform entries. Valid multi-arch images published as OCI image indexes are therefore reported as missing platforms even when the manifests[] entries are correct, which can block release verification for compliant OCI images.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@lefarcen lefarcen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @wangwanjie, thanks for the PR! 🎉 This is a solid foundation for Docker deployment — Alpine multi-stage build, multi-arch support, memory limits, health checks, and thorough verification scripts. The structure is clean and the test plan is comprehensive.

Found 5 P1 security/correctness issues that need fixes before merge, plus several P2/P3 improvements. The core concerns:

  1. Shell injection risk in publish-images.sh (unquoted user-supplied variables)
  2. Docker credential exposure via credential helpers (cleartext secrets in temp file)
  3. TOCTOU race in temp file handling
  4. Missing input validation on critical paths
  5. Container escape risk via --add-host in untrusted environments

P1 Issues (must fix)

deploy/scripts/publish-images.sh:218 - Credential exposure via cleartext temp file

When Docker uses a credential helper, this code extracts the secret and writes it to a temp file in cleartext. If the process crashes before cleanup, the temp file persists with credentials readable by any user.

Fix: Use mktemp --tmpdir=/dev/shm/... (memory-backed) + chmod 600 immediately.

deploy/scripts/publish-images.sh:99 - Shell injection via unquoted variable

proxy_url (from HTTP_PROXY env) is passed unquoted into Perl. Add input validation: [[ ! "$proxy_url" =~ ^https?://[a-zA-Z0-9._-]+(:[0-9]+)?$ ]]

deploy/scripts/publish-images.sh:240 - TOCTOU race in temp directory

Use rm -rfP to avoid symlink attacks, or verify directory before deletion.

deploy/scripts/publish-images.sh:322 - Missing input validation on --image_namespace/--image_repository

Validate: [[ ! "$IMAGE_NAMESPACE" =~ ^[a-zA-Z0-9_-]+$ ]] to prevent path traversal in image refs.

deploy/scripts/publish-images.sh:319 - Container escape risk via --add-host

--add-host host.docker.internal=host-gateway exposes host network. Make opt-in or document security assumption.

P2 Issues (recommended)

  • Dockerfile:34 - Native module stripping may break better-sqlite3. Either test after stripping or skip .node files.
  • docker-compose.yml:20 - Add security_opt: no-new-privileges + read_only: true for defense in depth.
  • .env.example:9 - Document tested memory workloads (384m may OOM on large exports).

P3 Improvements

  • server.js:1087 - IPv6 :: bind shows misleading 'localhost' URL
  • README.md:23 - Add guidance on agent CLI installation (volume mount vs derived image)

Most P1s can be fixed with input validation + proper quoting. Let me know if you need clarification on any of these!

Harden publishing inputs and temporary credential handling, and tighten Docker runtime defaults requested by the PR review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wangwanjie
Copy link
Copy Markdown
Author

Thanks for the detailed review. I've pushed fixes for the reported security/correctness issues:

  • Added proxy URL and image namespace/repository validation in publish-images.sh.
  • Removed unconditional host.docker.internal=host-gateway; it is now only added when a local proxy rewrite is needed.
  • Hardened temporary credential/authfile handling with private temp roots, chmod 600, and centralized cleanup.
  • Reused the effective skopeo authfile for post-push inspection.
  • Gated base-image preloading to the skopeo strategy so --push_strategy buildx no longer requires skopeo preloading.
  • Accepted OCI image indexes in the manifest verifier.
  • Added Compose hardening with read_only, /tmp tmpfs, and no-new-privileges.
  • Removed native .node stripping from the Dockerfile and fixed IPv6 listen URL formatting.

Validation run:

  • bash -n deploy/scripts/publish-images.sh && bash -n deploy/scripts/verify-image-manifest.sh
  • git diff --check
  • dry-run checks for skopeo/buildx, proxy validation, image namespace/repository validation
  • docker compose --env-file deploy/.env.example -f deploy/docker-compose.yml config
  • node --check daemon/server.js

I could not run pnpm typecheck locally because this checkout has no node_modules, and the current local Node version is v25 while the project engine requires >=20 <23.

Copy link
Copy Markdown
Contributor

@lefarcen lefarcen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work, @wangwanjie! 🎉 All P1 security issues have been fixed comprehensively:

Shell injection - Proxy URL validation with regex checks
Credential exposure - Memory-backed temp roots ( or 700-perms fallback) + on authfiles
TOCTOU race - Centralized temp dir management with cleanup validation
Input validation - Namespace/repository validation with Docker name component regex
Container escape - now conditional (only when proxy rewrite needed)

P2/P3 fixes:

  • Native .node stripping removed from Dockerfile
  • IPv6 listen host formatting ( for IPv6 any)
  • Compose hardening: read_only: true, tmpfs, no-new-privileges
  • Skopeo auth reused for post-push inspection
  • Base-image preloading gated to skopeo strategy

The implementation is solid — temp file handling is now defense-in-depth (private roots + restrictive perms + centralized cleanup), input validation is thorough, and the Compose runtime is properly hardened.

Remaining optional improvements (P3, non-blocking):

  1. deploy/.env.example memory docs (P2 → P3 after your validation work) — Consider adding a workload note like:

  2. deploy/README.md agent CLI installation — Add the options subsection I suggested (volume-mount vs derived image)

But these are polish — the core security/correctness is now excellent. Approving! 🚀

Copy link
Copy Markdown
Contributor

@lefarcen lefarcen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work, @wangwanjie! 🎉 All P1 security issues have been fixed comprehensively:

Shell injection - Proxy URL validation with regex checks
Credential exposure - Memory-backed temp roots (/dev/shm or 700-perms fallback) + chmod 600 on authfiles
TOCTOU race - Centralized temp dir management with cleanup validation
Input validation - Namespace/repository validation with Docker name component regex
Container escape - --add-host now conditional (only when proxy rewrite needed)

P2/P3 fixes:

  • Native .node stripping removed from Dockerfile
  • IPv6 listen host formatting for :: bind
  • Compose hardening: read_only, tmpfs, no-new-privileges
  • Skopeo auth reused for post-push inspection
  • Base-image preloading gated to skopeo strategy

The implementation is solid — temp file handling is now defense-in-depth (private roots + restrictive perms + centralized cleanup), input validation is thorough, and the Compose runtime is properly hardened.

Remaining optional improvements (P3, non-blocking):

  1. deploy/.env.example memory docs — Consider adding a workload note (idle ~20 MB; single 10 MB export peaks ~180 MB; concurrent agent +50 MB each → 384m supports ~3 concurrent + moderate exports).
  2. deploy/README.md agent CLI installation — Add the options subsection I suggested (volume-mount vs derived image).

But these are polish — the core security/correctness is now excellent. Approving! 🚀

Copy link
Copy Markdown
Contributor

@lefarcen lefarcen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All security issues resolved! This is ready to merge from a security/correctness perspective. The Docker deployment is production-ready. 🎉

@lefarcen
Copy link
Copy Markdown
Contributor

Hi @wangwanjie! 👋

The PR has a merge conflict that needs to be resolved before we can merge. Could you please:

git checkout main
git pull origin main
git checkout docker-deploy
git rebase main
# Resolve any conflicts
git push -f origin docker-deploy

Once the conflicts are resolved, we'll be ready to merge — the code review is already approved! ✅

@lefarcen
Copy link
Copy Markdown
Contributor

Hi @wangwanjie, your PR has a merge conflict with main. Could you rebase and resolve it? Once the conflict is fixed, we can merge immediately — all security issues have been addressed! 🚀

wangwanjie and others added 2 commits April 30, 2026 12:12
Resolve the daemon monorepo migration conflict and adapt Docker deployment files to the new workspace layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Set CI=true during the image build so pnpm prune can run non-interactively inside Docker.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread deploy/Dockerfile Outdated

FROM ${RUNTIME_IMAGE}

RUN apk add --no-cache nodejs tini && \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangwanjie Thanks for putting this Docker workflow together — the overall direction is really useful. One thing I’d tighten before merge: the build stage uses Node 24, but the runtime stage installs nodejs from Alpine 3.22, which may not be Node 24. Since better-sqlite3 is native and built during install, running it under a different Node major/ABI can cause startup failures. Could we use a Node 24 runtime image too, or otherwise pin/install exactly Node 24 in the final stage? 🙏

Comment thread deploy/scripts/verify-image.sh Outdated
for required_path in \
"app/apps/daemon/server.js" \
"app/apps/web/out/index.html" \
"app/node_modules/better-sqlite3" \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small verification follow-up: in this pnpm workspace, better-sqlite3 is a daemon package dependency, so the direct workspace symlink should be under app/apps/daemon/node_modules/better-sqlite3 rather than app/node_modules/better-sqlite3. Updating this check would make the verifier match the actual runtime layout and avoid false failures. 😊

wangwanjie and others added 3 commits April 30, 2026 13:15
Use pnpm deploy for the daemon package so the runtime image includes production dependencies where Node resolves them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Allow pnpm v10 deploy to package the daemon workspace without requiring injected workspace packages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Use Node 24 for both build and runtime stages and update image verification for the workspace daemon dependency layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wangwanjie wangwanjie marked this pull request as draft April 30, 2026 07:13
@lefarcen
Copy link
Copy Markdown
Contributor

lefarcen commented May 2, 2026

Hi @wangwanjie!

The bot review on this PR is positive overall, but there are now merge conflicts with main (about 44h since the last update).

Could you rebase against latest main? If you'd like a hand walking through the rebase, just reply here. Once conflicts are resolved, a maintainer can take a final look and merge.

If you've moved on, no worries — feel free to close.

Thanks for the contribution!
— open-design team

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants