Skip to content

ci: publish multi-arch (amd64+arm64) Docker images via native runners#39

Merged
KE7 merged 1 commit into
mainfrom
ci/publish-multiarch-images
May 30, 2026
Merged

ci: publish multi-arch (amd64+arm64) Docker images via native runners#39
KE7 merged 1 commit into
mainfrom
ci/publish-multiarch-images

Conversation

@KE7
Copy link
Copy Markdown
Owner

@KE7 KE7 commented May 21, 2026

Summary

The publish-runners workflow previously built and pushed amd64-only images because docker/setup-qemu-action, docker/setup-buildx-action, and the platforms: key were all absent from every docker/build-push-action step. Apple Silicon users on macOS pull the amd64 image and run it under Rosetta 2 / QEMU emulation.

This PR refactors the workflow to publish true multi-arch (linux/amd64 + linux/arm64) manifest lists using a native-runner build+merge pattern. Native runners (ubuntu-latest for amd64, ubuntu-24.04-arm for arm64) avoid the ~3-5× QEMU slowdown that would otherwise plague the backend images' npm install steps.

Job graph

build-base       (2 jobs:  amd64 on ubuntu-latest, arm64 on ubuntu-24.04-arm)
     |              each builds helix-evo-runner-base, pushes by digest
     v
merge-base       (1 job:   combines the 2 digests into a multi-arch manifest list)
     |
     v
build-backends   (10 jobs: 5 backends × 2 arches; builds against the merged base)
     |
     v
merge-backends   (5 jobs:  one per backend; combines its 2 arch digests)
     |
     v
verify           (smoke-tests every published image, unchanged)

Verification after release tag

After the next v* tag triggers this workflow:

docker buildx imagetools inspect ghcr.io/ke7/helix-evo-runner-base:latest
# Must show:
#   MediaType: application/vnd.oci.image.index.v1+json
#   Platform:  linux/amd64
#   Platform:  linux/arm64

# From Apple Silicon (was returning x86_64 under Rosetta before):
docker pull ghcr.io/ke7/helix-evo-runner-claude:latest
docker run --rm ghcr.io/ke7/helix-evo-runner-claude:latest uname -m
# Expected: aarch64

Reviewer pass (16 criteria, all PASS)

  • YAML syntactic correctness, on: trigger unchanged, env: block preserved, permissions preserved
  • Job graph needs: chain correct, ubuntu-24.04-arm runner name correct
  • build-backends 2D matrix (image × platform + include for runner pinning) expands to exactly 10 jobs
  • push-by-digest=true,name-canonical=true,push=true on build steps (no tags: key on build)
  • Digest artifact upload/download naming consistent across all build/merge pairs
  • docker buildx imagetools create jq invocation correct; tag scheme preserved (latest/ref/tag/semver)
  • verify job preserved + retargeted at merge-backends
  • fail-fast: false on both build matrices
  • No accidental load: true
  • Per-Dockerfile arm64 sanity checked (Cursor installer detects arm64; claude/codex/gemini/opencode Dockerfiles have no x86-only hard-codings)
  • Backward compat: amd64 pullers still get amd64 transparently (multi-arch manifest list serves both arches)

S1 polish applied (removed unused actions/checkout@v6 from merge-base and merge-backends — saves ~10-15s per merge run). S2 deferred (adding OCI labels via metadata-action in build jobs is cosmetic; images are functional without).

Root-cause report

See /Users/ke/helix-arm64-image-debug.md §6 for the design rationale (native-arm64-runner vs QEMU emulation trade-off) and full diagnostic evidence (manifest inspect output, config blob arch=amd64 confirmation).

The publish-runners workflow previously built and pushed amd64-only images
because docker/setup-qemu-action, docker/setup-buildx-action, and the
platforms: key were all absent from every docker/build-push-action step.
Apple Silicon users on macOS pulled the amd64 image and ran it under
Rosetta 2 / QEMU emulation.

Refactor to a build+merge pattern using native runners (ubuntu-latest for
amd64, ubuntu-24.04-arm for arm64) — avoids QEMU's 3-5x slowdown on the
backend images' npm install steps. 5-job graph:

  build-base       (2 jobs: per-arch builds of helix-evo-runner-base)
       |
       v
  merge-base       (1 job: combines 2 arch digests into manifest list)
       |
       v
  build-backends   (10 jobs: 5 backends x 2 arches; each builds against
                    the merged multi-arch base)
       |
       v
  merge-backends   (5 jobs: one per backend; combines its 2 arch digests
                    into a multi-arch manifest list)
       |
       v
  verify           (smoke-tests pulled images, unchanged)

Each build job pushes by digest (push-by-digest=true, name-canonical=true)
and uploads the digest as an artifact.  Each merge job downloads the
relevant digests and runs `docker buildx imagetools create` to combine them
into a multi-arch manifest list at the canonical tag (latest / version /
ref) computed by docker/metadata-action@v5.

Verification (after the next v* tag triggers this workflow):

  docker buildx imagetools inspect ghcr.io/ke7/helix-evo-runner-base:latest
  # Must show MediaType: application/vnd.oci.image.index.v1+json
  # plus TWO Platform: entries (linux/amd64, linux/arm64).

  docker pull ghcr.io/ke7/helix-evo-runner-claude:latest
  docker run --rm ghcr.io/ke7/helix-evo-runner-claude:latest uname -m
  # On Apple Silicon: aarch64 (was: x86_64).

Reviewer (sibling agent) approved all 16 review criteria. Polish S1
applied (removed extraneous actions/checkout@v6 from merge-base and
merge-backends — no repo files used there).  Polish S2 (adding OCI
labels to per-arch builds via metadata-action duplication) deferred
as cosmetic.

Refs: helix-arm64-image-debug.md, helix-multiarch-publish-review.md
@KE7 KE7 merged commit 48c5926 into main May 30, 2026
2 checks passed
@KE7 KE7 deleted the ci/publish-multiarch-images branch May 30, 2026 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant