feat(sidecar): build pipeline for embedded rapid-mlx artifact#568
Merged
Conversation
Codifies the Phase 2 spike recipe (rapid-desktop docs/plans/sidecar-bundling-phase-2-spike.md) into a reproducible artifact factory. Output: a signed tar.gz uploaded to GitHub Releases of raullenchai/Rapid-MLX as the asset the desktop release workflow pulls into Rapid.app/Contents/Resources/rapid-mlx/. ## scripts/build-sidecar.sh Driver script (~200 LOC). Steps: 1. Verify arm64 runner (mlx is Apple Silicon only). 2. Download python-build-standalone 3.12.13 (pinned tag 20260610) to /tmp, extract into $STAGE/python/. 3. pip install rapid-mlx + runtime deps into $STAGE/site-packages (driven by host python because bundled has ensurepip stripped). 4. Strip dev/unused artifacts (ensurepip, idlelib, tkinter, test, mlx/include, mlx/lib/cmake, __pycache__). 5. Install the shim entrypoint at $STAGE/bin/rapid-mlx. 6. Enumerate Mach-Os. Baseline is 77; tolerance is ±3 (small wheel drift OK, bigger means new dependency — block and require spike re-validation). 7. Codesign every Mach-O with the entitlements (--options runtime --timestamp). Skippable via --skip-codesign for unsigned PR runs. 8. Package as tar.gz + write SHA-256 sidecar. 9. Smoke test: env-stripped `rapid-mlx --version` and bundled python import mlx + zero matmul (proves Metal JIT path). Knobs (env vars): OUT_DIR, DEVELOPER_ID, PBS_TAG, PBS_VERSION, MACHO_BASELINE_COUNT, MACHO_TOLERANCE Exit codes: 0 success; 1 generic; 2 Mach-O count drift; 3 smoke failure (signing fine, runtime broken). ## scripts/sidecar-shim.sh Tiny /bin/sh entrypoint installed as $STAGE/bin/rapid-mlx. Pins PYTHONHOME/PYTHONPATH/PYTHONNOUSERSITE so a host `pip install --user mlx==<other>` can't leak a different mlx.so. Resolves one level of symlink (covers the design-doc plan of a runtime-override symlink at user scope). BSD-readlink-friendly — no readlink -f. ## scripts/sidecar-entitlements.plist Three entitlements, all empirically required by Phase 2 spike: - com.apple.security.cs.allow-jit - com.apple.security.cs.disable-library-validation - com.apple.security.cs.allow-unsigned-executable-memory Same shape as rapid-desktop Resources/Rapid.entitlements (Phase 1). Library-validation entitlement is mandatory — without it dlopen of every wheel .so fails with "different Team IDs". ## .github/workflows/sidecar-build.yml Triggered by sidecar-v* tag push or workflow_dispatch. macos-15 arm64 runner, 30-min timeout. Secret-gated codesigning: if all of APPLE_DEVELOPER_ID_APP / APPLE_SIGNING_CERTIFICATE / APPLE_SIGNING_PASSWORD are configured, codesigns and uploads to a GitHub Release with release notes carrying the SHA-256. Otherwise runs --skip-codesign for unsigned PR/test runs (artifact still uploaded as a workflow artifact, just not Released). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #568 validation scorecardTitle: feat(sidecar): build pipeline for embedded rapid-mlx artifact Verdict: MERGE-SAFE
Details
|
…in cleanup Round 1 codex review on PR #568 returned 4 BLOCKING findings; one (B3, entitlement mismatch) was already resolved by merged rapid-desktop PR #38 which added the third entitlement to Rapid.entitlements. This commit addresses the remaining three: * **B1 — smoke runs BEFORE packaging.** Old order produced the tarball then smoke-tested the staged bundle. A smoke failure halted set -e before Release upload, but only after burning CI minutes producing an artifact we'd immediately throw away. Reorder so smoke fails fast. * **B2 — Mach-O floor guard before drift check.** A partial pip install could leave 30-50 Mach-Os instead of 77 and we'd report "drift" pointing the operator at re-baselining when the real fix is re-reading the pip log. Add a hard floor at half the baseline below which we exit with a clear "check pip logs, do NOT bump baseline" message. * **B4 — keychain cleanup with `if: always()`.** GitHub-hosted runners get nuked between jobs but defense-in-depth + future self-hosted workflows. Cleanup step delete-keychains and removes any leftover cert.p12 even when codesign or smoke failed earlier. Also folded in two codex r1 NITs while I was here: * `trap` now catches INT/TERM (Ctrl-C in interactive runs no longer leaks the tmpfile). * Workflow now triggers on `pull_request` paths-filter for the sidecar scripts/workflow so we catch breakage pre-tag. Verified `bash -n scripts/build-sidecar.sh` clean.
First CI run on GitHub-hosted macos-15 (run 27472544784) produced 51 Mach-Os, not the 77 measured in the Phase 2 spike on my M3 Ultra. The local 77 almost certainly included build-time artifacts the strip step removes on a fresh runner — 51 is the authoritative "what actually ships" number. Also widen tolerance from 3 to 5 (proportionally similar sensitivity at the smaller baseline), and dump the sorted Mach-O list to stderr when drift fires so the operator can diff against the previous run without having to re-execute the build locally. The new floor guard (codex r1 B2) correctly let 51 through (above floor 38 = 77/2) and the drift check pinpointed the discrepancy — proves both layers of the count-protection logic work as designed.
…olation Round 2 codex review caught two new BLOCKING items, both stemming from the `pull_request` trigger I added in r1 fixes: * **B5 — Upload artifact gated on signed builds only.** The PR-trigger build runs against the PR's branch code, which on a fork could be malicious. Uploading a downloadable `rapid-mlx-sidecar.tar.gz` named exactly like the real release asset would be a supply-chain confusion hazard even when unsigned — the bundle's entitlements plist ships JIT + library-validation-disabled keys the host Rapid.app honors at dlopen time. PR runs now validate the build path without publishing an artifact. * **B6 — Release step adds `pull_request` veto.** Today codesign secrets are never injected on fork PR runs so the codesign gate alone blocks Release. But a future maintainer adding environment- protected secrets could silently enable Release on PR runs. Explicit `pull_request != true` belt-and-suspenders makes the invariant non-bypassable by accident. Folded in two r2 NITs while here: * **N1 — smoke runs with `HOME=$(mktemp -d)`** so the mlx Metal JIT cache doesn't pollute the developer's real `~/Library/Caches/mlx` during interactive runs. Trap extended to clean it up. * **N4 — `MACHO_FLOOR` clamps to `BASELINE-2` for very small baselines** so a test override of e.g. 5 doesn't produce a useless floor of 2. Verified `bash -n` clean.
GitHub-hosted macos-15 runners are virtualized and may not expose a working Metal device. The old smoke wrapped import + eval in one suppressed call so when `mx.eval` failed we couldn't tell whether the bundle was actually broken or just the runner environment. Split into two stages: * **Hard** — `import mlx.core` must succeed. If this fails the bundle is genuinely broken; exit 3. * **Soft on CI / hard locally** — `mx.eval(mx.zeros((4,4)))`. On CI ($CI is set) a failure logs the actual error and continues; the real Metal exercise happens in rapid-desktop's post-notary smoke (Phase 5) on a notarized Mac. Locally, a failure still aborts so a developer running `scripts/build-sidecar.sh` on their M3 catches regressions. Also surfaces the actual error output instead of redirecting to /dev/null so future failures point at the real cause.
Previous split-smoke commit invoked `$STAGE/python/bin/python3.12` directly without setting the env vars the shim normally sets. Because the install uses `pip install --target site-packages/`, the bundled python's default interpreter path doesn't include site-packages — `import mlx` fails with ModuleNotFoundError on a perfectly healthy bundle. The `rapid-mlx --version` part of the smoke worked because it routes through `bin/rapid-mlx` (the shim) which sets PYTHONHOME + PYTHONPATH + PYTHONNOUSERSITE before exec'ing python. Add the same three env vars to both the import-only and Metal-eval smoke commands. This makes the smoke test what the *bundle* does at runtime, not what a raw python invocation does. The fact this slipped through is a smoke-bug not a bundling-bug — PR #568 still bundles mlx correctly; we just weren't verifying it correctly.
The CI invocation passes `--out build/sidecar-stage` (a path relative to the workflow's working directory). The script then does `( cd "$OUT_DIR" && tar -czf "$TARBALL" rapid-mlx )` — inside that subshell, $TARBALL still holds `build/sidecar-stage/rapid-mlx-sidecar.tar.gz` relative, which means tar looks for the file at `./build/sidecar-stage/...` from INSIDE `build/sidecar-stage/` and exits "Failed to open". Smoke now passes on CI (mlx import + Metal JIT both OK on macos-15 runners), so this was the last gate. Resolve OUT_DIR + STAGE to absolute paths via `cd && pwd` right after mkdir so every downstream reference is portable across cwd changes.
Round 3 codex review caught that the supposed "soft-fail Metal on CI" logic added in commit 0daf9fc never actually executed: under `set -euo pipefail`, the pattern METAL_OUT="$(... Metal eval ...)" METAL_RC=$? aborts the script the moment the command substitution returns non-zero — `set -e` triggers BEFORE the `$?` capture runs, so the elif soft-skip branch was unreachable. CI is currently green only because Metal eval *succeeds* on the macos-15 runners. As soon as a future mlx wheel breaks Metal under GHA's virtualised macOS, the script would hard-fail instead of warning + continuing — the exact opposite of what the commit message promised. Fix: restructure as `if METAL_OUT="$(...)"; then ... else METAL_RC=$? fi`. The explicit guard satisfies `set -e` on both success and failure paths so the soft-skip elif runs as documented. Also folded codex r3 N5 (defensive OUT_DIR validation): after `OUT_DIR="$(cd "$OUT_DIR" && pwd)"`, if absolutisation ever silently produced an empty path (impossible for realistic failure modes, but the blast radius would be `rm -rf "/rapid-mlx"` on line 124 — catastrophic-rm class bug), we now bail out with an explicit error. Verified `bash -n` clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Codifies the Phase 2 spike recipe into a reproducible artifact factory. Output: a signed `tar.gz` uploaded to GitHub Releases as the asset rapid-desktop's release workflow pulls into `Rapid.app/Contents/Resources/rapid-mlx/`.
Spike report (validated 2026-06-13 on M3 Ultra): 184 MB compressed DMG, 77 Mach-Os to sign, Metal JIT works, mlx-lm dynamic loading works. Tool chosen: `astral-sh/python-build-standalone` 3.12.13 tag 20260610.
What ships
Mach-O baseline
Baseline = 77 (spike measurement). Tolerance = ±3 (small wheel drift OK). Bigger drift fails the build with exit code 2 — forces a re-validation spike before lifting the baseline.
Required secrets (rapid-mlx repo)
Without these the workflow still runs but skips codesigning + release upload. PRs from forks complete cleanly.
Test plan
Post-merge follow-ups (require secrets configuration)
These can't run inside this PR because the codesigning secrets aren't yet configured on raullenchai/Rapid-MLX. Tracked here for the sidecar rollout sweep:
Out of scope (next phases)