Skip to content

ci(main): run Phase A (local gating) in parallel with Phase B (staging deploy)#191

Draft
deucalioncodes wants to merge 1 commit into
mainfrom
cursor/ci-main-speedup-3a2a
Draft

ci(main): run Phase A (local gating) in parallel with Phase B (staging deploy)#191
deucalioncodes wants to merge 1 commit into
mainfrom
cursor/ci-main-speedup-3a2a

Conversation

@deucalioncodes
Copy link
Copy Markdown
Member

What

Drop the needs: layered-e2e-local on bootstrap-infra-staging so Phase A and Phase B run in parallel.

Why

The recent ~34m22s ci-main run breaks down as:

Job Duration
layered-e2e-local (Phase A) 10m31s
bootstrap-infra-staging 35s
publish-artifacts-staging 8m34s
install-mundus-staging 13m37s
verify-mundus-staging (parse + e2e) ~55s

Phase A and Phase B test different things — correctness against an ephemeral local replica vs. real staging deploy — and currently waiting for Phase A before kicking off Phase B costs ~10 minutes of pure wall-clock with no signal benefit. If Phase A fails the commit is still bad; the staging deploy will either succeed (in which case the bug isn't deploy-time) or surface its own failure independently.

Expected wall-clock after this change: ~24m (max of Phase A's 10m and Phase B's ~24m), compared to ~34m today. Phase B itself is unchanged.

The concurrency: ci-main group is unchanged so two staging deploys still never race each other.

Notes on rejected ideas

While analyzing this I considered four other optimizations and dropped them after discussion:

  • Per-realm matrix on _install-mundus.yml. All install messages go through the single realm_installer canister, so WASM installs serialize there anyway. Only the canister→canister extension installs would actually parallelize — net win ~4–5 min, not worth the YAML complexity for now.
  • Skip the dfx install in _bootstrap-infra.yml when all infra ids are pinned. The staging descriptor already pins all three, so stage 0 is metadata-only — but the conditional adds enough complexity that ~35s isn't worth it.
  • cache: pip / cache: npm on the reusable workflows. Explicitly rejected — caches have been a source of flakes in the past.

Risk

Low. The change only removes a needs: edge in the job graph; both phases already work standalone (e.g. layered-deploy-dominion.yml runs Phase B without Phase A).

Open in Web Open in Cursor 

…g deploy)

Phase B used to wait for layered-e2e-local to finish. The two phases
test different things — Phase A is correctness against an ephemeral
local replica, Phase B is the real staging deploy — so making them
sequential added ~10 min of pure wall-clock with no signal benefit.

The 'concurrency: ci-main' group still serializes main pushes so two
staging deploys never race.

Co-authored-by: Jose Perez <deucalioncodes@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants