Skip to content

perf(ci): Cache OpenTofu providers + Control-Plane node_modules in spin-up#644

Merged
stefanko-ch merged 2 commits into
mainfrom
perf/spin-up-cache-and-parallel-redeploy
Jun 4, 2026
Merged

perf(ci): Cache OpenTofu providers + Control-Plane node_modules in spin-up#644
stefanko-ch merged 2 commits into
mainfrom
perf/spin-up-cache-and-parallel-redeploy

Conversation

@stefanko-ch

@stefanko-ch stefanko-ch commented Jun 4, 2026

Copy link
Copy Markdown
Owner

Summary

Adds two actions/cache@v4 steps in spin-up.yml for the two biggest steady-state download costs:

Cache Path Saves Invalidated by
OpenTofu providers tofu/{stack,control-plane}/.terraform + ~/.terraform.d/plugin-cache 30-45s providers.tf or .terraform.lock.hcl changes
Control-Plane node_modules control-plane/node_modules 15-20s package-lock.json changes

Combined steady-state saving: ~45-65 seconds per spin-up run — brings a typical ~7-minute run under 6 minutes from the second deploy onward.

Partially addresses #489. Specifically the Parts B and C of the speedup plan there.

What this PR does NOT do

  • Part A (deploy.sh CONFIG_JOBS parallelisation): scripts/deploy.sh was replaced by python -m nexus_deploy run-pipeline in Phase 4c (feat(deploy): Migrate scripts/deploy.sh (5,539 lines bash) to Python package nexus_deploy with full test suite #505). The Python services-configure phase already batches every service hook into one SSH round-trip, so the bash-level fan-out described in the issue is a different shape entirely now. Doing it on the Python side would be a much bigger refactor of services.py / orchestrator.py and is out of scope for a cache-only change.
  • Part D (parallel CP redeploy): a closer reading of the workflow shows the redeploy step transitively depends on Deploy stacksStore credentials (the predecessor of Redeploy) reads /tmp/infisical-token written by the Python pipeline during Deploy stacks. The issue body's claim that "redeploy depends only on tofu output" misses this hop. Parallelising would require restructuring how the infisical token is transferred from the server to the runner, which is its own piece of work.

Doing only Parts B and C here keeps the PR risk-free (cache-add is purely additive — first run is identical to today; subsequent runs are faster).

Test plan

  • First run after merge: caches are empty, behaviour is identical to today (no regression risk).
  • Second run after merge: Cache OpenTofu providers and Cache Control-Plane node_modules steps report "Cache restored from key", and the tofu init + npm ci subsequent steps complete noticeably faster.
  • When a real provider bump or package-lock.json change lands: cache key changes, fresh download, subsequent runs cache the new revision.

Summary by CodeRabbit

  • Chores
    • Improved CI spin-up performance by adding targeted caching of initialization artifacts and control-plane dependencies to significantly reduce setup time and speed up iterations.

…in-up

Two added cache steps that target the two biggest steady-state
download costs in the spin-up workflow:

  * tofu providers (~50 MB across hcloud, cloudflare, random, tls)
    saved 30-45s on cache hit, keyed on providers.tf + the lockfile
    so a real version bump invalidates cleanly.
  * control-plane/node_modules saved 15-20s on cache hit, keyed on
    package-lock.json (which is committed).

Combined steady-state saving: ~45-65 seconds per spin-up run. Roughly
matches the Parts B + C estimate from the speedup plan; on a typical
~7-minute run this brings the workflow under 6 minutes for steady-
state operators.

What this PR does NOT do:

  * Part A (deploy.sh CONFIG_JOBS parallelisation): the script was
    replaced by `python -m nexus_deploy run-pipeline` in Phase 4c
    (#505). The Python services-configure phase already dispatches
    every service hook in one SSH round-trip, so the bash-level
    fan-out the issue described is now a different shape entirely.
    Would be a much larger refactor in services.py / orchestrator.py
    and is out of scope for a cache-only change.

  * Part D (parallel CP redeploy): closer reading of the workflow
    shows the redeploy step transitively depends on the Deploy-stacks
    step — Store-credentials reads /tmp/infisical-token written by
    the Python pipeline, then Redeploy uses Store-credentials' output.
    The "redeploy depends only on tofu output" claim in the issue body
    misses this hop. Parallelising would require restructuring the
    infisical-token transfer path, which is its own piece of work.

Caches start empty on first deploy after this lands; the saving
applies from the second deploy onward (and every deploy after, since
provider versions and package-lock.json are stable for weeks at a
time).

Addresses #489 (Parts B + C only).
@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 0e721dbf-362b-4c20-be74-c527d691af9f

📥 Commits

Reviewing files that changed from the base of the PR and between d1e8dfa and 4dac68b.

📒 Files selected for processing (1)
  • .github/workflows/spin-up.yml

📝 Walkthrough

Walkthrough

Adds two cache steps to .github/workflows/spin-up.yml (lines 126–152): one caches OpenTofu/Terraform directories and the Terraform plugin cache keyed by provider and lockfile hashes; the other caches control-plane/node_modules keyed by control-plane/package-lock.json.

Changes

CI Workflow Caching

Layer / File(s) Summary
Provider and dependency cache steps
.github/workflows/spin-up.yml
Adds two actions/cache@v4 steps (inserted at lines 126–152) that cache OpenTofu .terraform directories for tofu/stack and tofu/control-plane, ~/.terraform.d/plugin-cache (keyed by hashes of tofu/**/providers.tf and tofu/**/.terraform.lock.hcl), and control-plane/node_modules (keyed by hash of control-plane/package-lock.json).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

  • #489: Matches the same caching updates for OpenTofu providers and Control Plane node_modules in the spin-up workflow.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main changes: adding caching for OpenTofu providers and Control-Plane node_modules in the spin-up workflow.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/spin-up-cache-and-parallel-redeploy

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/spin-up.yml:
- Around line 142-152: The workflow step using the cache action is pinned to a
floating tag (`uses: actions/cache@v4`); replace that with the commit SHA for
the latest stable v4.x release (use the same commit SHA used for the OpenTofu
cache step for consistency) so the step that declares `uses: actions/cache@v4`
is changed to `uses: actions/cache@<commit-sha>` and update any other
`actions/cache` occurrences likewise; keep the rest of the step (path, key,
restore-keys, and comments) unchanged.
- Around line 126-141: Replace the floating tag in both cache steps that
currently use "uses: actions/cache@v4" with the pinned commit SHA "uses:
actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830" (optionally add a
comment like "# v4.3.0") so both occurrences (the step named "Cache OpenTofu
providers" and the other cache step that also references actions/cache@v4) are
pinned to the v4.x commit SHA.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 71038720-c7a6-4b95-8c37-5fa187990cc8

📥 Commits

Reviewing files that changed from the base of the PR and between 7a57ef7 and d1e8dfa.

📒 Files selected for processing (1)
  • .github/workflows/spin-up.yml

Comment thread .github/workflows/spin-up.yml
Comment thread .github/workflows/spin-up.yml
Both cache steps added in the previous commit used the floating
\`actions/cache@v4\` tag — left over from when this PR was originally
drafted before the SHA-pinning convention was established by PR #634
(release-please-action). Pinning to commit
\`0057852bfaa89a56745cba8c7296529d2fc39830\` (= v4.3.0, the latest
v4.x stable as of 2025-09-24) closes the floating-tag supply-chain
gap for these two specific uses.

Wider cleanup of the remaining ~20 floating-tag action uses across
the rest of the workflows is the scope of a separate (planned)
audit-finding-H45 follow-up PR — out of scope here.
@stefanko-ch stefanko-ch merged commit 396ff70 into main Jun 4, 2026
7 checks passed
@stefanko-ch stefanko-ch deleted the perf/spin-up-cache-and-parallel-redeploy branch June 4, 2026 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant