perf(ci): Cache OpenTofu providers + Control-Plane node_modules in spin-up#644
Conversation
…in-up
Two added cache steps that target the two biggest steady-state
download costs in the spin-up workflow:
* tofu providers (~50 MB across hcloud, cloudflare, random, tls)
saved 30-45s on cache hit, keyed on providers.tf + the lockfile
so a real version bump invalidates cleanly.
* control-plane/node_modules saved 15-20s on cache hit, keyed on
package-lock.json (which is committed).
Combined steady-state saving: ~45-65 seconds per spin-up run. Roughly
matches the Parts B + C estimate from the speedup plan; on a typical
~7-minute run this brings the workflow under 6 minutes for steady-
state operators.
What this PR does NOT do:
* Part A (deploy.sh CONFIG_JOBS parallelisation): the script was
replaced by `python -m nexus_deploy run-pipeline` in Phase 4c
(#505). The Python services-configure phase already dispatches
every service hook in one SSH round-trip, so the bash-level
fan-out the issue described is now a different shape entirely.
Would be a much larger refactor in services.py / orchestrator.py
and is out of scope for a cache-only change.
* Part D (parallel CP redeploy): closer reading of the workflow
shows the redeploy step transitively depends on the Deploy-stacks
step — Store-credentials reads /tmp/infisical-token written by
the Python pipeline, then Redeploy uses Store-credentials' output.
The "redeploy depends only on tofu output" claim in the issue body
misses this hop. Parallelising would require restructuring the
infisical-token transfer path, which is its own piece of work.
Caches start empty on first deploy after this lands; the saving
applies from the second deploy onward (and every deploy after, since
provider versions and package-lock.json are stable for weeks at a
time).
Addresses #489 (Parts B + C only).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds two cache steps to ChangesCI Workflow Caching
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/spin-up.yml:
- Around line 142-152: The workflow step using the cache action is pinned to a
floating tag (`uses: actions/cache@v4`); replace that with the commit SHA for
the latest stable v4.x release (use the same commit SHA used for the OpenTofu
cache step for consistency) so the step that declares `uses: actions/cache@v4`
is changed to `uses: actions/cache@<commit-sha>` and update any other
`actions/cache` occurrences likewise; keep the rest of the step (path, key,
restore-keys, and comments) unchanged.
- Around line 126-141: Replace the floating tag in both cache steps that
currently use "uses: actions/cache@v4" with the pinned commit SHA "uses:
actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830" (optionally add a
comment like "# v4.3.0") so both occurrences (the step named "Cache OpenTofu
providers" and the other cache step that also references actions/cache@v4) are
pinned to the v4.x commit SHA.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 71038720-c7a6-4b95-8c37-5fa187990cc8
📒 Files selected for processing (1)
.github/workflows/spin-up.yml
Both cache steps added in the previous commit used the floating \`actions/cache@v4\` tag — left over from when this PR was originally drafted before the SHA-pinning convention was established by PR #634 (release-please-action). Pinning to commit \`0057852bfaa89a56745cba8c7296529d2fc39830\` (= v4.3.0, the latest v4.x stable as of 2025-09-24) closes the floating-tag supply-chain gap for these two specific uses. Wider cleanup of the remaining ~20 floating-tag action uses across the rest of the workflows is the scope of a separate (planned) audit-finding-H45 follow-up PR — out of scope here.
Summary
Adds two
actions/cache@v4steps inspin-up.ymlfor the two biggest steady-state download costs:tofu/{stack,control-plane}/.terraform+~/.terraform.d/plugin-cacheproviders.tfor.terraform.lock.hclchangescontrol-plane/node_modulespackage-lock.jsonchangesCombined steady-state saving: ~45-65 seconds per spin-up run — brings a typical ~7-minute run under 6 minutes from the second deploy onward.
Partially addresses #489. Specifically the Parts B and C of the speedup plan there.
What this PR does NOT do
scripts/deploy.shwas replaced bypython -m nexus_deploy run-pipelinein Phase 4c (feat(deploy): Migrate scripts/deploy.sh (5,539 lines bash) to Python package nexus_deploy with full test suite #505). The Pythonservices-configurephase already batches every service hook into one SSH round-trip, so the bash-level fan-out described in the issue is a different shape entirely now. Doing it on the Python side would be a much bigger refactor ofservices.py/orchestrator.pyand is out of scope for a cache-only change.Deploy stacks—Store credentials(the predecessor of Redeploy) reads/tmp/infisical-tokenwritten by the Python pipeline duringDeploy stacks. The issue body's claim that "redeploy depends only on tofu output" misses this hop. Parallelising would require restructuring how the infisical token is transferred from the server to the runner, which is its own piece of work.Doing only Parts B and C here keeps the PR risk-free (cache-add is purely additive — first run is identical to today; subsequent runs are faster).
Test plan
Cache OpenTofu providersandCache Control-Plane node_modulessteps report "Cache restored from key", and thetofu init+npm cisubsequent steps complete noticeably faster.Summary by CodeRabbit