What went wrong?
Each earthly invocation hashes its buildkit settings (buildkitd/settings.go); on mismatch with the running daemon's dev.earthly.settingshash label, maybeRestart (buildkitd/buildkitd.go:326) stops and recreates the shared earthly-buildkitd — cancelling every in-flight build on the host. No check for active builds, and the log names no culprit, just Settings do not match. Restarting buildkit daemon with updated settings....
Field incident (2026-06-12, midnightntwrk/midnight-node#1693): two bot workflows lacked the tls_enabled: false config their CI siblings used → 4 daemon recreations in 45s on a shared runner, 3 unrelated jobs killed per trigger (unlazy force execution: Canceled: context canceled). Prior art: earthly#900 (closed as "known behavior", 2021).
What should have happened?
Heterogeneous settings on one host shouldn't cancel neighbours' builds.
Preferred: per-settings-hash daemon instances (name/volume/port already namespaceable via hidden --installation-name; derive instance from settings hash), old daemon reaped when idle. Cost: cache per variant.
Alternative: block-on-drain — restart waits for in-flight builds (shared/exclusive flock; StartUpLockPath precedent), with writer-intent to avoid starvation. Not chosen as preferred: serializes the runner under config flip-flop.
Either way: name the differing fields (e.g. UseTLS daemon=false requested=true) — store per-field hashes as labels.
Done when:
Note: earthly#4090 (registry port bind with --buildkit-container-name) must be fixed for the multi-daemon route.
What earthly version?
main (c21f828)
What went wrong?
Each
earthlyinvocation hashes its buildkit settings (buildkitd/settings.go); on mismatch with the running daemon'sdev.earthly.settingshashlabel,maybeRestart(buildkitd/buildkitd.go:326) stops and recreates the sharedearthly-buildkitd— cancelling every in-flight build on the host. No check for active builds, and the log names no culprit, justSettings do not match. Restarting buildkit daemon with updated settings....Field incident (2026-06-12, midnightntwrk/midnight-node#1693): two bot workflows lacked the
tls_enabled: falseconfig their CI siblings used → 4 daemon recreations in 45s on a shared runner, 3 unrelated jobs killed per trigger (unlazy force execution: Canceled: context canceled). Prior art: earthly#900 (closed as "known behavior", 2021).What should have happened?
Heterogeneous settings on one host shouldn't cancel neighbours' builds.
Preferred: per-settings-hash daemon instances (name/volume/port already namespaceable via hidden
--installation-name; derive instance from settings hash), old daemon reaped when idle. Cost: cache per variant.Alternative: block-on-drain — restart waits for in-flight builds (shared/exclusive flock;
StartUpLockPathprecedent), with writer-intent to avoid starvation. Not chosen as preferred: serializes the runner under config flip-flop.Either way: name the differing fields (e.g.
UseTLS daemon=false requested=true) — store per-field hashes as labels.Done when:
--force-buildkit-restartor equivalent)Note: earthly#4090 (registry port bind with
--buildkit-container-name) must be fixed for the multi-daemon route.What earthly version?
main (c21f828)