feat: surface BuildKit root-cause on cancellation#574
Conversation
Signed-off-by: Giles Cope <gilescope@gmail.com>
➖ Are we earthbuild yet?No change in "earthly" occurrences 📈 Overall Progress
Keep up the great work migrating from Earthly to Earthbuild! 🚀 💡 Tips for finding more occurrencesRun locally to see detailed breakdown: ./.github/scripts/count-earthly.shNote that the goal is not to reach 0. |
There was a problem hiding this comment.
Code Review
This pull request enhances error reporting and cancellation handling in Earthly by tracking the first fatal vertex failure, first cancellation, and active operations during a build. It introduces detailed error types (FirstFailureError, FirstCancellationError, and CancellationDetailsError) and reports the origin of cancellations. Additionally, it improves the robustness of the stats stream parser by allowing it to recover from malformed frames instead of failing the build. The review feedback suggests avoiding if statements with initializers in builder/solver.go to comply with Go linting preferences.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Depends on #572 (CI memory telemetry) — land this after #572 merges. It calls
statsstreamparser.Parser.Reset(), introduced there. This branch bundlesparser.go/parser_test.goso it builds standalone; once #572 is inmain, those two files drop out as identical (no conflict).Extracted from #442 (the buildkit upgrade) so the failure-visibility work can land independently of the bump.
What
Surfaces the real root cause of a failed build instead of a bare
context canceled. When BuildKit cancels or loses the solve session after a vertex has already failed, earth now reports the original failing target/command and BuildKit error, and distinguishes client-side from daemon-side cancellation.logbus/solvermon/first_failure.go— captures the first fatal BuildKit vertex failure (scrubbed), preserved across the cancellation fan-out.logbus/solvermon/{solvermon,vertexmon}.go— record per-vertex failures + logs; reset the stats parser on a desynced stream.cmd/earthly/app/run.go—printCancellationOriginreports whether cancellation began locally (signal / dead build context) or in BuildKit/the session layer; wiresAsFirstFailureErrorinto the fatal-error path.builder/solver.go—withBuildkitFailureContextattaches target/command context to the returned error.Deliberately excluded (stays in #442)
The buildkit-API entitlements change in
builder/solver.go/image_solver.go(AllowedEntitlements: s.enttlmnts→entitlementsToStrings(...)[]string) — won't compile against current buildkit.Verification
go build ./...— green.go test ./logbus/solvermon/ ./builder/— green.