build(compat): make Dockerfile.local go mod download resilient to proxy.golang.org HTTP/2 flakes#709
Merged
Merged
Conversation
…xy.golang.org HTTP/2 flakes The three compatibility harnesses (prom/loki/tempo) all build cerberus from Dockerfile.local on every CI run. The `RUN go mod download` step has no retry logic and no module cache mount, so a single transient `proxy.golang.org` HTTP/2 `stream error ... INTERNAL_ERROR; received from peer` mid-stream takes the whole compat job down with it. Observed on PR #708 / run 26306912141, compatibility/loki job 77445902857: `go: github.com/grpc-ecosystem/grpc-gateway/v2@v2.29.0: read "https://proxy.golang.org/.../v2.29.0.zip": stream error; INTERNAL_ERROR; received from peer`. The mandate is no-retry-rerun — fix the underlying fragility instead of bandaiding. Two structural changes to Dockerfile.local: 1. Wrap `go mod download` in a 5-attempt retry loop with linear backoff (3/6/9/12s). The Go module resolver does not retry past a bad HTTP/2 frame, so the wrapper is needed at the shell layer. 2. Add BuildKit `--mount=type=cache` for /go/pkg/mod and /root/.cache/go-build (sharing=locked because the three compat harnesses build this Dockerfile in parallel on the same runner). Warm caches mean transient proxy failures stop being possible on subsequent builds and the proxy hit surface narrows to first-build only. This is a fix to a flake class, not a single point; the same outage would have hit prom or tempo if the unlucky frame had landed there first.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR #708's
compatibility/lokijob (run 26306912141, job 77445902857)failed before the harness ever ran:
Not seed-settle (prior fixes #66 / #123 / #136 covered that). The
cerberus image build inside the compat compose stack tripped on a
transient
proxy.golang.orgHTTP/2 stream error duringRUN go mod downloadinDockerfile.local. The Go module resolverdoes not retry past a bad HTTP/2 frame, so it fails the whole compat
job.
All three compatibility harnesses (prom/loki/tempo) build cerberus
from this same Dockerfile via their
docker-compose.yml. The fix isstructural and applies to every head — loki was just the unlucky one
this run.
Fix
Two changes to
Dockerfile.local:go mod download— 5 attempts with linearbackoff (3/6/9/12s). Surfaces the failure only if all 5 frames
trip.
/go/pkg/modand/root/.cache/go-build(sharing=lockedbecause the three compatharnesses build this Dockerfile in parallel on the same runner).
Warm runners skip the proxy entirely on subsequent builds, so the
first-build surface is the only one a future flake can hit.
Mandate compliance: no timeout bump, no rerun-and-pray. The retry is
inside the build at the network layer (where the flake actually is),
not inside the harness at the seed-settle layer.
Test plan
compatibility/lokigreencompatibility/prometheusgreen (same Dockerfile — confirms no regression)compatibility/tempogreen (ditto)compose-smokegreen (also builds Dockerfile.local indirectly)