feat(releases): retry fetchReleases on transient failures with backoff#111
Closed
tmchow wants to merge 1 commit intoopentofu:mainfrom
Closed
feat(releases): retry fetchReleases on transient failures with backoff#111tmchow wants to merge 1 commit intoopentofu:mainfrom
tmchow wants to merge 1 commit intoopentofu:mainfrom
Conversation
fetchReleases() calls get.opentofu.org/tofu/api.json once and throws on any failure -- network error, 429 rate-limit, or 5xx. The issue (opentofu#110) reports that this surfaces as flaky action runs whenever the API hiccups. Wrap the fetch in an exponential-backoff retry loop (500ms, 1s, 2s, capped at 8s, with up to 250ms jitter to avoid concurrent runners re-syncing). Retry on: - Network-level errors thrown from http.get - HTTP 408 / 429 / 500 / 502 / 503 / 504 Fail fast on permanent errors (other 4xx) and on malformed JSON -- retrying wouldn't help those. The options object (maxAttempts, sleepFn, backoffFn) is test-only so the retry harness can drive deterministic scenarios without real sleeps. Default behavior matches the maintainer's "accepted" direction in the issue thread. Fixes opentofu#110 Signed-off-by: Trevin Chow <trevin@trevinchow.com>
d1083de to
7013747
Compare
Member
|
Thanks for your contribution @tmchow, but per our policy, we don't accept LLM-generated PRs: https://github.com/opentofu/opentofu/blob/main/contributing/DEVELOPING.md#a-note-on-copyright |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
fetchReleases()inlib/releases.jsmade a single GET tohttps://get.opentofu.org/tofu/api.jsonand threw on any failure -- network error, HTTP 429, or any 5xx. Action runs would fail whenever the releases endpoint hiccuped. Wrap the fetch in an exponential-backoff retry loop so transient failures self-heal. Addresses #110.Why
Per the issue thread, @diofeher marked this
acceptedand said:Today the fetch path has three failure modes, all treated as permanent:
http.get(...)throws (network reset, DNS, TLS) -> rethrown with no retry.resp.readBody()throws mid-stream -> rethrown with no retry.On shared CI, all three are frequently transient. A single retry with jitter would paper over the vast majority without changing steady-state behavior for callers.
Changes
Only
lib/releases.jsgrows logic; interfaces are unchanged._sleep(ms)-- promisifiedsetTimeout._backoffDelayMs(attempt)-- exponential (500ms, 1s, 2s, ...) capped at 8s with up to 250ms jitter so concurrent runners don't resync on retry boundaries.fetchReleasesnow loops up tomaxAttempts(default4):http.get-> sleep + retry until exhausted.readBodythrows -> sleep + retry (treated as transient stream failure).JSON.parsethrows -> fail fast (retrying won't help malformed JSON).maxAttempts,sleepFn,backoffFnare second-arg options, default to the production values. Tests inject a no-opsleepFnto keep the suite instant.fetchReleasesadded to the named exports so tests can drive it directly.The retry path is gated on status codes that the HTTP spec and GitHub's guidance call out as transient. Every other
4xxstill fails on the first attempt; callers see the same error as before.Testing
New
lib/test/releases-fetch.test.jscovers:attemptvalues (0,1).http.get-> retry -> success.maxAttempts=3exhausted on three 503s -> throws last error, sleeps 2 times (between attempts 1-2 and 2-3).Existing suites untouched:
npm testreports5 passed, 5 total/41 tests.npm run buildproducesdist/bundle; committed alongside the source change since this action ships prebuilt.Fixes #110