fix(orchestrator): retry transient GitHub API failures; drop fragile pre-check#427
Merged
Conversation
…pre-check A real `--with-demos` run aborted modelviewer_demo deterministically with "versions.json pins modelviewer_demo to 'v0.7.0', but gh release view … failed — bump the pin", even though that exact `gh release view` succeeds when run by hand and the release/asset plainly exist. Root cause: the install loop fires a burst of authenticated gh calls + large downloads (runtime 60MB, gauss 30MB), GitHub throttles mid-burst (one run showed a literal `write: broken pipe`), and the one non-resilient call — a redundant strict `gh release view` pre-check — hard-aborts the component and masks the real error behind a "bump the pin" message. - Add gh_retry(): 4 attempts with exponential backoff (2/4/8s) around gh calls, surfacing the final attempt's real stderr so a genuine 404 is distinguishable from a transient 403/broken-pipe. - Remove the redundant strict release-view pre-check (the asset probe already tolerates transient failure, and the download is the real gate). - Retry `gh release download` (with --clobber for clean re-attempts); on final failure print the actual gh error + whether it's transient (re-run) vs 404 (bump the pin). Validated: bash -n clean; gh_retry unit-tested (success/flaky/always-fail); `--with-demos --dry-run` downloads all four components (runtime + 3 demos, incl. modelviewer) and routes each to install. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Symptom
A real macOS
./scripts/setup-displayxr.sh --with-demosabortedmodelviewer_demodeterministically (twice) with:…yet that exact
gh release viewreturns exit 0 when run by hand, the release +.pkgplainly exist, and there's noGH_TOKENenv override. gauss installed fine; one run also showed a literalwrite: broken pipeon the mediaplayer API call.Root cause
The install loop fires a burst of authenticated
ghcalls + large downloads (runtime 60 MB, gauss 30 MB). GitHub throttles mid-burst (secondary rate limit / dropped connection), and the one non-resilient call — a redundant strictgh release viewpre-check — hard-aborts the component and masks the real error behind a misleading "bump the pin." (The rest ofinstall_componentalready tolerates transient failure by design — see the existing broken-pipe comment on the asset probe — this pre-check predated that.)Fix
gh_retry()— 4 attempts, exponential backoff (2/4/8 s), aroundghcalls; surfaces the final attempt's real stderr so a genuine 404 is distinguishable from a transient 403/broken-pipe.gh release viewpre-check — the asset probe already tolerates transient failure, and the retried download is the real gate.gh release download(--clobberfor clean re-attempts); on final failure print the actual gh error + whether it's transient (re-run) vs 404 (bump the pin).Validation
bash -nclean.gh_retryunit-tested: success suppresses stdout / returns 0; flaky-then-OK returns 0 after retries; always-fail returns the real rc and prints the real stderr (HTTP 403: secondary rate limit).--with-demos --dry-rundownloads all four components (runtime + gauss + modelviewer + mediaplayer) and routes each to install.Follow-up to #339 / #426 (the
components.shmodelviewer macOS entry).🤖 Generated with Claude Code