Skip to content

fix(orchestrator): retry transient GitHub API failures; drop fragile pre-check#427

Merged
dfattal merged 1 commit into
mainfrom
fix/orchestrator-gh-retry
Jun 5, 2026
Merged

fix(orchestrator): retry transient GitHub API failures; drop fragile pre-check#427
dfattal merged 1 commit into
mainfrom
fix/orchestrator-gh-retry

Conversation

@dfattal
Copy link
Copy Markdown
Collaborator

@dfattal dfattal commented Jun 4, 2026

Symptom

A real macOS ./scripts/setup-displayxr.sh --with-demos aborted modelviewer_demo deterministically (twice) with:

ERROR versions.json pins modelviewer_demo to 'v0.7.0', but
ERROR   gh release view v0.7.0 --repo DisplayXR/displayxr-demo-modelviewer
ERROR failed. Bump the pin in versions.json, or verify the repo is accessible.

…yet that exact gh release view returns exit 0 when run by hand, the release + .pkg plainly exist, and there's no GH_TOKEN env override. gauss installed fine; one run also showed a literal write: broken pipe on the mediaplayer API call.

Root cause

The install loop fires a burst of authenticated gh calls + large downloads (runtime 60 MB, gauss 30 MB). GitHub throttles mid-burst (secondary rate limit / dropped connection), and the one non-resilient call — a redundant strict gh release view pre-check — hard-aborts the component and masks the real error behind a misleading "bump the pin." (The rest of install_component already tolerates transient failure by design — see the existing broken-pipe comment on the asset probe — this pre-check predated that.)

Fix

  • gh_retry() — 4 attempts, exponential backoff (2/4/8 s), around gh calls; surfaces the final attempt's real stderr so a genuine 404 is distinguishable from a transient 403/broken-pipe.
  • Removed the redundant strict gh release view pre-check — the asset probe already tolerates transient failure, and the retried download is the real gate.
  • Retry gh release download (--clobber for clean re-attempts); on final failure print the actual gh error + whether it's transient (re-run) vs 404 (bump the pin).

Validation

  • bash -n clean.
  • gh_retry unit-tested: success suppresses stdout / returns 0; flaky-then-OK returns 0 after retries; always-fail returns the real rc and prints the real stderr (HTTP 403: secondary rate limit).
  • --with-demos --dry-run downloads all four components (runtime + gauss + modelviewer + mediaplayer) and routes each to install.

Follow-up to #339 / #426 (the components.sh modelviewer macOS entry).

🤖 Generated with Claude Code

…pre-check

A real `--with-demos` run aborted modelviewer_demo deterministically with
"versions.json pins modelviewer_demo to 'v0.7.0', but gh release view … failed
— bump the pin", even though that exact `gh release view` succeeds when run by
hand and the release/asset plainly exist. Root cause: the install loop fires a
burst of authenticated gh calls + large downloads (runtime 60MB, gauss 30MB),
GitHub throttles mid-burst (one run showed a literal `write: broken pipe`), and
the one non-resilient call — a redundant strict `gh release view` pre-check —
hard-aborts the component and masks the real error behind a "bump the pin"
message.

- Add gh_retry(): 4 attempts with exponential backoff (2/4/8s) around gh calls,
  surfacing the final attempt's real stderr so a genuine 404 is distinguishable
  from a transient 403/broken-pipe.
- Remove the redundant strict release-view pre-check (the asset probe already
  tolerates transient failure, and the download is the real gate).
- Retry `gh release download` (with --clobber for clean re-attempts); on final
  failure print the actual gh error + whether it's transient (re-run) vs 404
  (bump the pin).

Validated: bash -n clean; gh_retry unit-tested (success/flaky/always-fail);
`--with-demos --dry-run` downloads all four components (runtime + 3 demos,
incl. modelviewer) and routes each to install.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dfattal dfattal merged commit d426805 into main Jun 5, 2026
21 checks passed
@dfattal dfattal deleted the fix/orchestrator-gh-retry branch June 5, 2026 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant