-
-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Summary
The Daily Docs workflow (run #733) failed during the agent job because the GitHub Copilot CLI installer could not download from GitHub Releases β all 4 retry attempts received HTTP 500 errors from GitHub's servers.
Failure Details
- Run: 23201056596
- Commit: 207f55e
- Trigger: push (dependabot merge via merge queue β vsce npm deps bump)
- Failed Job:
agent(job ID 67423368905)
Root Cause Analysis
Category: Infrastructure β transient GitHub Releases HTTP 500
The install_copilot_cli.sh script downloads a checksums file from:
https://github.com/github/copilot-cli/releases/latest/download/SHA256SUMS.txt
```
All 4 retry attempts failed with:
```
curl: (22) The requested URL returned error: 500
##[error]Process completed with exit code 22.
The retries ran at ~5-second intervals (15:07:19 β 15:07:21 β 15:07:26 β 15:07:31 β 15:07:36), indicating that GitHub's release asset CDN was experiencing a transient outage during the ~16-second window the job ran.
This is not a code bug β the commit that triggered the run was a routine dependabot bump of vsce TypeScript ESLint dependencies and is unrelated to the failure.
Failed Jobs and Errors
| Job | Conclusion | Root Cause |
|---|---|---|
agent |
β failure | curl: (22) HTTP 500 from github.com/github/copilot-cli/releases |
All other jobs (pre_activation, activation, update_cache_memory, safe_outputs, conclusion) completed successfully.
Investigation Findings
- The Copilot CLI installer retried 4 times with ~5s delays before giving up
- No MCP gateway logs were produced (agent never started)
- No agent output/patch was generated
- This is a transient GitHub infrastructure issue, not a recurring pattern in this repo
Recommended Actions
- No code changes needed β this is a transient upstream failure
- Re-run the workflow if the Daily Docs update is time-sensitive
- Consider whether the Copilot CLI installer should back off more aggressively (e.g., 30β60s retry intervals) to survive brief CDN hiccups
Prevention Strategies
The current 4-retry, ~5s-interval strategy is insufficient to survive GitHub CDN outages that can last 10β30 seconds. Increasing retry intervals (e.g., exponential backoff up to 60s) would improve resilience against brief 500 errors.
AI Team Self-Improvement
### Transient GitHub Infrastructure Failures
When investigating a CI failure where the agent job never started (no MCP logs, no agent output, no
patch), immediately check for infrastructure failures in the install steps (e.g., curl HTTP 500).
These are transient upstream issues β do NOT file code bugs or change application code in response.
Simply note the transient failure and recommend a re-run. Only escalate to a code fix if the same
infrastructure error pattern recurs on multiple consecutive runs.Historical Context
No prior issues matched this specific pattern (HTTP 500 during install_copilot_cli.sh). The closest prior issue (#2563) was a different failure mode (Copilot CLI installed successfully but exited silently during detection).
To install this agentic workflow, run
gh aw add githubnext/agentics/workflows/ci-doctor.md@1ef9dbe65e8265b57fe2ffa76098457cf3ae2b32
Metadata
Metadata
Assignees
Labels
Type
Projects
Status