-
Notifications
You must be signed in to change notification settings - Fork 31
feat(checkout): harden slow-retry tier 2 against hangs and false positives #539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
skipi
wants to merge
1
commit into
master
Choose a base branch
from
skipi/github-resiliency-followups
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -381,6 +381,7 @@ function checkout::clone_with_speed_check() { | |
| # Returns: 0 = success, 1 = killed (slow), 2 = git error | ||
| local threshold="${SEMAPHORE_GIT_CLONE_SLOW_THRESHOLD:-20000}" | ||
| local timeout="${SEMAPHORE_GIT_CLONE_SLOW_TIMEOUT:-15}" | ||
| local grace="${SEMAPHORE_GIT_CLONE_SLOW_GRACE:-30}" | ||
| local check_interval=5 | ||
| local target_dir="${SEMAPHORE_GIT_DIR}" | ||
|
|
||
|
|
@@ -389,15 +390,26 @@ function checkout::clone_with_speed_check() { | |
|
|
||
| local prev_size=0 | ||
| local slow_seconds=0 | ||
| local elapsed=0 | ||
|
|
||
| while kill -0 "$pid" 2>/dev/null; do | ||
| sleep "$check_interval" | ||
| kill -0 "$pid" 2>/dev/null || break | ||
| elapsed=$((elapsed + check_interval)) | ||
|
|
||
| local cur_size | ||
| cur_size=$(du -sk "${target_dir}" 2>/dev/null | awk '{print $1}') | ||
| cur_size=${cur_size:-0} | ||
|
|
||
| # Don't arm slow-detection during the startup grace window. Server-side | ||
| # object counting/compression and connection setup can run for seconds | ||
| # with little or no on-disk growth before the receive phase ramps up; | ||
| # penalizing it would kill healthy clones of large repos. | ||
| if [ "$elapsed" -lt "$grace" ]; then | ||
| prev_size=$cur_size | ||
| continue | ||
| fi | ||
|
|
||
| if [ "$cur_size" -gt 0 ]; then | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @skipi Slow accounting still requires cur_size > 0, so pre-write stalls (DNS/TCP/TLS/auth) are never detected by this watchdog. Please add a no-progress timeout and a hard wall-clock timeout independent of directory size. |
||
| local speed=$(( (cur_size - prev_size) * 1024 / check_interval )) | ||
| prev_size=$cur_size | ||
|
|
@@ -429,9 +441,15 @@ function checkout::resolve_alt_ips() { | |
| local regions | ||
| IFS=',' read -ra regions <<< "${SEMAPHORE_GIT_CLONE_ALT_REGIONS:-74.0.0.0/8,177.0.0.0/8,110.0.0.0/8}" | ||
|
|
||
| local connect_timeout="${SEMAPHORE_GIT_CLONE_DOH_CONNECT_TIMEOUT:-5}" | ||
| local max_time="${SEMAPHORE_GIT_CLONE_DOH_MAX_TIME:-10}" | ||
|
|
||
| for region in "${regions[@]}"; do | ||
| local ip | ||
| ip=$(curl -sf "https://dns.google/resolve?name=${git_host}&type=A&edns_client_subnet=${region}" | \ | ||
| # --connect-timeout / --max-time keep tier 2 from hanging when dns.google | ||
| # is blocked or unreachable (common on locked-down self-hosted runners). | ||
| ip=$(curl -sf --connect-timeout "${connect_timeout}" --max-time "${max_time}" \ | ||
| "https://dns.google/resolve?name=${git_host}&type=A&edns_client_subnet=${region}" | \ | ||
| grep -o '"data":"[^"]*"' | sed 's/"data":"//;s/"//' | head -1) | ||
|
|
||
| if [ -n "$ip" ] && [ "$ip" != "$current_ip" ]; then | ||
|
|
@@ -451,7 +469,11 @@ function checkout::clone_with_alt_ip() { | |
| # HTTPS: inject curloptResolve between 'git' and 'clone' | ||
| checkout::clone_with_speed_check git -c "http.curloptResolve=${git_host}:${git_port}:${alt_ip}" "${@:2}" | ||
| else | ||
| # SSH: route through alternative IP via ProxyCommand | ||
| # SSH: route through alternative IP via ProxyCommand (needs nc) | ||
| if ! command -v nc >/dev/null 2>&1; then | ||
| echo "[checkout] 'nc' not available; cannot route SSH clone through alternative endpoint" | ||
| return 1 | ||
| fi | ||
| local orig_ssh_command="${GIT_SSH_COMMAND:-}" | ||
| export GIT_SSH_COMMAND="${orig_ssh_command:-ssh} -o ProxyCommand='nc ${alt_ip} ${git_port}'" | ||
| checkout::clone_with_speed_check "$@" | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skipi Disk-size delta is not a reliable network-throughput signal. Healthy clone phases can have flat/negative growth and be killed as "slow". Grace only delays arming; it does not fix the signal. Please replace this with transport-level checks (e.g. git/curl low-speed controls for HTTPS) plus an absolute timeout.