Skip to content

feat(zebra): gate resilient git checkout behind git_clone_slow_retry feature flag#1046

Merged
skipi merged 4 commits into
mainfrom
skipi/zebra/git-clone-slow-retry-flag
May 29, 2026
Merged

feat(zebra): gate resilient git checkout behind git_clone_slow_retry feature flag#1046
skipi merged 4 commits into
mainfrom
skipi/zebra/git-clone-slow-retry-flag

Conversation

@skipi
Copy link
Copy Markdown
Collaborator

@skipi skipi commented May 29, 2026

What

Adds a per-organization feature flag, git_clone_slow_retry, that — when enabled — injects SEMAPHORE_GIT_CLONE_SLOW_RETRY=true into the job environment built by JobRequestFactory.

This is the producer side of the toolbox checkout-resiliency work (semaphoreci/toolbox#538 + follow-up semaphoreci/toolbox#539). The toolbox checkout reads SEMAPHORE_GIT_CLONE_SLOW_RETRY to opt into slow-clone detection + resilient retry (speed monitoring, retries, alternative-endpoint fallback), and is a no-op when the var is absent. So the rollout is: flip the flag for an org → zebra injects the var → toolbox enables the behavior. Default off = unchanged behavior.

How

Mirrors the existing TestResults / :test_results_no_trim pattern exactly:

  • New Zebra.Workers.JobRequestFactory.GitCheckout module with env_vars/1, gated on FeatureProvider.feature_enabled?(:git_clone_slow_retry, param: org_id).
  • Appended to the env-var list in JobRequestFactory alongside TestResults.env_vars(org_id).
  • Registered in the test StubbedProvider (hidden by default, enabled for a dedicated test org id).

Deliberately not threading org_id through Repository.* — that would churn the exact-match repository tests for no benefit. Only the on/off switch is injected; the toolbox owns the tuning defaults (threshold / timeout / grace / retries).

Tests

zebra/test/zebra/workers/job_request_factory/git_checkout_test.exs — feature disabled → []; enabled → SEMAPHORE_GIT_CLONE_SLOW_RETRY=true.

Follow-up (not in this PR)

The git_clone_slow_retry feature must be registered in the feature management backend before it can be toggled per-org in production.

🤖 Generated with Claude Code

…feature

Injects SEMAPHORE_GIT_CLONE_SLOW_RETRY=true into the job environment when
the :git_clone_slow_retry feature is enabled for the organization. This is
the producer side of the toolbox checkout-resiliency work
(semaphoreci/toolbox#538, semaphoreci/toolbox#539): the toolbox `checkout`
reads this env var to opt into slow-clone detection and resilient retry,
and is a no-op when the var is absent.

Mirrors the existing TestResults / :test_results_no_trim pattern: a small
feature-gated module appended to the job env var list in
JobRequestFactory, rather than threading org_id through Repository.* (which
would churn the exact-match repository tests). Only the on/off switch is
injected; the toolbox keeps sensible defaults for the tuning knobs.

The feature still needs registering in the feature management backend
before it can be toggled per-org in production.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread zebra/lib/zebra/workers/job_request_factory/git_checkout.ex
skipi and others added 2 commits May 29, 2026 10:13
…etry

feature_provider_invalidator_worker_test asserts the length of the full
StubbedProvider feature list; adding :git_clone_slow_retry took it from 8
to 9.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per review: the resilient checkout (GeoDNS alternative-endpoint fallback,
DoH lookups) targets GitHub.com reachability from Semaphore's cloud egress.
On self-hosted agents the network is the customer's own and the DoH
endpoint may be blocked, so gate the injection on cloud agents
(not Job.self_hosted?/1) in addition to the feature flag.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dexyk dexyk self-requested a review May 29, 2026 09:37
@skipi skipi merged commit 5c7f687 into main May 29, 2026
2 checks passed
@skipi skipi deleted the skipi/zebra/git-clone-slow-retry-flag branch May 29, 2026 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants