run-sweep: gate full-sweep PRs behind a sequential canary by Oseltamivir · Pull Request #1503 · SemiAnalysisAI/InferenceX

Oseltamivir · 2026-05-18T17:44:10Z

Summary

When a PR carries full-sweep-enabled (and not evals-only), pick the lowest-conc single-node benchmark entry as a canary and run it before fanning out the full sweep. If the canary fails, the eight fan-out jobs are skipped to save cluster time on shared failures (bad image tag, removed CLI flag, etc.).

Design choices:

Canary candidacy is restricted to single_node['1k1k' | '8k1k'] and excludes entries with run-eval: true, so the canary is always a pure benchmark smoke test using the existing single-node template.
The canary entry is removed from the regular fan-out's matrix (via remaining-search-space-config) only when the canary actually succeeded. On canary skip / cancel / canary-select failure, the regular fan-out falls back to the full search-space-config so coverage is preserved.
The fan-out gate proceeds only on canary-sweep.result == 'success' || == 'skipped' (allowlist, not != 'failure'), so a cancelled canary also blocks fan-out — preventing the case where in-flight fan-out jobs continue after the user explicitly cancels the canary.
A new non-canary-full-sweep-enabled label runs the full sweep without the canary gate (escape hatch when the canary is flaky or unrepresentative). The three sweep labels are now mutually exclusive.
Non-full-sweep PRs, draft PRs, pushes to main, and the reuse path all behave identically to before via existing gates.

The aggregated results_bmk artifact picks up both the canary's row and the regular fan-out's rows via the existing bmk_* glob — each entry appears exactly once.

Docker-tag-monitor research block split out to #1538.

Test plan

Each test ran sequentially on this branch via PR synchronize events. gptoss-fp4-h100-vllm was used as the changelog target for T1/T2/T4 (h100 has shorter CI queue than b200/b300); kimik2.5-int4-h100-vllm (agentic-only) was used for T3.

T1 — Canary success → fan-out runs with dedup (run 26467367124)
- Result: canary-sweep ran the lowest-conc 1k1k entry (tp=2, conc=4) and succeeded. Downstream single-node-1k1k ran against remaining-search-space-config (the canary entry de-duped). Final tally: 41 success, 7 skipped, 0 failures.
- Catches: jq remove_one correctness and the (canary == success && remaining) || full matrix ternary.
T2 — Canary failure → fan-out skipped (run 26471353341)
- Setup: image override to vllm/vllm-openai:nonexistent-tag-T2-canary-failure-test.
- Result: canary-sweep.result == 'failure'; all 8 fan-out jobs (single-node 1k1k/8k1k, single-node agentic/eval, all multi-node variants) and collect-results correctly skipped via the success || skipped allowlist.
- Catches: the value-prop of this PR — gate actually blocks fan-out on shared failure.
T3 — No canary candidate → fan-out on full search space (run 26471597565)
- Setup: switched changelog target to kimik2.5-int4-h100-vllm (agentic-only config; no fixed-seq-len entries → empty single_node['1k1k'] and ['8k1k'] arrays).
- Result: canary-select succeeded but returned []; canary-sweep skipped via the != '' && != '[]' guard; agentic fan-out ran on the full search space (ternary fell through canary-sweep.result == 'success'); 1k1k/8k1k/eval/multi-node all correctly skipped because their matrices are empty for this config.
- Catches: the fallback branch of the ternary — broken matrices wouldn't silently skip single-node sweeps.
T4 — full-sweep-enabled + evals-only → canary skipped via label gate (run 26475589608)
- Result: canary-select skipped by its !contains(... 'evals-only') guard despite a valid canary candidate being available on gptoss-fp4-h100-vllm; canary-sweep skipped; fan-out matrices populated with the full search space.
- Catches: the evals-only carve-out, which neither T1/T2 (canary runs) nor T3 (canary has no candidate) exercise.
T5 — Push to main → canary never fires
- Method: workflow-logic inspection only (no live run).
- Expected: canary-select.if requires github.event_name == 'pull_request', so on a push event it evaluates false → canary-sweep skipped → every sweep-* job runs via the success || skipped allowlist on the full search-space-config. Identical to pre-PR push-to-main behavior.
T8 — non-canary-full-sweep-enabled label → canary skipped, full fan-out (run 26471525429)
- Setup: swap PR label from full-sweep-enabled to non-canary-full-sweep-enabled.
- Result: canary-select skipped (its gate requires exact-match on full-sweep-enabled); canary-sweep skipped; fan-out matrices populated with the full search space (no dedup since canary-sweep.result != 'success').
- Catches: the new label correctly takes the no-canary path AND the matrix ternary correctly falls through to search-space-config when canary is skipped.

+        needs: canary-select
+        if: ${{ needs.canary-select.outputs.canary-config != '' && needs.canary-select.outputs.canary-config != '[]' }}
+        uses: ./.github/workflows/benchmark-tmpl.yml
+        name: canary /
+        strategy:
+            fail-fast: false
+            matrix:
+                config: ${{ fromJson(needs.canary-select.outputs.canary-config) }}
+        secrets: inherit
+        with:
+            exp-name: ${{ matrix.config.exp-name }}
+            isl: ${{ matrix.config.isl }}
+            osl: ${{ matrix.config.osl }}
+            max-model-len: ${{ matrix.config.max-model-len }}
+            runner: ${{ matrix.config.runner }}
+            image: ${{ matrix.config.image }}
+            model: ${{ matrix.config.model }}
+            model-prefix: ${{ matrix.config.model-prefix }}
+            framework: ${{ matrix.config.framework }}
+            precision: ${{ matrix.config.precision }}
+            tp: ${{ matrix.config.tp }}
+            ep: ${{ matrix.config.ep }}
+            dp-attn: ${{ matrix.config.dp-attn }}
+            conc: ${{ matrix.config.conc }}
+            spec-decoding: ${{ matrix.config.spec-decoding }}
+            disagg: ${{ matrix.config.disagg }}
+            run-eval: false
+
+    sweep-multi-node-1k1k:


Per PR #1503 test plan (T1 — canary success → fan-out runs with dedup). The minimaxm2.5-fp8-b200-vllm config has both single_node 1k1k and 8k1k scenarios, so canary-select has lowest-conc candidates to pick from. The master config image (vllm/vllm-openai:v0.21.0) is known-good, so the canary is expected to succeed and the fan-out should run against remaining-search-space-config (canary entry removed → no duplicate run).

cursor · 2026-05-26T18:36:38Z

+              always() &&
+              needs.setup.result == 'success' &&
+              needs.setup.outputs.reuse-enabled != 'true' &&
+              (needs.canary-sweep.result == 'success' || needs.canary-sweep.result == 'skipped') &&


Canary gate doesn't handle cancelled state as documented

Low Severity

The PR description states the fan-out gate "blocks only on canary-sweep.result == 'failure' — every other state (success, skipped, cancelled) proceeds, so a bug in the canary mechanism never blocks the rest of the sweep." However, the condition (needs.canary-sweep.result == 'success' || needs.canary-sweep.result == 'skipped') does not allow cancelled through. Using needs.canary-sweep.result != 'failure' would match the documented design intent and be more resilient.

Additional Locations (2)

.github/workflows/run-sweep.yml#L287-L288

.github/workflows/run-sweep.yml#L324-L325

^{Reviewed by Cursor Bugbot for commit 0ba1cf7. Configure here.}

github-actions · 2026-05-26T18:56:37Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26467367124
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26467367124

Per PR #1503 test plan (T1 — canary success → fan-out runs with dedup). The minimaxm2.5-fp8-b200-vllm config has both single_node 1k1k and 8k1k scenarios, so canary-select has lowest-conc candidates to pick from. The master config image (vllm/vllm-openai:v0.21.0) is known-good, so the canary is expected to succeed and the fan-out should run against remaining-search-space-config (canary entry removed → no duplicate run).

github-actions · 2026-05-26T19:50:34Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26471353341
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26471353341

github-actions · 2026-05-26T19:53:10Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26471519153
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26471519153

github-actions · 2026-05-26T20:08:33Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26471525429
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26471525429

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 759a193. Configure here.}

cursor · 2026-05-26T20:51:53Z

+    - gptoss-fp4-h100-vllm
+  description:
+    - "[T4 / PR #1503 canary-gate test] Touch gptoss-fp4-h100-vllm (which has 1k1k/8k1k entries → valid canary candidate exists) and add `evals-only` label alongside `full-sweep-enabled`. Expected: canary-select SKIPPED via the `!contains(..., 'evals-only')` filter on line 151 — the canary suppression is label-driven, not absence-of-candidate. (Retrigger #2.)"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1503


Test-only changelog entry committed to production file

Low Severity

The entry at the bottom of perf-changelog.yaml is clearly test scaffolding — its description contains "[T4 / PR #1503 canary-gate test]" and explains the expected test behavior. This will ship to main on merge and trigger an unnecessary benchmark sweep for gptoss-fp4-h100-vllm on any future workflow invocation that reads the changelog.

^{Reviewed by Cursor Bugbot for commit 759a193. Configure here.}

github-actions · 2026-05-26T21:13:06Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26471597565
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26471597565

When a PR carries `full-sweep-enabled` (and not `evals-only`), pick the lowest-conc single-node benchmark entry as a canary and run it before fanning out the full sweep. If the canary fails, the eight fan-out jobs are skipped to save cluster time on shared failures (bad image tag, removed CLI flag, etc.). Design choices: - Canary candidacy is restricted to single_node['1k1k' | '8k1k'] and excludes entries with run-eval: true, so the canary is always a pure benchmark smoke test using the existing single-node template. - The canary entry is removed from the regular fan-out's matrix (via remaining-search-space-config) only when the canary actually succeeded. On canary skip / cancel / canary-select failure, the regular fan-out falls back to the full search-space-config so coverage is preserved. - The fan-out gate blocks only on `canary-sweep.result == 'failure'` -- every other state (success, skipped, cancelled) proceeds, so a bug in the canary mechanism never blocks the rest of the sweep. - Non-full-sweep PRs, draft PRs, pushes to main, and the reuse path all behave identically to before via existing gates. The aggregated results_bmk artifact picks up both the canary's row and the regular fan-out's rows via the existing bmk_* glob -- each entry appears exactly once.

Previously the fan-out gate was `needs.canary-sweep.result != 'failure'`, which let `cancelled` (and any unknown future result) fall through. A cancelled canary then ran the FULL fan-out matrix — including the canary's own entry — without canary validation, the worst-case outcome. Replace with an explicit allowlist: (result == 'success' || result == 'skipped') - success: canary passed → fan-out runs with deduped (remaining) matrix - skipped: no canary candidate (multi-node-only / evals-only) → fan-out runs with full matrix (no canary to dedup against) - failure / cancelled / anything else: fan-out blocked The matrix-ternary `result == 'success' && remaining-search-space-config || full-search-space-config` already had the right shape and is untouched. Applies to all 8 fan-out jobs (single/multi-node 1k1k+8k1k, agentic, multi-node-agentic, evals, multi-node-evals).

…nary) New label that triggers the full sweep without running the canary gate. Acts as an escape hatch when: - The canary is known to be flaky / unreliable for this config - The user wants the full sweep without the canary delay - The canary's chosen entry is not a representative smoke test Behavior matrix: sweep-enabled - trims to max(conc) per parallelism, with canary full-sweep-enabled - full intermediate conc sweep, with canary non-canary-full-sweep-enabled - full intermediate conc sweep, NO canary How it works (no changes to existing label semantics): - Setup gate triggers on any of the 3 labels (was: 2) - canary-select gate still requires `full-sweep-enabled` (exact array match, so `non-canary-full-sweep-enabled` does NOT match) → canary skips → all fan-out jobs run on full search space via the `== 'skipped'` allowlist - TRIM_CONC env is unchanged — only `sweep-enabled` enables trim, so the new label correctly behaves as "full sweep" - The reject-conflicting-labels step is now a 3-way exclusion: at most one of {sweep-enabled, full-sweep-enabled, non-canary-full-sweep-enabled} - The same gate updates apply to the comment-visualizer job - Concurrency-group filter excludes the new label too so toggling it uses the same `'active'` group key as the other sweep labels

github-actions · 2026-05-26T21:29:58Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26475589608
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26475589608

Oseltamivir requested a review from a team May 18, 2026 17:44

github-project-automation Bot added this to InferenceMAX Board May 18, 2026

github-advanced-security AI found potential problems May 18, 2026

View reviewed changes

Oseltamivir mentioned this pull request May 20, 2026

docker-tag-monitor: ask @claude to read upstream release notes #1538

Merged

2 tasks

Oseltamivir force-pushed the sweep-canary-gate branch from 28ecd9a to 78b51e3 Compare May 20, 2026 23:02

Oseltamivir changed the title ~~Sequential canary for full-sweep PRs + pre-flight research in docker-tag-monitor~~ run-sweep: gate full-sweep PRs behind a sequential canary May 20, 2026

Oseltamivir added the full-sweep-enabled label May 26, 2026

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread perf-changelog.yaml Outdated

Oseltamivir force-pushed the sweep-canary-gate branch from 65d7856 to 0c1a94d Compare May 26, 2026 17:35

cursor Bot reviewed May 26, 2026

View reviewed changes

Oseltamivir requested review from jgangani and kedarpotdar-nv as code owners May 26, 2026 19:44

Oseltamivir force-pushed the sweep-canary-gate branch from 9286ade to de73181 Compare May 26, 2026 19:49

Oseltamivir added non-canary-full-sweep-enabled Run the full sweep without the canary gate (full search space, no trim) and removed full-sweep-enabled labels May 26, 2026

Oseltamivir added full-sweep-enabled and removed non-canary-full-sweep-enabled Run the full sweep without the canary gate (full search space, no trim) labels May 26, 2026

Oseltamivir added the evals-only Restrict full-sweep to eval matrices only; bypass canary smoke-test label May 26, 2026

cursor Bot reviewed May 26, 2026

View reviewed changes

Oseltamivir removed the evals-only Restrict full-sweep to eval matrices only; bypass canary smoke-test label May 26, 2026

Oseltamivir added 2 commits May 26, 2026 14:13

Oseltamivir added the evals-only Restrict full-sweep to eval matrices only; bypass canary smoke-test label May 26, 2026

Oseltamivir force-pushed the sweep-canary-gate branch from 2f6aa0c to afb408b Compare May 26, 2026 21:16

Oseltamivir removed the evals-only Restrict full-sweep to eval matrices only; bypass canary smoke-test label May 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run-sweep: gate full-sweep PRs behind a sequential canary#1503

run-sweep: gate full-sweep PRs behind a sequential canary#1503
Oseltamivir wants to merge 3 commits into
mainfrom
sweep-canary-gate

Oseltamivir commented May 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

cursor Bot May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Oseltamivir commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

cursor Bot May 26, 2026

Choose a reason for hiding this comment

Canary gate doesn't handle cancelled state as documented

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 26, 2026

Choose a reason for hiding this comment

Test-only changelog entry committed to production file

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Oseltamivir commented May 18, 2026 •

edited

Loading