-
Notifications
You must be signed in to change notification settings - Fork 68
feat(ci): consolidate KWOK tier workflows into single reusable runner #1515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
8de6e8f
f5d8fdb
f2a8c8f
bd126d4
d51b964
b0d8ef0
83820ec
79bd45c
dcd8f12
e234dfd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,4 @@ | ||
| # Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 Minor — UTF-8 BOM accidentally prepended to kwok-recipes.yaml Line 1 now begins with bytes Blast radius: None functional — GitHub Actions and yamllint 1.38.0 both tolerate a leading BOM (lint passes), so CI won't fail. Cost is hygiene: git-blame noise, an encoding that differs from every sibling file, and a latent trap for tooling that keys on the first byte. Fix: Strip it so the first hunk disappears: |
||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
|
|
@@ -32,7 +32,7 @@ on: | |
| - 'go.sum' | ||
| - 'vendor/**' | ||
| - '.github/workflows/kwok-recipes.yaml' | ||
| - '.github/workflows/kwok-tier3-shard.yaml' | ||
| - '.github/workflows/kwok-test-run.yaml' | ||
| - '.github/actions/kwok-test/**' | ||
| - '!**.md' | ||
| pull_request: | ||
|
|
@@ -52,7 +52,7 @@ on: | |
| - 'go.sum' | ||
| - 'vendor/**' | ||
| - '.github/workflows/kwok-recipes.yaml' | ||
| - '.github/workflows/kwok-tier3-shard.yaml' | ||
| - '.github/workflows/kwok-test-run.yaml' | ||
| - '.github/actions/kwok-test/**' | ||
| - '!**.md' | ||
| schedule: | ||
|
|
@@ -84,6 +84,8 @@ jobs: | |
| tier2: ${{ steps.classify.outputs.tier2 }} | ||
| tier3: ${{ steps.classify.outputs.tier3 }} | ||
| tier3_batches: ${{ steps.classify.outputs.tier3_batches }} | ||
| tier1_pairs: ${{ steps.classify.outputs.tier1_pairs }} | ||
| tier2_pairs: ${{ steps.classify.outputs.tier2_pairs }} | ||
| steps: | ||
| - name: Checkout | ||
| uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0 | ||
|
|
@@ -100,13 +102,25 @@ jobs: | |
| run: | | ||
| set -euo pipefail | ||
|
|
||
| # Deployer list — single source of truth, consumed by all tiers and by | ||
| # the workflow_dispatch early-exit below. To add or remove a deployer, | ||
| # change this one line; Tier 1, Tier 2, and Tier 3 all derive from it. | ||
| readonly DEPLOYERS='["helm","argocd-oci","argocd-helm-oci","argocd-git","flux-oci","flux-git"]' | ||
|
|
||
| # --- workflow_dispatch: test exactly the requested recipe --- | ||
| if [[ -n "${DISPATCH_RECIPE}" ]]; then | ||
| single=$(jq -nc '[$r]' --arg r "${DISPATCH_RECIPE}") | ||
| echo "tier1=${single}" >> "$GITHUB_OUTPUT" | ||
| echo "tier2=[]" >> "$GITHUB_OUTPUT" | ||
| echo "tier3=[]" >> "$GITHUB_OUTPUT" | ||
| echo "tier3_batches=[]" >> "$GITHUB_OUTPUT" | ||
| single_pairs=$(jq -cn \ | ||
| --arg r "${DISPATCH_RECIPE}" \ | ||
| --argjson deployers "${DEPLOYERS}" ' | ||
| [ $deployers[] | {recipe: $r, deployer: .} ] | ||
| ') | ||
| echo "tier1=${single}" >> "$GITHUB_OUTPUT" | ||
| echo "tier2=[]" >> "$GITHUB_OUTPUT" | ||
| echo "tier3=[]" >> "$GITHUB_OUTPUT" | ||
| echo "tier3_batches=[]" >> "$GITHUB_OUTPUT" | ||
| echo "tier1_pairs=${single_pairs}" >> "$GITHUB_OUTPUT" | ||
| echo "tier2_pairs=[]" >> "$GITHUB_OUTPUT" | ||
| echo "Manual dispatch: ${DISPATCH_RECIPE}" | ||
| exit 0 | ||
| fi | ||
|
|
@@ -237,16 +251,42 @@ jobs: | |
| # --- Tier 3: full matrix (all testable overlays) --- | ||
| tier3="$all" | ||
|
|
||
| # Local alias so jq --argjson calls below can use $deployers | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔵 Nitpick — Dead The Blast radius: None functional; maintainer-confusing — a future reader trusts the "intentional / unused" comment and may keep propagating an alias that could just be deleted. Fix: Either drop the alias and use |
||
| # (DEPLOYERS is readonly; this assignment is intentional). | ||
| # shellcheck disable=SC2034 | ||
| deployers="${DEPLOYERS}" | ||
|
Comment on lines
+256
to
+257
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: this alias isn't needed. |
||
|
|
||
| # --- Pre-build Tier 1 pairs: generic recipes × all deployers --- | ||
| tier1_pairs=$(jq -cn \ | ||
| --argjson recipes "$tier1" \ | ||
| --argjson deployers "$deployers" ' | ||
| [ $recipes[] as $r | $deployers[] as $d | {recipe: $r, deployer: $d} ] | ||
| ') | ||
|
|
||
| # Guard: Tier 1 is expected to stay well under 256 (no batching needed). | ||
| # Warn early if organic growth is approaching the limit. | ||
| tier1_pair_count=$(echo "$tier1_pairs" | jq 'length') | ||
| if (( tier1_pair_count > 256 )); then | ||
| echo "::error::Tier 1 has ${tier1_pair_count} pairs (>256) — add batching before passing this to kwok-test-run.yaml" | ||
| exit 1 | ||
| elif (( tier1_pair_count > 200 )); then | ||
| echo "::warning::Tier 1 has ${tier1_pair_count} pairs (>200) — consider adding batching before it reaches 256" | ||
| fi | ||
|
|
||
| # --- Pre-build Tier 2 pairs: diff-affected recipes, helm-only --- | ||
| # Coverage-policy decision: Tier 2 uses helm only to keep PR wall-clock | ||
| # time proportional to the change scope. Full deployer coverage runs in | ||
| # Tier 3 on every push to main and on the nightly schedule. See ADR-003 | ||
| # §"Tier 2 deployer coverage" for rationale and how to revisit this. | ||
|
mohityadav8 marked this conversation as resolved.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟠 Major — New workflow comments cite a nonexistent ADR-003 §"Tier 2 deployer coverage" This comment (and the duplicate at L337-338 on the test-tier2 job) directs readers to Blast radius: A contributor revisiting the helm-only trade-off follows the pointer and finds nothing, defeating the comment's stated "how to revisit this" intent. Same dead pointer recurs at L337-338. Fix: Add a |
||
| tier2_pairs=$(echo "$tier2" | jq -c '[.[] | {recipe: ., deployer: "helm"}]') | ||
|
|
||
| # --- Tier 3 batching --- | ||
| # GitHub caps a single job's matrix at 256 configurations. Tier 3 | ||
| # crosses every testable recipe with every deployer, so the raw | ||
| # cross-product (recipes × deployers) outgrew the cap. Split the | ||
| # cross-product (recipes × deployers) can outgrow the cap. Split the | ||
| # {recipe, deployer} pairs into batches of <= TIER3_BATCH_SIZE; the | ||
| # caller fans each batch out to kwok-tier3-shard.yaml, keeping every | ||
| # shard's matrix under the limit. Keep this deployer list in sync | ||
| # with the test-tier1 matrix above and the input doc in | ||
| # .github/actions/kwok-test/action.yml. | ||
| deployers='["helm","argocd-oci","argocd-helm-oci","argocd-git","flux-oci","flux-git"]' | ||
| # caller fans each batch out to kwok-test-run.yaml, keeping every | ||
| # shard's matrix under the limit. | ||
| readonly TIER3_BATCH_SIZE=200 # headroom under GitHub's 256 cap | ||
|
|
||
| # Fail closed if the batch size is ever raised past the hard limit — | ||
|
|
@@ -259,7 +299,7 @@ jobs: | |
|
|
||
| tier3_batches=$(jq -cn \ | ||
| --argjson recipes "$tier3" \ | ||
| --argjson deployers "$deployers" \ | ||
| --argjson deployers "${DEPLOYERS}" \ | ||
| --argjson size "$TIER3_BATCH_SIZE" ' | ||
| [ $recipes[] as $r | $deployers[] as $d | {recipe: $r, deployer: $d} ] | ||
| | [ range(0; length; $size) as $i | ||
|
|
@@ -268,106 +308,50 @@ jobs: | |
| ') | ||
|
|
||
| # --- Output --- | ||
| echo "tier1=${tier1}" >> "$GITHUB_OUTPUT" | ||
| echo "tier2=${tier2}" >> "$GITHUB_OUTPUT" | ||
| echo "tier3=${tier3}" >> "$GITHUB_OUTPUT" | ||
| echo "tier1=${tier1}" >> "$GITHUB_OUTPUT" | ||
| echo "tier2=${tier2}" >> "$GITHUB_OUTPUT" | ||
| echo "tier3=${tier3}" >> "$GITHUB_OUTPUT" | ||
| echo "tier3_batches=${tier3_batches}" >> "$GITHUB_OUTPUT" | ||
| echo "tier1_pairs=${tier1_pairs}" >> "$GITHUB_OUTPUT" | ||
| echo "tier2_pairs=${tier2_pairs}" >> "$GITHUB_OUTPUT" | ||
|
|
||
| deployer_count=$(echo "$deployers" | jq 'length') | ||
| tier3_pairs=$(echo "$tier3_batches" | jq '[.[].pairs[]] | length') | ||
| tier3_batch_count=$(echo "$tier3_batches" | jq 'length') | ||
| echo "Tier 1 (generic): $(echo "$tier1" | jq 'length') recipe(s)" | ||
| echo "Tier 2 (diff-aware): $(echo "$tier2" | jq 'length') recipe(s)" | ||
| echo "Tier 3 (full matrix): $(echo "$tier3" | jq 'length') recipe(s) × $(echo "$deployers" | jq 'length') deployer(s) = ${tier3_pairs} pair(s) in ${tier3_batch_count} batch(es)" | ||
| echo "Tier 1 (generic): $(echo "$tier1" | jq 'length') recipe(s) × ${deployer_count} deployer(s) = ${tier1_pair_count} pair(s)" | ||
| echo "Tier 2 (diff-aware): $(echo "$tier2" | jq 'length') recipe(s) × 1 deployer (helm) = $(echo "$tier2_pairs" | jq 'length') pair(s)" | ||
| echo "Tier 3 (full matrix): $(echo "$tier3" | jq 'length') recipe(s) × ${deployer_count} deployer(s) = ${tier3_pairs} pair(s) in ${tier3_batch_count} batch(es)" | ||
|
|
||
| # ── Tier 1: PR gate — generic overlays (PR + push, skip on schedule) ── | ||
| test-tier1: | ||
| name: 'Tier 1: ${{ matrix.recipe }} (${{ matrix.deployer }})' | ||
| needs: discover | ||
| if: >- | ||
| github.event_name != 'schedule' && | ||
| needs.discover.outputs.tier1 != '[]' && | ||
| needs.discover.outputs.tier1 != '' | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 15 | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| recipe: ${{ fromJSON(needs.discover.outputs.tier1) }} | ||
| deployer: [helm, argocd-oci, argocd-helm-oci, argocd-git, flux-oci, flux-git] | ||
| steps: | ||
| - name: Checkout Code | ||
| uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0 | ||
| with: | ||
| persist-credentials: false | ||
|
|
||
| - name: Load versions | ||
| id: versions | ||
| uses: ./.github/actions/load-versions | ||
|
|
||
| - name: Run KWOK test | ||
| uses: ./.github/actions/kwok-test | ||
| with: | ||
| recipe: ${{ matrix.recipe }} | ||
| deployer: ${{ matrix.deployer }} | ||
| go_version: ${{ steps.versions.outputs.go }} | ||
| goreleaser_version: ${{ steps.versions.outputs.goreleaser }} | ||
| kind_version: ${{ steps.versions.outputs.kind }} | ||
| helm_version: ${{ steps.versions.outputs.helm }} | ||
| kwok_version: ${{ steps.versions.outputs.kwok }} | ||
| kubectl_version: ${{ steps.versions.outputs.kubectl }} | ||
| yq_version: ${{ steps.versions.outputs.yq }} | ||
| flux_version: ${{ steps.versions.outputs.flux }} | ||
| chainsaw_version: ${{ steps.versions.outputs.chainsaw }} | ||
| chainsaw_sha256: ${{ steps.versions.outputs.chainsaw_sha256_linux_amd64 }} | ||
| kind_node_image: ${{ steps.versions.outputs.kind_node_image }} | ||
| needs.discover.outputs.tier1_pairs != '[]' && | ||
| needs.discover.outputs.tier1_pairs != '' | ||
| uses: ./.github/workflows/kwok-test-run.yaml | ||
| with: | ||
| pairs: ${{ needs.discover.outputs.tier1_pairs }} | ||
|
|
||
| # ── Tier 2: diff-aware accelerator tests (PR only, conditional) ── | ||
| # Coverage-policy decision: Tier 2 uses helm only (see ADR-003 §"Tier 2 | ||
| # deployer coverage"). Full deployer coverage runs in Tier 3 on push/nightly. | ||
| test-tier2: | ||
| name: 'Tier 2: ${{ matrix.recipe }}' | ||
| needs: discover | ||
| if: >- | ||
| github.event_name == 'pull_request' && | ||
| needs.discover.outputs.tier2 != '[]' && | ||
| needs.discover.outputs.tier2 != '' | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 15 | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| recipe: ${{ fromJSON(needs.discover.outputs.tier2) }} | ||
| steps: | ||
| - name: Checkout Code | ||
| uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7.0.0 | ||
| with: | ||
| persist-credentials: false | ||
|
|
||
| - name: Load versions | ||
| id: versions | ||
| uses: ./.github/actions/load-versions | ||
|
|
||
| - name: Run KWOK test | ||
| uses: ./.github/actions/kwok-test | ||
| with: | ||
| recipe: ${{ matrix.recipe }} | ||
| go_version: ${{ steps.versions.outputs.go }} | ||
| goreleaser_version: ${{ steps.versions.outputs.goreleaser }} | ||
| kind_version: ${{ steps.versions.outputs.kind }} | ||
| helm_version: ${{ steps.versions.outputs.helm }} | ||
| kwok_version: ${{ steps.versions.outputs.kwok }} | ||
| kubectl_version: ${{ steps.versions.outputs.kubectl }} | ||
| yq_version: ${{ steps.versions.outputs.yq }} | ||
| flux_version: ${{ steps.versions.outputs.flux }} | ||
| chainsaw_version: ${{ steps.versions.outputs.chainsaw }} | ||
| chainsaw_sha256: ${{ steps.versions.outputs.chainsaw_sha256_linux_amd64 }} | ||
| kind_node_image: ${{ steps.versions.outputs.kind_node_image }} | ||
| needs.discover.outputs.tier2_pairs != '[]' && | ||
| needs.discover.outputs.tier2_pairs != '' | ||
| uses: ./.github/workflows/kwok-test-run.yaml | ||
| with: | ||
| pairs: ${{ needs.discover.outputs.tier2_pairs }} | ||
|
|
||
| # ── Tier 3: full matrix (push to main + nightly schedule) ── | ||
| # The recipe × deployer cross-product exceeds GitHub's 256-config matrix cap, | ||
| # so discover splits it into batches and we fan each batch out to the | ||
| # kwok-tier3-shard reusable workflow (one shard per batch, each <= 256). | ||
| # Per ADR-003: the concurrency group is keyed by SHA so successive merges to | ||
| # main never cancel in-flight Tier 3 runs; the batch id keeps every shard of a | ||
| # single run in its own group so they all run in parallel. | ||
| # The recipe × deployer cross-product can exceed GitHub's 256-config cap, so | ||
| # discover batches the pairs and we fan each batch out to kwok-test-run.yaml | ||
| # (one shard per batch, each <= 256 pairs). Per ADR-003: concurrency is keyed | ||
| # by SHA so successive merges never cancel in-flight Tier 3 runs; the batch id | ||
| # keeps every shard of a single run in its own group so they all run in parallel. | ||
| test-tier3: | ||
| needs: discover | ||
| concurrency: | ||
|
|
@@ -381,7 +365,7 @@ jobs: | |
| fail-fast: false | ||
| matrix: | ||
| batch: ${{ fromJSON(needs.discover.outputs.tier3_batches) }} | ||
| uses: ./.github/workflows/kwok-tier3-shard.yaml | ||
| uses: ./.github/workflows/kwok-test-run.yaml | ||
| with: | ||
| pairs: ${{ toJSON(matrix.batch.pairs) }} | ||
|
|
||
|
|
@@ -434,3 +418,4 @@ jobs: | |
| fi | ||
|
|
||
| echo "All recipe validations passed" >> $GITHUB_STEP_SUMMARY | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line now starts with a UTF-8 BOM (
EF BB BF) —maindoesn't have it, so it crept in via an editor. Strip it; yamllint can flag a BOM and it's just noise on a workflow file.