From 43febf10fb55c63250e64375a232ed79c64214c3 Mon Sep 17 00:00:00 2001 From: danielmeppiel Date: Thu, 23 Apr 2026 13:21:46 +0200 Subject: [PATCH 1/2] docs(ci): document merge-gate architecture and ruleset context gotcha - Document merge-gate.yml as the single-authority PR-time aggregator - Mark ci-integration-pr-stub.yml as DEPRECATED (slated for deletion) - Renumber workflow list (now 6 entries, was misnumbered with two #3s) - New section: Branch Protection & Required Checks - Ruleset 'context' field MUST match check-run name ('gate'), not the UI display string ('Merge Gate / gate'). Storing the display string causes permanent 'Expected - Waiting for status to be reported' that blocked PR #860 today - Adding new required checks goes through EXPECTED_CHECKS in merge-gate.yml, not the ruleset Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/instructions/cicd.instructions.md | 41 ++++++++++++++--------- CHANGELOG.md | 4 +++ 2 files changed, 30 insertions(+), 15 deletions(-) diff --git a/.github/instructions/cicd.instructions.md b/.github/instructions/cicd.instructions.md index 8c60cbda5..1c2c7681f 100644 --- a/.github/instructions/cicd.instructions.md +++ b/.github/instructions/cicd.instructions.md @@ -24,27 +24,33 @@ integration suite runs only at merge time via GitHub Merge Queue cross-workflow artifact plumbing across triggers. - **Never add a `pull_request` or `pull_request_target` trigger here.** This file holds production secrets (`GH_CLI_PAT`, `ADO_APM_PAT`). - Required-check satisfaction at PR time is handled by the inert stub - `ci-integration-pr-stub.yml` instead. -3. **`ci-integration-pr-stub.yml`** - inert PR-time stub for required checks - - Triggers on `pull_request_target` so the YAML is read from `main` - (admin-controlled) regardless of PR head contents - applies retroactively - to existing fork PRs without rebase. - - `permissions: {}`, no secrets, no checkout, four no-op `echo` jobs whose - names match the four Tier 2 required checks. Reports success in seconds. - - Concurrency group keyed on PR number cancels in-flight stub runs on - subsequent pushes. - - Activity types include `labeled/unlabeled/edited` so maintainers can - re-trigger the stub without forcing contributors to push commits. + Required-check satisfaction at PR time is handled by `merge-gate.yml`, + which aggregates all required signals into a single `gate` check. +3. **`merge-gate.yml`** - single-authority PR-time aggregator + - Triggers on `pull_request` only (single trigger - dual-trigger with + `pull_request_target` produces SUCCESS+CANCELLED check-run twins via + `cancel-in-progress` and poisons branch protection's rollup). + - One job named `gate`. Polls the Checks API for all entries in the + workflow's `EXPECTED_CHECKS` env var; aggregates pass/fail into a + single check-run. + - Branch protection requires ONLY this one check (`gate`). Adding, + renaming, or removing an underlying check is a `merge-gate.yml` edit, + never a ruleset edit. Tide / bors single-authority pattern. + - Recovery if the `pull_request` webhook is dropped: empty commit, + `gh workflow run merge-gate.yml -f pr_number=NNN`, or close+reopen. +4. **`ci-integration-pr-stub.yml`** - legacy PR-time stubs (DEPRECATED) + - Holdover from the pre-merge-gate model where branch protection + required all four Tier 2 names directly. Now redundant - kept only + to avoid disturbing in-flight PRs. Slated for deletion. - `.github/CODEOWNERS` requires Lead Maintainer review for any change to `.github/workflows/**` to prevent inadvertent additions of secrets, - checkout, or PR-data interpolation to this file. -3. **`build-release.yml`** - `push` to main, tags, schedule, `workflow_dispatch` + checkout, or PR-data interpolation. +5. **`build-release.yml`** - `push` to main, tags, schedule, `workflow_dispatch` - **Linux + Windows** run combined `build-and-test` (unit tests + binary build in one job). - **macOS Intel** uses `build-and-validate-macos-intel` (root node, runs own unit tests - no dependency on `build-and-test`). Builds the binary on every push for early regression feedback; integration + release-validation phases conditional on tag/schedule/dispatch. - **macOS ARM** uses `build-and-validate-macos-arm` (root node, tag/schedule/dispatch only - ARM runners are extremely scarce with 2-4h+ queue waits). Only requested when the binary is actually needed for a release. - Secrets always available. Full 5-platform binary output (linux x86_64/arm64, darwin x86_64/arm64, windows x86_64). -4. **`ci-runtime.yml`** - nightly schedule, manual dispatch, path-filtered push +6. **`ci-runtime.yml`** - nightly schedule, manual dispatch, path-filtered push - **Linux x86_64 only**. Live inference smoke tests (`apm run`) isolated from release pipeline. - Uses `GH_MODELS_PAT` for GitHub Models API access. - Failures do not block releases - annotated as warnings. @@ -86,6 +92,11 @@ integration suite runs only at merge time via GitHub Merge Queue - **Artifact Retention**: 30 days for debugging failed releases - **Cross-workflow artifacts**: ci-integration.yml builds the binary inline (no cross-workflow artifact transfer); build-release.yml jobs share artifacts within the same workflow run. +## Branch Protection & Required Checks +- **Single required check**: branch protection (`main-protection` ruleset id 9294522) requires only one status check: `gate` from `merge-gate.yml`. All other PR-time signals are aggregated by that workflow's poll loop. +- **CRITICAL ruleset gotcha**: the `context` field stored in the ruleset must match the **check-run name** (the job key, e.g. `gate`), NOT the GitHub UI's display string (`Merge Gate / gate`). Storing the display string causes permanent "Expected - Waiting for status to be reported" because no check-run with that literal name ever posts. The integration_id (`15368` = github-actions) plus the check-run name is what GitHub matches. +- **Adding a new required check**: add it to `EXPECTED_CHECKS` in `merge-gate.yml`. Do NOT touch the ruleset. + ## Trust Model - **PR push (any contributor, including forks)**: Runs Tier 1 only. No CI secrets exposed. PR code is checked out and tested in an unprivileged context. - **merge_group (write access required)**: Runs Tier 1 + Tier 2. Tier 2 sees secrets. The `gh-readonly-queue/main/*` ref is created by GitHub from the PR merged into main; only users with write access can trigger this by enqueueing a PR. diff --git a/CHANGELOG.md b/CHANGELOG.md index 74cb7af54..32e4f3fb7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Changed + +- CI docs: clarify that branch-protection ruleset must store the check-run name (`gate`), not the workflow display string (`Merge Gate / gate`); document the merge-gate aggregator in `cicd.instructions.md` and mark the legacy stub workflow as deprecated. + ## [0.9.2] - 2026-04-23 ### Added From 361d7783d3a307e8e298d15d35669bfe391ca41c Mon Sep 17 00:00:00 2001 From: danielmeppiel Date: Thu, 23 Apr 2026 13:32:20 +0200 Subject: [PATCH 2/2] chore(ci): remove deprecated PR-time stub workflow The four PR stubs (Build/Smoke/Integration/Release Validation - Linux) were a holdover from the pre-merge-gate model where branch protection required each Tier 2 check name directly. After #867, branch protection requires only the single 'gate' check from merge-gate.yml, so the stubs are dead weight that fire on every PR for no reason. Changes: - Delete .github/workflows/ci-integration-pr-stub.yml - Reduce EXPECTED_CHECKS in merge-gate.yml to just 'Build & Test (Linux)' (the only PR-time check we still emit) - Update merge-gate.yml + ci-integration.yml header comments - Update cicd.instructions.md (drop DEPRECATED entry, renumber to 5 workflows) - Drop stale CODEOWNERS reference to the deleted file - CHANGELOG entry under [Unreleased] > Removed Stacked on #874 (docs). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .github/CODEOWNERS | 1 - .github/instructions/cicd.instructions.md | 13 ++-- .github/workflows/ci-integration-pr-stub.yml | 74 -------------------- .github/workflows/ci-integration.yml | 12 ++-- .github/workflows/merge-gate.yml | 23 +++--- CHANGELOG.md | 4 ++ 6 files changed, 25 insertions(+), 102 deletions(-) delete mode 100644 .github/workflows/ci-integration-pr-stub.yml diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 9090c321e..2f2ca7c4e 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -5,5 +5,4 @@ README.md @danielmeppiel # Workflow files - Lead Maintainer review required for any change to CI # definitions, especially anything touching `pull_request_target` or secrets. -# See .github/workflows/ci-integration-pr-stub.yml for the security rationale. .github/workflows/** @danielmeppiel diff --git a/.github/instructions/cicd.instructions.md b/.github/instructions/cicd.instructions.md index 1c2c7681f..49e396282 100644 --- a/.github/instructions/cicd.instructions.md +++ b/.github/instructions/cicd.instructions.md @@ -6,7 +6,7 @@ description: "CI/CD Pipeline configuration for PyInstaller binary packaging and # CI/CD Pipeline Instructions ## Workflow Architecture (Tiered + Merge Queue) -Four workflows split by trigger and tier. PRs get fast feedback; the heavy +Five workflows split by trigger and tier. PRs get fast feedback; the heavy integration suite runs only at merge time via GitHub Merge Queue (microsoft/apm#770). @@ -38,19 +38,14 @@ integration suite runs only at merge time via GitHub Merge Queue never a ruleset edit. Tide / bors single-authority pattern. - Recovery if the `pull_request` webhook is dropped: empty commit, `gh workflow run merge-gate.yml -f pr_number=NNN`, or close+reopen. -4. **`ci-integration-pr-stub.yml`** - legacy PR-time stubs (DEPRECATED) - - Holdover from the pre-merge-gate model where branch protection - required all four Tier 2 names directly. Now redundant - kept only - to avoid disturbing in-flight PRs. Slated for deletion. - `.github/CODEOWNERS` requires Lead Maintainer review for any change - to `.github/workflows/**` to prevent inadvertent additions of secrets, - checkout, or PR-data interpolation. -5. **`build-release.yml`** - `push` to main, tags, schedule, `workflow_dispatch` + to `.github/workflows/**`. +4. **`build-release.yml`** - `push` to main, tags, schedule, `workflow_dispatch` - **Linux + Windows** run combined `build-and-test` (unit tests + binary build in one job). - **macOS Intel** uses `build-and-validate-macos-intel` (root node, runs own unit tests - no dependency on `build-and-test`). Builds the binary on every push for early regression feedback; integration + release-validation phases conditional on tag/schedule/dispatch. - **macOS ARM** uses `build-and-validate-macos-arm` (root node, tag/schedule/dispatch only - ARM runners are extremely scarce with 2-4h+ queue waits). Only requested when the binary is actually needed for a release. - Secrets always available. Full 5-platform binary output (linux x86_64/arm64, darwin x86_64/arm64, windows x86_64). -6. **`ci-runtime.yml`** - nightly schedule, manual dispatch, path-filtered push +5. **`ci-runtime.yml`** - nightly schedule, manual dispatch, path-filtered push - **Linux x86_64 only**. Live inference smoke tests (`apm run`) isolated from release pipeline. - Uses `GH_MODELS_PAT` for GitHub Models API access. - Failures do not block releases - annotated as warnings. diff --git a/.github/workflows/ci-integration-pr-stub.yml b/.github/workflows/ci-integration-pr-stub.yml deleted file mode 100644 index 8c22e33d9..000000000 --- a/.github/workflows/ci-integration-pr-stub.yml +++ /dev/null @@ -1,74 +0,0 @@ -# Inert PR-time stub for Tier 2 required checks (microsoft/apm#770). -# -# Why this file exists: -# The merge-queue ruleset requires four Tier 2 checks (Build (Linux), -# Smoke Test (Linux), Integration Tests (Linux), Release Validation -# (Linux)). Those checks are produced for real by `ci-integration.yml`, -# which only triggers on `merge_group` events. Without this stub, the -# required checks would never report on PR head SHAs and every PR would -# be permanently BLOCKED in branch protection. -# -# This workflow uses `pull_request_target` so the YAML is read from the -# base branch (`main`) regardless of the PR head's contents. That means: -# - the trigger applies retroactively to existing fork PRs -# - a fork cannot bypass or modify the no-op behaviour by editing this -# file in their PR head -# -# Security model (READ BEFORE EDITING): -# This file is allowed to use `pull_request_target` ONLY because it is -# inert: no secrets, no checkout, no PR data interpolation into `run:`, -# no GITHUB_TOKEN write permissions. The classic `pull_request_target` -# exploit (checkout fork code, then run it with secrets) is impossible -# here by construction. -# -# DO NOT add any of the following without a thorough security review: -# - actions/checkout (especially with `ref: ${{ github.event.pull_request.head.sha }}`) -# - any `run:` step that interpolates `${{ github.event.pull_request.* }}` -# - any `env:` block referencing `secrets.*` -# - any `permissions:` block elevating GITHUB_TOKEN beyond `{}` -# - any new job (each new check name must remain a no-op) -# -# If you need to add real secret-using or checkout work, do it in -# `ci-integration.yml` under the `merge_group` trigger - never here. -# -# Stale-run cleanup: -# The activity-type list intentionally includes labeled/unlabeled/edited -# so maintainers can re-trigger the stub without forcing contributors to -# push commits or close+reopen. -name: Integration Tests (PR Stub) - -on: - pull_request_target: - branches: [ main ] - types: [ opened, synchronize, reopened, ready_for_review, labeled, unlabeled, edited ] - -permissions: {} - -concurrency: - group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} - cancel-in-progress: true - -jobs: - build: - name: Build (Linux) - runs-on: ubuntu-24.04 - steps: - - run: echo "PR-time stub for required check 'Build (Linux)'. Real run happens in ci-integration.yml on merge_group events." - - smoke-test: - name: Smoke Test (Linux) - runs-on: ubuntu-24.04 - steps: - - run: echo "PR-time stub for required check 'Smoke Test (Linux)'. Real run happens in ci-integration.yml on merge_group events." - - integration-tests: - name: Integration Tests (Linux) - runs-on: ubuntu-24.04 - steps: - - run: echo "PR-time stub for required check 'Integration Tests (Linux)'. Real run happens in ci-integration.yml on merge_group events." - - release-validation: - name: Release Validation (Linux) - runs-on: ubuntu-24.04 - steps: - - run: echo "PR-time stub for required check 'Release Validation (Linux)'. Real run happens in ci-integration.yml on merge_group events." diff --git a/.github/workflows/ci-integration.yml b/.github/workflows/ci-integration.yml index 66217e15f..78fc55a64 100644 --- a/.github/workflows/ci-integration.yml +++ b/.github/workflows/ci-integration.yml @@ -6,12 +6,12 @@ # - Tier 2 (this workflow) runs smoke + integration + release-validation # against the tentative merge commit that the merge queue creates. # -# Required-check satisfaction on PR commits: -# - The four required check names produced here (Build (Linux), Smoke Test -# (Linux), Integration Tests (Linux), Release Validation (Linux)) are also -# produced by `ci-integration-pr-stub.yml` on `pull_request_target` events -# so branch protection's required-check gate is satisfied at PR time -# without burning runner minutes. +# Required-check satisfaction at PR time: +# - Branch protection requires only `gate` (from `merge-gate.yml`), which +# aggregates all PR-time signals. The four check names produced by THIS +# workflow (Build/Smoke/Integration/Release Validation - all Linux) are +# not required at PR time; they only run on the `gh-readonly-queue/main/*` +# SHA the merge queue creates, against the tentative merge commit. # - This workflow intentionally does NOT trigger on pull_request events. # Doing so would let a fork PR check out and run code with the secrets # declared below, which is a supply-chain attack vector. Keep this file diff --git a/.github/workflows/merge-gate.yml b/.github/workflows/merge-gate.yml index a85d711db..9861a6524 100644 --- a/.github/workflows/merge-gate.yml +++ b/.github/workflows/merge-gate.yml @@ -4,13 +4,11 @@ # # Why this file exists: # GitHub's required-status-checks model is name-based, not workflow-based. -# Our CI is tiered: ci.yml emits 'Build & Test (Linux)' and -# ci-integration-pr-stub.yml emits four stubs that hold positions for -# merge-queue jobs in ci-integration.yml. Without this gate, branch -# protection had to require all 5 checks individually -- adding or -# renaming a check meant a ruleset edit. With this gate, branch -# protection requires only 'Merge Gate / gate' and the gate aggregates. -# Tide / bors pattern. +# Without this gate, branch protection had to require each PR-time check +# individually -- adding or renaming a check meant a ruleset edit. With +# this gate, branch protection requires only 'gate' (the check-run name +# of the job below) and the gate aggregates whatever underlying checks +# we declare in EXPECTED_CHECKS. Tide / bors single-authority pattern. # # Why a single trigger (not dual pull_request + pull_request_target): # We tried dual-trigger redundancy in PR #865 to harden against rare @@ -91,11 +89,12 @@ jobs: REPO: ${{ github.repository }} SHA: ${{ steps.sha.outputs.sha }} # All PR-time checks the gate aggregates. Keep this in sync with - # the underlying workflows: ci.yml emits Build & Test (Linux), - # ci-integration-pr-stub.yml emits the other four. - # NOTE: 'Merge Gate / gate' itself MUST NOT appear here -- it - # would deadlock waiting for itself. - EXPECTED_CHECKS: 'Build & Test (Linux),Build (Linux),Smoke Test (Linux),Integration Tests (Linux),Release Validation (Linux)' + # the underlying workflows. Currently only ci.yml emits PR-time + # checks ('Build & Test (Linux)'); ci-integration.yml is + # merge_group-only and is NOT polled here. + # NOTE: 'gate' (this job) MUST NOT appear here -- it would + # deadlock waiting for itself. + EXPECTED_CHECKS: 'Build & Test (Linux)' TIMEOUT_MIN: '30' POLL_SEC: '30' run: | diff --git a/CHANGELOG.md b/CHANGELOG.md index 32e4f3fb7..30d8ef985 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,6 +12,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - CI docs: clarify that branch-protection ruleset must store the check-run name (`gate`), not the workflow display string (`Merge Gate / gate`); document the merge-gate aggregator in `cicd.instructions.md` and mark the legacy stub workflow as deprecated. +### Removed + +- CI: deleted `ci-integration-pr-stub.yml`. The four stubs were a holdover from the pre-merge-gate model where branch protection required each Tier 2 check name directly. After #867, branch protection requires only `gate`, so the stubs are dead weight. Reduced `EXPECTED_CHECKS` in `merge-gate.yml` to just `Build & Test (Linux)`. + ## [0.9.2] - 2026-04-23 ### Added