ci: dogfood apm audit --ci and integration-drift gate (closes #883) by danielmeppiel · Pull Request #885 · microsoft/apm

danielmeppiel · 2026-04-23T22:10:58Z

ci: dogfood `apm audit --ci` and integration-drift gate

TL;DR

microsoft/apm ships two CI gates as features and documents them publicly, but never ran them on its own pipeline — the embarrassment that PRs #874, #875, and #878 made unavoidable. This PR adds an APM Self-Check job to ci.yml that runs both via microsoft/apm-action@v1, wires it into merge-gate.yml's EXPECTED_CHECKS, and bundles the precursor regeneration the gate would otherwise have flagged on first run. Closes #883.

Note

The bundled regen is exactly the drift inventory #883 called out: .github/agents/auth-expert.agent.md and .github/skills/auth/SKILL.md, both stale vs .apm/ since #856's ADO bearer-token work. The third drift item (pr-description-skill) is owned by PR #884 and intentionally untouched here.

Problem (WHY)

We document apm audit --ci and the integration-drift recipe at integrations/ci-cd.md "Verify Deployed Primitives" and run neither on this repo. Self-inflicted credibility hole for a project whose value prop is "treat governance as code".
The producer-side drift mode caught us three times in a row: PRs docs(ci): improve Branch Protection section per review feedback #874, chore(ci): remove deprecated PR-time stub workflow #875, and the initial commit of ci(build-release): gate smoke to tag/schedule/dispatch only #878 each silently desynced .apm/instructions/cicd.instructions.md from .github/instructions/cicd.instructions.md because the maintainer edited the regenerated output. The next apm install would have overwritten the new content with the stale source.
[!] Live drift on main today: .github/agents/auth-expert.agent.md and .github/skills/auth/SKILL.md are stale vs canonical .apm/ (provenance: feat(auth): Azure DevOps authentication via Entra ID (AAD) bearer tokens #856 updated .apm/ only). Any honest first run of the gate fails on these — which is precisely why the issue called for a precursor PR.

Why these matter: shipping a CI gate we don't run ourselves violates "agents pattern-match well against concrete structures" — when the canonical reference repo doesn't match what we tell users to do, the recommendation stops being credible. And accepting silent producer-side drift violates "Grounding outputs in deterministic tool execution transforms probabilistic generation into verifiable action." — apm install is the deterministic step; bypassing it makes the regenerated tree probabilistic relative to source.

Approach (WHAT)

#	Fix	Principle	Source
1	Single PR-time job runs both gates via `microsoft/apm-action@v1` (no secrets needed → fits Tier 1).	"Favor small, chainable primitives over monolithic frameworks."	PROSE
2	Bundle the precursor `.github/` regen into this PR rather than ship a separate sequencing PR first.	"Add what the agent lacks, omit what it knows" — a separate PR adds nothing the bundled commit doesn't already prove.	Agent Skills
3	Add `APM Self-Check` to `merge-gate.yml`'s `EXPECTED_CHECKS` so the single-authority gate aggregator waits on it.	Reuses the existing aggregator pattern documented in the file header.	`merge-gate.yml:7-11`
4	Add a "We dogfood this" callout in `integrations/ci-cd.md` pointing at our own `ci.yml`.	Dogfooding becomes evidence, not just intent.	issue #883 acceptance criteria

Implementation (HOW)

.github/workflows/ci.yml — new top-level apm-self-check job mirroring the issue's spec verbatim. Uses microsoft/apm-action@v1 (auto-detects target from existing .github/), then runs apm audit --ci (Gate A) and a git status --porcelain -- .github/ .claude/ .cursor/ .opencode/ check (Gate B). Permissions scoped to contents: read. Comments preserved from the issue spec to keep the rationale on-disk for future readers.
.github/workflows/merge-gate.yml — append APM Self-Check to the comma-separated EXPECTED_CHECKS env var (now 'Build & Test (Linux),APM Self-Check'); update the inline comment to list both checks. Branch protection requires only gate, so no ruleset edit is needed.
.github/agents/auth-expert.agent.md, .github/skills/auth/SKILL.md — pure regenerated output of apm install against current canonical .apm/. Not hand-edited. Diff is exactly the ADO bearer-token content from feat(auth): Azure DevOps authentication via Entra ID (AAD) bearer tokens #856 propagating into the deployed tree.
docs/src/content/docs/integrations/ci-cd.md — three-line :::tip[We dogfood this] callout right after the existing "Verify Deployed Primitives" snippet, pointing at microsoft/apm's own ci.yml.
CHANGELOG.md — single Unreleased entry under ### Added summarising the gate + bundled regen and citing ci: dogfood apm audit --ci and integration-drift checks in microsoft/apm pipeline #883.

Diagrams

Legend: the dogfooding loop — canonical authoring lives in .apm/, apm install regenerates the integration tree, and the self-check job re-runs install in CI to detect any hand-edit that bypassed the canonical source.

flowchart LR
    A[".apm/ canonical source"] -->|"author edits here"| B["apm install (local)"]
    B -->|"regenerates"| C[".github/ integration tree"]
    C -->|"committed to PR"| D["PR opened"]
    D --> E["APM Self-Check (CI)"]
    E -->|"microsoft/apm-action@v1<br/>re-runs apm install"| F{"git status --porcelain<br/>.github/ .claude/ .cursor/ .opencode/"}
    F -->|"clean"| G["Gate B passes"]
    F -->|"dirty"| H["Gate B fails:<br/>'Run apm install and commit'"]
    E --> I["apm audit --ci"]
    I -->|"6/6 baseline checks"| J["Gate A passes"]
    I -->|"any check fails"| K["Gate A fails"]

Legend: producer-side vs consumer-side gates catch different failure modes; both are needed because they have disjoint blast radii.

flowchart TB
    subgraph CONSUMER["Gate A — apm audit --ci (consumer-side)"]
        A1["lockfile-exists"]
        A2["ref-consistency"]
        A3["deployed-files-present"]
        A4["no-orphaned-packages"]
        A5["config-consistency"]
        A6["content-integrity (Unicode)"]
    end
    subgraph PRODUCER["Gate B — regeneration drift (producer-side)"]
        B1["Run apm install"]
        B2["git status --porcelain<br/>on integration dirs"]
        B1 --> B2
    end
    X1["Edited apm.yml<br/>without re-install"] --> A2
    X2["Deleted a deployed file"] --> A3
    X3["Hidden Unicode in package"] --> A6
    Y1["Hand-edit to .github/<br/>without updating .apm/"] --> B2
    Y2["Stale .github/ from older<br/>upstream update"] --> B2

Trade-offs

Bundled precursor regen vs separate precursor PR. Issue ci: dogfood apm audit --ci and integration-drift checks in microsoft/apm pipeline #883 originally proposed shipping the regen as a precursor PR, then wiring the gate in a follow-up. Chose to bundle: a separate PR would add a merge cycle, leave the gate self-DOSing in the gap (CI on the precursor PR runs against pre-gate main so the gate isn't enforced; opening the gate-wiring PR right after is the first time it fires, and it has nothing to assert against beyond what we already manually verified). Bundling keeps the change atomic and verifiable from a single diff. Risk: the regen diff is reviewed alongside CI plumbing instead of in isolation — mitigated by the diff being two files, both pure mechanical regen output.
microsoft/apm-action@v1 unpinned vs SHA-pinned. Chose the floating major. Per the issue spec: "apm-version is intentionally unpinned. If a future change requires a specific CLI behavior, pin then; for now we want to catch breakage on latest stable as part of release readiness.". Trade: a malicious tag retag on microsoft/apm-action would execute in CI; mitigated because the action is in our own org and Tier 1 has no secrets to exfiltrate.
Job name APM Self-Check (with space) vs apm-self-check (kebab). Chose the spaced display name to match the existing convention in EXPECTED_CHECKS ('Build & Test (Linux)' already uses spaces and capitals). Branch protection matches on the rendered check-run name, not the job key, so this is consistent with the documented pattern in merge-gate.yml's header comment. The kebab form survives as the YAML job key (apm-self-check).
Tier 1 (PR-time, no secrets) vs Tier 2 (merge-queue only). Both gates are pure-public — they need no GH_CLI_PAT, ADO_APM_PAT, or GH_MODELS_PAT — so Tier 1 is the right home. Catching producer-side drift at PR-time is also where the signal is highest; surfacing it only in the merge queue would let drift accumulate on open PRs.

Benefits

Three known prior incidents would have been caught at PR-time. PRs docs(ci): improve Branch Protection section per review feedback #874, chore(ci): remove deprecated PR-time stub workflow #875, and the initial commit of ci(build-release): gate smoke to tag/schedule/dispatch only #878 all desynced .apm/ and .github/ for the same file; Gate B fails any of them on first push.
apm audit --ci runs on every PR, exercising the documented 6/6 baseline (lockfile-exists, ref-consistency, deployed-files-present, no-orphaned-packages, config-consistency, content-integrity).
Reference implementation users can copy verbatim. The dogfood callout makes our own ci.yml the canonical example for microsoft/apm-action@v1 consumers — no separate template to maintain.
Atomic commit: the precursor drift fix and the gate that would have caught it ship together; reviewer can verify "yes this was the right fix" by reading both changes in one pass.
No new secrets, no new runners. Tier 1 ubuntu-24.04, no permissions: escalation beyond contents: read.

Validation

apm audit --ci (run locally on the branch — this is the gate validating itself):

                                [>] APM Policy Compliance
+----------+------------------------+---------------------------------------------------+
| Status   | Check                  | Message                                           |
+----------+------------------------+---------------------------------------------------+
| [+]      | lockfile-exists        | No dependencies declared -- lockfile not required |
| [+]      | ref-consistency        | All dependency refs match lockfile                |
| [+]      | deployed-files-present | All deployed files present on disk                |
| [+]      | no-orphaned-packages   | No orphaned packages in lockfile                  |
| [+]      | config-consistency     | No MCP configs to check                           |
| [+]      | content-integrity      | No critical hidden Unicode characters detected    |
+----------+------------------------+---------------------------------------------------+

[*] All 6 check(s) passed

apm install (verifies Gate B is clean post-regen):

[>] Installing dependencies from apm.yml...
  [+] <project root> (local)
  |-- 8 instruction(s) integrated -> .github/instructions/
  |-- 10 agents integrated -> .github/agents/
  |-- 8 skill(s) integrated -> .github/skills/
[*] Installed 1 APM dependency.

After install: git status --porcelain -- .github/ .claude/ .cursor/ .opencode/ returns empty (gate-green).

Full diff stat against origin/main

.github/agents/auth-expert.agent.md         |  1 +
.github/skills/auth/SKILL.md                | 31 +++++++++++++++++++++++++++++++
.github/workflows/ci.yml                    | 42 ++++++++++++++++++++++++++++++++++++++++++
.github/workflows/merge-gate.yml            |  6 +++---
docs/src/content/docs/integrations/ci-cd.md |  4 ++++
CHANGELOG.md                                |  6 +++++-
6 files changed, 89 insertions(+), 4 deletions(-)

How to test

Open this PR and observe APM Self-Check reports green alongside Build & Test (Linux).
Confirm merge-gate.yml's gate job waits on APM Self-Check (poll log line [merge-gate] mentions both checks).
Negative test (Gate B): push a commit that edits only .github/instructions/cicd.instructions.md without touching .apm/. APM Self-Check should fail with the diff visible in the log and the message APM integration files are out of date.
Negative test (Gate A): bump a version in apm.yml without running apm install. APM Self-Check should fail with a ref-consistency violation from apm audit --ci.
Render docs/src/content/docs/integrations/ci-cd.md locally; the new :::tip[We dogfood this] callout appears under "Verify Deployed Primitives" linking to microsoft/apm's ci.yml.

References

Closes ci: dogfood apm audit --ci and integration-drift checks in microsoft/apm pipeline #883
Drift incidents this gate would have caught at PR-time: docs(ci): improve Branch Protection section per review feedback #874, chore(ci): remove deprecated PR-time stub workflow #875, ci(build-release): gate smoke to tag/schedule/dispatch only #878
Provenance of the bundled regen: feat(auth): Azure DevOps authentication via Entra ID (AAD) bearer tokens #856 (ADO AAD bearer-token auth)
Documented "Verify Deployed Primitives" pattern: integrations/ci-cd.md
Related future work: audit --ci: verify deployed file content, not just existence #684 (per-deployed-file content drift; out of scope here)

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Adds an APM Self-Check job to ci.yml that runs both CI gates we ship to users via microsoft/apm-action@v1: - Gate A (consumer-side): apm audit --ci -- 6 baseline lockfile / install fidelity checks. - Gate B (producer-side): regeneration drift -- fails if anyone edited a regenerated file under .github/ without updating the canonical .apm/ source. Wires APM Self-Check into merge-gate.yml's EXPECTED_CHECKS so the single-authority gate aggregator waits on it before merge. Includes the precursor regeneration that #883 called out: .github/agents/auth-expert.agent.md and .github/skills/auth/SKILL.md were stale vs canonical .apm/ since #856. Bundling the regen into this PR makes the gate green on first run; a separate precursor PR would have shipped a self-DOSing gate for one merge cycle. Adds a 'We dogfood this' callout in integrations/ci-cd.md pointing at our own ci.yml as the reference implementation of the documented 'Verify deployed primitives' pattern. Closes #883. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds PR-time dogfooding for APM’s published CI gates by introducing an APM Self-Check job (via microsoft/apm-action@v1) and making merge-gate.yml wait on it, plus regenerates a couple of .github/ auth primitives and updates docs/changelog to reflect the new gate.

Changes:

Add APM Self-Check job to .github/workflows/ci.yml to run apm audit --ci plus an integration-drift git status --porcelain gate.
Update .github/workflows/merge-gate.yml to include APM Self-Check in EXPECTED_CHECKS.
Update docs + regenerated auth agent/skill content, and add a changelog entry for the new CI gate.

Show a summary per file

File	Description
`.github/workflows/ci.yml`	Adds `APM Self-Check` job that runs `apm audit --ci` and a drift check after `apm install` via `microsoft/apm-action@v1`.
`.github/workflows/merge-gate.yml`	Extends `EXPECTED_CHECKS` so the gate aggregator waits for the new job.
`docs/src/content/docs/integrations/ci-cd.md`	Adds a “We dogfood this” callout pointing to the repo’s own CI job.
`CHANGELOG.md`	Adds an Unreleased entry describing the new CI self-check gate.
`.github/skills/auth/SKILL.md`	Regenerated auth skill content describing ADO bearer token auth (and related diagnostics).
`.github/agents/auth-expert.agent.md`	Regenerated auth agent content noting ADO AAD bearer-token support and precedence.

Copilot's findings

Files reviewed: 6/6 changed files
Comments generated: 4

Four issues raised by Copilot reviewer; the auth-primitive fixes are applied to canonical .apm/ sources so they survive future 'apm install --target copilot' regeneration cycles (the previous deployed-only fixes would have been clobbered by the new APM Self-Check drift gate this PR introduces). - .apm/agents/auth-expert.agent.md: replace em dashes ('--') and arrows ('->') with ASCII; .github/ mirror regenerated. - .apm/skills/auth/SKILL.md: correct the token_manager.py constant reference. ADO_APM_PAT is an env var name (string), not a module constant; only ADO_BEARER_SOURCE = 'AAD_BEARER_AZ_CLI' is defined on GitHubTokenManager. - CHANGELOG.md: compress to single bullet ending with (#885) per Keep-a-Changelog convention; issue ref Closes #883 stays in PR body. - docs/integrations/ci-cd.md: reword 'we dogfood this' callout to reference-implementation framing; previous wording overclaimed ('on every PR', 'shown above') given paths-ignore: ['docs/**'] excludes docs-only PRs and the page only shows the drift snippet. apm audit --ci passes 6/6 locally. .apm/ <-> .github/ mirror parity verified via diff -q. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 23, 2026 22:10

danielmeppiel requested a review from sergio-sisternes-epam as a code owner April 23, 2026 22:10

Copilot started reviewing on behalf of danielmeppiel April 23, 2026 22:11 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread docs/src/content/docs/integrations/ci-cd.md Outdated

Comment thread .github/skills/auth/SKILL.md Outdated

Comment thread CHANGELOG.md Outdated

Comment thread .github/agents/auth-expert.agent.md Outdated

danielmeppiel merged commit 8665f4b into main Apr 23, 2026
33 checks passed

danielmeppiel deleted the ci/dogfood-self-check branch April 23, 2026 22:25

This was referenced Apr 23, 2026

Windows: _get_cache_dir flakes on tmpdir 8.3 short names (PathTraversalError) #886

Open

[aw] No-Op Runs #833

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: dogfood apm audit --ci and integration-drift gate (closes #883)#885

ci: dogfood apm audit --ci and integration-drift gate (closes #883)#885
danielmeppiel merged 2 commits intomainfrom
ci/dogfood-self-check

danielmeppiel commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielmeppiel commented Apr 23, 2026

ci: dogfood apm audit --ci and integration-drift gate

TL;DR

Problem (WHY)

Approach (WHAT)

Implementation (HOW)

Diagrams

Trade-offs

Benefits

Validation

How to test

References

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ci: dogfood `apm audit --ci` and integration-drift gate