Skip to content

ci: dogfood apm audit --ci and integration-drift gate (closes #883)#885

Merged
danielmeppiel merged 2 commits intomainfrom
ci/dogfood-self-check
Apr 23, 2026
Merged

ci: dogfood apm audit --ci and integration-drift gate (closes #883)#885
danielmeppiel merged 2 commits intomainfrom
ci/dogfood-self-check

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

ci: dogfood apm audit --ci and integration-drift gate

TL;DR

microsoft/apm ships two CI gates as features and documents them publicly, but never ran them on its own pipeline — the embarrassment that PRs #874, #875, and #878 made unavoidable. This PR adds an APM Self-Check job to ci.yml that runs both via microsoft/apm-action@v1, wires it into merge-gate.yml's EXPECTED_CHECKS, and bundles the precursor regeneration the gate would otherwise have flagged on first run. Closes #883.

Note

The bundled regen is exactly the drift inventory #883 called out: .github/agents/auth-expert.agent.md and .github/skills/auth/SKILL.md, both stale vs .apm/ since #856's ADO bearer-token work. The third drift item (pr-description-skill) is owned by PR #884 and intentionally untouched here.

Problem (WHY)

Why these matter: shipping a CI gate we don't run ourselves violates "agents pattern-match well against concrete structures" — when the canonical reference repo doesn't match what we tell users to do, the recommendation stops being credible. And accepting silent producer-side drift violates "Grounding outputs in deterministic tool execution transforms probabilistic generation into verifiable action."apm install is the deterministic step; bypassing it makes the regenerated tree probabilistic relative to source.

Approach (WHAT)

# Fix Principle Source
1 Single PR-time job runs both gates via microsoft/apm-action@v1 (no secrets needed → fits Tier 1). "Favor small, chainable primitives over monolithic frameworks." PROSE
2 Bundle the precursor .github/ regen into this PR rather than ship a separate sequencing PR first. "Add what the agent lacks, omit what it knows" — a separate PR adds nothing the bundled commit doesn't already prove. Agent Skills
3 Add APM Self-Check to merge-gate.yml's EXPECTED_CHECKS so the single-authority gate aggregator waits on it. Reuses the existing aggregator pattern documented in the file header. merge-gate.yml:7-11
4 Add a "We dogfood this" callout in integrations/ci-cd.md pointing at our own ci.yml. Dogfooding becomes evidence, not just intent. issue #883 acceptance criteria

Implementation (HOW)

  • .github/workflows/ci.yml — new top-level apm-self-check job mirroring the issue's spec verbatim. Uses microsoft/apm-action@v1 (auto-detects target from existing .github/), then runs apm audit --ci (Gate A) and a git status --porcelain -- .github/ .claude/ .cursor/ .opencode/ check (Gate B). Permissions scoped to contents: read. Comments preserved from the issue spec to keep the rationale on-disk for future readers.
  • .github/workflows/merge-gate.yml — append APM Self-Check to the comma-separated EXPECTED_CHECKS env var (now 'Build & Test (Linux),APM Self-Check'); update the inline comment to list both checks. Branch protection requires only gate, so no ruleset edit is needed.
  • .github/agents/auth-expert.agent.md, .github/skills/auth/SKILL.md — pure regenerated output of apm install against current canonical .apm/. Not hand-edited. Diff is exactly the ADO bearer-token content from feat(auth): Azure DevOps authentication via Entra ID (AAD) bearer tokens #856 propagating into the deployed tree.
  • docs/src/content/docs/integrations/ci-cd.md — three-line :::tip[We dogfood this] callout right after the existing "Verify Deployed Primitives" snippet, pointing at microsoft/apm's own ci.yml.
  • CHANGELOG.md — single Unreleased entry under ### Added summarising the gate + bundled regen and citing ci: dogfood apm audit --ci and integration-drift checks in microsoft/apm pipeline #883.

Diagrams

Legend: the dogfooding loop — canonical authoring lives in .apm/, apm install regenerates the integration tree, and the self-check job re-runs install in CI to detect any hand-edit that bypassed the canonical source.

flowchart LR
    A[".apm/ canonical source"] -->|"author edits here"| B["apm install (local)"]
    B -->|"regenerates"| C[".github/ integration tree"]
    C -->|"committed to PR"| D["PR opened"]
    D --> E["APM Self-Check (CI)"]
    E -->|"microsoft/apm-action@v1<br/>re-runs apm install"| F{"git status --porcelain<br/>.github/ .claude/ .cursor/ .opencode/"}
    F -->|"clean"| G["Gate B passes"]
    F -->|"dirty"| H["Gate B fails:<br/>'Run apm install and commit'"]
    E --> I["apm audit --ci"]
    I -->|"6/6 baseline checks"| J["Gate A passes"]
    I -->|"any check fails"| K["Gate A fails"]
Loading

Legend: producer-side vs consumer-side gates catch different failure modes; both are needed because they have disjoint blast radii.

flowchart TB
    subgraph CONSUMER["Gate A — apm audit --ci (consumer-side)"]
        A1["lockfile-exists"]
        A2["ref-consistency"]
        A3["deployed-files-present"]
        A4["no-orphaned-packages"]
        A5["config-consistency"]
        A6["content-integrity (Unicode)"]
    end
    subgraph PRODUCER["Gate B — regeneration drift (producer-side)"]
        B1["Run apm install"]
        B2["git status --porcelain<br/>on integration dirs"]
        B1 --> B2
    end
    X1["Edited apm.yml<br/>without re-install"] --> A2
    X2["Deleted a deployed file"] --> A3
    X3["Hidden Unicode in package"] --> A6
    Y1["Hand-edit to .github/<br/>without updating .apm/"] --> B2
    Y2["Stale .github/ from older<br/>upstream update"] --> B2
Loading

Trade-offs

  • Bundled precursor regen vs separate precursor PR. Issue ci: dogfood apm audit --ci and integration-drift checks in microsoft/apm pipeline #883 originally proposed shipping the regen as a precursor PR, then wiring the gate in a follow-up. Chose to bundle: a separate PR would add a merge cycle, leave the gate self-DOSing in the gap (CI on the precursor PR runs against pre-gate main so the gate isn't enforced; opening the gate-wiring PR right after is the first time it fires, and it has nothing to assert against beyond what we already manually verified). Bundling keeps the change atomic and verifiable from a single diff. Risk: the regen diff is reviewed alongside CI plumbing instead of in isolation — mitigated by the diff being two files, both pure mechanical regen output.
  • microsoft/apm-action@v1 unpinned vs SHA-pinned. Chose the floating major. Per the issue spec: "apm-version is intentionally unpinned. If a future change requires a specific CLI behavior, pin then; for now we want to catch breakage on latest stable as part of release readiness.". Trade: a malicious tag retag on microsoft/apm-action would execute in CI; mitigated because the action is in our own org and Tier 1 has no secrets to exfiltrate.
  • Job name APM Self-Check (with space) vs apm-self-check (kebab). Chose the spaced display name to match the existing convention in EXPECTED_CHECKS ('Build & Test (Linux)' already uses spaces and capitals). Branch protection matches on the rendered check-run name, not the job key, so this is consistent with the documented pattern in merge-gate.yml's header comment. The kebab form survives as the YAML job key (apm-self-check).
  • Tier 1 (PR-time, no secrets) vs Tier 2 (merge-queue only). Both gates are pure-public — they need no GH_CLI_PAT, ADO_APM_PAT, or GH_MODELS_PAT — so Tier 1 is the right home. Catching producer-side drift at PR-time is also where the signal is highest; surfacing it only in the merge queue would let drift accumulate on open PRs.

Benefits

  1. Three known prior incidents would have been caught at PR-time. PRs docs(ci): improve Branch Protection section per review feedback #874, chore(ci): remove deprecated PR-time stub workflow #875, and the initial commit of ci(build-release): gate smoke to tag/schedule/dispatch only #878 all desynced .apm/ and .github/ for the same file; Gate B fails any of them on first push.
  2. apm audit --ci runs on every PR, exercising the documented 6/6 baseline (lockfile-exists, ref-consistency, deployed-files-present, no-orphaned-packages, config-consistency, content-integrity).
  3. Reference implementation users can copy verbatim. The dogfood callout makes our own ci.yml the canonical example for microsoft/apm-action@v1 consumers — no separate template to maintain.
  4. Atomic commit: the precursor drift fix and the gate that would have caught it ship together; reviewer can verify "yes this was the right fix" by reading both changes in one pass.
  5. No new secrets, no new runners. Tier 1 ubuntu-24.04, no permissions: escalation beyond contents: read.

Validation

apm audit --ci (run locally on the branch — this is the gate validating itself):

                                [>] APM Policy Compliance
+----------+------------------------+---------------------------------------------------+
| Status   | Check                  | Message                                           |
+----------+------------------------+---------------------------------------------------+
| [+]      | lockfile-exists        | No dependencies declared -- lockfile not required |
| [+]      | ref-consistency        | All dependency refs match lockfile                |
| [+]      | deployed-files-present | All deployed files present on disk                |
| [+]      | no-orphaned-packages   | No orphaned packages in lockfile                  |
| [+]      | config-consistency     | No MCP configs to check                           |
| [+]      | content-integrity      | No critical hidden Unicode characters detected    |
+----------+------------------------+---------------------------------------------------+

[*] All 6 check(s) passed

apm install (verifies Gate B is clean post-regen):

[>] Installing dependencies from apm.yml...
  [+] <project root> (local)
  |-- 8 instruction(s) integrated -> .github/instructions/
  |-- 10 agents integrated -> .github/agents/
  |-- 8 skill(s) integrated -> .github/skills/
[*] Installed 1 APM dependency.

After install: git status --porcelain -- .github/ .claude/ .cursor/ .opencode/ returns empty (gate-green).

Full diff stat against origin/main
.github/agents/auth-expert.agent.md         |  1 +
.github/skills/auth/SKILL.md                | 31 +++++++++++++++++++++++++++++++
.github/workflows/ci.yml                    | 42 ++++++++++++++++++++++++++++++++++++++++++
.github/workflows/merge-gate.yml            |  6 +++---
docs/src/content/docs/integrations/ci-cd.md |  4 ++++
CHANGELOG.md                                |  6 +++++-
6 files changed, 89 insertions(+), 4 deletions(-)

How to test

  • Open this PR and observe APM Self-Check reports green alongside Build & Test (Linux).
  • Confirm merge-gate.yml's gate job waits on APM Self-Check (poll log line [merge-gate] mentions both checks).
  • Negative test (Gate B): push a commit that edits only .github/instructions/cicd.instructions.md without touching .apm/. APM Self-Check should fail with the diff visible in the log and the message APM integration files are out of date.
  • Negative test (Gate A): bump a version in apm.yml without running apm install. APM Self-Check should fail with a ref-consistency violation from apm audit --ci.
  • Render docs/src/content/docs/integrations/ci-cd.md locally; the new :::tip[We dogfood this] callout appears under "Verify Deployed Primitives" linking to microsoft/apm's ci.yml.

References

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Adds an APM Self-Check job to ci.yml that runs both CI gates we ship
to users via microsoft/apm-action@v1:

  - Gate A (consumer-side): apm audit --ci -- 6 baseline lockfile /
    install fidelity checks.
  - Gate B (producer-side): regeneration drift -- fails if anyone
    edited a regenerated file under .github/ without updating the
    canonical .apm/ source.

Wires APM Self-Check into merge-gate.yml's EXPECTED_CHECKS so the
single-authority gate aggregator waits on it before merge.

Includes the precursor regeneration that #883 called out:
.github/agents/auth-expert.agent.md and .github/skills/auth/SKILL.md
were stale vs canonical .apm/ since #856. Bundling the regen into
this PR makes the gate green on first run; a separate precursor PR
would have shipped a self-DOSing gate for one merge cycle.

Adds a 'We dogfood this' callout in integrations/ci-cd.md pointing
at our own ci.yml as the reference implementation of the documented
'Verify deployed primitives' pattern.

Closes #883.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 23, 2026 22:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds PR-time dogfooding for APM’s published CI gates by introducing an APM Self-Check job (via microsoft/apm-action@v1) and making merge-gate.yml wait on it, plus regenerates a couple of .github/ auth primitives and updates docs/changelog to reflect the new gate.

Changes:

  • Add APM Self-Check job to .github/workflows/ci.yml to run apm audit --ci plus an integration-drift git status --porcelain gate.
  • Update .github/workflows/merge-gate.yml to include APM Self-Check in EXPECTED_CHECKS.
  • Update docs + regenerated auth agent/skill content, and add a changelog entry for the new CI gate.
Show a summary per file
File Description
.github/workflows/ci.yml Adds APM Self-Check job that runs apm audit --ci and a drift check after apm install via microsoft/apm-action@v1.
.github/workflows/merge-gate.yml Extends EXPECTED_CHECKS so the gate aggregator waits for the new job.
docs/src/content/docs/integrations/ci-cd.md Adds a “We dogfood this” callout pointing to the repo’s own CI job.
CHANGELOG.md Adds an Unreleased entry describing the new CI self-check gate.
.github/skills/auth/SKILL.md Regenerated auth skill content describing ADO bearer token auth (and related diagnostics).
.github/agents/auth-expert.agent.md Regenerated auth agent content noting ADO AAD bearer-token support and precedence.

Copilot's findings

  • Files reviewed: 6/6 changed files
  • Comments generated: 4

Comment thread docs/src/content/docs/integrations/ci-cd.md Outdated
Comment thread .github/skills/auth/SKILL.md Outdated
Comment thread CHANGELOG.md Outdated
Comment thread .github/agents/auth-expert.agent.md Outdated
Four issues raised by Copilot reviewer; the auth-primitive fixes are
applied to canonical .apm/ sources so they survive future
'apm install --target copilot' regeneration cycles (the previous
deployed-only fixes would have been clobbered by the new
APM Self-Check drift gate this PR introduces).

- .apm/agents/auth-expert.agent.md: replace em dashes ('--') and
  arrows ('->') with ASCII; .github/ mirror regenerated.
- .apm/skills/auth/SKILL.md: correct the token_manager.py constant
  reference. ADO_APM_PAT is an env var name (string), not a module
  constant; only ADO_BEARER_SOURCE = 'AAD_BEARER_AZ_CLI' is defined
  on GitHubTokenManager.
- CHANGELOG.md: compress to single bullet ending with (#885) per
  Keep-a-Changelog convention; issue ref Closes #883 stays in PR body.
- docs/integrations/ci-cd.md: reword 'we dogfood this' callout to
  reference-implementation framing; previous wording overclaimed
  ('on every PR', 'shown above') given paths-ignore: ['docs/**']
  excludes docs-only PRs and the page only shows the drift snippet.

apm audit --ci passes 6/6 locally.
.apm/ <-> .github/ mirror parity verified via diff -q.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel merged commit 8665f4b into main Apr 23, 2026
33 checks passed
@danielmeppiel danielmeppiel deleted the ci/dogfood-self-check branch April 23, 2026 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ci: dogfood apm audit --ci and integration-drift checks in microsoft/apm pipeline

2 participants