diff --git a/.apm/agents/auth-expert.agent.md b/.apm/agents/auth-expert.agent.md index ed6c90456..cd3331860 100644 --- a/.apm/agents/auth-expert.agent.md +++ b/.apm/agents/auth-expert.agent.md @@ -23,8 +23,8 @@ If a code change contradicts the mermaid diagram, the diagram (and matching doc ## Core Knowledge -- **Token prefixes**: Fine-grained PATs (`github_pat_`), classic PATs (`ghp_`), OAuth user-to-server (`ghu_` — e.g. `gh auth login`), OAuth app (`gho_`), GitHub App install (`ghs_`), GitHub App refresh (`ghr_`) -- **EMU (Enterprise Managed Users)**: Use standard PAT prefixes (`ghp_`, `github_pat_`). There is NO special prefix for EMU — it's a property of the account, not the token. EMU tokens are enterprise-scoped and cannot access public github.com repos. EMU orgs can exist on github.com or *.ghe.com. +- **Token prefixes**: Fine-grained PATs (`github_pat_`), classic PATs (`ghp_`), OAuth user-to-server (`ghu_` -- e.g. `gh auth login`), OAuth app (`gho_`), GitHub App install (`ghs_`), GitHub App refresh (`ghr_`) +- **EMU (Enterprise Managed Users)**: Use standard PAT prefixes (`ghp_`, `github_pat_`). There is NO special prefix for EMU -- it's a property of the account, not the token. EMU tokens are enterprise-scoped and cannot access public github.com repos. EMU orgs can exist on github.com or *.ghe.com. - **Host classification**: github.com (public), *.ghe.com (no public repos), GHES (`GITHUB_HOST`), ADO - **Git credential helpers**: macOS Keychain, Windows Credential Manager, `gh auth`, `git credential fill` - **Rate limiting**: 60/hr unauthenticated, 5000/hr authenticated, primary (403) vs secondary (429) @@ -32,7 +32,7 @@ If a code change contradicts the mermaid diagram, the diagram (and matching doc ## APM Architecture - **AuthResolver** (`src/apm_cli/core/auth.py`): Single source of truth. Per-(host, org) resolution. Frozen `AuthContext` for thread safety. -- **Token precedence**: `GITHUB_APM_PAT_{ORG}` → `GITHUB_APM_PAT` → `GITHUB_TOKEN` → `GH_TOKEN` → `git credential fill` +- **Token precedence**: `GITHUB_APM_PAT_{ORG}` -> `GITHUB_APM_PAT` -> `GITHUB_TOKEN` -> `GH_TOKEN` -> `git credential fill` - **Fallback chains**: unauth-first for validation (save rate limits), auth-first for download - **GitHubTokenManager** (`src/apm_cli/core/token_manager.py`): Low-level token lookup, wrapped by AuthResolver @@ -40,18 +40,18 @@ If a code change contradicts the mermaid diagram, the diagram (and matching doc When reviewing or writing auth code: -1. **Every remote operation** must go through AuthResolver — no direct `os.getenv()` for tokens +1. **Every remote operation** must go through AuthResolver -- no direct `os.getenv()` for tokens 2. **Per-dep resolution**: Use `resolve_for_dep(dep_ref)`, never `self.github_token` instance vars 3. **Host awareness**: Global env vars are checked for all hosts (no host-gating). `try_with_fallback()` retries with `git credential fill` if the token is rejected. HTTPS is the transport security boundary. *.ghe.com and ADO always require auth (no unauthenticated fallback). -4. **Error messages**: Always use `build_error_context()` — never hardcode env var names +4. **Error messages**: Always use `build_error_context()` -- never hardcode env var names 5. **Thread safety**: AuthContext is resolved before `executor.submit()`, passed per-worker ## Common Pitfalls -- EMU PATs on public github.com repos → will fail silently (you cannot detect EMU from prefix) +- EMU PATs on public github.com repos -> will fail silently (you cannot detect EMU from prefix) - `git credential fill` only resolves per-host, not per-org - `_build_repo_url` must accept token param, not use instance var - Windows: `GIT_ASKPASS` must be `'echo'` not empty string -- Classic PATs (`ghp_`) work cross-org but are being deprecated — prefer fine-grained -- ADO uses Basic auth with base64-encoded `:PAT` — different from GitHub bearer token flow +- Classic PATs (`ghp_`) work cross-org but are being deprecated -- prefer fine-grained +- ADO uses Basic auth with base64-encoded `:PAT` -- different from GitHub bearer token flow - ADO also supports AAD bearer tokens via `az account get-access-token` (resource `499b84ac-1321-427f-aa17-267ca6975798`); precedence is `ADO_APM_PAT` -> az bearer -> fail. Stale PATs (401) silently fall back to the bearer with a `[!]` warning. See the auth skill for the four diagnostic cases. diff --git a/.apm/skills/auth/SKILL.md b/.apm/skills/auth/SKILL.md index 7fb2513f3..9ba894486 100644 --- a/.apm/skills/auth/SKILL.md +++ b/.apm/skills/auth/SKILL.md @@ -3,7 +3,7 @@ name: auth description: > Activate when code touches token management, credential resolution, git auth flows, GITHUB_APM_PAT, ADO_APM_PAT, AuthResolver, HostInfo, AuthContext, or - any remote host authentication — even if 'auth' isn't mentioned explicitly. + any remote host authentication -- even if 'auth' isn't mentioned explicitly. --- # Auth Skill @@ -35,7 +35,7 @@ ADO hosts (`dev.azure.com`, `*.visualstudio.com`) resolve auth in this order: 2. AAD bearer via `az account get-access-token --resource 499b84ac-1321-427f-aa17-267ca6975798` if `az` is installed and `az account show` succeeds 3. Otherwise: auth-failed error from `build_error_context` -Token source constants live in `src/apm_cli/core/token_manager.py`: `ADO_APM_PAT = "ADO_APM_PAT"`, `ADO_BEARER_SOURCE = "AAD_BEARER_AZ_CLI"`. +`ADO_APM_PAT` is the env var name used by the auth flow. The AAD bearer source constant lives in `src/apm_cli/core/token_manager.py` as `GitHubTokenManager.ADO_BEARER_SOURCE = "AAD_BEARER_AZ_CLI"`. **Stale-PAT silent fallback:** if `ADO_APM_PAT` is rejected with HTTP 401, APM retries with the az bearer and emits: diff --git a/.github/agents/auth-expert.agent.md b/.github/agents/auth-expert.agent.md index 3d0498439..cd3331860 100644 --- a/.github/agents/auth-expert.agent.md +++ b/.github/agents/auth-expert.agent.md @@ -23,8 +23,8 @@ If a code change contradicts the mermaid diagram, the diagram (and matching doc ## Core Knowledge -- **Token prefixes**: Fine-grained PATs (`github_pat_`), classic PATs (`ghp_`), OAuth user-to-server (`ghu_` — e.g. `gh auth login`), OAuth app (`gho_`), GitHub App install (`ghs_`), GitHub App refresh (`ghr_`) -- **EMU (Enterprise Managed Users)**: Use standard PAT prefixes (`ghp_`, `github_pat_`). There is NO special prefix for EMU — it's a property of the account, not the token. EMU tokens are enterprise-scoped and cannot access public github.com repos. EMU orgs can exist on github.com or *.ghe.com. +- **Token prefixes**: Fine-grained PATs (`github_pat_`), classic PATs (`ghp_`), OAuth user-to-server (`ghu_` -- e.g. `gh auth login`), OAuth app (`gho_`), GitHub App install (`ghs_`), GitHub App refresh (`ghr_`) +- **EMU (Enterprise Managed Users)**: Use standard PAT prefixes (`ghp_`, `github_pat_`). There is NO special prefix for EMU -- it's a property of the account, not the token. EMU tokens are enterprise-scoped and cannot access public github.com repos. EMU orgs can exist on github.com or *.ghe.com. - **Host classification**: github.com (public), *.ghe.com (no public repos), GHES (`GITHUB_HOST`), ADO - **Git credential helpers**: macOS Keychain, Windows Credential Manager, `gh auth`, `git credential fill` - **Rate limiting**: 60/hr unauthenticated, 5000/hr authenticated, primary (403) vs secondary (429) @@ -32,7 +32,7 @@ If a code change contradicts the mermaid diagram, the diagram (and matching doc ## APM Architecture - **AuthResolver** (`src/apm_cli/core/auth.py`): Single source of truth. Per-(host, org) resolution. Frozen `AuthContext` for thread safety. -- **Token precedence**: `GITHUB_APM_PAT_{ORG}` → `GITHUB_APM_PAT` → `GITHUB_TOKEN` → `GH_TOKEN` → `git credential fill` +- **Token precedence**: `GITHUB_APM_PAT_{ORG}` -> `GITHUB_APM_PAT` -> `GITHUB_TOKEN` -> `GH_TOKEN` -> `git credential fill` - **Fallback chains**: unauth-first for validation (save rate limits), auth-first for download - **GitHubTokenManager** (`src/apm_cli/core/token_manager.py`): Low-level token lookup, wrapped by AuthResolver @@ -40,17 +40,18 @@ If a code change contradicts the mermaid diagram, the diagram (and matching doc When reviewing or writing auth code: -1. **Every remote operation** must go through AuthResolver — no direct `os.getenv()` for tokens +1. **Every remote operation** must go through AuthResolver -- no direct `os.getenv()` for tokens 2. **Per-dep resolution**: Use `resolve_for_dep(dep_ref)`, never `self.github_token` instance vars 3. **Host awareness**: Global env vars are checked for all hosts (no host-gating). `try_with_fallback()` retries with `git credential fill` if the token is rejected. HTTPS is the transport security boundary. *.ghe.com and ADO always require auth (no unauthenticated fallback). -4. **Error messages**: Always use `build_error_context()` — never hardcode env var names +4. **Error messages**: Always use `build_error_context()` -- never hardcode env var names 5. **Thread safety**: AuthContext is resolved before `executor.submit()`, passed per-worker ## Common Pitfalls -- EMU PATs on public github.com repos → will fail silently (you cannot detect EMU from prefix) +- EMU PATs on public github.com repos -> will fail silently (you cannot detect EMU from prefix) - `git credential fill` only resolves per-host, not per-org - `_build_repo_url` must accept token param, not use instance var - Windows: `GIT_ASKPASS` must be `'echo'` not empty string -- Classic PATs (`ghp_`) work cross-org but are being deprecated — prefer fine-grained -- ADO uses Basic auth with base64-encoded `:PAT` — different from GitHub bearer token flow +- Classic PATs (`ghp_`) work cross-org but are being deprecated -- prefer fine-grained +- ADO uses Basic auth with base64-encoded `:PAT` -- different from GitHub bearer token flow +- ADO also supports AAD bearer tokens via `az account get-access-token` (resource `499b84ac-1321-427f-aa17-267ca6975798`); precedence is `ADO_APM_PAT` -> az bearer -> fail. Stale PATs (401) silently fall back to the bearer with a `[!]` warning. See the auth skill for the four diagnostic cases. diff --git a/.github/skills/auth/SKILL.md b/.github/skills/auth/SKILL.md index 28f67342a..9ba894486 100644 --- a/.github/skills/auth/SKILL.md +++ b/.github/skills/auth/SKILL.md @@ -3,7 +3,7 @@ name: auth description: > Activate when code touches token management, credential resolution, git auth flows, GITHUB_APM_PAT, ADO_APM_PAT, AuthResolver, HostInfo, AuthContext, or - any remote host authentication — even if 'auth' isn't mentioned explicitly. + any remote host authentication -- even if 'auth' isn't mentioned explicitly. --- # Auth Skill @@ -26,3 +26,34 @@ All auth flows MUST go through `AuthResolver`. No direct `os.getenv()` for token ## Canonical reference The full per-org -> global -> credential-fill -> fallback resolution flow is in [`docs/src/content/docs/getting-started/authentication.md`](../../../docs/src/content/docs/getting-started/authentication.md) (mermaid flowchart). Treat it as the single source of truth; if behavior diverges, fix the diagram in the same PR. + +## Bearer-token authentication for ADO + +ADO hosts (`dev.azure.com`, `*.visualstudio.com`) resolve auth in this order: + +1. `ADO_APM_PAT` env var if set +2. AAD bearer via `az account get-access-token --resource 499b84ac-1321-427f-aa17-267ca6975798` if `az` is installed and `az account show` succeeds +3. Otherwise: auth-failed error from `build_error_context` + +`ADO_APM_PAT` is the env var name used by the auth flow. The AAD bearer source constant lives in `src/apm_cli/core/token_manager.py` as `GitHubTokenManager.ADO_BEARER_SOURCE = "AAD_BEARER_AZ_CLI"`. + +**Stale-PAT silent fallback:** if `ADO_APM_PAT` is rejected with HTTP 401, APM retries with the az bearer and emits: + +``` +[!] ADO_APM_PAT was rejected for {host} (HTTP 401); fell back to az cli bearer. +[!] Consider unsetting the stale variable. +``` + +**Verbose source line** (one per host, emitted under `--verbose`): + +``` +[i] dev.azure.com -- using bearer from az cli (source: AAD_BEARER_AZ_CLI) +[i] dev.azure.com -- token from ADO_APM_PAT +``` + +**Diagnostic cases** (`_emit_stale_pat_diagnostic` + `build_error_context` in `src/apm_cli/core/auth.py`): + +1. No PAT, no `az`: `No ADO_APM_PAT was set and az CLI is not installed.` -> install `az`, run `az login --tenant `, or set `ADO_APM_PAT`. +2. No PAT, `az` not signed in: `az CLI is installed but no active session was found.` -> run `az login --tenant ` against the tenant that owns the org, or set `ADO_APM_PAT`. +3. No PAT, wrong tenant: `az CLI returned a token but the org does not accept it (likely a tenant mismatch).` -> run `az login --tenant `, or set `ADO_APM_PAT`. +4. PAT 401, no `az` fallback: `ADO_APM_PAT was rejected (HTTP 401) and no az cli fallback was available.` -> rotate the PAT, or install `az` and run `az login --tenant `. diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index d76d9d73c..5cb128b8c 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -96,3 +96,45 @@ jobs: include-hidden-files: true retention-days: 30 if-no-files-found: error + + # Dogfood the two CI gates we ship and document to users: + # - Gate A (consumer-side): `apm audit --ci` -- lockfile / install fidelity. + # - Gate B (producer-side): regeneration drift -- did someone hand-edit + # a regenerated file under .github/ without updating canonical .apm/? + # See microsoft/apm#883 for context. Tier 1 (no secrets needed). + apm-self-check: + name: APM Self-Check + runs-on: ubuntu-24.04 + permissions: + contents: read + steps: + - uses: actions/checkout@v4 + + # Installs the APM CLI (latest stable) and runs `apm install` against + # this repo's apm.yml. Auto-detects target from the existing .github/ + # directory and re-integrates local .apm/ content, regenerating + # .github/instructions/, .github/agents/, .github/skills/, etc. + # Adds `apm` to PATH for subsequent steps. + - uses: microsoft/apm-action@v1 + + # Gate A: lockfile / install fidelity (consumer-side). + # Verifies every file in lockfile.deployed_files exists, ref consistency + # between apm.yml and apm.lock.yaml, no orphan packages, and + # content-integrity (hidden Unicode) on deployed package content. + # Does NOT verify deployed-file content vs lockfile (see #684). + - name: apm audit --ci + run: apm audit --ci + + # Gate B: regeneration drift (producer-side). + # The action's `apm install` step re-integrated local .apm/ into + # .github/ via target auto-detection. If anything in the governed + # integration directories changed, someone edited the regenerated + # output without updating the canonical .apm/ source. + - name: Check APM integration drift + run: | + if [ -n "$(git status --porcelain -- .github/ .claude/ .cursor/ .opencode/)" ]; then + echo "::error::APM integration files are out of date." + echo "Run 'apm install' locally (with .github/ present) and commit the result." + git --no-pager diff -- .github/ .claude/ .cursor/ .opencode/ + exit 1 + fi diff --git a/.github/workflows/merge-gate.yml b/.github/workflows/merge-gate.yml index 9861a6524..aaa6d85e8 100644 --- a/.github/workflows/merge-gate.yml +++ b/.github/workflows/merge-gate.yml @@ -90,11 +90,11 @@ jobs: SHA: ${{ steps.sha.outputs.sha }} # All PR-time checks the gate aggregates. Keep this in sync with # the underlying workflows. Currently only ci.yml emits PR-time - # checks ('Build & Test (Linux)'); ci-integration.yml is - # merge_group-only and is NOT polled here. + # checks ('Build & Test (Linux)', 'APM Self-Check'); + # ci-integration.yml is merge_group-only and is NOT polled here. # NOTE: 'gate' (this job) MUST NOT appear here -- it would # deadlock waiting for itself. - EXPECTED_CHECKS: 'Build & Test (Linux)' + EXPECTED_CHECKS: 'Build & Test (Linux),APM Self-Check' TIMEOUT_MIN: '30' POLL_SEC: '30' run: | diff --git a/CHANGELOG.md b/CHANGELOG.md index 7b9ddc62b..0e3496861 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added + +- CI: add `APM Self-Check` to `ci.yml` for `apm audit --ci`, regeneration-drift validation, and `merge-gate.yml` `EXPECTED_CHECKS` coverage. (#885) + ### Changed - CI: smoke tests in `build-release.yml`'s `build-and-test` job (Linux x86_64, Linux arm64, Windows) are now gated to promotion boundaries (tag/schedule/dispatch) instead of running on every push to main. Push-time smoke duplicated the merge-time smoke gate in `ci-integration.yml` and burned ~15 redundant codex-binary downloads/day. Tag-cut releases still run smoke as a pre-ship gate; nightly catches upstream codex URL drift; merge-time still gates merges into main. (#878) diff --git a/docs/src/content/docs/integrations/ci-cd.md b/docs/src/content/docs/integrations/ci-cd.md index cc7a6a29f..021305b51 100644 --- a/docs/src/content/docs/integrations/ci-cd.md +++ b/docs/src/content/docs/integrations/ci-cd.md @@ -74,6 +74,10 @@ To ensure `.github/`, `.claude/`, `.cursor/`, and `.opencode/` integration files This catches cases where a developer updates `apm.yml` but forgets to re-run `apm install`. +:::tip[We dogfood this] +APM's own repo uses the `APM Self-Check` job in [`microsoft/apm`'s `ci.yml`](https://github.com/microsoft/apm/blob/main/.github/workflows/ci.yml) as a reference implementation for installing APM, running CI validation commands such as `apm audit --ci`, and checking for drift with `git status --porcelain`. Use it as a practical example when wiring these checks into your own workflow. +::: + ## Azure Pipelines ```yaml