From 704720df5ba57c738debfe1271e20843c5edab2d Mon Sep 17 00:00:00 2001 From: Your Name Date: Sat, 25 Apr 2026 10:47:30 -0500 Subject: [PATCH] Add Tier A runner policy docs --- README.md | 2 + docs/omt-global-ci-cd-audit-2026-04-14.md | 193 ++++++++++++++++++++++ docs/tier-a-rollout-ledger.md | 21 +++ docs/tier-a-runner-policy.md | 43 +++++ 4 files changed, 259 insertions(+) create mode 100644 docs/omt-global-ci-cd-audit-2026-04-14.md create mode 100644 docs/tier-a-rollout-ledger.md create mode 100644 docs/tier-a-runner-policy.md diff --git a/README.md b/README.md index 040f7c9..c0d8802 100644 --- a/README.md +++ b/README.md @@ -14,6 +14,8 @@ Self-hosted GitHub runner infrastructure for Synology shell-only pools, Linux Do - [Workflow cookbook](docs/workflow-cookbook.md) - [Linux Docker examples](docs/linux-docker-pool.md) - [Private-repo parity guide](docs/private-repo-parity.md) +- [Tier A runner policy](docs/tier-a-runner-policy.md) +- [Tier A rollout ledger](docs/tier-a-rollout-ledger.md) - [Release image flow](#publishing-a-release-image) - [Roadmap](ROADMAP.md) diff --git a/docs/omt-global-ci-cd-audit-2026-04-14.md b/docs/omt-global-ci-cd-audit-2026-04-14.md new file mode 100644 index 0000000..8b07c0b --- /dev/null +++ b/docs/omt-global-ci-cd-audit-2026-04-14.md @@ -0,0 +1,193 @@ +# OMT-Global CI/CD Audit + +Date: 2026-04-14 +Scope baseline: active `OMT-Global` repos visible through the GitHub connector, with local workflow inspection where available and `/Users/johnteneyckjr./src/github-runner-fleet` treated as the intended runner-platform reference. + +## Executive Summary + +This org does not have a runner-capacity problem. It has a CI shape and trust-separation consistency problem. + +The best repos already follow the right pattern: cheap PR gates, separate extended validation, and explicit runner compatibility rules. The weaker repos are not failing because they lack CI. They are failing because they either: + +1. collapse PR, push, and privileged execution into one broad workflow, +2. rely on legacy hosted-only pipelines with no split between fast and deep validation, or +3. use self-hosted labels without an explicit security boundary for release/security-only work. + +Highest-risk org-wide findings: + +1. There is no org-wide `rg-security` or `rg-release` boundary yet. Even the better repos mostly distinguish only between shell-safe self-hosted lanes and GitHub-hosted lanes; they do not yet implement the stricter trust split described in the target state. +2. Several active repos still use legacy single-workflow CI (`Why-fi`, `fix-your-life-app`, `homenet`) or imported pipelines (`synology-api`), which means no standardized PR gate, no nightly/deep validation split, and uneven security coverage. +3. Bootstrap-aligned repos are directionally correct, but the current template still stops at `fast-checks` plus `validate-secrets`; it does not yet centralize dependency review, CodeQL, Semgrep, or OSV in a reusable security workflow. +4. Release isolation is inconsistent. `github-runner-fleet` keeps release work on GitHub-hosted runners, which is good, but that pattern is not yet generalized across the org and is not enforced with selected-workflow access to a separate privileged runner group. +5. Tier B imported/mirrored repos are still visible in the org and can dilute governance if treated as first-class without first normalizing ownership and workflow policy. + +## Method + +Tiering and scoring rules used here: + +- `Tier A`: active first-party, non-archived repos with enough evidence to treat them as current org delivery surfaces. +- `Tier B`: archived repos, imported/mirrored repos, or repos where first-party ownership/pipeline intent is unclear. + +Each Tier A repo is scored on: + +- Trust boundary +- Runner discipline +- Pipeline shape +- Security coverage +- Reusability/governance +- Operability + +Final rating: + +- `Green`: structurally sound, incremental hardening only +- `Yellow`: workable but inconsistent, medium-priority cleanup +- `Red`: trust-boundary or pipeline-shape issues that should be fixed before wider policy rollout + +## Tier A Ranked Audit + +| Rank | Repo | Class | Current runner model | Current pipeline model | Rating | Why it landed here | Next action | +| --- | --- | --- | --- | --- | --- | --- | --- | +| 1 | `bootstrap` | Bootstrap-aligned split CI | Shell-safe self-hosted for PR/main; hosted for Claude | `pr-fast-ci` + `extended-validation` + `CI Gate` | Green | Clean reference shape and closest thing to org policy template | Use as the source repo for reusable PR/security/release workflows | +| 2 | `github-runner-fleet` | Bootstrap-aligned split CI with release specialization | Shell-safe self-hosted for shell-safe jobs; hosted for Docker/macOS contract and release image work | Split PR/main plus dedicated release workflow | Green | Best concrete runner compatibility discipline in the org | Extend its policy into explicit `rg-ci` / `rg-security` / `rg-release` governance | +| 3 | `openclaw-ouro` | Mixed/custom but intentional | Dynamic routing between self-hosted private and hosted fallback depending on trust | PR fast gate + automerge + extended validation | Green | Strongest trust-aware custom routing; justified repo-specific shape | Preserve custom routing but move security scans into a central reusable security workflow | +| 4 | `axiom` | Mixed/custom but intentional | Public shell-safe self-hosted for trusted repo work; hosted for matrix and Rust stage1 | Custom CI plus split PR/main templates | Yellow | Thoughtful hybrid design, but not yet standardized on org split-CI for the main workload | Normalize around the shared gate model and add central security scans | +| 5 | `lattice` | Bootstrap-aligned split CI, macOS-heavy | Private shell-safe self-hosted plus dedicated self-hosted macOS/Xcode lanes | Split PR/main plus separate Swift CI | Yellow | Good direction, but CI shape is fragmented between template and repo-specific Swift workflow | Consolidate app-specific tests under the split template and isolate deeper macOS validation | +| 6 | `mypersonalbanker` | Bootstrap-aligned split CI, mixed mobile/backend | Private shell-safe self-hosted plus self-hosted macOS/Xcode | Split PR/main plus broad custom CI | Yellow | Strong self-hosted adoption, but broad CI remains separate from template governance and has no explicit security/release boundary | Collapse backend/mobile coverage into the shared fast/deep model and add reusable security checks | +| 7 | `Screensaver` | Bootstrap-aligned split CI | Public shell-safe self-hosted | Split PR/main only | Yellow | Template adoption is good, but public-repo self-hosted use still needs stricter trust proof and separate security/release policy | Keep PR gates cheap, add hosted security scans, and verify no privileged secrets/network are reachable from public runners | +| 8 | `homenet` | Special-case release/docs repo | Hosted only | Single CI + digest/report/release workflows | Yellow | Safe from self-hosted trust issues, but no PR/main split and release/security concerns are mixed with repo-specific automation | Keep hosted execution, but split fast PR validation from release/reporting and add security scanning | +| 9 | `Why-fi` | Legacy/single-workflow | Hosted macOS only | Single CI workflow | Yellow | Safe runner choice, but thin pipeline shape with no split gates, no nightly/deep lane, and no reusable governance hooks | Move to org split CI and add separate extended validation plus secrets/security checks | +| 10 | `fix-your-life-app` | Legacy/single-workflow | Hosted macOS only | Single iOS test workflow | Yellow | Similar to `Why-fi`: low trust risk, weak pipeline shape | Add fast PR gate, extended validation, and shared security checks before scaling contributor volume | +| 11 | `fireworks-game` | Workflow-light / unconfirmed | No local workflow evidence found | No confirmed CI in inspected checkout | Red | Active first-party app repo with no confirmed CI evidence in this pass | Confirm workflow absence, then add minimum PR gate and platform-specific validation immediately | +| 12 | `home-tv-channel-list` | Workflow-light / connector-only | No `.github/workflows/ci.yml` found via contents API; no local clone inspected | No confirmed CI evidence in this pass | Red | Active public repo with no confirmed CI baseline from the evidence gathered | Verify workflow inventory directly and add hosted PR validation at minimum | +| 13 | `omt-corner-cave` | Workflow-light / connector-only | No `.github/workflows/ci.yml` found via contents API; local checkout not wired to org remote in this pass | No confirmed CI evidence in this pass | Red | Infra repo without confirmed guardrails is a governance gap by default | Confirm actual workflow set and add hosted validation plus deployment/release separation | +| 14 | `mac-cksum` | Workflow-light / connector-only | No `.github/workflows/ci.yml` found via contents API; no local clone inspected | No confirmed CI evidence in this pass | Red | Active first-party utility repo with no confirmed baseline | Add minimum hosted CI and secret/dependency checks | +| 15 | `gh-attest` | Special-case release/security repo | No `.github/workflows/ci.yml` found via contents API | No confirmed CI evidence in this pass | Red | Security-sensitive reusable workflow repo should not be effectively ungoverned | Add hosted validation and release integrity checks before wider reuse | +| 16 | `acme-aws` | Connector-only infra repo | No `.github/workflows/ci.yml` found via contents API | No confirmed CI evidence in this pass | Red | Infra repo with no confirmed CI is a direct governance miss | Confirm whether Terraform/security workflows exist; if absent, add them first | + +## Tier A Findings By Severity + +### Critical + +- No repo in scope currently demonstrates the full target-state split of `rg-ci`, `rg-security`, and `rg-release`. The org is still mostly operating with a two-way distinction: shell-safe self-hosted versus GitHub-hosted. +- Several active repos still have no confirmed baseline CI from the evidence gathered: `fireworks-game`, `home-tv-channel-list`, `omt-corner-cave`, `mac-cksum`, `gh-attest`, and `acme-aws`. Those remain red until confirmed otherwise. +- `Screensaver` is public and routes its split-CI template to `[self-hosted, synology, shell-only, public]`. That can be acceptable only if those runners have no privileged network or secrets exposure; otherwise the public trust boundary is weaker than it should be. + +### Important + +- `Why-fi` and `fix-your-life-app` are low-risk from a runner perspective because they stay on hosted macOS, but they are behind the org standard in pipeline shape. They need PR gates, deeper scheduled validation, and shared security checks. +- `lattice` and `mypersonalbanker` both show useful self-hosted private-runner adoption, but each still has overlapping custom CI alongside the split template. That increases maintenance and weakens central policy enforcement. +- `axiom` has a thoughtful custom hybrid model, including hosted fallback for fork PRs and hosted matrix coverage, but it still lives outside the common governance lane and does not yet consume a central security workflow. +- `homenet` is operationally safe because it stays hosted, but it mixes CI, release packaging, digest reporting, and autopatch automation without a common fast/deep split or reusable security posture. + +### Cleanup + +- Template drift exists across otherwise aligned repos: action versions differ (`@v4`, `@v6`), path filters vary, and app-specific CI remains partially duplicated outside the shared gate shape. +- `github-runner-fleet` already models correct compatibility rules for Docker-heavy and macOS contract jobs, but those lessons are not yet promoted into a reusable org-wide workflow contract. +- Imported pipeline history remains present in `synology-api`, which is still on `jmcte/synology-api` remote locally and uses legacy Pages plus pre-commit workflows instead of org policy. + +## Evidence Notes For Yellow/Red Repos + +- `axiom`: custom `ci.yml` mixes trusted self-hosted public jobs and hosted matrix/stage1 jobs instead of using the shared split template; this is intentional but increases policy drift. +- `lattice`: `ci.yml` runs a broad self-hosted macOS lane while `pr-fast-ci.yml` and `extended-validation.yml` also exist, so the repo has overlapping governance surfaces. +- `mypersonalbanker`: custom `ci.yml` drives backend plus iOS/macOS builds on self-hosted runners while the split template also exists; no dedicated security workflow is present. +- `Screensaver`: split template exists, but all fast and extended jobs still run on public self-hosted labels, so trust depends entirely on runner hardening rather than runner-group separation. +- `homenet`: single hosted `ci.yml` plus digest/report/release workflows means no standard PR fast gate and no explicit deep-validation or security-only lane. +- `Why-fi`: only a single hosted macOS workflow was found. +- `fix-your-life-app`: only a single hosted iOS test workflow was found. +- `fireworks-game`: no local workflow files were present in the inspected checkout. +- `home-tv-channel-list`, `omt-corner-cave`, `mac-cksum`, `gh-attest`, `acme-aws`: `.github/workflows/ci.yml` was not found through the GitHub contents API, and code search indexing was unavailable, so these remain red pending direct workflow inventory confirmation. + +## Tier B: Excluded From Primary Ranking + +Archived or likely imported/mirrored repos should not drive the main rollout order: + +- Archived: `personal-knowledge-graph`, `acme-agents`, `company-os`, `demo-repository` +- Likely imported/mirrored from description or local remote evidence: `acme-core` (explicit clone), `synology-api` (local remote points to `jmcte/synology-api`), `machete` (local remote points to `johnmteneyckjr/mac-setup`), `glacier-utilities` (local remote points to `johnmteneyckjr/glacier-utilities`) + +Recommendation: keep these visible for reference, but do not spend runner-policy migration effort on them until ownership and desired end-state are explicit. + +## Target-State Platform Standard + +### Runner Groups + +- `rg-ci` + - For fast PR gates, lint, unit tests, and shell-safe validation + - No deploy credentials + - No prod network reachability + - Public repos may use this only if the underlying runners are truly public-safe and ephemeral + +- `rg-security` + - For CodeQL, Semgrep, dependency review, OSV, SonarQube, and deep security checks + - Selected-workflow access only + - No deploy credentials + - No prod network reachability + +- `rg-release` + - For deploy/release only + - Selected repositories only + - Selected pinned workflows only + - OIDC only; no long-lived cloud keys in repo secrets + +### Workflow Contract + +- `pr-fast-ci` + - Cheap, deterministic, repo-compatible checks + - Required status check remains `CI Gate` + - No deploy or release steps + +- `security-pr` + - Reusable workflow + - Runs dependency review, Semgrep diff scan, OSV PR scan, and repo-appropriate CodeQL where warranted + - Hosted or isolated `rg-security`, never general-purpose persistent runners + +- `extended-validation` + - Runs on `main`, nightly, and manual dispatch + - Includes slower integration, smoke, release-readiness, and deep validation + +- `release` + - Separate workflow family + - Pinned reusable workflow reference + - `rg-release` only if self-hosted is required; otherwise hosted + +## Migration Waves + +### Wave 1: Stop the Red Gaps + +- Confirm workflow inventory for `fireworks-game`, `home-tv-channel-list`, `omt-corner-cave`, `mac-cksum`, `gh-attest`, and `acme-aws`. +- Add a minimum hosted PR validation baseline where CI is absent. +- For infra/security-sensitive repos, add Terraform or release-specific validation before any self-hosted expansion. + +### Wave 2: Normalize the Legacy Yellow Repos + +- Move `Why-fi` and `fix-your-life-app` onto the split PR/extended model. +- Keep them hosted for now; do not force self-hosted use where hosted macOS is the safer operating point. +- Add shared secret scanning and reusable security workflows. + +### Wave 3: Simplify the Mixed Custom Repos + +- For `lattice` and `mypersonalbanker`, reduce overlap between custom broad CI and the split template. +- Keep repo-specific macOS/iOS lanes, but hang them off the common fast/deep contract instead of parallel governance. +- For `axiom`, preserve the justified hosted matrix and stage1 coverage while moving security work into shared reusable workflows. + +### Wave 4: Add Real Trust Separation + +- Stand up `rg-security` and `rg-release`. +- Restrict high-trust groups by selected workflows pinned to branch/tag/SHA. +- Move security scanning out of the general CI lane and out of repo-by-repo bespoke YAML. + +### Wave 5: Harden Release and Public-Repo Policy + +- Generalize the `github-runner-fleet` release isolation pattern. +- Re-evaluate `Screensaver` public self-hosted execution and either: + - prove those runners are public-safe and ephemeral, or + - move public PR execution to hosted runners and reserve self-hosted for trusted same-repo work only. + +## Honest Bottom Line + +The org already has a viable pattern in `bootstrap`, `github-runner-fleet`, and `openclaw-ouro`. That is enough to standardize from. The weak point is not technical feasibility; it is uneven adoption and incomplete trust separation. + +If you do only three things next, do these: + +1. Inventory and fix the red repos with missing or unconfirmed CI. +2. Move `Why-fi`, `fix-your-life-app`, `lattice`, and `mypersonalbanker` onto one shared PR/deep-validation contract. +3. Introduce real `rg-security` and `rg-release` boundaries with reusable workflows, because that is the actual blocker to using the runner fleet more aggressively without increasing risk. diff --git a/docs/tier-a-rollout-ledger.md b/docs/tier-a-rollout-ledger.md new file mode 100644 index 0000000..70780ca --- /dev/null +++ b/docs/tier-a-rollout-ledger.md @@ -0,0 +1,21 @@ +# Tier A Rollout Ledger + +Use this ledger as the central coordination artifact for multi-repo rollout work. + +| Repo | Cluster | Current class | Target class | Exception needed | Status | Blocker | Required checks target | Runner target | Security workflow | Release workflow | Notes | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | +| bootstrap | A | Bootstrap-aligned split CI | Control-plane reference | No | In progress | Define reusable workflow contract | `CI Gate` | `rg-ci` + hosted | Add reusable caller examples | Add reusable release caller example | Source of truth for default contract | +| github-runner-fleet | A | Bootstrap-aligned split CI with release specialization | Control-plane reference | No | In progress | Finalize runner-group policy docs | `CI Gate` | `rg-ci` + hosted release | Consume shared security contract selectively | Keep hosted release-image until `rg-release` is ready | Source of truth for runner classes | +| lattice | B | Split CI plus bespoke Swift CI | Split CI with repo-specific app hooks | Yes | Planned | Overlapping CI surfaces | `CI Gate` | `rg-ci` + private macOS | Planned | Planned | macOS/Xcode exception | +| mypersonalbanker | B | Split CI plus bespoke backend/mobile CI | Split CI with repo-specific app hooks | Yes | Planned | Overlapping CI surfaces | `CI Gate` | `rg-ci` + private macOS | Planned | Planned | mixed backend + Apple app | +| Why-fi | C | Legacy hosted single workflow | Hosted split CI | Yes | In progress | Shared bootstrap security caller not published yet | `CI Gate` | hosted macOS | Pending control-plane publish | N/A | split CI added locally; keep hosted | +| fix-your-life-app | C | Legacy hosted single workflow | Hosted split CI | Yes | Planned | No split gate yet | `CI Gate` | hosted macOS | Planned | N/A | hosted is safer default | +| openclaw-ouro | D | Custom trust-aware routing | Custom exception on shared contract | Yes | Planned | Preserve repo-specific routing | `CI Gate` | custom private routing | Planned | Planned | reference exception repo | +| axiom | D | Custom hybrid public/private CI | Hybrid exception on shared contract | Yes | Planned | Preserve matrix + stage1 flow | `CI Gate` or repo-specific gate | `rg-ci` + hosted matrix | Planned | N/A | language-specific exception | +| homenet | D | Hosted special-case CI/release/docs | Hosted split CI with special automation lanes | Yes | Planned | CI and release/report mixed together | `CI Gate` | hosted | Planned | Planned | hosted special-case repo | +| fireworks-game | E | No CI confirmed | Minimum viable hosted baseline | Yes | In progress | Shared bootstrap security caller not published yet | `CI Gate` | hosted macOS | Pending control-plane publish | N/A | local inventory confirmed; baseline CI added | +| home-tv-channel-list | E | Unconfirmed | Minimum viable repo-specific baseline | Unknown | Needs inventory | Confirm workflow set | TBD | TBD | TBD | TBD | no confirmed CI in audit pass | +| omt-corner-cave | E | Unconfirmed | Minimum viable infra baseline | Unknown | Needs inventory | Confirm workflow set | TBD | TBD | TBD | TBD | infra repo | +| mac-cksum | E | Unconfirmed | Minimum viable repo-specific baseline | Unknown | Needs inventory | Confirm workflow set | TBD | TBD | TBD | TBD | utility repo | +| gh-attest | E | Unconfirmed | Hosted release/security baseline | Unknown | Needs inventory | Confirm workflow set | TBD | TBD | TBD | TBD | security-sensitive reusable repo | +| acme-aws | E | Unconfirmed | Hosted infra baseline | Unknown | Needs inventory | Confirm workflow set | TBD | TBD | TBD | TBD | Terraform/security-sensitive repo | diff --git a/docs/tier-a-runner-policy.md b/docs/tier-a-runner-policy.md new file mode 100644 index 0000000..fb96a74 --- /dev/null +++ b/docs/tier-a-runner-policy.md @@ -0,0 +1,43 @@ +# Tier A Runner Policy + +This document is the runner-boundary control plane for Tier A `OMT-Global` repos. + +## Runner Groups + +### `rg-ci` + +- Fast PR gates and deterministic shell-safe validation +- No deploy credentials +- No production network reachability +- Public repos may use this only if the runner class is ephemeral and public-safe + +### `rg-security` + +- CodeQL, Semgrep, dependency review, OSV, SonarQube, and other security-only jobs +- Selected repositories only +- Selected workflows only +- No deploy credentials +- No production network reachability + +### `rg-release` + +- Publish, deploy, and release verification jobs only +- Selected repositories only +- Selected workflows pinned to a branch, tag, or SHA +- OIDC preferred over long-lived secrets + +## Workload Rules + +- Synology shell-only runners are for shell-safe jobs only. +- Linux Docker runners are for `container:`, service-container, Docker daemon, Buildx, and similar workloads. +- Lume macOS runners are for native macOS/Xcode jobs. +- GitHub-hosted remains the default for public untrusted PRs unless the self-hosted class is explicitly hardened for that trust level. + +## Tier A Defaults + +- `bootstrap`, `github-runner-fleet`, and other bootstrap-aligned repos should route: + - `pr-fast-ci` to `rg-ci` + - `security-pr` to hosted or `rg-security` + - `extended-validation` to repo-compatible runner classes + - `release` to hosted or `rg-release` +- Repos with justified exceptions must record them explicitly rather than silently forking policy in workflow YAML.