asabaylus · asabaylus · May 3, 2026 · May 3, 2026 · May 3, 2026 · May 11, 2026
diff --git a/.env.schema b/.env.schema
@@ -42,6 +42,9 @@ VISIBLE_WINS_AI_MODEL=
 AI_DISCREPANCY_ANALYSIS_MODEL=
 # @type=string
 AI_TECHNICAL_WINS_MODEL=
+# Override AI model for the Agent Maturity Assessment scorer (falls back to AI_MODEL).
+# @type=string
+MATURITY_AI_MODEL=
 
 # Custom OpenAI-compatible endpoint
 # @sensitive @type=url
@@ -130,6 +133,11 @@ TECHNICAL_WINS_AUDIENCE=
 # @type=string
 TEAMHERO_TUI_PATH=
 
+# Set to "1" when a GitHub MCP server is connected so `teamhero assess` chooses
+# Tier 2 evidence fidelity instead of falling back to git-only.
+# @type=enum(1,)
+TEAMHERO_GITHUB_MCP=
+
 # Override GitHub OAuth App client ID (development/testing only)
 # @type=string
 GITHUB_OAUTH_CLIENT_ID=
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -103,6 +103,29 @@ teamhero report --headless --foreground --flush-cache loc  # Force re-fetch LOC
 - Scan for leaked secrets: `npx varlock scan` (or `npx varlock scan --staged` in pre-commit)
 - The `.env.schema` is safe for AI tools — it contains types and descriptions but never secret values
 
+## Maturity Assessment (`teamhero assess`)
+
+- The 12-item rubric is **hardcoded** in `src/services/maturity/rubric.ts`.
+  `RUBRIC_VERSION` is part of the cache key — bump it when rubric text or
+  scoring math changes so audits don't surface stale results.
+- The 7 Phase-1 interview questions in `src/services/maturity/interview.ts`
+  are **verbatim from references/interview.md** — the wording is calibrated,
+  do not paraphrase. The skill rule "ask one question at a time, wait for
+  the answer" is enforced by the bidirectional JSON-lines protocol — don't
+  batch them.
+- Tier-3 (git-only) audits **must cap items 2, 3, 9, 11 at 0.5** even when the
+  AI awards 1.0. `ai-scorer.ts::applyTier3Caps` enforces this post-hoc.
+- `scripts/run-assess.ts` is bidirectional: stdin stays open after the initial
+  config line so the Go TUI can write `interview-answer` events.
+  `RunAssessServiceRunner` in `tui/assess_runner.go` keeps the stdin pipe
+  open intentionally — do not close it.
+- The `MaturityProvider`, `InterviewTransport`, `AuditStore` ports live in
+  `src/core/types.ts` (same rule as every other port). Concrete value types
+  live in `src/services/maturity/types.ts`.
+- `docs/maturity-skill-ref/` is a **reference copy** of the upstream skill
+  (extracted from the original zip) — kept for human readers. The canonical
+  rubric is the TS code, not these files.
+
 ## Landing Changes
 
 - Always use `/land` to commit, push, and open PRs — never do these steps manually.

diff --git a/README.md b/README.md
@@ -129,9 +129,108 @@ Or set `OPENAI_SERVICE_TIER=flex` in `~/.config/teamhero/.env`.
 
 ---
 
+## Run a maturity assessment
+
+Score an engineering organization (or a single repo) against the 12-criterion
+**Agent Maturity Assessment** — reproducible dev environments, integration
+cadence, testability, observability, design discipline, deep modules,
+repo-local agent context, sanctioned AI tooling, human review, evals,
+blast-radius controls, and judgment under AI augmentation.
+
+The audit produces a weighted percentage, a raw `/12` score, item-level
+evidence sentences, the top-3 fixes, and strengths to preserve. Output lands
+in the current directory as `teamhero-maturity-<scope>-<date>.md` plus a
+JSON sidecar with the full data.
+
+**Bands:** **Excellent** (90%+) · **Healthy** (75–89%) · **Functional but
+slow** (60–74%) · **Significant dysfunction** (40–59%) · **Triage** (<40%).
+
+### Interactive TUI
+
+```bash
+teamhero assess
+```
+
+The wizard asks for scope (local repo / GitHub org / both), then walks you
+through the 7 Phase-1 interview questions one at a time (AI tooling, hiring,
+DORA visibility, design discipline, evals, blast-radius red-teaming, adjacent
+repos). Each question has a small set of pre-written answer options plus a
+free-text "Other" choice; "I don't know" maps the linked criterion to `n/a`.
+
+### Headless / scripted
+
+```bash
+# Audit the current repo (no interview — uses CONFIG.md or "unknown")
+teamhero assess --headless --path .
+
+# Audit with pre-supplied interview answers
+teamhero assess --headless --path . \
+  --interview-answers ./answers.json
+
+# Org-wide audit
+teamhero assess --headless --target-org acme \
+  --interview-answers ./answers.json
+
+# Smoke test without an OpenAI call (placeholder scores)
+teamhero assess --headless --path . --dry-run
+```
+
+`answers.json` shape — keys map to question IDs, value is verbatim text or
+`"unknown"`:
+
+```json
+{
+  "q1": "Company-paid Claude with policy",
+  "q2": "AI allowed; interviewers trained",
+  "q3": "DORA via Grafana",
+  "q4": "Consistent ADR step before agent code",
+  "q5": "LLMs in dev loop, retro-tracked",
+  "q6": "unknown",
+  "q7": "No"
+}
+```
+
+### Useful flags
+
+| Flag | Purpose |
+|------|---------|
+| `--scope-mode {org\|local-repo\|both}` | Override scope (auto-inferred from other flags) |
+| `--evidence-tier {auto\|gh\|github-mcp\|git-only}` | Pin the evidence tier; default auto-detects |
+| `--audit-output <path>` | Override the markdown output path |
+| `--audit-output-format {markdown\|json\|both}` | Default: `both` |
+| `--dry-run` | Skip the AI scorer; emit a placeholder audit |
+| `--show-assess-config` | Print saved configuration as JSON and exit |
+
+Run `teamhero assess --help` for the full list, or read the [**full maturity assessment reference**](docs/MATURITY_ASSESSMENT.md) for everything — every flag, all 7 interview questions, evidence tiers, output format, troubleshooting.
+
+### How the score is built
+
+1. **Preflight** — auto-detects evidence tier (`gh` CLI authed → Tier 1,
+   GitHub MCP available → Tier 2, otherwise → Tier 3 git+filesystem only).
+2. **Adjacent repos** — scans the local repo for workflow `uses:`, Terraform
+   module sources, submodules, and README cross-refs to find sibling repos
+   that should be in scope.
+3. **Interview** — captures the 7 Phase-1 answers (interactively, from
+   `--interview-answers`, or from `docs/audits/CONFIG.md` if it exists in
+   the repo). Persists the confirmed answers back to `CONFIG.md` after
+   every successful run so re-audits can confirm-or-refresh.
+4. **Evidence** — 12 deterministic detectors run against the local repo
+   (test files, CI workflows, dependency manifests, ADRs, agent context
+   files, CODEOWNERS, OIDC vs. long-lived secrets, Terraform IaC, etc.).
+5. **AI scoring** — OpenAI Responses API with a strict JSON schema returns
+   per-item scores, ≤25-word evidence sentences, top-3 fixes, and
+   strengths. Tier-3 audits cap items 2/3/9/11 at 0.5 because the
+   GitHub-side evidence isn't observable.
+6. **Output** — markdown rendered against the canonical template +
+   matching `.json` with the full artifact (rubric version, evidence
+   facts, category subtotals).
+
+---
+
 ## Learn more
 
 - [Configuration Reference](docs/CONFIG_FORMAT.md) — all settings, credentials, and user identity mapping
+- [Maturity Assessment Reference](docs/MATURITY_ASSESSMENT.md) — full `teamhero assess` docs: rubric, interview, tiers, output, troubleshooting
 - [Architecture Overview](docs/ARCHITECTURE.md) — how the system works under the hood
 
 ---
@@ -153,6 +252,7 @@ just                  # List all available recipes
 | `just test-all` | Run all tests (TypeScript + Go) |
 | `just lint` | Format and lint (Biome) |
 | `just report` | Run a report |
+| `just assess` | Run a maturity assessment |
 | `just reset` | Clean all build artifacts |
 
 ### Secure credential setup with varlock
@@ -176,6 +276,16 @@ claude plugin install teamhero-scripts@teamhero
 
 In Cowork, the plugin uses MCP connectors for GitHub and Asana (OAuth-based, no API tokens to manage).
 
+### Shareable maturity-assessment skill
+
+The Agent Maturity Assessment is also packaged as a **standalone, harness-agnostic skill** you can drop into any Claude environment (Claude Code, Cowork, Workbench, custom SDK harness) without installing TeamHero itself:
+
+```
+share/skills/agent-maturity-assessment/   ← copy this folder to ~/.claude/skills/
+```
+
+See [`share/skills/agent-maturity-assessment/INSTALL.md`](share/skills/agent-maturity-assessment/INSTALL.md) for installation steps for each harness. The skill works in pure-Claude mode by default and uses the `teamhero` binary as an optional accelerator when it's installed.
+
 ### Further reading
 
 - [Distribution & Release Process](docs/DISTRIBUTION.md)

diff --git a/claude-plugin/skills/agent-maturity-assessment/SKILL.md b/claude-plugin/skills/agent-maturity-assessment/SKILL.md
@@ -0,0 +1,145 @@
+---
+name: agent-maturity-assessment
+description: Run the Agent Maturity Assessment via Team Hero — a 12-criterion diagnostic for engineering organization readiness in the AI-agentic coding era. Items score 0/0.5/1 across four weighted categories (engineering basics 1.0×, knowledge & context 1.5×, AI governance & quality 1.25×, hiring 1.0×), producing a weighted percentage and a raw /12. Use whenever the user wants to audit, diagnose, or score an engineering organization, team, repo, or recently acquired company for AI readiness. Trigger on phrases like "agent maturity", "agent readiness", "AI maturity", "engineering org health", "engineering maturity", "audit the team", "score this repo", "diagnose dev experience", "is this team ready for AI", "is this team modern", "how healthy is this org", or any onboarding-era assessment. Produces a scored audit with item-level evidence, category subtotals, weighted overall score, top fixes, and strengths to preserve.
+---
+
+# Run an Agent Maturity Assessment via Team Hero
+
+Team Hero ships a first-class implementation of the Agent Maturity Assessment.
+It scores a 12-criterion diagnostic across four weighted categories using a
+hybrid pipeline: deterministic detectors gather evidence from the local repo /
+GitHub / Asana, a Phase-1 interview captures the org-level signals that aren't
+visible in code, and an AI scorer (OpenAI Responses API + strict JSON schema)
+produces the final scores, evidence sentences (≤25 words each), top-3 fixes,
+and strengths to preserve.
+
+## Detect runtime mode
+
+1. **Binary mode (preferred when available)** — If `teamhero` (or `teamhero-tui`)
+   is installed and the user has `OPENAI_API_KEY` configured, use Path A.
+2. **Pure-Claude fallback** — Otherwise fall back to the standalone
+   `anthropic-skills:agent-maturity-assessment` skill if available, or to
+   running the rubric manually using the references below.
+
+```bash
+teamhero --version 2>/dev/null || teamhero-tui --version 2>/dev/null
+```
+
+## Path A — Team Hero binary mode
+
+### Step 1: Ensure credentials
+
+```bash
+teamhero doctor                    # confirms ~/.config/teamhero/.env is healthy
+```
+
+If `OPENAI_API_KEY` is missing, ask the user to run `teamhero setup` (or write
+the key into `~/.config/teamhero/.env`).
+
+### Step 2: Pick the scope
+
+Ask the user one of:
+- A **local repo path** they want audited (default: `cwd`).
+- A **GitHub org** name — for an org-wide audit.
+- **Both** — when the user wants to assess an org and a representative checkout.
+
+### Step 3: Run the assessment
+
+Headless invocation (preferred when running on behalf of the user):
+
+```bash
+# Local repo audit, no interview, dry-run for a quick smoke test
+teamhero assess --headless --path . --dry-run
+
+# Real audit against a local repo with interview answers in a JSON file
+teamhero assess --headless --path . \
+  --interview-answers /path/to/answers.json \
+  --audit-output ./audit.md
+
+# Org-wide audit
+teamhero assess --headless --target-org acme \
+  --interview-answers /path/to/answers.json
+```
+
+Interactive (the user walks through scope + the 7 Phase-1 questions one at a
+time in the TUI):
+
+```bash
+teamhero assess
+```
+
+### Step 4: Surface the result
+
+The runner emits two files:
+- `<audit-output>.md` — full audit using the canonical template (per-category
+  tables, summary, maturity-scale row marker, top-3 fixes, strengths,
+  adjacent repos consulted, notes for re-audit).
+- `<audit-output>.json` — full data including item scores, evidence facts,
+  rubric version, and tier.
+
+Read the markdown back to the user — do not just say "done." Highlight the
+band (Excellent / Healthy / Functional but slow / Significant dysfunction /
+Triage), the weighted percentage, and the top-3 fixes.
+
+### Notes on the interview
+
+Phase-1 has 7 questions about org-level facts the repo can't answer (AI
+tooling, hiring, DORA visibility, design discipline, evals, blast-radius
+red-teaming, adjacent repos). The skill's invariant is **one question at a
+time** — do not pre-answer or batch them. In `--headless` mode, supply
+`--interview-answers <file.json>` with shape:
+
+```json
+{
+  "q1": "Company-paid Claude with policy",
+  "q2": "AI allowed in interviews, interviewers trained",
+  "q3": "DORA tracked via Grafana",
+  "q4": "Consistent ADR step before agent code",
+  "q5": "LLMs in dev loop, tracked in retro metrics",
+  "q6": "Worst-case red-teamed, rollbacks documented",
+  "q7": "unknown"
+}
+```
+
+Use `"unknown"` (or `"I don't know"`) to mark a question as unanswered — the
+linked criterion will be scored `n/a` and excluded from numerator and max.
+
+### Tier behavior
+
+The runner auto-detects the evidence tier (`gh` CLI authenticated → Tier 1,
+GitHub MCP available → Tier 2, git+filesystem only → Tier 3). At Tier 3,
+items 2, 3, 9, and 11 are capped at 0.5 because GitHub-side evidence is
+needed to award 1.0 confidently.
+
+Override with `--evidence-tier {auto|gh|github-mcp|git-only}` when needed.
+
+## Path B — Pure-Claude fallback
+
+If the binary is not available, defer to the standalone skill bundle (e.g.,
+`anthropic-skills:agent-maturity-assessment`) which contains the same rubric,
+interview, output template, and preflight references but runs entirely from
+within Claude.
+
+## Reference: the rubric
+
+The Team Hero implementation hardcodes the rubric at
+`src/services/maturity/rubric.ts` (RUBRIC_VERSION export). The 12 items map
+to 4 categories:
+
+| # | Item | Category | Weight |
+|---|------|----------|--------|
+| 1 | Reproducible dev environments | A. Engineering basics | 1.0× |
+| 2 | Sub-day integration cadence with measured outcomes | A. Engineering basics | 1.0× |
+| 3 | Testability and the agent inner loop | A. Engineering basics | 1.0× |
+| 4 | Observability before features | A. Engineering basics | 1.0× |
+| 5 | Design discipline as a first-class practice | B. Knowledge & context | 1.5× |
+| 6 | Codebase composed of deep modules | B. Knowledge & context | 1.5× |
+| 7 | Repo-local agent context | B. Knowledge & context | 1.5× |
+| 8 | Sanctioned, governed AI tooling | C. AI governance & quality | 1.25× |
+| 9 | Human review on every PR | C. AI governance & quality | 1.25× |
+| 10 | Evals for AI-touched code paths | C. AI governance & quality | 1.25× |
+| 11 | Blast-radius controls for agent actions | C. AI governance & quality | 1.25× |
+| 12 | Interviews assess judgment under AI augmentation | D. Hiring | 1.0× |
+
+Maximum weighted score: 14.5. Bands: 90%+ Excellent · 75–89% Healthy ·
+60–74% Functional but slow · 40–59% Significant dysfunction · <40% Triage.