Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .env.schema
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,9 @@ VISIBLE_WINS_AI_MODEL=
AI_DISCREPANCY_ANALYSIS_MODEL=
# @type=string
AI_TECHNICAL_WINS_MODEL=
# Override AI model for the Agent Maturity Assessment scorer (falls back to AI_MODEL).
# @type=string
MATURITY_AI_MODEL=

# Custom OpenAI-compatible endpoint
# @sensitive @type=url
Expand Down Expand Up @@ -130,6 +133,11 @@ TECHNICAL_WINS_AUDIENCE=
# @type=string
TEAMHERO_TUI_PATH=

# Set to "1" when a GitHub MCP server is connected so `teamhero assess` chooses
# Tier 2 evidence fidelity instead of falling back to git-only.
# @type=enum(1,)
TEAMHERO_GITHUB_MCP=

# Override GitHub OAuth App client ID (development/testing only)
# @type=string
GITHUB_OAUTH_CLIENT_ID=
23 changes: 23 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,29 @@ teamhero report --headless --foreground --flush-cache loc # Force re-fetch LOC
- Scan for leaked secrets: `npx varlock scan` (or `npx varlock scan --staged` in pre-commit)
- The `.env.schema` is safe for AI tools — it contains types and descriptions but never secret values

## Maturity Assessment (`teamhero assess`)

- The 12-item rubric is **hardcoded** in `src/services/maturity/rubric.ts`.
`RUBRIC_VERSION` is part of the cache key — bump it when rubric text or
scoring math changes so audits don't surface stale results.
- The 7 Phase-1 interview questions in `src/services/maturity/interview.ts`
are **verbatim from references/interview.md** — the wording is calibrated,
do not paraphrase. The skill rule "ask one question at a time, wait for
the answer" is enforced by the bidirectional JSON-lines protocol — don't
batch them.
- Tier-3 (git-only) audits **must cap items 2, 3, 9, 11 at 0.5** even when the
AI awards 1.0. `ai-scorer.ts::applyTier3Caps` enforces this post-hoc.
- `scripts/run-assess.ts` is bidirectional: stdin stays open after the initial
config line so the Go TUI can write `interview-answer` events.
`RunAssessServiceRunner` in `tui/assess_runner.go` keeps the stdin pipe
open intentionally — do not close it.
- The `MaturityProvider`, `InterviewTransport`, `AuditStore` ports live in
`src/core/types.ts` (same rule as every other port). Concrete value types
live in `src/services/maturity/types.ts`.
- `docs/maturity-skill-ref/` is a **reference copy** of the upstream skill
(extracted from the original zip) — kept for human readers. The canonical
rubric is the TS code, not these files.

## Landing Changes

- Always use `/land` to commit, push, and open PRs — never do these steps manually.
Expand Down
110 changes: 110 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,9 +129,108 @@ Or set `OPENAI_SERVICE_TIER=flex` in `~/.config/teamhero/.env`.

---

## Run a maturity assessment

Score an engineering organization (or a single repo) against the 12-criterion
**Agent Maturity Assessment** — reproducible dev environments, integration
cadence, testability, observability, design discipline, deep modules,
repo-local agent context, sanctioned AI tooling, human review, evals,
blast-radius controls, and judgment under AI augmentation.

The audit produces a weighted percentage, a raw `/12` score, item-level
evidence sentences, the top-3 fixes, and strengths to preserve. Output lands
in the current directory as `teamhero-maturity-<scope>-<date>.md` plus a
JSON sidecar with the full data.

**Bands:** **Excellent** (90%+) · **Healthy** (75–89%) · **Functional but
slow** (60–74%) · **Significant dysfunction** (40–59%) · **Triage** (<40%).

### Interactive TUI

```bash
teamhero assess
```

The wizard asks for scope (local repo / GitHub org / both), then walks you
through the 7 Phase-1 interview questions one at a time (AI tooling, hiring,
DORA visibility, design discipline, evals, blast-radius red-teaming, adjacent
repos). Each question has a small set of pre-written answer options plus a
free-text "Other" choice; "I don't know" maps the linked criterion to `n/a`.

### Headless / scripted

```bash
# Audit the current repo (no interview — uses CONFIG.md or "unknown")
teamhero assess --headless --path .

# Audit with pre-supplied interview answers
teamhero assess --headless --path . \
--interview-answers ./answers.json

# Org-wide audit
teamhero assess --headless --target-org acme \
--interview-answers ./answers.json

# Smoke test without an OpenAI call (placeholder scores)
teamhero assess --headless --path . --dry-run
```

`answers.json` shape — keys map to question IDs, value is verbatim text or
`"unknown"`:

```json
{
"q1": "Company-paid Claude with policy",
"q2": "AI allowed; interviewers trained",
"q3": "DORA via Grafana",
"q4": "Consistent ADR step before agent code",
"q5": "LLMs in dev loop, retro-tracked",
"q6": "unknown",
"q7": "No"
}
```

### Useful flags

| Flag | Purpose |
|------|---------|
| `--scope-mode {org\|local-repo\|both}` | Override scope (auto-inferred from other flags) |
| `--evidence-tier {auto\|gh\|github-mcp\|git-only}` | Pin the evidence tier; default auto-detects |
| `--audit-output <path>` | Override the markdown output path |
| `--audit-output-format {markdown\|json\|both}` | Default: `both` |
| `--dry-run` | Skip the AI scorer; emit a placeholder audit |
| `--show-assess-config` | Print saved configuration as JSON and exit |

Run `teamhero assess --help` for the full list, or read the [**full maturity assessment reference**](docs/MATURITY_ASSESSMENT.md) for everything — every flag, all 7 interview questions, evidence tiers, output format, troubleshooting.

### How the score is built

1. **Preflight** — auto-detects evidence tier (`gh` CLI authed → Tier 1,
GitHub MCP available → Tier 2, otherwise → Tier 3 git+filesystem only).
2. **Adjacent repos** — scans the local repo for workflow `uses:`, Terraform
module sources, submodules, and README cross-refs to find sibling repos
that should be in scope.
3. **Interview** — captures the 7 Phase-1 answers (interactively, from
`--interview-answers`, or from `docs/audits/CONFIG.md` if it exists in
the repo). Persists the confirmed answers back to `CONFIG.md` after
every successful run so re-audits can confirm-or-refresh.
4. **Evidence** — 12 deterministic detectors run against the local repo
(test files, CI workflows, dependency manifests, ADRs, agent context
files, CODEOWNERS, OIDC vs. long-lived secrets, Terraform IaC, etc.).
5. **AI scoring** — OpenAI Responses API with a strict JSON schema returns
per-item scores, ≤25-word evidence sentences, top-3 fixes, and
strengths. Tier-3 audits cap items 2/3/9/11 at 0.5 because the
GitHub-side evidence isn't observable.
6. **Output** — markdown rendered against the canonical template +
matching `.json` with the full artifact (rubric version, evidence
facts, category subtotals).

---

## Learn more

- [Configuration Reference](docs/CONFIG_FORMAT.md) — all settings, credentials, and user identity mapping
- [Maturity Assessment Reference](docs/MATURITY_ASSESSMENT.md) — full `teamhero assess` docs: rubric, interview, tiers, output, troubleshooting
- [Architecture Overview](docs/ARCHITECTURE.md) — how the system works under the hood

---
Expand All @@ -153,6 +252,7 @@ just # List all available recipes
| `just test-all` | Run all tests (TypeScript + Go) |
| `just lint` | Format and lint (Biome) |
| `just report` | Run a report |
| `just assess` | Run a maturity assessment |
| `just reset` | Clean all build artifacts |

### Secure credential setup with varlock
Expand All @@ -176,6 +276,16 @@ claude plugin install teamhero-scripts@teamhero

In Cowork, the plugin uses MCP connectors for GitHub and Asana (OAuth-based, no API tokens to manage).

### Shareable maturity-assessment skill

The Agent Maturity Assessment is also packaged as a **standalone, harness-agnostic skill** you can drop into any Claude environment (Claude Code, Cowork, Workbench, custom SDK harness) without installing TeamHero itself:

```
share/skills/agent-maturity-assessment/ ← copy this folder to ~/.claude/skills/
```

See [`share/skills/agent-maturity-assessment/INSTALL.md`](share/skills/agent-maturity-assessment/INSTALL.md) for installation steps for each harness. The skill works in pure-Claude mode by default and uses the `teamhero` binary as an optional accelerator when it's installed.

### Further reading

- [Distribution & Release Process](docs/DISTRIBUTION.md)
Expand Down
145 changes: 145 additions & 0 deletions claude-plugin/skills/agent-maturity-assessment/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
---
name: agent-maturity-assessment
description: Run the Agent Maturity Assessment via Team Hero — a 12-criterion diagnostic for engineering organization readiness in the AI-agentic coding era. Items score 0/0.5/1 across four weighted categories (engineering basics 1.0×, knowledge & context 1.5×, AI governance & quality 1.25×, hiring 1.0×), producing a weighted percentage and a raw /12. Use whenever the user wants to audit, diagnose, or score an engineering organization, team, repo, or recently acquired company for AI readiness. Trigger on phrases like "agent maturity", "agent readiness", "AI maturity", "engineering org health", "engineering maturity", "audit the team", "score this repo", "diagnose dev experience", "is this team ready for AI", "is this team modern", "how healthy is this org", or any onboarding-era assessment. Produces a scored audit with item-level evidence, category subtotals, weighted overall score, top fixes, and strengths to preserve.
---

# Run an Agent Maturity Assessment via Team Hero

Team Hero ships a first-class implementation of the Agent Maturity Assessment.
It scores a 12-criterion diagnostic across four weighted categories using a
hybrid pipeline: deterministic detectors gather evidence from the local repo /
GitHub / Asana, a Phase-1 interview captures the org-level signals that aren't
visible in code, and an AI scorer (OpenAI Responses API + strict JSON schema)
produces the final scores, evidence sentences (≤25 words each), top-3 fixes,
and strengths to preserve.

## Detect runtime mode

1. **Binary mode (preferred when available)** — If `teamhero` (or `teamhero-tui`)
is installed and the user has `OPENAI_API_KEY` configured, use Path A.
2. **Pure-Claude fallback** — Otherwise fall back to the standalone
`anthropic-skills:agent-maturity-assessment` skill if available, or to
running the rubric manually using the references below.

```bash
teamhero --version 2>/dev/null || teamhero-tui --version 2>/dev/null
```

## Path A — Team Hero binary mode

### Step 1: Ensure credentials

```bash
teamhero doctor # confirms ~/.config/teamhero/.env is healthy
```

If `OPENAI_API_KEY` is missing, ask the user to run `teamhero setup` (or write
the key into `~/.config/teamhero/.env`).

### Step 2: Pick the scope

Ask the user one of:
- A **local repo path** they want audited (default: `cwd`).
- A **GitHub org** name — for an org-wide audit.
- **Both** — when the user wants to assess an org and a representative checkout.

### Step 3: Run the assessment

Headless invocation (preferred when running on behalf of the user):

```bash
# Local repo audit, no interview, dry-run for a quick smoke test
teamhero assess --headless --path . --dry-run

# Real audit against a local repo with interview answers in a JSON file
teamhero assess --headless --path . \
--interview-answers /path/to/answers.json \
--audit-output ./audit.md

# Org-wide audit
teamhero assess --headless --target-org acme \
--interview-answers /path/to/answers.json
```

Interactive (the user walks through scope + the 7 Phase-1 questions one at a
time in the TUI):

```bash
teamhero assess
```

### Step 4: Surface the result

The runner emits two files:
- `<audit-output>.md` — full audit using the canonical template (per-category
tables, summary, maturity-scale row marker, top-3 fixes, strengths,
adjacent repos consulted, notes for re-audit).
- `<audit-output>.json` — full data including item scores, evidence facts,
rubric version, and tier.

Read the markdown back to the user — do not just say "done." Highlight the
band (Excellent / Healthy / Functional but slow / Significant dysfunction /
Triage), the weighted percentage, and the top-3 fixes.

### Notes on the interview

Phase-1 has 7 questions about org-level facts the repo can't answer (AI
tooling, hiring, DORA visibility, design discipline, evals, blast-radius
red-teaming, adjacent repos). The skill's invariant is **one question at a
time** — do not pre-answer or batch them. In `--headless` mode, supply
`--interview-answers <file.json>` with shape:

```json
{
"q1": "Company-paid Claude with policy",
"q2": "AI allowed in interviews, interviewers trained",
"q3": "DORA tracked via Grafana",
"q4": "Consistent ADR step before agent code",
"q5": "LLMs in dev loop, tracked in retro metrics",
"q6": "Worst-case red-teamed, rollbacks documented",
"q7": "unknown"
}
```

Use `"unknown"` (or `"I don't know"`) to mark a question as unanswered — the
linked criterion will be scored `n/a` and excluded from numerator and max.

### Tier behavior

The runner auto-detects the evidence tier (`gh` CLI authenticated → Tier 1,
GitHub MCP available → Tier 2, git+filesystem only → Tier 3). At Tier 3,
items 2, 3, 9, and 11 are capped at 0.5 because GitHub-side evidence is
needed to award 1.0 confidently.

Override with `--evidence-tier {auto|gh|github-mcp|git-only}` when needed.

## Path B — Pure-Claude fallback

If the binary is not available, defer to the standalone skill bundle (e.g.,
`anthropic-skills:agent-maturity-assessment`) which contains the same rubric,
interview, output template, and preflight references but runs entirely from
within Claude.

## Reference: the rubric

The Team Hero implementation hardcodes the rubric at
`src/services/maturity/rubric.ts` (RUBRIC_VERSION export). The 12 items map
to 4 categories:

| # | Item | Category | Weight |
|---|------|----------|--------|
| 1 | Reproducible dev environments | A. Engineering basics | 1.0× |
| 2 | Sub-day integration cadence with measured outcomes | A. Engineering basics | 1.0× |
| 3 | Testability and the agent inner loop | A. Engineering basics | 1.0× |
| 4 | Observability before features | A. Engineering basics | 1.0× |
| 5 | Design discipline as a first-class practice | B. Knowledge & context | 1.5× |
| 6 | Codebase composed of deep modules | B. Knowledge & context | 1.5× |
| 7 | Repo-local agent context | B. Knowledge & context | 1.5× |
| 8 | Sanctioned, governed AI tooling | C. AI governance & quality | 1.25× |
| 9 | Human review on every PR | C. AI governance & quality | 1.25× |
| 10 | Evals for AI-touched code paths | C. AI governance & quality | 1.25× |
| 11 | Blast-radius controls for agent actions | C. AI governance & quality | 1.25× |
| 12 | Interviews assess judgment under AI augmentation | D. Hiring | 1.0× |

Maximum weighted score: 14.5. Bands: 90%+ Excellent · 75–89% Healthy ·
60–74% Functional but slow · 40–59% Significant dysfunction · <40% Triage.
Loading
Loading