Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions diagnostics/bias-scan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Bias Scan Diagnostic

Load when motivated reasoning, deadline pressure, unusual confidence, political pressure, or high-stakes downside could distort the analysis.

---

## Purpose

AZIMUTH already has inline circuit-breakers for sycophancy, availability, domain calibration, and verdict softening. This diagnostic centralizes the broader bias pass so the core skill does not bloat while still catching common decision distortions.

Run the scan after the initial risk register exists. Bias scanning before risk generation can prematurely narrow the search space.

---

## Triggers

Load in STANDARD when any of these fire:

- user is unusually certain but evidence is thin
- user asks for a fast check on a high-stakes decision
- Module 4 returns YELLOW or RED
- plan contains sunk cost, public commitment, deadline pressure, or authority pressure
- all risks look generic or conveniently manageable
- proposed verdict is PROCEED or PROCEED WITH SAFEGUARDS despite unsupported critical assumptions

Load unconditionally in DEEP.

---

## Fast Bias Pass

For each triggered bias, answer: **signal, counter-question, effect on register**.

| Bias | Signal | Counter-question | Register effect |
|---|---|---|---|
| Sycophancy | User confidence appears mirrored by the model | Would this classification be the same if a stranger proposed it? | Re-check strongest user claim as first UNSUPPORTED candidate |
| Optimism / Planning fallacy | Timeline assumes clean path | What does the outside view say for similar work? | Raise likelihood or add evidence gate |
| Anchoring | Estimate inherits first number mentioned | What estimate would emerge from bottom-up constraints? | Re-score timing risks |
| Availability | Risks mirror generic examples | Which failure trigger is specific to this plan? | Discard generic chains without anchors |
| Confirmation | Only supportive evidence is cited | What would change the verdict? | Add falsifier or DELAY gate |
| Sunk cost | Past spend affects forward decision | Would this be approved fresh today? | Raise governance / reversibility severity |
| Commitment consistency | Public prior statement narrows options | Is consistency being valued over correctness? | Add elephant or incentive entry |
| Authority bias | Senior sponsor/vendor/board pressure dominates | Does their authority apply to this domain? | Escalate Module 4 conflict if needed |
| Social proof / bandwagon | Others doing it is treated as evidence | Do those others share our constraints and failure costs? | Require reference-class fit |
| FOMO | Urgency based on scarce opportunity | What is the cost of waiting, and will similar chances recur? | Test RAPID vs artificial urgency |
| Loss aversion | Avoiding loss dominates upside/downside balance | What opportunity cost is created by avoiding loss? | Add opportunity-cost lens if material |
| Status quo bias | Current path preserved without analysis | Is doing nothing being evaluated as an active option? | Add baseline scenario |
| Shiny object | Novelty substitutes for fit | Is this better or just newer? | Add alternative-path comparison |
| Survivorship bias | Examples include only winners | What failed examples are missing? | Load base rates / failure references |
| Endowment effect | Existing asset/team/idea is overvalued | Would we buy/build this today from scratch? | Re-check cost/benefit assumptions |

---

## Output Shape

Only include biases that changed the analysis.

```markdown
## Bias Scan
- [Bias]: [signal]. Counter-question: [question]. Register impact: [risk score / evidence class / verdict changed how].
```

If no bias changed the register, omit the section and state internally: bias scan completed, no output-relevant change.

---

## Verdict Hooks

- If bias scan changes the top assumption to UNSUPPORTED, Module 10 confidence ceiling applies.
- If bias scan reveals artificial urgency, RAPID may still run, but output must state the urgency appears incentive-driven.
- If bias scan reveals authority, social proof, or sunk-cost pressure, add or escalate an Elephant / incentive conflict entry.

---

## Provenance

Synthesizes AZIMUTH's existing circuit-breakers with decision-bias counter-question patterns from decision-toolkit-style references. No external text copied.
119 changes: 119 additions & 0 deletions diagnostics/likelihood-impact-matrix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Likelihood / Impact Matrix Diagnostic

Load when STANDARD, DEEP, or any advanced lens produces a non-trivial risk register. This diagnostic quantifies risk without pretending precision the evidence does not support.

---

## Purpose

AZIMUTH already classifies evidence quality and structural fragility. This file adds a small quantitative layer so the output can distinguish:

- highly likely moderate failures from rare catastrophic failures
- scary but unsupported speculation from evidence-backed launch blockers
- mitigations that change risk from mitigations that merely sound responsible

The score is a triage aid, not a verdict by itself. Verdicts still come from the full register, incentive tier, reversibility, and evidence gates.

---

## When to Run

Run automatically in:

- **STANDARD** when the register has 3+ material risks, any potential PILOT FIRST verdict, or any potential PROCEED / PROCEED WITH SAFEGUARDS verdict
- **DEEP** whenever Critical Risks are produced
- **RESIDUAL-RISK-REGISTER** outputs when ranking the 3-5 remaining controllable risks
- **MULTI-LENS** or **FOOL** drill-downs when they add risks back into the main register

Do not run in FAST unless the user explicitly asks for scoring. FAST should stay low-friction.

---

## Scale

Use a 5x5 matrix unless the user provides a native risk scale.

### Likelihood (L)

| Score | Probability anchor | Use when |
|---|---:|---|
| 1 | <5% | Possible, but no direct evidence and weak fit to base rates |
| 2 | 5-20% | Plausible but not the modal path |
| 3 | 20-50% | Credible risk with partial evidence or relevant base-rate support |
| 4 | 50-80% | More likely than not without mitigation |
| 5 | >80% | Already happening, structurally forced, or strongly supported by base rates |

### Impact (I)

| Score | Severity anchor | Use when |
|---|---|---|
| 1 | Minor | Local rework; no decision-level impact |
| 2 | Moderate | Schedule slip or bounded cost; recoverable inside current scope |
| 3 | Major | Meaningful delay, reputation hit, missed launch, or stakeholder escalation |
| 4 | Severe | Budget/headcount/customer/contract damage; reversal costly |
| 5 | Existential / decision-killing | Objective fails, public trust loss, irreversible commitment, or strategic dead end |

---

## Base-Rate Adjustment

Before finalizing L, compare the inside-view estimate to the closest usable reference class.

1. Identify the closest reference class from `references/base-rates.md` or domain references.
2. If the user's estimate is materially more optimistic than the reference class, raise L by 1 unless they supplied strong differentiating evidence.
3. If the reference class is adjacent but not exact, label it directional and cap its adjustment at +1.
4. If no reference class fits, leave L evidence-based and state uncertainty.

Never present an adjusted score as actuarial precision. Use language like "directional L=4" when the reference class is imperfect.

---

## Score and Bands

`Risk Score = Likelihood × Impact`

| Score | Band | Action posture |
|---:|---|---|
| 1-4 | Green | Track only; do not let it drive the verdict |
| 5-9 | Yellow | Mitigate if cheap; include only if decision-relevant |
| 10-14 | Orange | Requires mitigation or explicit acceptance |
| 15-19 | Red | Decision blocker unless structurally mitigated or tested first |
| 20-25 | Black | Strong bias toward DELAY, REDUCE SCOPE, PILOT FIRST, or REJECT |

Tie-breaker: when two risks have equal scores, rank later-detectable and less-reversible risks higher.

---

## Output Shape

Use only for the top 1-5 risks. Do not score every hypothetical.

| Risk | L | I | L×I | Base-rate adjustment | Detectability | Reversibility | Action |
|---|---:|---:|---:|---|---|---|---|
| [risk] | 4 | 5 | 20 | +1 from reference class | Late | Low | Pilot before full commitment |

---

## Verdict Hooks

- **PROCEED** is unavailable if any unmitigated register risk scores 15+.
- **PROCEED WITH SAFEGUARDS** requires every 15+ risk to have a named structural safeguard, owner, leading indicator, and review date.
- **PILOT FIRST** requires the pilot to target the highest L×I unsupported assumption or dependency.
- **REDUCE SCOPE** fires when the score is high mainly because scope amplifies impact.
- **DELAY PENDING EVIDENCE** fires when one narrow evidence gate could move L by 2+ points.
- **REJECT** is favored when multiple 15+ risks remain untestable or structurally unmitigated.

---

## Anti-Patterns

- Do not use scoring to launder weak evidence into confidence.
- Do not assign false precision to unknown probabilities.
- Do not let one dramatic low-likelihood scenario crowd out the most likely failure path.
- Do not average scores. One Black risk can kill the decision.

---

## Provenance

Adapted conceptually from common likelihood × impact risk matrices and the L/I gap noted in DasClown/premortem-skill `AUDIT.md`. Implementation is AZIMUTH-native: base-rate adjusted, register-bound, and verdict-gated.
103 changes: 103 additions & 0 deletions diagnostics/risk-triage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Risk Triage Diagnostic

Load when the register has more risks than the output can responsibly carry, when a user or team is conflating anxiety with evidence, or when unspoken organizational risks may be suppressing the true failure mode.

---

## Purpose

Separate real launch-blocking risks from scary but weak speculation and from political / unspoken concerns. This keeps AZIMUTH sharp: fewer generic risks, more decision-grade triage.

---

## Categories

Use the taxonomy in `references/risk-categories.md`.

- **Tiger** — real, evidence-backed, capable of material harm
- **Paper Tiger** — sounds scary but is unsupported, low-likelihood, or low-impact after inspection
- **Elephant** — unspoken, political, incentive-laden, or socially costly to name

For Tigers, assign urgency:

- **Launch-Blocking** — must resolve before commitment / launch
- **Fast-Follow** — can proceed only with a dated post-decision owner and review point
- **Track** — monitor with a leading indicator; does not drive verdict alone

---

## Triage Steps

1. **Evidence check** — What makes this risk real?
2. **Impact check** — What breaks if it fires?
3. **Timing check** — Must it be resolved before commitment, or can it be watched?
4. **Politics check** — Did the room get quieter around this risk? Is someone punished for naming it?
5. **Actionability check** — Is there a structural action, owner, and review date?

---

## Register Fields

Use these fields when triage is active:

| Field | Required? | Notes |
|---|---|---|
| Risk | Yes | Plain-English risk name |
| Category | Yes | Tiger / Paper Tiger / Elephant |
| Urgency | For Tigers | Launch-Blocking / Fast-Follow / Track |
| Evidence | Yes | Data, observation, prior incident, or explicit lack of evidence |
| L/I Score | When scoring active | From `diagnostics/likelihood-impact-matrix.md` |
| Owner | For Launch-Blocking / Fast-Follow / Elephant | Single accountable role/person |
| Leading Indicator | For all output risks | Observable early warning sign |
| Review Date | For accepted / residual risks | Date or decision checkpoint |

---

## Output Shape

```markdown
## Risk Triage
| Risk | Category | Urgency | Evidence | Owner | Leading Indicator | Review Date |
|---|---|---|---|---|---|---|
```

Omit Paper Tigers from Critical Risks unless explaining why a scary objection should not drive the verdict.

---

## Verdict Hooks

- Any unmitigated **Launch-Blocking Tiger** blocks PROCEED.
- Two or more Launch-Blocking Tigers usually imply DELAY, REDUCE SCOPE, PILOT FIRST, or REJECT.
- Any unresolved **Elephant** involving authority, incentives, dissent suppression, or hidden ownership raises governance severity and may cap confidence.
- A register dominated by Paper Tigers is evidence that anxiety is high but decision risk may be lower than it feels; do not inflate the verdict.

---

## Automation Reference

A user may implement simple automation with this schema:

```json
{
"risks": [
{
"description": "Authentication service had 3 P1 outages in 30 days",
"category": "tiger",
"evidence": "Incident reports",
"urgency": "launch_blocking",
"owner": "Platform lead",
"leading_indicator": "P1 recurrence or error budget breach",
"review_date": "YYYY-MM-DD"
}
]
}
```

AZIMUTH does not require code execution. The schema is enough to improve reasoning and handoff quality.

---

## Provenance

Adapted conceptually from the Tiger / Paper Tiger / Elephant framing in borghei/Claude-Skills pre-mortem. This file is original AZIMUTH integration and avoids copying the Commons Clause script.
54 changes: 54 additions & 0 deletions learnings/outcome-tracking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Outcome Tracking Loop

Use after a decision has time to produce signal. This is not part of the core run unless the user asks for a learning review.

---

## Purpose

Feed real outcomes back into AZIMUTH's base rates, gotchas, and diagnostic calibration. The skill improves only if predictions are compared against reality.

---

## Review Cadence

At the review date from `templates/commitment-lock.md` or `templates/decision-record.md`, capture:

| Field | Prompt |
|---|---|
| Original verdict | What did AZIMUTH recommend? |
| Actual decision | What did the user/team do? |
| Outcome | What happened? |
| Prediction hit | Which risks occurred? |
| Prediction miss | What happened that AZIMUTH missed? |
| False alarm | Which risks were over-weighted? |
| Leading indicator quality | Did indicators fire early enough? |
| Calibration change | What should change in base rates, gotchas, or diagnostics? |

---

## Learning Classes

- **Base-rate update** — observed outcome changes reference-class expectations
- **Gotcha candidate** — recurring cross-domain pattern not currently captured
- **Diagnostic weakness** — existing diagnostic missed or over-weighted a signal
- **Commitment failure** — risk was identified but no owned action happened
- **Verdict calibration** — verdict was too soft or too severe for the evidence

---

## Output

```markdown
## AZIMUTH Outcome Review
- Original verdict: ...
- Actual outcome: ...
- Calibration finding: ...
- Repository update candidate: ...
```

---

## Provenance

Implements AZIMUTH's self-improvement loop: outcome evidence should feed future base-rate, gotcha, and commitment calibration rather than remain anecdotal.
Loading