MrBinnacle · MrBinnacle · May 15, 2026 · May 15, 2026
diff --git a/diagnostics/bias-scan.md b/diagnostics/bias-scan.md
@@ -0,0 +1,77 @@
+# Bias Scan Diagnostic
+
+Load when motivated reasoning, deadline pressure, unusual confidence, political pressure, or high-stakes downside could distort the analysis.
+
+---
+
+## Purpose
+
+AZIMUTH already has inline circuit-breakers for sycophancy, availability, domain calibration, and verdict softening. This diagnostic centralizes the broader bias pass so the core skill does not bloat while still catching common decision distortions.
+
+Run the scan after the initial risk register exists. Bias scanning before risk generation can prematurely narrow the search space.
+
+---
+
+## Triggers
+
+Load in STANDARD when any of these fire:
+
+- user is unusually certain but evidence is thin
+- user asks for a fast check on a high-stakes decision
+- Module 4 returns YELLOW or RED
+- plan contains sunk cost, public commitment, deadline pressure, or authority pressure
+- all risks look generic or conveniently manageable
+- proposed verdict is PROCEED or PROCEED WITH SAFEGUARDS despite unsupported critical assumptions
+
+Load unconditionally in DEEP.
+
+---
+
+## Fast Bias Pass
+
+For each triggered bias, answer: **signal, counter-question, effect on register**.
+
+| Bias | Signal | Counter-question | Register effect |
+|---|---|---|---|
+| Sycophancy | User confidence appears mirrored by the model | Would this classification be the same if a stranger proposed it? | Re-check strongest user claim as first UNSUPPORTED candidate |
+| Optimism / Planning fallacy | Timeline assumes clean path | What does the outside view say for similar work? | Raise likelihood or add evidence gate |
+| Anchoring | Estimate inherits first number mentioned | What estimate would emerge from bottom-up constraints? | Re-score timing risks |
+| Availability | Risks mirror generic examples | Which failure trigger is specific to this plan? | Discard generic chains without anchors |
+| Confirmation | Only supportive evidence is cited | What would change the verdict? | Add falsifier or DELAY gate |
+| Sunk cost | Past spend affects forward decision | Would this be approved fresh today? | Raise governance / reversibility severity |
+| Commitment consistency | Public prior statement narrows options | Is consistency being valued over correctness? | Add elephant or incentive entry |
+| Authority bias | Senior sponsor/vendor/board pressure dominates | Does their authority apply to this domain? | Escalate Module 4 conflict if needed |
+| Social proof / bandwagon | Others doing it is treated as evidence | Do those others share our constraints and failure costs? | Require reference-class fit |
+| FOMO | Urgency based on scarce opportunity | What is the cost of waiting, and will similar chances recur? | Test RAPID vs artificial urgency |
+| Loss aversion | Avoiding loss dominates upside/downside balance | What opportunity cost is created by avoiding loss? | Add opportunity-cost lens if material |
+| Status quo bias | Current path preserved without analysis | Is doing nothing being evaluated as an active option? | Add baseline scenario |
+| Shiny object | Novelty substitutes for fit | Is this better or just newer? | Add alternative-path comparison |
+| Survivorship bias | Examples include only winners | What failed examples are missing? | Load base rates / failure references |
+| Endowment effect | Existing asset/team/idea is overvalued | Would we buy/build this today from scratch? | Re-check cost/benefit assumptions |
+
+---
+
+## Output Shape
+
+Only include biases that changed the analysis.
+
+```markdown
+## Bias Scan
+- [Bias]: [signal]. Counter-question: [question]. Register impact: [risk score / evidence class / verdict changed how].
+```
+
+If no bias changed the register, omit the section and state internally: bias scan completed, no output-relevant change.
+
+---
+
+## Verdict Hooks
+
+- If bias scan changes the top assumption to UNSUPPORTED, Module 10 confidence ceiling applies.
+- If bias scan reveals artificial urgency, RAPID may still run, but output must state the urgency appears incentive-driven.
+- If bias scan reveals authority, social proof, or sunk-cost pressure, add or escalate an Elephant / incentive conflict entry.
+
+---
+
+## Provenance
+
+Synthesizes AZIMUTH's existing circuit-breakers with decision-bias counter-question patterns from decision-toolkit-style references. No external text copied.
diff --git a/diagnostics/likelihood-impact-matrix.md b/diagnostics/likelihood-impact-matrix.md
@@ -0,0 +1,119 @@
+# Likelihood / Impact Matrix Diagnostic
+
+Load when STANDARD, DEEP, or any advanced lens produces a non-trivial risk register. This diagnostic quantifies risk without pretending precision the evidence does not support.
+
+---
+
+## Purpose
+
+AZIMUTH already classifies evidence quality and structural fragility. This file adds a small quantitative layer so the output can distinguish:
+
+- highly likely moderate failures from rare catastrophic failures
+- scary but unsupported speculation from evidence-backed launch blockers
+- mitigations that change risk from mitigations that merely sound responsible
+
+The score is a triage aid, not a verdict by itself. Verdicts still come from the full register, incentive tier, reversibility, and evidence gates.
+
+---
+
+## When to Run
+
+Run automatically in:
+
+- **STANDARD** when the register has 3+ material risks, any potential PILOT FIRST verdict, or any potential PROCEED / PROCEED WITH SAFEGUARDS verdict
+- **DEEP** whenever Critical Risks are produced
+- **RESIDUAL-RISK-REGISTER** outputs when ranking the 3-5 remaining controllable risks
+- **MULTI-LENS** or **FOOL** drill-downs when they add risks back into the main register
+
+Do not run in FAST unless the user explicitly asks for scoring. FAST should stay low-friction.
+
+---
+
+## Scale
+
+Use a 5x5 matrix unless the user provides a native risk scale.
+
+### Likelihood (L)
+
+| Score | Probability anchor | Use when |
+|---|---:|---|
+| 1 | <5% | Possible, but no direct evidence and weak fit to base rates |
+| 2 | 5-20% | Plausible but not the modal path |
+| 3 | 20-50% | Credible risk with partial evidence or relevant base-rate support |
+| 4 | 50-80% | More likely than not without mitigation |
+| 5 | >80% | Already happening, structurally forced, or strongly supported by base rates |
+
+### Impact (I)
+
+| Score | Severity anchor | Use when |
+|---|---|---|
+| 1 | Minor | Local rework; no decision-level impact |
+| 2 | Moderate | Schedule slip or bounded cost; recoverable inside current scope |
+| 3 | Major | Meaningful delay, reputation hit, missed launch, or stakeholder escalation |
+| 4 | Severe | Budget/headcount/customer/contract damage; reversal costly |
+| 5 | Existential / decision-killing | Objective fails, public trust loss, irreversible commitment, or strategic dead end |
+
+---
+
+## Base-Rate Adjustment
+
+Before finalizing L, compare the inside-view estimate to the closest usable reference class.
+
+1. Identify the closest reference class from `references/base-rates.md` or domain references.
+2. If the user's estimate is materially more optimistic than the reference class, raise L by 1 unless they supplied strong differentiating evidence.
+3. If the reference class is adjacent but not exact, label it directional and cap its adjustment at +1.
+4. If no reference class fits, leave L evidence-based and state uncertainty.
+
+Never present an adjusted score as actuarial precision. Use language like "directional L=4" when the reference class is imperfect.
+
+---
+
+## Score and Bands
+
+`Risk Score = Likelihood × Impact`
+
+| Score | Band | Action posture |
+|---:|---|---|
+| 1-4 | Green | Track only; do not let it drive the verdict |
+| 5-9 | Yellow | Mitigate if cheap; include only if decision-relevant |
+| 10-14 | Orange | Requires mitigation or explicit acceptance |
+| 15-19 | Red | Decision blocker unless structurally mitigated or tested first |
+| 20-25 | Black | Strong bias toward DELAY, REDUCE SCOPE, PILOT FIRST, or REJECT |
+
+Tie-breaker: when two risks have equal scores, rank later-detectable and less-reversible risks higher.
+
+---
+
+## Output Shape
+
+Use only for the top 1-5 risks. Do not score every hypothetical.
+
+| Risk | L | I | L×I | Base-rate adjustment | Detectability | Reversibility | Action |
+|---|---:|---:|---:|---|---|---|---|
+| [risk] | 4 | 5 | 20 | +1 from reference class | Late | Low | Pilot before full commitment |
+
+---
+
+## Verdict Hooks
+
+- **PROCEED** is unavailable if any unmitigated register risk scores 15+.
+- **PROCEED WITH SAFEGUARDS** requires every 15+ risk to have a named structural safeguard, owner, leading indicator, and review date.
+- **PILOT FIRST** requires the pilot to target the highest L×I unsupported assumption or dependency.
+- **REDUCE SCOPE** fires when the score is high mainly because scope amplifies impact.
+- **DELAY PENDING EVIDENCE** fires when one narrow evidence gate could move L by 2+ points.
+- **REJECT** is favored when multiple 15+ risks remain untestable or structurally unmitigated.
+
+---
+
+## Anti-Patterns
+
+- Do not use scoring to launder weak evidence into confidence.
+- Do not assign false precision to unknown probabilities.
+- Do not let one dramatic low-likelihood scenario crowd out the most likely failure path.
+- Do not average scores. One Black risk can kill the decision.
+
+---
+
+## Provenance
+
+Adapted conceptually from common likelihood × impact risk matrices and the L/I gap noted in DasClown/premortem-skill `AUDIT.md`. Implementation is AZIMUTH-native: base-rate adjusted, register-bound, and verdict-gated.
diff --git a/diagnostics/risk-triage.md b/diagnostics/risk-triage.md
@@ -0,0 +1,103 @@
+# Risk Triage Diagnostic
+
+Load when the register has more risks than the output can responsibly carry, when a user or team is conflating anxiety with evidence, or when unspoken organizational risks may be suppressing the true failure mode.
+
+---
+
+## Purpose
+
+Separate real launch-blocking risks from scary but weak speculation and from political / unspoken concerns. This keeps AZIMUTH sharp: fewer generic risks, more decision-grade triage.
+
+---
+
+## Categories
+
+Use the taxonomy in `references/risk-categories.md`.
+
+- **Tiger** — real, evidence-backed, capable of material harm
+- **Paper Tiger** — sounds scary but is unsupported, low-likelihood, or low-impact after inspection
+- **Elephant** — unspoken, political, incentive-laden, or socially costly to name
+
+For Tigers, assign urgency:
+
+- **Launch-Blocking** — must resolve before commitment / launch
+- **Fast-Follow** — can proceed only with a dated post-decision owner and review point
+- **Track** — monitor with a leading indicator; does not drive verdict alone
+
+---
+
+## Triage Steps
+
+1. **Evidence check** — What makes this risk real?
+2. **Impact check** — What breaks if it fires?
+3. **Timing check** — Must it be resolved before commitment, or can it be watched?
+4. **Politics check** — Did the room get quieter around this risk? Is someone punished for naming it?
+5. **Actionability check** — Is there a structural action, owner, and review date?
+
+---
+
+## Register Fields
+
+Use these fields when triage is active:
+
+| Field | Required? | Notes |
+|---|---|---|
+| Risk | Yes | Plain-English risk name |
+| Category | Yes | Tiger / Paper Tiger / Elephant |
+| Urgency | For Tigers | Launch-Blocking / Fast-Follow / Track |
+| Evidence | Yes | Data, observation, prior incident, or explicit lack of evidence |
+| L/I Score | When scoring active | From `diagnostics/likelihood-impact-matrix.md` |
+| Owner | For Launch-Blocking / Fast-Follow / Elephant | Single accountable role/person |
+| Leading Indicator | For all output risks | Observable early warning sign |
+| Review Date | For accepted / residual risks | Date or decision checkpoint |
+
+---
+
+## Output Shape
+
+```markdown
+## Risk Triage
+| Risk | Category | Urgency | Evidence | Owner | Leading Indicator | Review Date |
+|---|---|---|---|---|---|---|
+```
+
+Omit Paper Tigers from Critical Risks unless explaining why a scary objection should not drive the verdict.
+
+---
+
+## Verdict Hooks
+
+- Any unmitigated **Launch-Blocking Tiger** blocks PROCEED.
+- Two or more Launch-Blocking Tigers usually imply DELAY, REDUCE SCOPE, PILOT FIRST, or REJECT.
+- Any unresolved **Elephant** involving authority, incentives, dissent suppression, or hidden ownership raises governance severity and may cap confidence.
+- A register dominated by Paper Tigers is evidence that anxiety is high but decision risk may be lower than it feels; do not inflate the verdict.
+
+---
+
+## Automation Reference
+
+A user may implement simple automation with this schema:
+
+```json
+{
+  "risks": [
+    {
+      "description": "Authentication service had 3 P1 outages in 30 days",
+      "category": "tiger",
+      "evidence": "Incident reports",
+      "urgency": "launch_blocking",
+      "owner": "Platform lead",
+      "leading_indicator": "P1 recurrence or error budget breach",
+      "review_date": "YYYY-MM-DD"
+    }
+  ]
+}
+```
+
+AZIMUTH does not require code execution. The schema is enough to improve reasoning and handoff quality.
+
+---
+
+## Provenance
+
+Adapted conceptually from the Tiger / Paper Tiger / Elephant framing in borghei/Claude-Skills pre-mortem. This file is original AZIMUTH integration and avoids copying the Commons Clause script.
diff --git a/learnings/outcome-tracking.md b/learnings/outcome-tracking.md
@@ -0,0 +1,54 @@
+# Outcome Tracking Loop
+
+Use after a decision has time to produce signal. This is not part of the core run unless the user asks for a learning review.
+
+---
+
+## Purpose
+
+Feed real outcomes back into AZIMUTH's base rates, gotchas, and diagnostic calibration. The skill improves only if predictions are compared against reality.
+
+---
+
+## Review Cadence
+
+At the review date from `templates/commitment-lock.md` or `templates/decision-record.md`, capture:
+
+| Field | Prompt |
+|---|---|
+| Original verdict | What did AZIMUTH recommend? |
+| Actual decision | What did the user/team do? |
+| Outcome | What happened? |
+| Prediction hit | Which risks occurred? |
+| Prediction miss | What happened that AZIMUTH missed? |
+| False alarm | Which risks were over-weighted? |
+| Leading indicator quality | Did indicators fire early enough? |
+| Calibration change | What should change in base rates, gotchas, or diagnostics? |
+
+---
+
+## Learning Classes
+
+- **Base-rate update** — observed outcome changes reference-class expectations
+- **Gotcha candidate** — recurring cross-domain pattern not currently captured
+- **Diagnostic weakness** — existing diagnostic missed or over-weighted a signal
+- **Commitment failure** — risk was identified but no owned action happened
+- **Verdict calibration** — verdict was too soft or too severe for the evidence
+
+---
+
+## Output
+
+```markdown
+## AZIMUTH Outcome Review
+- Original verdict: ...
+- Actual outcome: ...
+- Calibration finding: ...
+- Repository update candidate: ...
+```
+
+---
+
+## Provenance
+
+Implements AZIMUTH's self-improvement loop: outcome evidence should feed future base-rate, gotcha, and commitment calibration rather than remain anecdotal.