Synthesize Training Intensity Distribution (TID): time-in-zone + Polarization Index — the engine knows how MUCH you train but is blind to the easy/hard MIX, the single most actionable lever in endurance science

## The one-line gap

WattWise measures **how much** you train (TSS, CTL/ATL/TSB) and **how hard a single session was** (`intensity_class`, IF, `hr_load_zonal`). It has **no number for how your training time is split across easy / moderate / hard** over a week or a block. That split — the *Training Intensity Distribution* (TID) — is the most-studied and most-coachable lever in endurance science, and right now the coach literally cannot answer the most common coaching question on earth: *"Is my easy/hard balance right?"*

## Why this is the critical missing metric (not just another nice-to-have)

Training load is **intensity x duration collapsed into one scalar**. That collapse is the whole point of TSS — and also its blind spot. Two athletes with an **identical CTL of 70** can be on opposite ends of adaptation and injury risk:

- Athlete A: 80% easy aerobic, 20% hard intervals — *polarized*, the distribution that repeatedly wins for VO2max and highly-trained performance.
- Athlete B: most sessions in the moderate "grey zone" just under threshold — *the single most common mistake in amateur endurance training* ("black-hole" / junk-miles training): too hard to recover from, too easy to drive top-end adaptation.

Same load. Same PMC chart. Same form. **Completely different coaching verdict.** Today WattWise renders A and B as indistinguishable. The PMC tells you the *size* of the stimulus; TID tells you its *quality* — and quality is where the real advice lives.

This also unlocks **forecasting that works**: the research is now specific about *which* distribution produces *which* adaptation (polarized: best VO2max gains, esp. in shorter blocks and trained athletes; pyramidal: most effective for many runners and base phases; ~75–80% low-intensity + ~15–20% high-intensity as the productive band). With TID measured, the coach can say *"to peak your VO2max for the event in 8 weeks, your current threshold-heavy mix should shift polarized"* — a real, testable prescription, instead of only describing past load.

## The metric set (one logical bundle)

A rolling-window distribution, computed per sport, anchored to the athlete's existing thresholds:

1. **Time-in-zone fractions** over a window (7/28-day, and per block) using the standard **3-zone physiological model**:
   - **Z1** — below the first threshold (LT1/VT1 / aerobic threshold)
   - **Z2** — between the two thresholds ("threshold / grey zone")
   - **Z3** — above the second threshold (LT2/VT2 / FTP / CP)
2. **`polarization_index`** — Treff et al. (2019): `PI = log10( (Z1 / Z2) x Z3 x 100 )`, where a TID is **polarized when PI > 2.00**; higher = more polarized. One honest number that integrates all three zones.
3. **`lit_hit_ratio`** — the low-intensity vs high-intensity split (the "80/20" coaches actually talk about).
4. **`tid_archetype`** — a typed classification: `polarized` / `pyramidal` / `threshold` / `grey_zone`, with the math, not a guess. (Polarized condition, Treff: `0 <= z2 < z3 < z1`.)

## Why it is derivable from data we already store — today

This is not a new sensor; it is a synthesis of streams already in the record, and most of the machinery exists:

- `hr_load_zonal` **already bins per-second HR samples against athlete zone boundaries** (`analytics/trimp.py`) — time-in-zone is the same binning, kept as durations instead of collapsed into a weighted scalar.
- `intensity_class` **already cut-points IF** against fixed thresholds (`analytics/np_if_tss.py`).
- The fitness signature **already stores the anchors**: `cp_w` / `ftp_w`, `threshold_hr_bpm`, `max_hr_bpm`, zone boundaries + weights are already a first-class concept.
- **Sport-agnostic by construction**, matching the existing applicability rules: power zones for **cycling**, pace **or** HR zones for **running**, HR for **swimming/rowing/xc-ski**, and — degrading honestly — an RPE/zone fallback for **strength** and sensor-less sessions (reusing the `srpe_load` path from #27). Pace/HR three-zone TID is exactly how the running and skiing literature reports it, so we stay comparable to the papers.

## Fits the project's honesty contract

- **No anchors, no number.** TID requires zone anchors (CP/FTP, or LTHR/threshold-HR, or threshold pace). When the athlete has none, return *Unavailable* with the typed reason — never a fabricated split.
- **Fidelity tag.** A power-derived TID is `raw_stream`; an HR-derived one is lower fidelity; an RPE-derived one is `SUBSTITUTED`. Same fidelity ladder the rest of the engine already uses.
- **Groundable & citable.** Each fraction / index is a deterministic function of (streams, zone anchors), so it earns a normal citation — the grounder can verify a coach claim like *"68% of last month was Z1"* against the record, instead of scrubbing it as an unknown metric.

## Distinct from the existing record (checked)

- **#26 durability / fatigue-resistance** — *how power decays within a long effort* (the fatigue axis). TID is *how intensity is distributed across sessions*. Orthogonal axes.
- **#27 sRPE / perception:output** — adds a load *source* for sensor-less work; TID consumes it as one optional input but is about distribution, not load magnitude.
- **#25 prescription-by-simulation** — a different *kind* of thing, and **not a dependency in either direction** (this caused confusion on first read; see [the note on #25](https://github.com/bepcyc/wattwise-core/issues/25#issuecomment-4704884879)). #25 is a verification *mechanism* for *future plans* — it grounds a prescription by forward-simulating it, instead of "correcting" it against the past. #76 is a *descriptive metric* of the easy/hard mix you have *already done*. They **compose but ship independently**: #76 would give #25's simulation envelope one more thing to check a plan against — extending its per-session "intensity anchoring" leg to a whole-*block* distribution target (e.g. a build block grounds only if its prescribed sessions add up to a polarized PI). #25 is not blocked on #76; #76 is not part of #25.
- **#47 per-ride TSS** — a single session's load scalar. TID is a multi-session distribution.
- **`hr_load_zonal` / `intensity_class`** — a zone-*weighted load scalar* and a per-activity *label*. Neither exposes the distribution, a window-level index, or an archetype.

No prior open or closed issue proposes a time-in-zone distribution, a polarization index, or a TID archetype.

## Proposed slice (no code in this issue)

1. `AnalyticsService.intensity_distribution(athlete, sport, window)` -> time-in-zone fractions + `polarization_index` + `lit_hit_ratio` + `tid_archetype`, same purity / fail-closed envelope as siblings, reusing the `hr_load_zonal` binning and the signature's zone anchors.
2. `MetricName` members + capability resolvers + metric aliases ("intensity distribution", "easy/hard split", "polarization", "80/20") so retrieval can name them and grounding can cite them.
3. `docs/METRICS.md` entries (definition, formula, honest ranges with sources, when Unavailable).
4. Golden + property tests at the service seam (a known polarized vs grey-zone fixture must classify correctly; PI math must match Treff worked examples; fail-closed without anchors).

## Selected sources

- Treff et al. (2019), *The Polarization-Index*, Front. Physiol. — https://pmc.ncbi.nlm.nih.gov/articles/PMC6582670/ (PI formula, PI > 2.00 cutoff)
- Comparison of Polarized vs other TID, systematic review + meta-analysis, *Sports Medicine* (2024) — https://link.springer.com/article/10.1007/s40279-024-02034-z
- Polarized TID effect on VO2max and work economy, systematic review (2024) — https://pmc.ncbi.nlm.nih.gov/articles/PMC11679080/
- Recent advances in TID theory for cyclic endurance sports (2025), Front. Physiol. — https://www.frontiersin.org/journals/physiology/articles/10.3389/fphys.2025.1657892/full

---

@bepcyc — flagging this as the highest-leverage metric gap I can find, and I'd value your read before anyone builds it. Two calls I'd want your opinion on: **(a)** scope — ship the 3-zone TID + PI first and treat per-session "black-hole" detection as a follow-up, or build both together? **(b)** zone model — anchor zones to CP/FTP and threshold-HR we already store (pragmatic, available today), or hold out for a proper dual-threshold (LT1/LT2) model where data allows? *(Audit + proposal only — no code changed for this issue.)*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synthesize Training Intensity Distribution (TID): time-in-zone + Polarization Index — the engine knows how MUCH you train but is blind to the easy/hard MIX, the single most actionable lever in endurance science #76

The one-line gap

Why this is the critical missing metric (not just another nice-to-have)

The metric set (one logical bundle)

Why it is derivable from data we already store — today

Fits the project's honesty contract

Distinct from the existing record (checked)

Proposed slice (no code in this issue)

Selected sources

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Synthesize Training Intensity Distribution (TID): time-in-zone + Polarization Index — the engine knows how MUCH you train but is blind to the easy/hard MIX, the single most actionable lever in endurance science #76

Description

The one-line gap

Why this is the critical missing metric (not just another nice-to-have)

The metric set (one logical bundle)

Why it is derivable from data we already store — today

Fits the project's honesty contract

Distinct from the existing record (checked)

Proposed slice (no code in this issue)

Selected sources

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions