Epic: pluggable feasibility-model registry — falsify prescriptions per (sport, goal) instead of predicting outcomes

Derivative of #25. That issue establishes the type split (`NUMBER` vs `PRESCRIPTION`) and the near-term safety slice (decision-aware gate + no in-place rewrite of prescriptions). This issue is the research-gated **epic**: how we actually *verify* a prescription once we stop rewriting it.

The framing decided in #25 ([discussion](https://github.com/bepcyc/wattwise-core/issues/25#issuecomment-4704873260)): **falsify, don't predict.** The science says a single point-prediction simulator is not achievable — the fitness-fatigue model is statistically ill-conditioned with poor parameter identifiability and overfitting fatigue terms ([*Sci Rep* 2025](https://www.nature.com/articles/s41598-025-88153-7)), and Banister himself recommended re-fitting every 60–90 days. So the verifier's job is **refutation against a typed feasibility envelope**, with stated residual uncertainty — never a promise of outcome. This keeps the "model never self-certifies" invariant and breaks the recursion (a sound *critic* need not be as capable as the *planner* — [LLM-Modulo, Kambhampati et al., ICML 2024](https://arxiv.org/abs/2402.01817)).

## Goals

1. **Feasibility-model registry**, keyed by `(sport, goal_type)`, each model implementing one interface:
   ```
   check(prescription, canonical_state, goal) -> {feasible | infeasible | uncertain,
                                                  binding_critique, citation}
   ```
   - **Declared default per domain** + a **rule-based floor** when no fitted model applies.
   - Unknown / unsupported `(sport, goal_type)` → **abstain/degrade honestly** ("can't verify a plan for this yet"), never flatten to 7×CTL maintenance.
   - `GoalType` is a closed enum (`EVENT`, `TARGET_METRIC`, `DISTANCE`, `PROCESS`, `OTHER`, `domain/enums.py:339`); `sport` is the data-driven registry (`persistence/models/athlete.py`). The registry key is `(sport, goal_type)`.

2. **Banded endurance verifier** (first concrete model). Reuse the existing deterministic, seedable `AnalyticsService.pmc(...)` (`analytics/pmc.py`, `service.py:315`) as the forward integrator. Simulate CTL/ATL/TSB across a **plausible parameter band**, not a point estimate; `CONTRADICTED` only when infeasible **across the whole band**; in-band ambiguity → `DEGRADE` with uncertainty stated.

3. **Safety envelope as review flags, not injury predictions** — and explicitly **NOT ACWR**. ACWR is discredited as a causal/predictive injury metric ([Impellizzeri et al. 2020, PMID 32502973](https://pubmed.ncbi.nlm.nih.gov/32502973/); [2024 review](https://www.mdpi.com/2076-3417/14/11/4449)); the persisted `daily_wellness.acwr` column (`persistence/models/wellness.py:126`, read by nothing) stays orphaned. Use instead:
   - bounded weekly ramp rate (coaching guardrail);
   - Foster monotony & strain ([Foster 1998, MSSE, PMID 9662690](https://pubmed.ncbi.nlm.nih.gov/9662690/));
   - internal consistency (weekly total = Σ prescribed sessions — subsumes 7×CTL as one feasible point);
   - timeline coherence (room for a taper before `target_date`).

4. **Intensity anchoring** off canonical thresholds — we already have `CRITICAL_POWER_W`, `W_PRIME_J`, `FitnessSignature.ftp_w`. Verify per-session targets as deterministic functions of CP/FTP (zone derivations), extending the closed derivation table the 7×CTL precedent legitimized.

5. **Strength / hypertrophy is a different verifier, not the same simulator.** It is *not* an impulse-response phenomenon — model it as volume landmarks (MEV→MAV→MRV) + progressive overload per the [ACSM progression-model position stand 2009](https://tourniquets.org/wp-content/uploads/PDFs/ACSM-Progression-models-in-resistance-training-for-healthy-adults-2009.pdf) and the RP volume-landmark framework. This is the clearest reason the registry must support **heterogeneous** models behind one interface rather than one universal simulator.

6. **Close the generate-test loop.** Per-claim grounding critiques are computed in `ground` and discarded before `compose` (`graph_state.py` `render_context`), so REGENERATE is a blind re-roll that re-trips the same wall. Render machine-readable `binding_critique`s back into the redraft context — turns the existing loop into a real LLM-Modulo loop with no graph-topology change. A natural extension: deterministic optimal-control over the FF model proposes a *feasible load scaffold* the LLM only narrates ([Busso & Thomas 2006](https://www.nature.com/articles/srep40422) lineage).

7. **New citation kind `feasibility_envelope`** (preferred over `simulated_trajectory`) — records `model_id`, `parameter_band`, `binding_critique`; reproducible from `(canonical_state, prescription, model_id, band)`. Add alongside existing kinds `metric` / `user_request` / `name` / `url`.

## Coordination with #10

#10's entailment verifier makes prescriptions **worse** if applied naively — nothing future is entailed by a record of the past, so a stronger *descriptive* verifier scrubs progressive plans harder. #10 and this epic must share the `PRESCRIPTION` claim type as the seam so they compose. Prescriptions need *different* verification semantics, not more descriptive rigor.

## Verification principle

Verify **adherence to coaching principles** (progressive overload, recovery/supercompensation, specificity, timeline) — **not predicted outcomes**. A taper week is feasible *because* the recovery principle demands it before an event, not "wrong because it is below 7×CTL."

## Eval (plan goldens the current suites can't express)

- A build-plan case asserting simulation-feasible progressive targets **survive** grounding.
- A taper case asserting a prescribed load is **never rewritten upward**.
- A strength-goal case asserting the agent **degrades honestly** when no fitted model applies, rather than flattening to maintenance.
- An ambiguity case asserting in-band uncertainty yields `DEGRADE` (stated), not `CONTRADICTED`.

## Spec-first

Define the `PRESCRIPTION` claim type, the registry interface, and the `feasibility_envelope` citation contract in the grounding + forward-model specs **before** code, so the new citation kind is normative and the agent can cite it.

## Open research questions (carried from #25 debate)

- Which parameter priors/bands per sport, and how to narrow them from sparse individual history without overfitting (the identifiability trap).
- Minimum viable model set for v1 (proposal: endurance-event default + rule-based floor; strength = explicit "not yet verifiable").
- How to surface residual uncertainty to the athlete in the approval surface without eroding trust.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: pluggable feasibility-model registry — falsify prescriptions per (sport, goal) instead of predicting outcomes #78

Goals

Coordination with #10

Verification principle

Eval (plan goldens the current suites can't express)

Spec-first

Open research questions (carried from #25 debate)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Epic: pluggable feasibility-model registry — falsify prescriptions per (sport, goal) instead of predicting outcomes #78

Description

Goals

Coordination with #10

Verification principle

Eval (plan goldens the current suites can't express)

Spec-first

Open research questions (carried from #25 debate)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions