Two-layer coach answer: verifiable technical/evidence layer (read by grounding) + warm visible prose (shown to athlete), fail-closed split

## Problem this solves

Two of the coach's hard requirements pull in opposite directions today:

1. **Grounding needs numbers in the text.** The grounder binds citations by matching numeric spans in the answer against canonical data (GROUND-R2). No number in the prose → nothing to bind → `citations=0`.
2. **The voice rules push numbers OUT.** The coach prompt says "FIRST sentence never a number", "never write ctl/atl/...", "keep numbers in the background". A faithful model then writes warm, number-free prose — and grounding has nothing to verify.

The result (observed live, deepseek-v4-flash): a data-rich athlete asks about training load and gets confident prose with **zero cited numbers** — the engine's own `e2e_smoke` grounded-answer oracle only passes intermittently, and the false-`degraded` of #85 rides on the same collision. The single answer string is being asked to be BOTH the machine-verifiable artifact AND the gentle human message, and it cannot be both well.

## Proposed solution: separate the verifiable layer from the shown layer

The model emits a **technical/evidence layer** (numbers + reasoning, read by the grounder and every downstream checker) and a **visible layer** (warm prose for the athlete). The technical layer is stripped before the answer reaches the user.

Owner's framing — an inline tagged block:

```
<technical_proof>fitness is 5.7 (ctl, as_of 2026-06-15); fatigue 4.8 — basis for the "building steadily" read</technical_proof>
You've been building steadily — your fitness is climbing while fatigue stays manageable.
```

Everything inside `<technical_proof>` is consumed by grounding/binding/observability and **cut** at the athlete-facing surface.

## Design constraints (must be in the spec before code)

1. **Grounding covers BOTH layers.** Any number shown to the athlete in the visible prose must itself be grounded against canonical — the proof block does not get to "vouch" for an unverified visible number. A proof saying 42 while the visible text says 48 is a grounding failure, not a feature. The visible layer is what the athlete believes; it cannot be less verified than the technical one.
2. **Fail-closed stripping (VOICE-R2).** The technical block is full of exactly the internal jargon VOICE-R2 forbids in athlete output (ctl/atl/grounding/as_of/...). Stripping MUST be fail-closed: a malformed/unclosed/nested tag, or any residual tag-or-jargon after the cut, MUST scrub or refuse — never leak the raw technical layer. A hidden channel in LLM output is a classic leak vector.
3. **Structured, not in-band, is safer.** Raw in-text tags are fragile (model forgets to close, nests, mislabels, switches language). The robust form of the same idea is a **structured output** — e.g. `{visible_answer, evidence_claims[]}` — validated at the tool-call layer (STRUCT-R5 already mandates structured claim extraction). 2026 reasoning models already expose a separate reasoning channel and MODEL-R5a(b) already tells the engine to read the ANSWER, not the reasoning trace — so the infra for a two-layer answer is partly here. Tags-in-text is the owner's intuition; structured fields are its hardened implementation. (Spec discussion should pick one; structured is recommended.)

## Plan (order matters — spec first)

1. **Spec first.** Edit doc 50 where the coach prompt says "don't lead with a number / never write codes" → reframe to "numbers + reasoning go INSIDE the technical layer; the visible layer is warm prose", and teach the grounding/voice requirements (GROUND-R2/R3, VOICE-R2, STATUS-R1) how the technical layer is read and how the visible layer is still fully grounded and fail-closed-stripped. The graders/checkers must be spec'd to read the tags and know why they exist.
2. **Then code** to the amended spec: compose emits two layers; grounder verifies the visible layer's numbers against canonical using the technical layer as the candidate-claim source; a fail-closed stripper removes the technical layer at the API boundary.
3. **Fold in #85**: with the new contract, `terminal_status` couples to real grounded substance (visible, verified numbers), and the eval grounder fixtures become realistic (survivors only when claims actually ground), so the false-`degraded` disappears naturally.

## Acceptance

- `e2e_smoke` grounded-answer oracle reliably GREEN: a data-rich athlete gets warm prose AND verified numbers, `citations>0`, every shown number canonical-matched.
- No technical-layer text or internal jargon ever appears in athlete-facing output (fail-closed strip test).
- #85 false-`degraded` resolved as a consequence.

Raised by Opus 4.8 from the live onboarding loop; blocks #85.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two-layer coach answer: verifiable technical/evidence layer (read by grounding) + warm visible prose (shown to athlete), fail-closed split #87

Problem this solves

Proposed solution: separate the verifiable layer from the shown layer

Design constraints (must be in the spec before code)

Plan (order matters — spec first)

Acceptance

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Two-layer coach answer: verifiable technical/evidence layer (read by grounding) + warm visible prose (shown to athlete), fail-closed split #87

Description

Problem this solves

Proposed solution: separate the verifiable layer from the shown layer

Design constraints (must be in the spec before code)

Plan (order matters — spec first)

Acceptance

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions