Skip to content

Teach the model to author <technical_proof> claims so per-ride TSS (and all metrics) ground via the evidence layer (#87 slice) #105

@bepcyc

Description

@bepcyc

Per-ride TSS does not ground end-to-end yet: the model is never taught to author <technical_proof> claims (the evidence-layer authoring slice of #87)

Found by the independent Opus review of #95 (PR for feat/issue-95-per-ride-tss-factsheet). #95 correctly delivers its scoped half — the compose fact sheet now NAMES the canonical activity_id and renders the per-ride load figures under the canonical code activity_tss (COMPOSE-R1a). But surfacing the id is necessary, not sufficient for a per-ride TSS claim to actually ground.

The gap (verified by reading the code, not assumed)

  1. The model is never instructed to write <technical_proof> tags. Every reference to technical_proof in src/ is a PARSE or STRIP regex (compose_contracts.parse_tagged_answer, voice._strip_technical_proof). The compose authoring prompt (config/defaults.toml system_prompt, lines ~413-448) teaches plain coach prose only — no tag syntax, no (metric, as_of <date>) / (activity_tss, activity <id>) parenthetical form.
  2. So the evidence layer relies on the fallback extractor. Per graph_state.evidence_claims ("falls back to draft extraction … the transitional bridge until compose always populates it"), when the model emits no tags the grounder extracts candidate claims from the visible draft via the claim_system LLM extractor (defaults.toml:520).
  3. The fallback extractor cannot capture a per-ride ref. It copies "the metric label exactly as the SENTENCE states it" + an ISO date only when the sentence states one. The visible coach sentence says "that ride", never the canonical activity_id — so the extracted claim has ref=None.
  4. activity_tss with ref=None is scrubbed. CanonicalEvidence._activity_tss(None)None (capabilities_evidence.py), and the Per-ride TSS (#47) ref can be overwritten by a sentence date in the ENFORCE binding guard #99 binding guard EXEMPTS activity_tss from date-rebind (grounding_binding.py) so there is no sentence-level recovery of the id.

Net: a per-ride TSS claim grounds ONLY if the model emits <technical_proof>… (activity_tss, activity <id>)</technical_proof> — the only form _ACTIVITY_REF_RE extracts into claim.ref (compose_contracts.py). Nothing teaches that form (or tag authoring at all).

Why this is #87, not #95

Teaching the model to author the two-layer answer's <technical_proof> evidence is a cross-cutting concern over EVERY metric (ctl/atl/tsb/tss/cp/…), not per-ride TSS alone. It is the "compose always populates the evidence layer" slice the evidence_claims docstring names as the remaining bridge — i.e. the #87 evidence-layer-population / reveal slice. #95 (fact-sheet surfacing) is a prerequisite that is now in place.

Scope of the fix

  • Teach <technical_proof> authoring in the compose system_prompt (or a dedicated loaded fragment, CFG-R1a): the inline two-layer form, the (canonical_metric, as_of <ISO>) paren for dated metrics, AND the (activity_tss, activity <id>) paren for per-ride claims, with one example each. The fact sheet already gives the model the id (Surface gathered activity id into the compose fact sheet so the model can author per-ride TSS claims (follow-up to #47) #95).
  • Extend the spec (COMPOSE-R3) to require the authoring instruction, keeping spec↔code correspondence.
  • Add an END-TO-END test through the real model seam: a per-ride TSS question → the model authors a parseable (activity_tss, activity <id>) claim → it grounds against svc.coggan(activity_id).value.tss.

Acceptance

A per-ride TSS question yields a coach answer whose stated TSS is a grounded survivor (cited), resolved by activity_id — not scrubbed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions