You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Per-ride TSS does not ground end-to-end yet: the model is never taught to author <technical_proof> claims (the evidence-layer authoring slice of #87)
Found by the independent Opus review of #95 (PR for feat/issue-95-per-ride-tss-factsheet). #95 correctly delivers its scoped half — the compose fact sheet now NAMES the canonical activity_id and renders the per-ride load figures under the canonical code activity_tss (COMPOSE-R1a). But surfacing the id is necessary, not sufficient for a per-ride TSS claim to actually ground.
The gap (verified by reading the code, not assumed)
The model is never instructed to write <technical_proof> tags. Every reference to technical_proof in src/ is a PARSE or STRIP regex (compose_contracts.parse_tagged_answer, voice._strip_technical_proof). The compose authoring prompt (config/defaults.tomlsystem_prompt, lines ~413-448) teaches plain coach prose only — no tag syntax, no (metric, as_of <date>) / (activity_tss, activity <id>) parenthetical form.
So the evidence layer relies on the fallback extractor. Per graph_state.evidence_claims ("falls back to draft extraction … the transitional bridge until compose always populates it"), when the model emits no tags the grounder extracts candidate claims from the visible draft via the claim_system LLM extractor (defaults.toml:520).
The fallback extractor cannot capture a per-ride ref. It copies "the metric label exactly as the SENTENCE states it" + an ISO date only when the sentence states one. The visible coach sentence says "that ride", never the canonical activity_id — so the extracted claim has ref=None.
Net: a per-ride TSS claim grounds ONLY if the model emits <technical_proof>… (activity_tss, activity <id>)</technical_proof> — the only form _ACTIVITY_REF_RE extracts into claim.ref (compose_contracts.py). Nothing teaches that form (or tag authoring at all).
Teaching the model to author the two-layer answer's <technical_proof> evidence is a cross-cutting concern over EVERY metric (ctl/atl/tsb/tss/cp/…), not per-ride TSS alone. It is the "compose always populates the evidence layer" slice the evidence_claims docstring names as the remaining bridge — i.e. the #87 evidence-layer-population / reveal slice. #95 (fact-sheet surfacing) is a prerequisite that is now in place.
Extend the spec (COMPOSE-R3) to require the authoring instruction, keeping spec↔code correspondence.
Add an END-TO-END test through the real model seam: a per-ride TSS question → the model authors a parseable (activity_tss, activity <id>) claim → it grounds against svc.coggan(activity_id).value.tss.
Acceptance
A per-ride TSS question yields a coach answer whose stated TSS is a grounded survivor (cited), resolved by activity_id — not scrubbed.
Per-ride TSS does not ground end-to-end yet: the model is never taught to author
<technical_proof>claims (the evidence-layer authoring slice of #87)Found by the independent Opus review of #95 (PR for
feat/issue-95-per-ride-tss-factsheet). #95 correctly delivers its scoped half — the compose fact sheet now NAMES the canonicalactivity_idand renders the per-ride load figures under the canonical codeactivity_tss(COMPOSE-R1a). But surfacing the id is necessary, not sufficient for a per-ride TSS claim to actually ground.The gap (verified by reading the code, not assumed)
<technical_proof>tags. Every reference totechnical_proofinsrc/is a PARSE or STRIP regex (compose_contracts.parse_tagged_answer,voice._strip_technical_proof). The compose authoring prompt (config/defaults.tomlsystem_prompt, lines ~413-448) teaches plain coach prose only — no tag syntax, no(metric, as_of <date>)/(activity_tss, activity <id>)parenthetical form.graph_state.evidence_claims("falls back to draft extraction … the transitional bridge until compose always populates it"), when the model emits no tags the grounder extracts candidate claims from the visible draft via theclaim_systemLLM extractor (defaults.toml:520).activity_id— so the extracted claim hasref=None.activity_tsswithref=Noneis scrubbed.CanonicalEvidence._activity_tss(None)→None(capabilities_evidence.py), and the Per-ride TSS (#47) ref can be overwritten by a sentence date in the ENFORCE binding guard #99 binding guard EXEMPTSactivity_tssfrom date-rebind (grounding_binding.py) so there is no sentence-level recovery of the id.Net: a per-ride TSS claim grounds ONLY if the model emits
<technical_proof>… (activity_tss, activity <id>)</technical_proof>— the only form_ACTIVITY_REF_REextracts intoclaim.ref(compose_contracts.py). Nothing teaches that form (or tag authoring at all).Why this is #87, not #95
Teaching the model to author the two-layer answer's
<technical_proof>evidence is a cross-cutting concern over EVERY metric (ctl/atl/tsb/tss/cp/…), not per-ride TSS alone. It is the "compose always populates the evidence layer" slice theevidence_claimsdocstring names as the remaining bridge — i.e. the #87 evidence-layer-population / reveal slice. #95 (fact-sheet surfacing) is a prerequisite that is now in place.Scope of the fix
<technical_proof>authoring in the composesystem_prompt(or a dedicated loaded fragment, CFG-R1a): the inline two-layer form, the(canonical_metric, as_of <ISO>)paren for dated metrics, AND the(activity_tss, activity <id>)paren for per-ride claims, with one example each. The fact sheet already gives the model the id (Surface gathered activity id into the compose fact sheet so the model can author per-ride TSS claims (follow-up to #47) #95).(activity_tss, activity <id>)claim → it grounds againstsvc.coggan(activity_id).value.tss.Acceptance
A per-ride TSS question yields a coach answer whose stated TSS is a grounded survivor (cited), resolved by activity_id — not scrubbed.