Scope
US7 — emit OTel spans for the eval runner itself so regressions in eval metrics correlate with deployment/observability events in the same dashboard.
Priority: P2
Tasks
Acceptance
- Span names:
swink.eval.run_set → swink.eval.case → swink.eval.evaluator.
- Attributes include
eval_set.id, case.id, case.name, evaluator.name, prompt.version, score.value, score.threshold, verdict, duration_ms, session_id.
- Evaluator failure records OTel status-error + exception event rather than silently succeeding.
- Spans nest under any already-active parent span.
References
- Spec FR-035, US7 scenarios 1–4
- Research R-005
Depends on
#753 (runner).
Scope
US7 — emit OTel spans for the eval runner itself so regressions in eval metrics correlate with deployment/observability events in the same dashboard.
Priority: P2
Tasks
eval/tests/telemetry_test.rsusingopentelemetry-sdk::testing::trace::InMemorySpanExporter— span tree (run_set → case → evaluator), error-recording on failed case, parent-span inheritanceEvalsTelemetry+EvalsTelemetryBuilderineval/src/telemetry.rs(featuretelemetry)EvalsTelemetryintoEvalRunner::run_set— emit three-level span tree with standardized attributes (FR-035); honor existing parent spaneval/tests/us7_end_to_end_test.rs— full run produces expected span tree; failing case surfaces as errored span (US7 scenario 3)Acceptance
swink.eval.run_set→swink.eval.case→swink.eval.evaluator.eval_set.id,case.id,case.name,evaluator.name,prompt.version,score.value,score.threshold,verdict,duration_ms,session_id.References
Depends on
#753 (runner).