Skip to content

[043][Phase 9][US7] EvalsTelemetry + runner span emission #757

@jwesleye

Description

@jwesleye

Scope

US7 — emit OTel spans for the eval runner itself so regressions in eval metrics correlate with deployment/observability events in the same dashboard.

Priority: P2

Tasks

  • T135 [P] [US7] Tests in eval/tests/telemetry_test.rs using opentelemetry-sdk::testing::trace::InMemorySpanExporter — span tree (run_set → case → evaluator), error-recording on failed case, parent-span inheritance
  • T136 [US7] EvalsTelemetry + EvalsTelemetryBuilder in eval/src/telemetry.rs (feature telemetry)
  • T137 [US7] Wire EvalsTelemetry into EvalRunner::run_set — emit three-level span tree with standardized attributes (FR-035); honor existing parent span
  • T138 [US7] eval/tests/us7_end_to_end_test.rs — full run produces expected span tree; failing case surfaces as errored span (US7 scenario 3)

Acceptance

  • Span names: swink.eval.run_setswink.eval.caseswink.eval.evaluator.
  • Attributes include eval_set.id, case.id, case.name, evaluator.name, prompt.version, score.value, score.threshold, verdict, duration_ms, session_id.
  • Evaluator failure records OTel status-error + exception event rather than silently succeeding.
  • Spans nest under any already-active parent span.

References

  • Spec FR-035, US7 scenarios 1–4
  • Research R-005

Depends on

#753 (runner).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestevalspecSpec-driven implementation task

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions