Skip to content

Add OTLP tracing support and documentation for evaluation runs#56

Merged
Dongbumlee merged 3 commits into
developfrom
feature/otlp-tracing
Apr 7, 2026
Merged

Add OTLP tracing support and documentation for evaluation runs#56
Dongbumlee merged 3 commits into
developfrom
feature/otlp-tracing

Conversation

@Dongbumlee
Copy link
Copy Markdown
Collaborator

This pull request introduces optional OpenTelemetry (OTLP) tracing for AgentOps evaluation runs, allowing users to emit detailed traces for evaluation pipelines, agent/model invocations, and evaluator results. The tracing is fully optional, incurs zero overhead when disabled, and is compatible with standard OTLP collectors (e.g., AI Toolkit, Azure Monitor, Jaeger). The implementation is careful to avoid unnecessary dependencies and performance impact when tracing is not enabled.

The most important changes are:

OTLP Telemetry Feature:

  • Added a new module utils/telemetry.py that implements lazy, optional OpenTelemetry tracing for evaluation runs. Tracing is activated only when the AGENTOPS_OTLP_ENDPOINT environment variable is set, and all OpenTelemetry imports are lazy to avoid unnecessary dependencies. The schema uses three semantic convention layers: CICD (cicd.pipeline.*), GenAI (gen_ai.*), and AgentOps-specific (agentops.eval.*).
  • Integrated telemetry hooks into the evaluation flow in services/runner.py. This includes initializing and shutting down tracing, creating root and item-level spans, and recording evaluator results as child spans with relevant attributes for pass/fail, scores, and thresholds. All telemetry calls are no-ops unless tracing is enabled. [1] [2] [3] [4] [5]

Documentation and Changelog:

  • Updated .github/copilot-instructions.md and AGENTS.md to document the new OTLP telemetry feature, including design rules, environment variable usage, and compatibility with various OTLP collectors. [1] [2] [3] [4]
  • Added a detailed entry to CHANGELOG.md under [Unreleased] describing the new optional OTLP tracing capability, its schema, and runtime behavior.

- Add utils/telemetry.py with lazy OTel imports and span context managers
- Instrument runner.py with three-layer schema (CICD + GenAI + agentops.eval)
- Root span per eval run, item spans per row, evaluator child spans
- Activated via AGENTOPS_OTLP_ENDPOINT env var (opt-in, zero overhead)
- Graceful no-op when opentelemetry-sdk is not installed
- 16 unit tests covering disabled, degraded, and enabled states

Refs: #14
TestSpanAttributesWhenEnabled requires opentelemetry to be installed
because the code paths import SpanKind/StatusCode when tracing is
enabled. Use pytest.importorskip to skip the class in CI where
opentelemetry is not a declared dependency.
@Dongbumlee Dongbumlee merged commit 5b5aa6e into develop Apr 7, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant