Add OTLP tracing support and documentation for evaluation runs#56
Merged
Conversation
- Add utils/telemetry.py with lazy OTel imports and span context managers - Instrument runner.py with three-layer schema (CICD + GenAI + agentops.eval) - Root span per eval run, item spans per row, evaluator child spans - Activated via AGENTOPS_OTLP_ENDPOINT env var (opt-in, zero overhead) - Graceful no-op when opentelemetry-sdk is not installed - 16 unit tests covering disabled, degraded, and enabled states Refs: #14
TestSpanAttributesWhenEnabled requires opentelemetry to be installed because the code paths import SpanKind/StatusCode when tracing is enabled. Use pytest.importorskip to skip the class in CI where opentelemetry is not a declared dependency.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces optional OpenTelemetry (OTLP) tracing for AgentOps evaluation runs, allowing users to emit detailed traces for evaluation pipelines, agent/model invocations, and evaluator results. The tracing is fully optional, incurs zero overhead when disabled, and is compatible with standard OTLP collectors (e.g., AI Toolkit, Azure Monitor, Jaeger). The implementation is careful to avoid unnecessary dependencies and performance impact when tracing is not enabled.
The most important changes are:
OTLP Telemetry Feature:
utils/telemetry.pythat implements lazy, optional OpenTelemetry tracing for evaluation runs. Tracing is activated only when theAGENTOPS_OTLP_ENDPOINTenvironment variable is set, and all OpenTelemetry imports are lazy to avoid unnecessary dependencies. The schema uses three semantic convention layers: CICD (cicd.pipeline.*), GenAI (gen_ai.*), and AgentOps-specific (agentops.eval.*).services/runner.py. This includes initializing and shutting down tracing, creating root and item-level spans, and recording evaluator results as child spans with relevant attributes for pass/fail, scores, and thresholds. All telemetry calls are no-ops unless tracing is enabled. [1] [2] [3] [4] [5]Documentation and Changelog:
.github/copilot-instructions.mdandAGENTS.mdto document the new OTLP telemetry feature, including design rules, environment variable usage, and compatibility with various OTLP collectors. [1] [2] [3] [4]CHANGELOG.mdunder[Unreleased]describing the new optional OTLP tracing capability, its schema, and runtime behavior.