Observability

Each agent run produces two artifacts, joined by run_id:

Trace — an OpenTelemetry span with metadata and aggregate metrics (runner, model, IDs, latency via span duration, token totals, turn and tool-call counts). Written as canonical OTLP-JSON and recognised by every OTEL-aware backend (Jaeger, Tempo, Langfuse, Grafana Cloud AI, Honeycomb).
Trajectory — a per-run JSON file with per-turn content: turn text, tool call inputs / outputs, and (for SDK runners) per-turn token usage. Written directly by the agent runner alongside the trace.

Spans and trajectories complement each other without duplicating content: the span holds everything an observability UI needs to summarise or bill a run, the trajectory holds the raw per-turn data needed for offline evaluation. Aggregate numbers (totals) live on the span; per-turn numbers (from which the totals are derived) live on the trajectory. Nothing is repeated.

Root span attributes

Metadata + aggregate metrics — always written when tracing is enabled:

"SDK runners" below means claude-agent, openai-agent, deep-agent (which all expose turn/tool-call bookkeeping); plan-execute's loop is step-shaped and surfaces different attributes.

Attribute	Runner coverage	Notes
`agent.runner`	all	`plan-execute` / `claude-agent` / …
`gen_ai.system`	all	Provider family (anthropic, openai…)
`gen_ai.request.model`	all	Full model ID
`gen_ai.usage.input_tokens`	all	Sum across the run
`gen_ai.usage.output_tokens`	all	Sum across the run
`agent.question.length`	all	Character length of the question
`agent.answer.length`	all	Character length of the final answer
`agent.duration_ms`	all	Wall-clock of `run()`
`agent.run_id`	all	`--run-id` or auto-generated UUID4
`agent.scenario_id`	all	`--scenario-id` (omitted if unset)
`agent.turns`	SDK runners	Number of turns
`agent.tool_calls`	SDK runners	Total tool calls
`agent.llm_time_ms`	plan-execute	Planning + summarisation LLM time
`agent.planning_time_ms`	plan-execute	`Planner.generate_plan` wall-clock
`agent.summarization_time_ms`	plan-execute	Final summarise-LLM wall-clock
`agent.plan.steps`	plan-execute	Number of generated plan steps

For plan-execute, gen_ai.usage.* is the run-wide sum across planning, per-step arg-resolution, and summarisation LLM calls (provided the backend reports usage — LiteLLMBackend does; mocks return zero). Turn and tool-call counts have no clean mapping to the step-shaped loop and are not surfaced; per-step wall-clock lives on each StepResult.duration_ms in the trajectory.

Per-tool timing is not captured for the three SDK runners — the PreToolUse hook that claude-agent needed broke compatibility with some @anthropic-ai/claude-code CLI versions, and the openai / deep SDKs do not expose clean per-tool callback surfaces either. Follow-up when needed.

Plus automatic child spans from the HTTPXClientInstrumentor — one per outbound HTTP request to the LiteLLM proxy (URL, status, latency). The root span's own duration = agent wall-clock, so agent.duration_ms is redundant for OTEL UIs but convenient for jq on the JSONL file.

Trajectory file layout

When AGENT_TRAJECTORY_DIR is set, each runner writes {AGENT_TRAJECTORY_DIR}/{run_id}.json. The trajectory field's shape depends on the runner.

Field	claude-agent	openai-agent	deep-agent	plan-execute
`Trajectory.started_at`	✓	✓	✓	(n/a)
`TurnRecord.duration_ms`	✓	✗	✗	(n/a)
`ToolCall.duration_ms`	✗	✗	✗	(n/a)
`StepResult.duration_ms`	(n/a)	(n/a)	(n/a)	✓

plan-execute's trajectory is a list of StepResult records instead of turns, each carrying its own duration_ms populated by the executor.

Enabling persistence

Install the optional tracing deps (trajectories need no extra deps):

uv sync --group otel

Each artifact has its own env var; set either, both, or neither:

Env var	Effect
`AGENT_TRAJECTORY_DIR`	Directory for `{run_id}.json` trajectory records.
`OTEL_TRACES_FILE`	Append OTLP-JSON lines to this path (in-process).
`OTEL_EXPORTER_OTLP_ENDPOINT`	Ship spans over HTTP to a live collector endpoint.

When none are set, runs work normally with zero persistence overhead.

Recommended: save both traces and trajectories

AGENT_TRAJECTORY_DIR=./traces/trajectories \
OTEL_TRACES_FILE=./traces/traces.jsonl \
  uv run deep-agent --run-id bench-001 --scenario-id 304 \
  "Calculate bearing characteristic frequencies for a 6205 bearing at 1800 RPM."

Each span batch appends one JSON line to ./traces/traces.jsonl in canonical OTLP-JSON format — the same format the OpenTelemetry Collector's file exporter produces, and ingestible by the Collector's otlpjsonfile receiver later if you want to replay into a live backend.

Query with `jq`

For metadata + aggregate metrics (run_id, runner, model, token totals, latency) read the trace alone — token totals are on the span:

jq -c '.resourceSpans[].scopeSpans[].spans[]
       | select(.name | startswith("agent.run"))
       | {
           run_id: (.attributes[] | select(.key == "agent.run_id") | .value.stringValue),
           runner: (.attributes[] | select(.key == "agent.runner") | .value.stringValue),
           model: (.attributes[] | select(.key == "gen_ai.request.model") | .value.stringValue),
           input_tokens: (.attributes[] | select(.key == "gen_ai.usage.input_tokens") | .value.intValue),
           output_tokens: (.attributes[] | select(.key == "gen_ai.usage.output_tokens") | .value.intValue),
           turns: (.attributes[] | select(.key == "agent.turns") | .value.intValue),
         }' traces/traces.jsonl

For per-turn content (text, tool call inputs/outputs, per-turn tokens) read the matching trajectory file:

jq '.trajectory.turns[] | {index, input_tokens, tool_calls: [.tool_calls[].name]}' \
   traces/trajectories/bench-001.json

Rotation

The built-in file exporter appends indefinitely — one line per span batch is small, but long-running benchmarks can grow. For rotation, pipe the path through logrotate, or split runs across dated files:

OTEL_TRACES_FILE="./traces/$(date +%F).jsonl" uv run deep-agent "..."

Replaying saved traces into a live backend (optional)

If you later want to visualize persisted traces, point any OpenTelemetry Collector at the file with its otlpjsonfile receiver:

receivers:
  otlpjsonfile:
    include: ["traces/traces.jsonl"]
exporters:
  otlp:
    endpoint: jaeger:4317
    tls: {insecure: true}
service:
  pipelines:
    traces:
      receivers: [otlpjsonfile]
      exporters: [otlp]

Live debugging with Jaeger (optional, Docker)

When network access to Docker Hub is available, Jaeger all-in-one is the quickest way to inspect traces in a UI:

docker run -d --rm --name jaeger \
  -p 16686:16686 -p 4318:4318 \
  jaegertracing/all-in-one

OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
OTEL_TRACES_FILE=./traces/traces.jsonl \
  uv run deep-agent --run-id demo "$query"

open http://localhost:16686   # macOS

With both env vars set, spans go to disk and to Jaeger simultaneously. Jaeger all-in-one is in-memory only; the file stays on disk when the container exits.

Troubleshooting

"OTEL SDK not installed; tracing disabled" — run uv sync --group otel.

No output file on disk — tracing is lazy; at least one runner has to complete a run() call before the BatchSpanProcessor flushes. For small smoke tests, make sure the CLI exits cleanly (the atexit hook flushes any buffered spans).

Spans exist but agent.run_id is missing — you called runner.run() programmatically without going through a CLI. Seed it yourself:

from observability import init_tracing, set_run_context
init_tracing("my-harness")
set_run_context(run_id="...", scenario_id="...")
await runner.run(question)

No trajectory file in AGENT_TRAJECTORY_DIR — the runner skips persistence when no run_id is set. Use the CLI (which seeds a UUID4 automatically), or call set_run_context(run_id=...) before invoking the runner programmatically.

Exporter silently failing — set OTEL_LOG_LEVEL=debug for the SDK's internal logs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Observability

Root span attributes

Trajectory file layout

Enabling persistence

Recommended: save both traces and trajectories

Query with `jq`

Rotation

Replaying saved traces into a live backend (optional)

Live debugging with Jaeger (optional, Docker)

Troubleshooting

FilesExpand file tree

observability.md

Latest commit

History

observability.md

File metadata and controls

Observability

Root span attributes

Trajectory file layout

Enabling persistence

Recommended: save both traces and trajectories

Query with jq

Rotation

Replaying saved traces into a live backend (optional)

Live debugging with Jaeger (optional, Docker)

Troubleshooting

Query with `jq`