Trace-native evaluation
-Built on OpenTelemetry traces so you can evaluate real production-like runs without replaying agent execution.
+No reruns
+Use recorded traces to evaluate real executions after the fact.
diff --git a/layouts/index.html b/layouts/index.html index 73a8820..67dd75c 100644 --- a/layouts/index.html +++ b/layouts/index.html @@ -1,141 +1,113 @@ - - -
- - -- AgentEvals is the open-source Python framework for scoring AI agent performance and behavior - from OpenTelemetry traces. Test prompts, tools, memory, and workflows without re-running your agents. -
-OpenTelemetry-native agent evaluation
++ agentevals turns OpenTelemetry traces into repeatable, rubric-based scores for tool use, + handoffs, planning, and other agent behaviors. +
+Traditional evals re-run entire workflows. AgentEvals scores the traces you already collect, so you can measure behavior in realistic conditions.
+Why agentevals
++ Score agents against consistent rubrics using OpenTelemetry traces rather than replaying runs. + Keep evaluations close to your production workflows and compare changes over time. +
Built on OpenTelemetry traces so you can evaluate real production-like runs without replaying agent execution.
+Use recorded traces to evaluate real executions after the fact.
Combine built-in evaluators with custom Python logic to measure correctness, tool usage, memory behavior, and more.
+Measure task completion, tool use quality, handoffs, latency, and more.
Run locally with the CLI, automate in CI/CD, or explore results visually in the web UI.
+Plug into existing observability pipelines instead of inventing a parallel eval stack.
How it works
+Instrument your agent with OpenTelemetry and emit traces for prompts, tool calls, memory operations, and outputs.
-Choose built-in evaluators or create your own to score the behaviors that matter for your agent.
-Score trace datasets through the CLI or web UI and compare results across prompts, models, or tool strategies.
-+ Run evaluations locally or in CI with config files and reproducible commands. +
+ Open the CLI guide → ++ Explore traces, inspect scores, and review rubric results in the browser. +
+ Open the Web guide → +Docs
+{{ .Description }}
-Install agentevals, run your first scoring pass, and inspect the output.
+ + +Connect agentevals with your existing tracing and observability stack.
+ + +Define your own scoring logic and tailor rubrics to your agents.
+ + +See how to inspect traces and scores with the browser-based interface.
+Use the CLI for fast, scriptable scoring or the Web UI for visual exploration of evaluation results.
-Run evaluations locally or in CI with straightforward commands and structured outputs.
-agentevals eval run config.yaml
- Inspect trace datasets, compare runs, and review evaluator outputs in a visual interface.
-agentevals ui
- Install AgentEvals, connect your traces, and start measuring how your agent behaves in the real world.
-