From 580dd3d759cc3f900aaaaa0cbb2493229f42b00e Mon Sep 17 00:00:00 2001 From: Sebastian Maniak Date: Sun, 22 Mar 2026 15:57:20 -0400 Subject: [PATCH] docs: standardize landing page interface wording --- layouts/index.html | 210 ++++++++++++++++++++------------------------- 1 file changed, 91 insertions(+), 119 deletions(-) diff --git a/layouts/index.html b/layouts/index.html index 73a8820..67dd75c 100644 --- a/layouts/index.html +++ b/layouts/index.html @@ -1,141 +1,113 @@ - - - - - - {{ .Site.Title }} - {{ partial "head.html" . }} - - -
- -
-
-
- - Open source • Python SDK • OpenTelemetry native -
-

Score your AI agent behavior from traces.

-

- AgentEvals is the open-source Python framework for scoring AI agent performance and behavior - from OpenTelemetry traces. Test prompts, tools, memory, and workflows without re-running your agents. -

- -
- CLI - Custom Evaluators - Web UI - CI/CD +{{ define "main" }} +
+
+
+

OpenTelemetry-native agent evaluation

+

Score AI agents from traces — no reruns required

+

+ agentevals turns OpenTelemetry traces into repeatable, rubric-based scores for tool use, + handoffs, planning, and other agent behaviors. +

+
+
-
+ -
-
+
+
- -

Evaluation that matches how agents actually run.

-

Traditional evals re-run entire workflows. AgentEvals scores the traces you already collect, so you can measure behavior in realistic conditions.

+

Why agentevals

+

Evaluate behavior from the telemetry you already collect

+

+ Score agents against consistent rubrics using OpenTelemetry traces rather than replaying runs. + Keep evaluations close to your production workflows and compare changes over time. +

-
-
-

Trace-native evaluation

-

Built on OpenTelemetry traces so you can evaluate real production-like runs without replaying agent execution.

+

No reruns

+

Use recorded traces to evaluate real executions after the fact.

-
-

Flexible scoring

-

Combine built-in evaluators with custom Python logic to measure correctness, tool usage, memory behavior, and more.

+

Behavior-first scoring

+

Measure task completion, tool use quality, handoffs, latency, and more.

-
-

Works in your workflow

-

Run locally with the CLI, automate in CI/CD, or explore results visually in the web UI.

+

Built on OpenTelemetry

+

Plug into existing observability pipelines instead of inventing a parallel eval stack.

-
+ +
-
+
+
- -

From traces to scores in three steps.

+

How it works

+

Two ways to evaluate

- -
-
- 01 -

Collect traces

-

Instrument your agent with OpenTelemetry and emit traces for prompts, tool calls, memory operations, and outputs.

-
-
- 02 -

Define evaluators

-

Choose built-in evaluators or create your own to score the behaviors that matter for your agent.

-
-
- 03 -

Run evaluations

-

Score trace datasets through the CLI or web UI and compare results across prompts, models, or tool strategies.

-
+
+
+ 1 +

CLI workflow

+

+ Run evaluations locally or in CI with config files and reproducible commands. +

+ Open the CLI guide → +
+
+ 2 +

Web workflow

+

+ Explore traces, inspect scores, and review rubric results in the browser. +

+ Open the Web guide → +
-
+ +
-
+
+
- -
-
- -

Two ways to evaluate.

-

Use the CLI for fast, scriptable scoring or the Web UI for visual exploration of evaluation results.

-
- -
-
-

CLI

-

Run evaluations locally or in CI with straightforward commands and structured outputs.

-
agentevals eval run config.yaml
-
-
-

Web UI

-

Inspect trace datasets, compare runs, and review evaluator outputs in a visual interface.

-
agentevals ui
-
-
-
- -
-
- -

Bring evaluation into your agent development loop.

-

Install AgentEvals, connect your traces, and start measuring how your agent behaves in the real world.

- -
-
-
- - + + +{{ end }} \ No newline at end of file