Skip to content
Ian Parent edited this page Mar 19, 2026 · 1 revision

Iris — The Agent Eval Standard for MCP

Iris scores every AI agent output against 12 eval rules. Catch PII leaks, hallucinations, injection attacks, and cost overruns before your users do.

Website | Playground | npm


Quick Start

npx @iris-eval/mcp-server@latest

Open http://localhost:6920 to see the eval dashboard.

Wiki Contents

Why Agent Eval?

Traditional monitoring tells you if your agent ran. Iris tells you how well it performed.

What monitoring catches What Iris catches
Agent crashed Agent leaked a credit card number
Latency > 5s Agent hallucinated instead of answering
Error rate spike Agent repeated an injection attack in output
Cost per request Cost 4.7x over budget for a simple query

Architecture Overview

Iris is an MCP server that intercepts agent traces and evaluates them against configurable rules:

MCP Client (Claude, etc.)
    ↓ traces
Iris MCP Server
    ├── 12 Eval Rules (PII, hallucination, injection, cost, ...)
    ├── SQLite Storage (traces, evals, metrics)
    └── Web Dashboard (localhost:6920)

The 12 Eval Rules

Rule What It Catches
topic_consistency Output drifts from the prompt topic
expected_coverage Key topics missing from response
response_complete Truncated or incomplete answers
no_hallucination_markers "As an AI...", hedging, punt phrases
no_blocklist_words Profanity, competitor names, banned terms
no_pii Credit cards, SSNs, emails, phone numbers
no_injection_patterns Prompt injection repeated in output
sentiment_appropriate Tone mismatch for the context
language_match Response in wrong language
output_format_valid JSON/XML format violations
cost_under_threshold Query cost exceeds budget
latency_under_threshold Response time too slow

License

MIT License. Copyright (c) 2026 Ian Parent.