-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Ian Parent edited this page Mar 19, 2026
·
1 revision
Iris scores every AI agent output against 12 eval rules. Catch PII leaks, hallucinations, injection attacks, and cost overruns before your users do.
Website | Playground | npm
npx @iris-eval/mcp-server@latestOpen http://localhost:6920 to see the eval dashboard.
- Architecture — How Iris works under the hood
- Eval Rules — The 12 built-in evaluation rules explained
- Configuration — Customizing rules, thresholds, and storage
- MCP Integration — Connecting Iris to any MCP client
- Dashboard Guide — Using the web dashboard
- Contributing — How to contribute to Iris
- FAQ — Frequently asked questions
Traditional monitoring tells you if your agent ran. Iris tells you how well it performed.
| What monitoring catches | What Iris catches |
|---|---|
| Agent crashed | Agent leaked a credit card number |
| Latency > 5s | Agent hallucinated instead of answering |
| Error rate spike | Agent repeated an injection attack in output |
| Cost per request | Cost 4.7x over budget for a simple query |
Iris is an MCP server that intercepts agent traces and evaluates them against configurable rules:
MCP Client (Claude, etc.)
↓ traces
Iris MCP Server
├── 12 Eval Rules (PII, hallucination, injection, cost, ...)
├── SQLite Storage (traces, evals, metrics)
└── Web Dashboard (localhost:6920)
| Rule | What It Catches |
|---|---|
topic_consistency |
Output drifts from the prompt topic |
expected_coverage |
Key topics missing from response |
response_complete |
Truncated or incomplete answers |
no_hallucination_markers |
"As an AI...", hedging, punt phrases |
no_blocklist_words |
Profanity, competitor names, banned terms |
no_pii |
Credit cards, SSNs, emails, phone numbers |
no_injection_patterns |
Prompt injection repeated in output |
sentiment_appropriate |
Tone mismatch for the context |
language_match |
Response in wrong language |
output_format_valid |
JSON/XML format violations |
cost_under_threshold |
Query cost exceeds budget |
latency_under_threshold |
Response time too slow |
MIT License. Copyright (c) 2026 Ian Parent.