AgentLens

AgentLens is an AI Agent Evaluation Platform built to test, debug, and evaluate customer-support AI agents across policy compliance, hallucination risk, tone quality, escalation handling, and security behavior.

The platform allows users to upload conversations, generate different test scenarios, simulate agent responses, and evaluate them with automated PASS/WARN/FAIL verdicts using multiple LLM providers with fallback handling.

Live Demo

🔗 https://agentlens-ujyf.onrender.com

Features

Conversation Analyzer

Upload customer-support conversations
Analyze:
- Tone quality
- Policy compliance
- Hallucination risk
- Security handling
- Resolution quality
Generate structured evaluation reports

Turn-by-Turn Debugger

Debug AI responses message-by-message
Identify:
- Incorrect responses
- Policy violations
- Escalation failures
- Unsafe behavior
Suggest corrected responses and improvements

Scenario Generator

Auto-generates categorized AI evaluation scenarios:
- Normal
- Edge Cases
- Adversarial
Includes:
- Severity tagging
- Policy area classification
- Failure modes
- Escalation indicators
- Expected behavior

Scenario Evaluation Engine

Run evaluation on generated scenarios
Generates:
- PASS / WARN / FAIL verdicts
- Policy compliance checks
- Tone evaluation
- Security analysis
- Hallucination analysis
- Escalation correctness

Multi-LLM Fallback Architecture

Supports multiple AI providers with automatic fallback handling:

Groq
Cerebras
SambaNova
OpenRouter

If one provider fails or rate-limits, the system automatically switches to another provider.

Tech Stack

Backend

Python
FastAPI

Frontend

HTML
CSS
JavaScript

APIs / AI Providers

Groq API
Cerebras API
SambaNova API
OpenRouter API

Deployment

Render

Tools

Git
GitHub

Project Structure

agentlens/
│
├── static/
│   └── index.html
│
├── sample_conversations/
│
├── api.py
├── main.py
├── debugger.py
├── scenario_generator.py
├── llm_client.py
├── test_fallback.py
├── requirements.txt
└── README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentLens

Live Demo

Features

Conversation Analyzer

Turn-by-Turn Debugger

Scenario Generator

Scenario Evaluation Engine

Multi-LLM Fallback Architecture

Tech Stack

Backend

Frontend

APIs / AI Providers

Deployment

Tools

Project Structure

About

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
sample_conversations		sample_conversations
static		static
.gitignore		.gitignore
README.md		README.md
api.py		api.py
debugger.py		debugger.py
llm_client.py		llm_client.py
main.py		main.py
requirements.txt		requirements.txt
scenario_generator.py		scenario_generator.py
test_fallback.py		test_fallback.py

Folders and files

Latest commit

History

Repository files navigation

AgentLens

Live Demo

Features

Conversation Analyzer

Turn-by-Turn Debugger

Scenario Generator

Scenario Evaluation Engine

Multi-LLM Fallback Architecture

Tech Stack

Backend

Frontend

APIs / AI Providers

Deployment

Tools

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages