Feature Description
Run standardized performance benchmarks on agents to measure speed, cost, and quality.
Problem/Motivation
Hard to compare different agents or track performance changes over time objectively.
Proposed Solution
chat_loop myagent --benchmark
# Runs standard queries and reports:
# - Average latency: 2.3s
# - Tokens/sec: 45
# - Cost per query: $0.015
# - Total time: 23s
# - Total cost: $0.15
Benefits
- Compare agents objectively
- Track performance regressions
- Optimize for speed or cost
- Share benchmarks with team
Priority
Feature Description
Run standardized performance benchmarks on agents to measure speed, cost, and quality.
Problem/Motivation
Hard to compare different agents or track performance changes over time objectively.
Proposed Solution
Benefits
Priority