Skip to content

[FEATURE] Add --benchmark flag for performance testing #15

@jwesleye

Description

@jwesleye

Feature Description

Run standardized performance benchmarks on agents to measure speed, cost, and quality.

Problem/Motivation

Hard to compare different agents or track performance changes over time objectively.

Proposed Solution

chat_loop myagent --benchmark

# Runs standard queries and reports:
# - Average latency: 2.3s
# - Tokens/sec: 45
# - Cost per query: $0.015
# - Total time: 23s
# - Total cost: $0.15

Benefits

  • Compare agents objectively
  • Track performance regressions
  • Optimize for speed or cost
  • Share benchmarks with team

Priority

  • Critical
  • High
  • Medium
  • Low

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requeststatus: staleNo activity for extended period

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions