Skip to content

Define benchmark agents for leaderboard and paper reporting #397

Description

@ChathurangiShyalika

Description

We need to finalize the set of agents that will be used for the leaderboard and paper results. This should include a review of related work to identify which agent types are commonly used and how they should be represented in our benchmark.

Why

A clear agent definition is needed for fair comparison, reproducibility, and consistent reporting across papers and leaderboard runs.

Goal

Create a fixed agent list and align it with related work and evaluation reporting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions