Skip to content

feat(benchmark): add scenario suite runner#390

Merged
DhavalRepo18 merged 4 commits into
mainfrom
scenario_suite
Jun 18, 2026
Merged

feat(benchmark): add scenario suite runner#390
DhavalRepo18 merged 4 commits into
mainfrom
scenario_suite

Conversation

@ChathurangiShyalika

Copy link
Copy Markdown
Collaborator

Summary

Adds a generic benchmark runner for scenario-suite execution. The runner reads a simple scenario-id list, runs each scenario sequentially, resets and reloads CouchDB uniformly for every scenario, saves trajectories, and invokes the existing evaluator to generate per-scenario and aggregate reports.

Changes

  • Added benchmarks/scenario_suite/scenarios.txt as the scenario-id list
  • Added benchmarks/scenario_suite/scenarios.sample.txt as a public example file
  • Added benchmarks/scenario_suite/README.md
  • Added src/benchmark/scenario_suite_runner.py
  • Added src/benchmark/tests/test_scenario_suite_runner.py
  • Updated direct LLM agent files to support the benchmark flow
  • Updated evaluation report aggregation for static JSON results
  • Updated related tests

Behavior

  • Reads scenario IDs from benchmarks/scenario_suite/scenarios.txt
  • Uses SCENARIOS_DATA_DIR from the provided scenario root
  • Resets and reloads CouchDB for every scenario so the flow stays uniform
  • Runs the selected agent method sequentially
  • Saves trajectories under AGENT_TRAJECTORY_DIR
  • Runs the evaluator with --scorer-default static_json
  • Writes per-scenario reports and an aggregate report

Notes

  • The public sample file is intentionally generic and does not expose private benchmark wording.
  • Scenario data remains external and is not committed in full.

Signed-off-by: Chathurangi Shyalika <chathurangishyalika@Chathurangis-MacBook-Pro.local>
Signed-off-by: Chathurangi Shyalika <chathurangishyalika@Chathurangis-MacBook-Pro.local>

@DhavalRepo18 DhavalRepo18 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cross-check every file. We can get started.

Signed-off-by: Chathurangi Shyalika <chathurangishyalika@Chathurangis-MacBook-Pro.local>
Signed-off-by: Chathurangi Shyalika <chathurangishyalika@Chathurangis-MacBook-Pro.local>
@DhavalRepo18 DhavalRepo18 merged commit 2d81e4a into main Jun 18, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants