feat(benchmark): add scenario suite runner by ChathurangiShyalika · Pull Request #390 · IBM/AssetOpsBench

ChathurangiShyalika · 2026-06-18T20:45:50Z

Summary

Adds a generic benchmark runner for scenario-suite execution. The runner reads a simple scenario-id list, runs each scenario sequentially, resets and reloads CouchDB uniformly for every scenario, saves trajectories, and invokes the existing evaluator to generate per-scenario and aggregate reports.

Changes

Added benchmarks/scenario_suite/scenarios.txt as the scenario-id list
Added benchmarks/scenario_suite/scenarios.sample.txt as a public example file
Added benchmarks/scenario_suite/README.md
Added src/benchmark/scenario_suite_runner.py
Added src/benchmark/tests/test_scenario_suite_runner.py
Updated direct LLM agent files to support the benchmark flow
Updated evaluation report aggregation for static JSON results
Updated related tests

Behavior

Reads scenario IDs from benchmarks/scenario_suite/scenarios.txt
Uses SCENARIOS_DATA_DIR from the provided scenario root
Resets and reloads CouchDB for every scenario so the flow stays uniform
Runs the selected agent method sequentially
Saves trajectories under AGENT_TRAJECTORY_DIR
Runs the evaluator with --scorer-default static_json
Writes per-scenario reports and an aggregate report

Notes

The public sample file is intentionally generic and does not expose private benchmark wording.
Scenario data remains external and is not committed in full.

Signed-off-by: Chathurangi Shyalika <chathurangishyalika@Chathurangis-MacBook-Pro.local>

DhavalRepo18

I cross-check every file. We can get started.

Signed-off-by: Chathurangi Shyalika <chathurangishyalika@Chathurangis-MacBook-Pro.local>

ChathurangiShyalika requested review from DhavalRepo18 and ShuxinLin June 18, 2026 20:45

feat(benchmark): add scenario suite runner

44af3a9

Signed-off-by: Chathurangi Shyalika <chathurangishyalika@Chathurangis-MacBook-Pro.local>

ChathurangiShyalika force-pushed the scenario_suite branch from 4a69969 to 44af3a9 Compare June 18, 2026 20:47

Updating README.md

72d7ad0

Signed-off-by: Chathurangi Shyalika <chathurangishyalika@Chathurangis-MacBook-Pro.local>

DhavalRepo18 approved these changes Jun 18, 2026

View reviewed changes

Updating model parameters & Updating README.md

055a3c1

Signed-off-by: Chathurangi Shyalika <chathurangishyalika@Chathurangis-MacBook-Pro.local>

ChathurangiShyalika force-pushed the scenario_suite branch from bbbdbbe to 055a3c1 Compare June 18, 2026 23:26

Updating README.md

afb3437

Signed-off-by: Chathurangi Shyalika <chathurangishyalika@Chathurangis-MacBook-Pro.local>

ChathurangiShyalika force-pushed the scenario_suite branch from 3819b36 to afb3437 Compare June 18, 2026 23:30

DhavalRepo18 merged commit 2d81e4a into main Jun 18, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(benchmark): add scenario suite runner#390

feat(benchmark): add scenario suite runner#390
DhavalRepo18 merged 4 commits into
mainfrom
scenario_suite

ChathurangiShyalika commented Jun 18, 2026

Uh oh!

DhavalRepo18 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ChathurangiShyalika commented Jun 18, 2026

Summary

Changes

Behavior

Notes

Uh oh!

DhavalRepo18 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants