Test aime max_dataset_rows by xzrderek · Pull Request #427 · eval-protocol/python-sdk

xzrderek · 2026-02-13T00:06:48Z

Remove max_dataset_rows from test_aime25.py to allow the test to use the full dataset instead of being limited to 2 rows.

Slack Thread

Note

Medium Risk
Changes evaluation runtime and cost by expanding test coverage from 2 rows to the full dataset, which may impact CI duration and resource usage.

Overview
Removes the max_dataset_rows=2 cap from the test_aime25_pointwise @evaluation_test configuration so the AIME 2025 benchmark runs against the full input datasets instead of only the first two rows.

^{Written by Cursor Bugbot for commit cab133f. This will update automatically on new commits. Configure here.}

Co-authored-by: Derek Xu <xzrderek@users.noreply.github.com>

cursor · 2026-02-13T00:06:50Z

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
_{Learn more about Cursor Agents}

Remove max_dataset_rows from test_aime25.py

cab133f

Co-authored-by: Derek Xu <xzrderek@users.noreply.github.com>

xzrderek marked this pull request as ready for review February 13, 2026 00:07

xzrderek merged commit 25dff74 into main Feb 13, 2026
1 of 2 checks passed

xzrderek deleted the cursor/test-aime-max-dataset-rows-e41c branch February 13, 2026 00:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test aime max_dataset_rows#427

Test aime max_dataset_rows#427
xzrderek merged 1 commit intomainfrom
cursor/test-aime-max-dataset-rows-e41c

xzrderek commented Feb 13, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xzrderek commented Feb 13, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xzrderek commented Feb 13, 2026 •

edited by cursor bot

Loading