Skip to content

Test aime max_dataset_rows#427

Merged
xzrderek merged 1 commit intomainfrom
cursor/test-aime-max-dataset-rows-e41c
Feb 13, 2026
Merged

Test aime max_dataset_rows#427
xzrderek merged 1 commit intomainfrom
cursor/test-aime-max-dataset-rows-e41c

Conversation

@xzrderek
Copy link
Contributor

@xzrderek xzrderek commented Feb 13, 2026

Remove max_dataset_rows from test_aime25.py to allow the test to use the full dataset instead of being limited to 2 rows.


Slack Thread

Open in Cursor Open in Web


Note

Medium Risk
Changes evaluation runtime and cost by expanding test coverage from 2 rows to the full dataset, which may impact CI duration and resource usage.

Overview
Removes the max_dataset_rows=2 cap from the test_aime25_pointwise @evaluation_test configuration so the AIME 2025 benchmark runs against the full input datasets instead of only the first two rows.

Written by Cursor Bugbot for commit cab133f. This will update automatically on new commits. Configure here.

Co-authored-by: Derek Xu <xzrderek@users.noreply.github.com>
@cursor
Copy link

cursor bot commented Feb 13, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@xzrderek xzrderek marked this pull request as ready for review February 13, 2026 00:07
@xzrderek xzrderek merged commit 25dff74 into main Feb 13, 2026
1 of 2 checks passed
@xzrderek xzrderek deleted the cursor/test-aime-max-dataset-rows-e41c branch February 13, 2026 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants