scheduler orchestration optimization 1 by mayinghan · Pull Request #381 · eval-protocol/python-sdk

mayinghan · 2025-12-17T23:38:51Z

Refactors the PriorityRolloutScheduler from batch-based to streaming scheduling. Each completed run now immediately schedules the next run, rather than waiting for an entire mini-batch to complete. This improves overall throughput by maximizing concurrent execution.

Key Changes

Streaming Scheduling Architecture

New SampleState dataclass: Tracks per-sample state across multiple runs, including active/completed run counts and async locks for safe concurrent updates
Refactored RolloutTask: Now represents a single run instead of a batch of runs
Immediate next-run scheduling: When a run completes, the next run is immediately scheduled with high priority (0), ensuring samples finish as quickly as possible

Concurrency Simplification

Removed redundant rollout_sem: The rollout processor (e.g., SingleTurnRolloutProcessor) already handles concurrency via config.semaphore. The scheduler-level semaphore was causing double-limiting and has been removed.

Priority Queue Behavior

Initial runs are scheduled with low priority (1, row_index, run_idx)
Subsequent runs (after first completion) are scheduled with high priority (0, row_index, run_idx) to finish in-progress samples ASAP

Benefits

Higher throughput: New runs start immediately when a slot opens, no waiting for batch completion
Better resource utilization: Maintains in_group_minibatch_size concurrent runs per sample at all times
Cleaner concurrency model: Single point of concurrency control in the rollout processor

Note

Refactors the rollout scheduler to stream per-run tasks with high/low priorities, adds speculative history injection, and simplifies concurrency to rely on the rollout processor’s semaphore.

Scheduler (PriorityRolloutScheduler):
- Introduces SampleState and redefines RolloutTask to a single run with priority (status, row_index, run_index).
- Implements streaming scheduling: each completed run immediately enqueues the next with high priority; maintains per-sample concurrency via in_group_minibatch_size (defaults depend on ENABLE_SPECULATION).
- Adds speculative prediction by injecting history_snapshot into completion_params.extra_body.prediction per run.
- Simplifies concurrency: relies on rollout_processor's semaphore; worker count set to max_concurrent_rollouts.
- Tracks active rollouts for progress display; preserves pointwise/groupwise eval flows with buffered group dispatch.
Tests:
- Update to new RolloutTask/SampleState API; adjust expectations for worker scaling, priority ordering, concurrency limits, and groupwise evaluation.
Misc:
- Minor whitespace adjustment in default_single_turn_rollout_process.py.

^{Written by Cursor Bugbot for commit fb8debe. This will update automatically on new commits. Configure here.}

eval_protocol/pytest/priority_scheduler.py

cursor · 2025-12-18T20:19:29Z

eval_protocol/pytest/priority_scheduler.py

-                extra_body["prediction"] = {"type": "content", "content": " ".join(task.history)[:max_tokens]}
-                cp["extra_body"] = extra_body
-                row_copy.input_metadata.completion_params = cp
+            cp["extra_body"]["prediction"] = " ".join(task.history_snapshot)[:max_tokens]


Bug: Shallow copy allows concurrent mutation of shared config

The comment says "Deep copy completion_params to avoid mutating shared config" but dict() only performs a shallow copy. If the base config already contains extra_body (which is common in practice), it will be shared by reference. When multiple speculation tasks from different samples run concurrently, they will all modify the same shared extra_body dict, causing race conditions where one task's prediction value gets overwritten by another. The extra_body dict needs to be explicitly copied or a true deep copy is needed.

eval_protocol/pytest/priority_scheduler.py

morgendave · 2025-12-19T00:04:41Z

eval_protocol/pytest/priority_scheduler.py

+    Tracks state for a single dataset sample across multiple runs.
+    Enables streaming scheduling where each completed run immediately triggers the next.
+    """
+    row: EvaluationRow


What's this row for? Do we update this? And for the lock I can see one state shouldn't be consumed by two async tasks?

this is just a reference and keep track of the history and rollout status for one sample. the row will be deep copied during actual usage

this samplestate only maintaining the state for each sample.

cursor · 2025-12-19T02:24:13Z

eval_protocol/pytest/default_single_turn_rollout_process.py

+                tc = time.perf_counter()
+                # print(f"run_id {row.execution_metadata.run_id} request_params: {json.dumps(request_params)}")
                response = await acompletion(**request_params)
+                print(f"run_id {row.execution_metadata.run_id} time taken: {time.perf_counter() - tc} speculation_enabled: {request_params.get('extra_body', {}).get('prediction', None) is not None}")


Bug: Debug print statement left in production code

A debug print statement was left in the production code path. This prints timing information and speculation status for every non-streaming LLM completion call, which will pollute logs and stdout in production. The timing variable tc and the uncommented print at line 103 appear to be debugging/profiling code that wasn't removed before committing.

morgendave

generally looks fine, unblock, need to verify the safety of lock and queue

mayinghan added 2 commits December 17, 2025 15:38

fix scheduler concurrency and orchestration

29c6d06

remove comment

c9344c3

mayinghan requested a review from morgendave December 17, 2025 23:49

mayinghan changed the title ~~fix scheduler concurrency and orchestration~~ scheduler orchestration optimization 1 Dec 17, 2025

add

1dd4169

mayinghan marked this pull request as ready for review December 18, 2025 20:14

cursor bot reviewed Dec 18, 2025

View reviewed changes

resolve comments

692f3ad

cursor bot reviewed Dec 18, 2025

View reviewed changes

eval_protocol/pytest/priority_scheduler.py Outdated Show resolved Hide resolved

morgendave reviewed Dec 19, 2025

View reviewed changes

mayinghan requested a review from morgendave December 19, 2025 02:11

fix

c726a57

cursor bot reviewed Dec 19, 2025

View reviewed changes

remove print

fb8debe

morgendave approved these changes Dec 19, 2025

View reviewed changes

mayinghan merged commit 57a46a6 into main Dec 19, 2025
17 checks passed

mayinghan deleted the priority-scheduler-2 branch December 19, 2025 22:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduler orchestration optimization 1#381

scheduler orchestration optimization 1#381
mayinghan merged 6 commits intomainfrom
priority-scheduler-2

mayinghan commented Dec 17, 2025 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

cursor bot Dec 18, 2025

Uh oh!

Uh oh!

morgendave Dec 19, 2025

Uh oh!

mayinghan Dec 19, 2025

Uh oh!

mayinghan Dec 19, 2025

Uh oh!

cursor bot Dec 19, 2025

Uh oh!

morgendave left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mayinghan commented Dec 17, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

Streaming Scheduling Architecture

Concurrency Simplification

Priority Queue Behavior

Benefits

Uh oh!

Uh oh!

cursor bot Dec 18, 2025

Choose a reason for hiding this comment

Bug: Shallow copy allows concurrent mutation of shared config

Uh oh!

Uh oh!

morgendave Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

mayinghan Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

mayinghan Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

cursor bot Dec 19, 2025

Choose a reason for hiding this comment

Bug: Debug print statement left in production code

Uh oh!

morgendave left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mayinghan commented Dec 17, 2025 •

edited by cursor bot

Loading