[trainer] fix: fail closed on incomplete agent-loop rollouts by TimurTaepov · Pull Request #6641 · verl-project/verl

TimurTaepov · 2026-06-06T10:01:30Z

What does this PR do?

This PR makes main_ppo_sync fail closed when agent-loop rollout batches finish with missing rollout sessions.

Previously, ReplayBuffer.sample() waited only for status="running" markers to disappear, collected status="success" rows, and silently ignored terminal root statuses such as finished and failure. This could let training continue on a partial rollout group after an agent-loop failure or empty output.

This PR:

records the expected rollout session count on each root prompt marker;
detects terminal prompt failures;
detects finished prompts with fewer successful rollout sessions than expected;
cleans up sampled root prompt markers together with child output rows;
raises a clear error instead of returning a partial KVBatchMeta;
adds CPU regression tests.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here:
- https://github.com/verl-project/verl/pulls?q=is%3Apr+is%3Aopen+6437+in%3Abody
- https://github.com/verl-project/verl/pulls?q=is%3Apr+is%3Aopen+ReplayBuffer+partial+rollout+failure+main_ppo_sync
Format the PR title as [{modules}] {type}: {description}

Test

.venv/bin/python -m pytest tests/trainer/test_main_ppo_sync_replay_buffer_on_cpu.py -q
# 9 passed

.venv/bin/python -m pytest tests/trainer/test_multi_trajectories_advantage.py -q
# 2 passed

.venv/bin/pre-commit run ruff --files verl/trainer/main_ppo_sync.py tests/trainer/test_main_ppo_sync_replay_buffer_on_cpu.py
# Passed

.venv/bin/pre-commit run ruff-format --files verl/trainer/main_ppo_sync.py tests/trainer/test_main_ppo_sync_replay_buffer_on_cpu.py
# Passed

git diff --check
# Passed

During commit, the local check-naming-conventions pre-commit hook was skipped because it scans .venv and false-positives on Ray's installed ServeRLlibRLModule. Running the same grep check with .venv excluded found no matches in the repository files.

API and Usage Example

No public API or usage changes.

Design & Code Changes

AgentLoopManagerTQ.generate_sequences() now records expected_rollouts on each root prompt marker before dispatching agent-loop workers.

ReplayBuffer.sample() now:

waits while any matching prompt is still running;
collects successful child output rows as before;
counts unique successful rollout sessions per prompt using child keys formatted as {uid}_{session_id}_{index};
carries the sampled root prompt keys in KVBatchMeta.extra_info so cleanup removes both root markers and child rows;
raises RuntimeError if a root prompt reaches failure;
raises RuntimeError if a finished root prompt has fewer successful sessions than expected_rollouts.

The child-key parser uses rsplit("_", 2) because uid can contain underscores. The check counts unique sessions rather than output rows, because one agent-loop session can emit multiple output rows.

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code.
Once your PR is ready for CI, send a message in the ci-request channel.
This PR is not related to the recipe submodule.

AI assistance disclosure: this PR was developed with AI assistance. I reviewed the changed lines and ran the tests above.

CLAassistant · 2026-06-06T10:01:38Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This pull request introduces validation for agent-loop rollout batches in the ReplayBuffer to ensure completeness before sampling, which prevents skewed advantage computations. It adds helper functions to parse output keys, format error messages, and track expected rollouts, along with comprehensive unit tests. The feedback highlights a potential issue where terminal failures with missing expected_rollouts might be ignored, and suggests explicitly separating the handling of "finished" and "failure" statuses to guarantee that failures are always captured.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

TimurTaepov · 2026-06-06T18:11:53Z

The cpu_unit_tests.yml workflow parse error appears unrelated to this PR. This branch does not modify workflow files; the same invalid step is present on latest upstream/main at .github/workflows/cpu_unit_tests.yml:98, where Check final pip list is missing a run: key.

I’ll leave this PR scoped to the trainer fix unless maintainers prefer otherwise.

Baiyu-Su · 2026-06-07T15:45:04Z

Thanks for the PR. The fail-closed direction matches the issue, and counting unique {uid}_{session_id} sessions rather than output rows looks right for multi-output agent loops.

I think there is still one cleanup edge case from the original issue: root prompt markers are not removed after a successful sample. Both training and validation cleanup currently clear/remove only batch.keys, which are child success rows. The root uid markers remain in the replay buffer / TransferQueue metadata.

This becomes observable with the new completeness check in validation. _validate() iterates over all validation dataloader batches using the same self.global_steps. After the first val batch succeeds, cleanup removes only the child success rows. The root marker from that batch remains as finished with expected_rollouts. On the next val batch at the same global_steps, ReplayBuffer.sample(partition_id="val", global_steps=...) can see the stale finished root, count zero successful sessions for that old uid, and raise a false incomplete-rollout error.

Could we include root-marker cleanup in this PR, or at least add a regression test for two complete validation-style batches at the same global_steps with cleanup between them? I think the sampled batch needs a way to carry the root keys used for completeness accounting so cleanup can remove both root markers and child output rows.

Fail closed when agent-loop rollouts finish with missing sessions. Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: TimurTaepov <tim.taepov@gmail.com>

TimurTaepov · 2026-06-07T19:00:20Z

Hi @Baiyu-Su, thanks, you're right. The sampled batch only carried child success rows, so cleanup could leave finished root markers behind.

I updated ReplayBuffer.sample() to carry the sampled root prompt markers in KVBatchMeta.extra_info["rollout_root_keys"], and updated both validation and training cleanup to clear/remove child rows plus those root markers.

I also added a regression test for two complete validation-style batches at the same global_steps with cleanup between them.

TimurTaepov requested review from PeterSH6, eric-haibin-lin, tongyx361 and vermouth1992 as code owners June 6, 2026 10:01

gemini-code-assist Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread verl/trainer/main_ppo_sync.py Outdated

TimurTaepov force-pushed the fix-main-ppo-sync-partial-rollouts branch from 49e47d1 to ae6ab70 Compare June 6, 2026 10:09

TimurTaepov changed the title ~~[trainer] fail closed on incomplete agent-loop rollouts~~ [trainer] fix: fail closed on incomplete agent-loop rollouts Jun 6, 2026

[trainer] fail closed on incomplete agent-loop rollouts

9e4e85d

Fail closed when agent-loop rollouts finish with missing sessions. Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: TimurTaepov <tim.taepov@gmail.com>

TimurTaepov force-pushed the fix-main-ppo-sync-partial-rollouts branch from ae6ab70 to 9e4e85d Compare June 7, 2026 18:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[trainer] fix: fail closed on incomplete agent-loop rollouts#6641

[trainer] fix: fail closed on incomplete agent-loop rollouts#6641
TimurTaepov wants to merge 1 commit into
verl-project:mainfrom
TimurTaepov:fix-main-ppo-sync-partial-rollouts

TimurTaepov commented Jun 6, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Jun 6, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

TimurTaepov commented Jun 6, 2026

Uh oh!

Baiyu-Su commented Jun 7, 2026

Uh oh!

TimurTaepov commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

TimurTaepov commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

CLAassistant commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

TimurTaepov commented Jun 6, 2026

Uh oh!

Baiyu-Su commented Jun 7, 2026

Uh oh!

TimurTaepov commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TimurTaepov commented Jun 6, 2026 •

edited

Loading

CLAassistant commented Jun 6, 2026 •

edited

Loading