Skip to content

[trainer] fix: fail closed on incomplete agent-loop rollouts#6641

Open
TimurTaepov wants to merge 1 commit into
verl-project:mainfrom
TimurTaepov:fix-main-ppo-sync-partial-rollouts
Open

[trainer] fix: fail closed on incomplete agent-loop rollouts#6641
TimurTaepov wants to merge 1 commit into
verl-project:mainfrom
TimurTaepov:fix-main-ppo-sync-partial-rollouts

Conversation

@TimurTaepov

@TimurTaepov TimurTaepov commented Jun 6, 2026

Copy link
Copy Markdown

What does this PR do?

Fixes #6437.

This PR makes main_ppo_sync fail closed when agent-loop rollout batches finish with missing rollout sessions.

Previously, ReplayBuffer.sample() waited only for status="running" markers to disappear, collected status="success" rows, and silently ignored terminal root statuses such as finished and failure. This could let training continue on a partial rollout group after an agent-loop failure or empty output.

This PR:

  • records the expected rollout session count on each root prompt marker;
  • detects terminal prompt failures;
  • detects finished prompts with fewer successful rollout sessions than expected;
  • cleans up sampled root prompt markers together with child output rows;
  • raises a clear error instead of returning a partial KVBatchMeta;
  • adds CPU regression tests.

Checklist Before Starting

Test

.venv/bin/python -m pytest tests/trainer/test_main_ppo_sync_replay_buffer_on_cpu.py -q
# 9 passed

.venv/bin/python -m pytest tests/trainer/test_multi_trajectories_advantage.py -q
# 2 passed

.venv/bin/pre-commit run ruff --files verl/trainer/main_ppo_sync.py tests/trainer/test_main_ppo_sync_replay_buffer_on_cpu.py
# Passed

.venv/bin/pre-commit run ruff-format --files verl/trainer/main_ppo_sync.py tests/trainer/test_main_ppo_sync_replay_buffer_on_cpu.py
# Passed

git diff --check
# Passed

During commit, the local check-naming-conventions pre-commit hook was skipped because it scans .venv and false-positives on Ray's installed ServeRLlibRLModule. Running the same grep check with .venv excluded found no matches in the repository files.

API and Usage Example

No public API or usage changes.

Design & Code Changes

AgentLoopManagerTQ.generate_sequences() now records expected_rollouts on each root prompt marker before dispatching agent-loop workers.

ReplayBuffer.sample() now:

  • waits while any matching prompt is still running;
  • collects successful child output rows as before;
  • counts unique successful rollout sessions per prompt using child keys formatted as {uid}_{session_id}_{index};
  • carries the sampled root prompt keys in KVBatchMeta.extra_info so cleanup removes both root markers and child rows;
  • raises RuntimeError if a root prompt reaches failure;
  • raises RuntimeError if a finished root prompt has fewer successful sessions than expected_rollouts.

The child-key parser uses rsplit("_", 2) because uid can contain underscores. The check counts unique sessions rather than output rows, because one agent-loop session can emit multiple output rows.

Checklist Before Submitting

  • Read the Contribute Guide.
  • Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
  • Add / Update the documentation.
  • Add unit or end-to-end test(s) to the CI workflow to cover all the code.
  • Once your PR is ready for CI, send a message in the ci-request channel.
  • This PR is not related to the recipe submodule.

AI assistance disclosure: this PR was developed with AI assistance. I reviewed the changed lines and ran the tests above.

@CLAassistant

CLAassistant commented Jun 6, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces validation for agent-loop rollout batches in the ReplayBuffer to ensure completeness before sampling, which prevents skewed advantage computations. It adds helper functions to parse output keys, format error messages, and track expected rollouts, along with comprehensive unit tests. The feedback highlights a potential issue where terminal failures with missing expected_rollouts might be ignored, and suggests explicitly separating the handling of "finished" and "failure" statuses to guarantee that failures are always captured.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread verl/trainer/main_ppo_sync.py Outdated
@TimurTaepov TimurTaepov force-pushed the fix-main-ppo-sync-partial-rollouts branch from 49e47d1 to ae6ab70 Compare June 6, 2026 10:09
@TimurTaepov

Copy link
Copy Markdown
Author

The cpu_unit_tests.yml workflow parse error appears unrelated to this PR. This branch does not modify workflow files; the same invalid step is present on latest upstream/main at .github/workflows/cpu_unit_tests.yml:98, where Check final pip list is missing a run: key.

I’ll leave this PR scoped to the trainer fix unless maintainers prefer otherwise.

@TimurTaepov TimurTaepov changed the title [trainer] fail closed on incomplete agent-loop rollouts [trainer] fix: fail closed on incomplete agent-loop rollouts Jun 6, 2026
@Baiyu-Su

Baiyu-Su commented Jun 7, 2026

Copy link
Copy Markdown

Thanks for the PR. The fail-closed direction matches the issue, and counting unique {uid}_{session_id} sessions rather than output rows looks right for multi-output agent loops.

I think there is still one cleanup edge case from the original issue: root prompt markers are not removed after a successful sample. Both training and validation cleanup currently clear/remove only batch.keys, which are child success rows. The root uid markers remain in the replay buffer / TransferQueue metadata.

This becomes observable with the new completeness check in validation. _validate() iterates over all validation dataloader batches using the same self.global_steps. After the first val batch succeeds, cleanup removes only the child success rows. The root marker from that batch remains as finished with expected_rollouts. On the next val batch at the same global_steps, ReplayBuffer.sample(partition_id="val", global_steps=...) can see the stale finished root, count zero successful sessions for that old uid, and raise a false incomplete-rollout error.

Could we include root-marker cleanup in this PR, or at least add a regression test for two complete validation-style batches at the same global_steps with cleanup between them? I think the sampled batch needs a way to carry the root keys used for completeness accounting so cleanup can remove both root markers and child output rows.

Fail closed when agent-loop rollouts finish with missing sessions.

Co-authored-by: OpenAI Codex <codex@openai.com>
Signed-off-by: TimurTaepov <tim.taepov@gmail.com>
@TimurTaepov TimurTaepov force-pushed the fix-main-ppo-sync-partial-rollouts branch from ae6ab70 to 9e4e85d Compare June 7, 2026 18:59
@TimurTaepov

Copy link
Copy Markdown
Author

Hi @Baiyu-Su, thanks, you're right. The sampled batch only carried child success rows, so cleanup could leave finished root markers behind.

I updated ReplayBuffer.sample() to carry the sampled root prompt markers in KVBatchMeta.extra_info["rollout_root_keys"], and updated both validation and training cleanup to clear/remove child rows plus those root markers.

I also added a regression test for two complete validation-style batches at the same global_steps with cleanup between them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[trainer] main_ppo_sync can silently train on partial rollout batches after agent-loop failures

3 participants