[trainer] fix: fail closed on incomplete agent-loop rollouts#6641
[trainer] fix: fail closed on incomplete agent-loop rollouts#6641TimurTaepov wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces validation for agent-loop rollout batches in the ReplayBuffer to ensure completeness before sampling, which prevents skewed advantage computations. It adds helper functions to parse output keys, format error messages, and track expected rollouts, along with comprehensive unit tests. The feedback highlights a potential issue where terminal failures with missing expected_rollouts might be ignored, and suggests explicitly separating the handling of "finished" and "failure" statuses to guarantee that failures are always captured.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
49e47d1 to
ae6ab70
Compare
|
The I’ll leave this PR scoped to the trainer fix unless maintainers prefer otherwise. |
|
Thanks for the PR. The fail-closed direction matches the issue, and counting unique I think there is still one cleanup edge case from the original issue: root prompt markers are not removed after a successful sample. Both training and validation cleanup currently clear/remove only This becomes observable with the new completeness check in validation. Could we include root-marker cleanup in this PR, or at least add a regression test for two complete validation-style batches at the same |
Fail closed when agent-loop rollouts finish with missing sessions. Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: TimurTaepov <tim.taepov@gmail.com>
ae6ab70 to
9e4e85d
Compare
|
Hi @Baiyu-Su, thanks, you're right. The sampled batch only carried child success rows, so cleanup could leave finished root markers behind. I updated I also added a regression test for two complete validation-style batches at the same |
What does this PR do?
Fixes #6437.
This PR makes
main_ppo_syncfail closed when agent-loop rollout batches finish with missing rollout sessions.Previously,
ReplayBuffer.sample()waited only forstatus="running"markers to disappear, collectedstatus="success"rows, and silently ignored terminal root statuses such asfinishedandfailure. This could let training continue on a partial rollout group after an agent-loop failure or empty output.This PR:
finishedprompts with fewer successful rollout sessions than expected;KVBatchMeta;Checklist Before Starting
[{modules}] {type}: {description}Test
During commit, the local
check-naming-conventionspre-commit hook was skipped because it scans.venvand false-positives on Ray's installedServeRLlibRLModule. Running the same grep check with.venvexcluded found no matches in the repository files.API and Usage Example
No public API or usage changes.
Design & Code Changes
AgentLoopManagerTQ.generate_sequences()now recordsexpected_rolloutson each root prompt marker before dispatching agent-loop workers.ReplayBuffer.sample()now:running;{uid}_{session_id}_{index};KVBatchMeta.extra_infoso cleanup removes both root markers and child rows;RuntimeErrorif a root prompt reachesfailure;RuntimeErrorif afinishedroot prompt has fewer successful sessions thanexpected_rollouts.The child-key parser uses
rsplit("_", 2)becauseuidcan contain underscores. The check counts unique sessions rather than output rows, because one agent-loop session can emit multiple output rows.Checklist Before Submitting
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel.recipesubmodule.AI assistance disclosure: this PR was developed with AI assistance. I reviewed the changed lines and ran the tests above.