Fix some rebased issues by thibautbar · Pull Request #40 · project-numina/verl

thibautbar · 2025-07-04T07:47:48Z

No description provided.

…ect#1851) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Follow-up of verl-project#1838, make the `name_prefix` mechanism same for `RayWorkerGroup` and `RayResourcePool`, default to be `None` and will be initialized randomly. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path. Signed-off-by: Hongpeng Guo <hg5@illinois.edu>

### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Fix ep bug and try to add CI with 15B model, finding smaller models which are more convenient to test. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.

ProRL is a novel training methodology that incorporates KL divergence control, reference policy resetting, and a diverse suite of tasks. The empirical analysis reveals that RL-trained models consistently outperform base models across a wide range of pass@k evaluations, including scenarios where base models fail entirely regardless of the number of attempts. It is developed based on Verl. Link: https://arxiv.org/abs/2505.24864

1. Add: Add support for FSDP2 in GRPO-LoRa 2. Format: Automatic code formatting changes initiated by the pre-commit tool 3. Add: Integrate the end-to-end (e2e) testing of GRPO-LoRA + fsdp2 into the CI pipeline.

…tate. (verl-project#1625) Fix training crash due to missing checkpoint directory We encountered a training crash with error: "RuntimeError: Parent directory /workspace/ckpts/global_step_20 does not exist". It appears that `self.actor_rollout_wg.save_checkpoint`, which should create the checkpoint directory, might be running asynchronously and doesn't complete creating the folder in time. This change explicitly forces creation of the directory before saving the dataloader state to prevent this race condition. ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: [1657](verl-project#1657) - **Training**: FSDP/Megatron - **Inference**: vLLM ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.

### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? there is a tricky bug in per_tensor_generator with model.named_parameter(). "decoder.layers[n].mlp.router.expert_bias" in GPTModel is not registered in named_parameter, but in state_dict(). Before this fix, the router_bias or `model.layers.{layer_number}.mlp.gate.e_score_correction_bias` is not transfered from m-core to infer engine. > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.

### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? support training with deepseekv3 671B support MTP on top of verl-project#1284 now it is functional ready for 671B, still lacking of practice > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.

### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Add an example for DeepSeek 671B GRPO ### Specific Changes - Need verl-project#1694 - Set `torch._dynamo.config.suppress_errors = True` at entrypoint, if ``` ray.exceptions.RaySystemError: System error: Failed to unpickle serialized exception traceback: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/ray/exceptions.py", line 46, in from_ray_exception return pickle.loads(ray_exception.serialized_exception) TypeError: BackendCompilerFailed.__init__() missing 1 required positional argument: 'inner_exception' ``` ### Additional Info. - vllm as backend, sglang working in process (sgl-project/sglang#6762). Merged when both backends are ready. - For DeepSeek-V3-0324 at `gsm8k`, the reward starts from 0.8 and saturated at around 0.95 using only 3 steps. - Memory peaks around 90GB during actor update (1.5k input + 2.5k output), consider using TP/ETP for a lower requirement. - For gsm8k training using this yaml, ![image](https://github.com/user-attachments/assets/d16cf959-5845-4dd0-95af-07fc35820f18) ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.

…DP1 (verl-project#1823) ### Checklist Before Starting - [done] Search for similar PR(s). ### What does this PR do? Mirror the CI for VeRL to run on the NPU and fallback the e2e test of the SFT to FSDP1, as the NPU is not currently adapted for FSDP2 ### Specific Changes Add `.github/workflows/e2e_ascend.yml` Change `tests/e2e/sft/run_sft.sh` ### Checklist Before Submitting - [ done ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ done ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). --------- Co-authored-by: liaochangyue <liaochangyue@bytedance.com>

…t#1867) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? - Run on 512 GPUs with TP1PP16EP32, 2k input + 4k output - Add some tips on memory saving ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.

Fixed URL for ProRL in README.md

### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. For PPO critic training, the value of EOS tokens should be zero and should not be fitted. However, the current implementation does not mask the EOS token values, resulting in non-zero EOS token values. Although the learning target is zero, when PPO GAE lambda < 1, this affects the advantage calculation for tokens preceding EOS, thereby impacting performance. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com>

### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? ray put all the args in advance to avoid duplicate serialization cost for megatron dispatch. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.

### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Split docker image used by CI and deepseek-V3 running, using cudnn 9.8 to support MLA. New Image is ``whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.1-te2.3-deepseekv3``. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.

…ject#1768) ### Checklist Before Starting - [ done ] Search for similar PR(s). ### What does this PR do? Add an option to generate ray timeline for performance analysing. ### Usage Example Run a job with this option. It can generate the trace file at the end of training. You can view it from https://ui.perfetto.dev/ ``` python3 -m verl.trainer.main_ppo \ ray_init.timeline_json_file=/tmp/timeline.json \ ... ``` <img width="1347" alt="截屏2025-05-30 13 13 56" src="https://github.com/user-attachments/assets/ec57ef94-3ecd-467e-b33f-ae0da3a54c49" />

### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.

…t#1872) ### Checklist Before Starting - [ done ] Search for similar PR(s). ### What does this PR do? Fix ci failure from incorrect sgl-kernel version in docker image: ``` File "/usr/local/lib/python3.10/dist-packages/sglang/srt/utils.py", line 647, in assert_pkg_version raise Exception( Exception: sgl-kernel is installed with version 0.1.0, which is less than the minimum required version 0.1.1. Please reinstall the latest version with `pip install sgl-kernel --force-reinstall` ```

fix: typos

Updated readme for rollout related ppcoming features and changes.

…ct#1769) Changed sglang rollout pipeline to async method to have better performance. resolved issue verl-project#1721 ### Checklist Before Starting - [ done ] Search for similar PR(s). ### What does this PR do? In previous version, the sglang async_generate is called with a sync ray actor with lots of sync functions, and resulted poor performance ( GPU SM is 20% in TP2) This PR changed the while pipeline to async method. Performance comparsion to previous "sglang_async" mode: | sglang_async (old) | async （new） | % faster -- | -- | -- | -- timing_s/gen | 95 | 25 | 73.68% timing_s/step | 170 | 90 | 47.06% perf/throughput | 2700 | 4000 | 48.15% ### High-Level Design see verl-project#1698 This is a follow up task from above PR. ### Usage Example examples/grpo_trainer/run_qwen2-7b_seq_balance.sh ### Test .github/workflows/e2e_ppo_trainer.yml ### Additional Info. - **Issue Number**: Fixes issue verl-project#1721 ### Checklist Before Submitting - [ done ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ done ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ done ] Add `[BREAKING]` to the PR title if it breaks any API. - [ done ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ done ] Add CI test(s) if necessary.

### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Support DAPO algorithm on npu ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes 1. change `cuda` hardcode to get_torch_device() 2. add `device_name` parameter to RayDAPOTrainer ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.

### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. To handle the process bar update frequency when training in DAPO. ### Specific Changes > List the specific changes. 1.When we set algorithm.filter_groups.enable=true, the DAPO training process will skip samples whose advantages are all 0 or 1. 2.However, the progress bar does not update simultaneously, which can confuse users. 3.This merge request addresses the issue by updating the progress bar before filtering the samples. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. Co-authored-by: techzhu <techzhu@tencent.com>

### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.

### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? - Add retool qwen3 dataset and sft - The original retool doesn't follow standard qwen multiturn chat template. In this PR, we recompile the dataset and add a SFT script to train QWen-8b ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.

…ghts resharding (verl-project#1834)

…erl-project#1831)

### Checklist Before Starting - [x ] Search for similar PR(s). ### What does this PR do? Update tensordict version Resolve PPO training error + python3 -m verl.trainer.main_ppo algorithm.adv_estimator=gae data.train_files=/root/data/gsm8k/train.parquet data.val_files=/root/data/gsm8k/test.parquet data.train_batch_size=256 data.max_prompt_length=512 data.max_response_length=512 data.return_raw_chat=True actor_rollout_ref.model.path=/root/models/Qwen/Qwen2.5-0.5B actor_rollout_ref.model.use_liger=True actor_rollout_ref.actor.optim.lr=1e-6 actor_rollout_ref.model.use_remove_padding=True actor_rollout_ref.actor.optim.lr_warmup_steps_ratio=0.1 actor_rollout_ref.actor.ppo_mini_batch_size=128 actor_rollout_ref.actor.use_dynamic_bsz=False actor_rollout_ref.actor.ppo_max_token_len_per_gpu=32768 actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=2 actor_rollout_ref.actor.ulysses_sequence_parallel_size=1 actor_rollout_ref.actor.fsdp_config.param_offload=False actor_rollout_ref.actor.fsdp_config.optimizer_offload=False actor_rollout_ref.actor.use_kl_loss=False actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=32768 actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=2 actor_rollout_ref.rollout.tensor_model_parallel_size=2 actor_rollout_ref.rollout.name=vllm actor_rollout_ref.rollout.gpu_memory_utilization=0.8 actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=32768 actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=2 critic.optim.lr=1e-5 critic.ulysses_sequence_parallel_size=1 critic.model.use_remove_padding=True critic.optim.lr_warmup_steps_ratio=0.05 critic.model.path=/root/models/Qwen/Qwen2.5-0.5B critic.model.enable_gradient_checkpointing=False critic.use_dynamic_bsz=False critic.ppo_max_token_len_per_gpu=32768 critic.ppo_micro_batch_size_per_gpu=2 critic.model.fsdp_config.param_offload=False critic.model.fsdp_config.optimizer_offload=False reward_model.enable=True reward_model.ulysses_sequence_parallel_size=1 reward_model.model.path=/root/models/Qwen/Qwen2.5-0.5B reward_model.model.use_remove_padding=True reward_model.model.fsdp_config.param_offload=True reward_model.use_dynamic_bsz=False reward_model.forward_max_token_len_per_gpu=32768 reward_model.micro_batch_size_per_gpu=2 algorithm.use_kl_in_reward=False trainer.critic_warmup=0 'trainer.logger=[console]' trainer.project_name=verl-test trainer.experiment_name=qwen2.5-0.5b-model-reward-minimal trainer.nnodes=1 trainer.n_gpus_per_node=8 trainer.val_before_train=False trainer.test_freq=False trainer.save_freq=-1 trainer.resume_mode=disable trainer.total_epochs=2 trainer.total_training_steps=1 Traceback (most recent call last): File "<frozen runpy>", line 189, in _run_module_as_main File "<frozen runpy>", line 112, in _get_module_details File "/sgl-workspace/verl/__init__.py", line 22, in <module> from .protocol import DataProto File "/sgl-workspace/verl/protocol.py", line 30, in <module> import tensordict File "/usr/local/lib/python3.12/dist-packages/tensordict/__init__.py", line 6, in <module> import tensordict._reductions File "/usr/local/lib/python3.12/dist-packages/tensordict/_reductions.py", line 11, in <module> from tensordict._lazy import LazyStackedTensorDict File "/usr/local/lib/python3.12/dist-packages/tensordict/_lazy.py", line 38, in <module> from tensordict.memmap import MemoryMappedTensor File "/usr/local/lib/python3.12/dist-packages/tensordict/memmap.py", line 25, in <module> from torch.multiprocessing.reductions import ForkingPickler ImportError: cannot import name 'ForkingPickler' from 'torch.multiprocessing.reductions' (/usr/local/lib/python3.12/dist-packages/torch/multiprocessing/reductions.py) ### Checklist Before Submitting - [x ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [x ] Rely on existing unit tests on CI that covers the code path. Signed-off-by: Vicky Tsang <vtsang@amd.com>

### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? In scenarios involving multiple validation sets, where the difficulty levels of these sets differ significantly and the generated content lengths vary notably, the order in which the validation sets are processed can have a substantial impact on the validation speed. ### High-Level Design add validation shuffle ### Usage Example > Provide usage example(s) for easier usage. ```python validation_shuffle: True ``` ### Test Validation speed increase of over 10%. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.

…callback

hongpeng-guo and others added 30 commits June 5, 2025 11:52

[feat] Add support for FSDP2 in GRPO-LoRA (verl-project#1844)

f7f8b04

1. Add: Add support for FSDP2 in GRPO-LoRa 2. Format: Automatic code formatting changes initiated by the pre-commit tool 3. Add: Integrate the end-to-end (e2e) testing of GRPO-LoRA + fsdp2 into the CI pipeline.

Fixed URL for ProRL in README.md (verl-project#1866)

45aec85

Fixed URL for ProRL in README.md

fix: typos (verl-project#1879)

4653f82

fix: typos

Add rollout Module Development Progress & Roadmap (verl-project#1884)

22da46b

Updated readme for rollout related ppcoming features and changes.

[feat] Wandb Timing: Add more detailed timing of gen_sequence and wei…

70bd3d3

…ghts resharding (verl-project#1834)

[rollout] feat: follow OpenAI tool calling schema in chat scheduler (v…

457f4d2

…erl-project#1831)

[docs] moe: add docs for deepseek 671b and qwen-236b (verl-project#1896)

043c72b

[release] chore: bump version to v0.4 (verl-project#1897)

69c2a1a

vinyesm and others added 28 commits July 1, 2025 17:11

use async server and not agent_loop

8ca58d0

debug messages in async_server.py

725d845

rm kwargs arg

2f8d145

pass full config to scheduler_cls

43e7e7f

pass full config to scheduler_cls

2c6d45e

more debug

51eb112

use default ChatCompletionScheduler in async_server, then use custom …

2fe3ad4

…callback

fix: correct args in ChatCompletionScheduler

a51ba7f

override tools to []

be07766

fix chat scheduler (from thibaut)

b24cadb

add extra_info

9172841

more debug msg

6e709e7

more debug prints

6200693

more debug prints

1490442

fix: out of bounds, add non_tensor_batch_index

9f2a482

working version with lean call

0145a40

fix np array ?

ff49a62

fix np array ?

51f91ef

fix np array ?

d3bbe21

fix np array ?

b5281b6

fix np array ?

edc2dd3

fix n epochs

fa1529f

remove debug

f8c46c8

remove debug

64f0221

remove debug

b4b6ba5

mini batch size

fa67d1a

remove prints

eca171f

remove warning

cda66c9

thibautbar changed the base branch from marina/rebase280625 to main July 7, 2025 12:05

thibautbar changed the base branch from main to numina-main July 9, 2025 06:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix some rebased issues#40

Fix some rebased issues#40
thibautbar wants to merge 309 commits into
numina-mainfrom
thibaut/rebase280625_fix

thibautbar commented Jul 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Uh oh!

Conversation

thibautbar commented Jul 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants