Fix some rebased issues#40
Open
thibautbar wants to merge 309 commits into
Open
Conversation
…ect#1851) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Follow-up of verl-project#1838, make the `name_prefix` mechanism same for `RayWorkerGroup` and `RayResourcePool`, default to be `None` and will be initialized randomly. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path. Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Fix ep bug and try to add CI with 15B model, finding smaller models which are more convenient to test. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.
ProRL is a novel training methodology that incorporates KL divergence control, reference policy resetting, and a diverse suite of tasks. The empirical analysis reveals that RL-trained models consistently outperform base models across a wide range of pass@k evaluations, including scenarios where base models fail entirely regardless of the number of attempts. It is developed based on Verl. Link: https://arxiv.org/abs/2505.24864
1. Add: Add support for FSDP2 in GRPO-LoRa 2. Format: Automatic code formatting changes initiated by the pre-commit tool 3. Add: Integrate the end-to-end (e2e) testing of GRPO-LoRA + fsdp2 into the CI pipeline.
…tate. (verl-project#1625) Fix training crash due to missing checkpoint directory We encountered a training crash with error: "RuntimeError: Parent directory /workspace/ckpts/global_step_20 does not exist". It appears that `self.actor_rollout_wg.save_checkpoint`, which should create the checkpoint directory, might be running asynchronously and doesn't complete creating the folder in time. This change explicitly forces creation of the directory before saving the dataloader state to prevent this race condition. ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: [1657](verl-project#1657) - **Training**: FSDP/Megatron - **Inference**: vLLM ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.
### Checklist Before Starting
- [ ] Search for similar PR(s).
### What does this PR do?
there is a tricky bug in per_tensor_generator with
model.named_parameter().
"decoder.layers[n].mlp.router.expert_bias" in GPTModel is not registered
in named_parameter, but in state_dict(). Before this fix, the
router_bias or
`model.layers.{layer_number}.mlp.gate.e_score_correction_bias` is not
transfered from m-core to infer engine.
> Add one-line overview of what this PR aims to achieve or accomplish.
### High-Level Design
> Demonstrate the high-level design if this PR is complex.
### Specific Changes
> List the specific changes.
### API
> Demonstrate how the API changes if any.
### Usage Example
> Provide usage example(s) for easier usage.
```python
# Add code snippet or script demonstrating how to use this
```
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.
### Additional Info.
- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]
### Checklist Before Submitting
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? support training with deepseekv3 671B support MTP on top of verl-project#1284 now it is functional ready for 671B, still lacking of practice > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Add an example for DeepSeek 671B GRPO ### Specific Changes - Need verl-project#1694 - Set `torch._dynamo.config.suppress_errors = True` at entrypoint, if ``` ray.exceptions.RaySystemError: System error: Failed to unpickle serialized exception traceback: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/ray/exceptions.py", line 46, in from_ray_exception return pickle.loads(ray_exception.serialized_exception) TypeError: BackendCompilerFailed.__init__() missing 1 required positional argument: 'inner_exception' ``` ### Additional Info. - vllm as backend, sglang working in process (sgl-project/sglang#6762). Merged when both backends are ready. - For DeepSeek-V3-0324 at `gsm8k`, the reward starts from 0.8 and saturated at around 0.95 using only 3 steps. - Memory peaks around 90GB during actor update (1.5k input + 2.5k output), consider using TP/ETP for a lower requirement. - For gsm8k training using this yaml,  ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.
…DP1 (verl-project#1823) ### Checklist Before Starting - [done] Search for similar PR(s). ### What does this PR do? Mirror the CI for VeRL to run on the NPU and fallback the e2e test of the SFT to FSDP1, as the NPU is not currently adapted for FSDP2 ### Specific Changes Add `.github/workflows/e2e_ascend.yml` Change `tests/e2e/sft/run_sft.sh` ### Checklist Before Submitting - [ done ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ done ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). --------- Co-authored-by: liaochangyue <liaochangyue@bytedance.com>
…t#1867) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? - Run on 512 GPUs with TP1PP16EP32, 2k input + 4k output - Add some tips on memory saving ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
Fixed URL for ProRL in README.md
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. For PPO critic training, the value of EOS tokens should be zero and should not be fitted. However, the current implementation does not mask the EOS token values, resulting in non-zero EOS token values. Although the learning target is zero, when PPO GAE lambda < 1, this affects the advantage calculation for tokens preceding EOS, thereby impacting performance. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com>
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? ray put all the args in advance to avoid duplicate serialization cost for megatron dispatch. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Split docker image used by CI and deepseek-V3 running, using cudnn 9.8 to support MLA. New Image is ``whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.1-te2.3-deepseekv3``. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
…ject#1768) ### Checklist Before Starting - [ done ] Search for similar PR(s). ### What does this PR do? Add an option to generate ray timeline for performance analysing. ### Usage Example Run a job with this option. It can generate the trace file at the end of training. You can view it from https://ui.perfetto.dev/ ``` python3 -m verl.trainer.main_ppo \ ray_init.timeline_json_file=/tmp/timeline.json \ ... ``` <img width="1347" alt="截屏2025-05-30 13 13 56" src="https://github.com/user-attachments/assets/ec57ef94-3ecd-467e-b33f-ae0da3a54c49" />
### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
…t#1872) ### Checklist Before Starting - [ done ] Search for similar PR(s). ### What does this PR do? Fix ci failure from incorrect sgl-kernel version in docker image: ``` File "/usr/local/lib/python3.10/dist-packages/sglang/srt/utils.py", line 647, in assert_pkg_version raise Exception( Exception: sgl-kernel is installed with version 0.1.0, which is less than the minimum required version 0.1.1. Please reinstall the latest version with `pip install sgl-kernel --force-reinstall` ```
fix: typos
Updated readme for rollout related ppcoming features and changes.
…ct#1769) Changed sglang rollout pipeline to async method to have better performance. resolved issue verl-project#1721 ### Checklist Before Starting - [ done ] Search for similar PR(s). ### What does this PR do? In previous version, the sglang async_generate is called with a sync ray actor with lots of sync functions, and resulted poor performance ( GPU SM is 20% in TP2) This PR changed the while pipeline to async method. Performance comparsion to previous "sglang_async" mode: | sglang_async (old) | async (new) | % faster -- | -- | -- | -- timing_s/gen | 95 | 25 | 73.68% timing_s/step | 170 | 90 | 47.06% perf/throughput | 2700 | 4000 | 48.15% ### High-Level Design see verl-project#1698 This is a follow up task from above PR. ### Usage Example examples/grpo_trainer/run_qwen2-7b_seq_balance.sh ### Test .github/workflows/e2e_ppo_trainer.yml ### Additional Info. - **Issue Number**: Fixes issue verl-project#1721 ### Checklist Before Submitting - [ done ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ done ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ done ] Add `[BREAKING]` to the PR title if it breaks any API. - [ done ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ done ] Add CI test(s) if necessary.
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Support DAPO algorithm on npu ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes 1. change `cuda` hardcode to get_torch_device() 2. add `device_name` parameter to RayDAPOTrainer ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] New CI unit test(s) are added to cover the code path. - [x] Rely on existing unit tests on CI that covers the code path.
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. To handle the process bar update frequency when training in DAPO. ### Specific Changes > List the specific changes. 1.When we set algorithm.filter_groups.enable=true, the DAPO training process will skip samples whose advantages are all 0 or 1. 2.However, the progress bar does not update simultaneously, which can confuse users. 3.This merge request addresses the issue by updating the progress bar before filtering the samples. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. Co-authored-by: techzhu <techzhu@tencent.com>
### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? - Add retool qwen3 dataset and sft - The original retool doesn't follow standard qwen multiturn chat template. In this PR, we recompile the dataset and add a SFT script to train QWen-8b ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.
### Checklist Before Starting
- [x ] Search for similar PR(s).
### What does this PR do?
Update tensordict version
Resolve PPO training error
+ python3 -m verl.trainer.main_ppo algorithm.adv_estimator=gae
data.train_files=/root/data/gsm8k/train.parquet
data.val_files=/root/data/gsm8k/test.parquet data.train_batch_size=256
data.max_prompt_length=512 data.max_response_length=512
data.return_raw_chat=True
actor_rollout_ref.model.path=/root/models/Qwen/Qwen2.5-0.5B
actor_rollout_ref.model.use_liger=True
actor_rollout_ref.actor.optim.lr=1e-6
actor_rollout_ref.model.use_remove_padding=True
actor_rollout_ref.actor.optim.lr_warmup_steps_ratio=0.1
actor_rollout_ref.actor.ppo_mini_batch_size=128
actor_rollout_ref.actor.use_dynamic_bsz=False
actor_rollout_ref.actor.ppo_max_token_len_per_gpu=32768
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=2
actor_rollout_ref.actor.ulysses_sequence_parallel_size=1
actor_rollout_ref.actor.fsdp_config.param_offload=False
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False
actor_rollout_ref.actor.use_kl_loss=False
actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=32768
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=2
actor_rollout_ref.rollout.tensor_model_parallel_size=2
actor_rollout_ref.rollout.name=vllm
actor_rollout_ref.rollout.gpu_memory_utilization=0.8
actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=32768
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=2
critic.optim.lr=1e-5 critic.ulysses_sequence_parallel_size=1
critic.model.use_remove_padding=True
critic.optim.lr_warmup_steps_ratio=0.05
critic.model.path=/root/models/Qwen/Qwen2.5-0.5B
critic.model.enable_gradient_checkpointing=False
critic.use_dynamic_bsz=False critic.ppo_max_token_len_per_gpu=32768
critic.ppo_micro_batch_size_per_gpu=2
critic.model.fsdp_config.param_offload=False
critic.model.fsdp_config.optimizer_offload=False
reward_model.enable=True reward_model.ulysses_sequence_parallel_size=1
reward_model.model.path=/root/models/Qwen/Qwen2.5-0.5B
reward_model.model.use_remove_padding=True
reward_model.model.fsdp_config.param_offload=True
reward_model.use_dynamic_bsz=False
reward_model.forward_max_token_len_per_gpu=32768
reward_model.micro_batch_size_per_gpu=2 algorithm.use_kl_in_reward=False
trainer.critic_warmup=0 'trainer.logger=[console]'
trainer.project_name=verl-test
trainer.experiment_name=qwen2.5-0.5b-model-reward-minimal
trainer.nnodes=1 trainer.n_gpus_per_node=8
trainer.val_before_train=False trainer.test_freq=False
trainer.save_freq=-1 trainer.resume_mode=disable trainer.total_epochs=2
trainer.total_training_steps=1
Traceback (most recent call last):
File "<frozen runpy>", line 189, in _run_module_as_main
File "<frozen runpy>", line 112, in _get_module_details
File "/sgl-workspace/verl/__init__.py", line 22, in <module>
from .protocol import DataProto
File "/sgl-workspace/verl/protocol.py", line 30, in <module>
import tensordict
File "/usr/local/lib/python3.12/dist-packages/tensordict/__init__.py",
line 6, in <module>
import tensordict._reductions
File
"/usr/local/lib/python3.12/dist-packages/tensordict/_reductions.py",
line 11, in <module>
from tensordict._lazy import LazyStackedTensorDict
File "/usr/local/lib/python3.12/dist-packages/tensordict/_lazy.py", line
38, in <module>
from tensordict.memmap import MemoryMappedTensor
File "/usr/local/lib/python3.12/dist-packages/tensordict/memmap.py",
line 25, in <module>
from torch.multiprocessing.reductions import ForkingPickler
ImportError: cannot import name 'ForkingPickler' from
'torch.multiprocessing.reductions'
(/usr/local/lib/python3.12/dist-packages/torch/multiprocessing/reductions.py)
### Checklist Before Submitting
- [x ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [x ] Rely on existing unit tests on CI that covers the code path.
Signed-off-by: Vicky Tsang <vtsang@amd.com>
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? In scenarios involving multiple validation sets, where the difficulty levels of these sets differ significantly and the generated content lengths vary notably, the order in which the validation sets are processed can have a substantial impact on the validation speed. ### High-Level Design add validation shuffle ### Usage Example > Provide usage example(s) for easier usage. ```python validation_shuffle: True ``` ### Test Validation speed increase of over 10%. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.