Skip to content

Fix some rebased issues#40

Open
thibautbar wants to merge 309 commits into
numina-mainfrom
thibaut/rebase280625_fix
Open

Fix some rebased issues#40
thibautbar wants to merge 309 commits into
numina-mainfrom
thibaut/rebase280625_fix

Conversation

@thibautbar

Copy link
Copy Markdown

No description provided.

hongpeng-guo and others added 30 commits June 5, 2025 11:52
…ect#1851)

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Follow-up of verl-project#1838, make the `name_prefix` mechanism same for
`RayWorkerGroup` and `RayResourcePool`, default to be `None` and will be
initialized randomly.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Fix ep bug and try to add CI with 15B model, finding smaller models
which are more convenient to test.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
ProRL is a novel training methodology that incorporates KL divergence
control, reference policy resetting, and a diverse suite of tasks. The
empirical analysis reveals that RL-trained models consistently
outperform base models across a wide range of pass@k evaluations,
including scenarios where base models fail entirely regardless of the
number of attempts.

It is developed based on Verl. 

Link: https://arxiv.org/abs/2505.24864
1. Add: Add support for FSDP2 in GRPO-LoRa
2. Format: Automatic code formatting changes initiated by the pre-commit
tool
3. Add: Integrate the end-to-end (e2e) testing of GRPO-LoRA + fsdp2 into
the CI pipeline.
…tate. (verl-project#1625)

Fix training crash due to missing checkpoint directory

We encountered a training crash with error: "RuntimeError: Parent
directory /workspace/ckpts/global_step_20 does not exist".

It appears that `self.actor_rollout_wg.save_checkpoint`, which should
create the checkpoint directory, might be running asynchronously and
doesn't complete creating the folder in time.

This change explicitly forces creation of the directory before saving
the dataloader state to prevent this race condition.

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**:
[1657](verl-project#1657)
- **Training**: FSDP/Megatron
- **Inference**: vLLM

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

there is a tricky bug in per_tensor_generator with
model.named_parameter().
"decoder.layers[n].mlp.router.expert_bias" in GPTModel is not registered
in named_parameter, but in state_dict(). Before this fix, the
router_bias or
`model.layers.{layer_number}.mlp.gate.e_score_correction_bias` is not
transfered from m-core to infer engine.





> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

support training with deepseekv3 671B
support MTP on top of verl-project#1284 

now it is functional ready for 671B, still lacking of practice

> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Add an example for DeepSeek 671B GRPO

### Specific Changes

- Need verl-project#1694
- Set `torch._dynamo.config.suppress_errors = True` at entrypoint, if 

```
ray.exceptions.RaySystemError: System error: Failed to unpickle serialized exception
traceback: Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ray/exceptions.py", line 46, in from_ray_exception
    return pickle.loads(ray_exception.serialized_exception)
TypeError: BackendCompilerFailed.__init__() missing 1 required positional argument: 'inner_exception'
```

### Additional Info.

- vllm as backend, sglang working in process
(sgl-project/sglang#6762). Merged when both
backends are ready.
- For DeepSeek-V3-0324 at `gsm8k`, the reward starts from 0.8 and
saturated at around 0.95 using only 3 steps.
- Memory peaks around 90GB during actor update (1.5k input + 2.5k
output), consider using TP/ETP for a lower requirement.
- For gsm8k training using this yaml,


![image](https://github.com/user-attachments/assets/d16cf959-5845-4dd0-95af-07fc35820f18)


### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
…DP1 (verl-project#1823)

### Checklist Before Starting

- [done] Search for similar PR(s).

### What does this PR do?

Mirror the CI for VeRL to run on the NPU and fallback the e2e test of
the SFT to FSDP1, as the NPU is not currently adapted for FSDP2

### Specific Changes

Add `.github/workflows/e2e_ascend.yml`
Change `tests/e2e/sft/run_sft.sh`

### Checklist Before Submitting

- [ done ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ done ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).

---------

Co-authored-by: liaochangyue <liaochangyue@bytedance.com>
…t#1867)

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

- Run on 512 GPUs with TP1PP16EP32, 2k input + 4k output
- Add some tips on memory saving

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

For PPO critic training, the value of EOS tokens should be zero and
should not be fitted. However, the current implementation does not mask
the EOS token values, resulting in non-zero EOS token values. Although
the learning target is zero, when PPO GAE lambda < 1, this affects the
advantage calculation for tokens preceding EOS, thereby impacting
performance.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com>
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

ray put all the args in advance to avoid duplicate serialization cost
for megatron dispatch.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Split docker image used by CI and deepseek-V3 running, using cudnn 9.8
to support MLA.

New Image is
``whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.1-te2.3-deepseekv3``.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
…ject#1768)

### Checklist Before Starting

- [ done ] Search for similar PR(s).

### What does this PR do?

Add an option to generate ray timeline for performance analysing.

### Usage Example
Run a job with this option. It can generate the trace file at the end of
training. You can view it from https://ui.perfetto.dev/
```
python3 -m verl.trainer.main_ppo \
    ray_init.timeline_json_file=/tmp/timeline.json \
...
```


<img width="1347" alt="截屏2025-05-30 13 13 56"
src="https://github.com/user-attachments/assets/ec57ef94-3ecd-467e-b33f-ae0da3a54c49"
/>
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
…t#1872)

### Checklist Before Starting

- [ done ] Search for similar PR(s).

### What does this PR do?

Fix ci failure from incorrect sgl-kernel version in docker image:

```
File "/usr/local/lib/python3.10/dist-packages/sglang/srt/utils.py", line 647, in assert_pkg_version
    raise Exception(
Exception: sgl-kernel is installed with version 0.1.0, which is less than the minimum required version 0.1.1. Please reinstall the latest version with `pip install sgl-kernel --force-reinstall`
```
Updated readme for rollout related ppcoming features and changes.
…ct#1769)

Changed sglang rollout pipeline to async method to have better
performance.

resolved issue verl-project#1721

### Checklist Before Starting

- [ done ] Search for similar PR(s).

### What does this PR do?

In previous version, the sglang async_generate is called with a sync ray
actor with lots of sync functions, and resulted poor performance ( GPU
SM is 20% in TP2)

This PR changed the while pipeline to async method. 

Performance comparsion to previous "sglang_async" mode:
  | sglang_async (old) | async (new) | % faster
-- | -- | -- | --
timing_s/gen | 95 | 25 | 73.68%
timing_s/step | 170 | 90 | 47.06%
perf/throughput | 2700 | 4000 | 48.15%

### High-Level Design

see verl-project#1698

This is a follow up task from above PR.


### Usage Example

examples/grpo_trainer/run_qwen2-7b_seq_balance.sh

### Test

.github/workflows/e2e_ppo_trainer.yml

### Additional Info.

- **Issue Number**: Fixes issue verl-project#1721

### Checklist Before Submitting

- [ done ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ done ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ done ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ done ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ done ] Add CI test(s) if necessary.
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Support DAPO algorithm on npu

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

1. change `cuda` hardcode to get_torch_device()
2. add `device_name` parameter to RayDAPOTrainer

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] New CI unit test(s) are added to cover the code path.
- [x] Rely on existing unit tests on CI that covers the code path.
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

To handle the process bar update frequency when training in DAPO.

### Specific Changes

> List the specific changes.

1.When we set algorithm.filter_groups.enable=true, the DAPO training
process will skip samples whose advantages are all 0 or 1.
2.However, the progress bar does not update simultaneously, which can
confuse users.
3.This merge request addresses the issue by updating the progress bar
before filtering the samples.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

Co-authored-by: techzhu <techzhu@tencent.com>
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

- Add retool qwen3 dataset and sft
- The original retool doesn't follow standard qwen multiturn chat
template. In this PR, we recompile the dataset and add a SFT script to
train QWen-8b

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
### Checklist Before Starting

- [x ] Search for similar PR(s).

### What does this PR do?

Update tensordict version

Resolve PPO training error
+ python3 -m verl.trainer.main_ppo algorithm.adv_estimator=gae
data.train_files=/root/data/gsm8k/train.parquet
data.val_files=/root/data/gsm8k/test.parquet data.train_batch_size=256
data.max_prompt_length=512 data.max_response_length=512
data.return_raw_chat=True
actor_rollout_ref.model.path=/root/models/Qwen/Qwen2.5-0.5B
actor_rollout_ref.model.use_liger=True
actor_rollout_ref.actor.optim.lr=1e-6
actor_rollout_ref.model.use_remove_padding=True
actor_rollout_ref.actor.optim.lr_warmup_steps_ratio=0.1
actor_rollout_ref.actor.ppo_mini_batch_size=128
actor_rollout_ref.actor.use_dynamic_bsz=False
actor_rollout_ref.actor.ppo_max_token_len_per_gpu=32768
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=2
actor_rollout_ref.actor.ulysses_sequence_parallel_size=1
actor_rollout_ref.actor.fsdp_config.param_offload=False
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False
actor_rollout_ref.actor.use_kl_loss=False
actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=32768
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=2
actor_rollout_ref.rollout.tensor_model_parallel_size=2
actor_rollout_ref.rollout.name=vllm
actor_rollout_ref.rollout.gpu_memory_utilization=0.8
actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=32768
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=2
critic.optim.lr=1e-5 critic.ulysses_sequence_parallel_size=1
critic.model.use_remove_padding=True
critic.optim.lr_warmup_steps_ratio=0.05
critic.model.path=/root/models/Qwen/Qwen2.5-0.5B
critic.model.enable_gradient_checkpointing=False
critic.use_dynamic_bsz=False critic.ppo_max_token_len_per_gpu=32768
critic.ppo_micro_batch_size_per_gpu=2
critic.model.fsdp_config.param_offload=False
critic.model.fsdp_config.optimizer_offload=False
reward_model.enable=True reward_model.ulysses_sequence_parallel_size=1
reward_model.model.path=/root/models/Qwen/Qwen2.5-0.5B
reward_model.model.use_remove_padding=True
reward_model.model.fsdp_config.param_offload=True
reward_model.use_dynamic_bsz=False
reward_model.forward_max_token_len_per_gpu=32768
reward_model.micro_batch_size_per_gpu=2 algorithm.use_kl_in_reward=False
trainer.critic_warmup=0 'trainer.logger=[console]'
trainer.project_name=verl-test
trainer.experiment_name=qwen2.5-0.5b-model-reward-minimal
trainer.nnodes=1 trainer.n_gpus_per_node=8
trainer.val_before_train=False trainer.test_freq=False
trainer.save_freq=-1 trainer.resume_mode=disable trainer.total_epochs=2
trainer.total_training_steps=1
Traceback (most recent call last):
  File "<frozen runpy>", line 189, in _run_module_as_main
  File "<frozen runpy>", line 112, in _get_module_details
  File "/sgl-workspace/verl/__init__.py", line 22, in <module>
    from .protocol import DataProto
  File "/sgl-workspace/verl/protocol.py", line 30, in <module>
    import tensordict
File "/usr/local/lib/python3.12/dist-packages/tensordict/__init__.py",
line 6, in <module>
    import tensordict._reductions
File
"/usr/local/lib/python3.12/dist-packages/tensordict/_reductions.py",
line 11, in <module>
    from tensordict._lazy import LazyStackedTensorDict
File "/usr/local/lib/python3.12/dist-packages/tensordict/_lazy.py", line
38, in <module>
    from tensordict.memmap import MemoryMappedTensor
File "/usr/local/lib/python3.12/dist-packages/tensordict/memmap.py",
line 25, in <module>
    from torch.multiprocessing.reductions import ForkingPickler
ImportError: cannot import name 'ForkingPickler' from
'torch.multiprocessing.reductions'
(/usr/local/lib/python3.12/dist-packages/torch/multiprocessing/reductions.py)

### Checklist Before Submitting

- [x ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [x ] Rely on existing unit tests on CI that covers the code path.

Signed-off-by: Vicky Tsang <vtsang@amd.com>
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

In scenarios involving multiple validation sets, where the difficulty
levels of these sets differ significantly and the generated content
lengths vary notably, the order in which the validation sets are
processed can have a substantial impact on the validation speed.

### High-Level Design

add validation shuffle

### Usage Example

> Provide usage example(s) for easier usage.

```python
validation_shuffle: True
```

### Test

Validation speed increase of over 10%.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.
@thibautbar thibautbar changed the base branch from marina/rebase280625 to main July 7, 2025 12:05
@thibautbar thibautbar changed the base branch from main to numina-main July 9, 2025 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.