Cherry-picks of numina-main commits on a refreshed verl main by desaxce · Pull Request #38 · project-numina/verl

desaxce · 2025-07-01T14:56:59Z

On a fresh main of verl, I did:

git cherry-pick $(git rev-list --reverse origin/main..numina-main)

Resolved conflicts where needed
Skipped all merge commits
Allowed empty commits

Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>

@ETOgaosion

… index.rst. (verl-project#1539) - [x] Search for similar PR(s). - move dev folder to scripts @ETOgaosion - add sandbox documentation to index.rst @chenhaiq - installation docs have been updated - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.

…ject#1513) - [x] Search for similar PR(s). fix head_dim in GQA model when load from hf ckpt > Demonstrate the high-level design if this PR is complex. - Change the acquisition methods of q and kv head_dim to be compatible with GQA. - Add the conversions of q_layernorm and k_layernorm in convert_megatron_model_to_transformers_model for Qwen3. > Demonstrate how the API changes if any. > Provide usage example(s) for easier usage. ```python ``` > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. - **Issue Number**: Fixes issue verl-project#1510 - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. --------- Signed-off-by: ShareLer <ShareLe@163.com>

- [x] Search for similar PR(s). This PR adds support for a new truncation mode, **middle**, for loading datasets. It enables data that exceed the `max_prompt_length` to retain both the beginning and the end of the prompt, instead of truncating content only from the left or only from the right. The implementation introduces a `"middle"` option, alongside the existing truncation modes, making changes in both `rl_dataset.py` and `torch_functional.py`. When selected, the logic splits the allowed max length roughly in half and keeps the head and tail of the sequence, effectively discarding the middle section. **In `verl/utils/dataset/rl_dataset.py`:** - Added support for `self.truncation == "middle"` at line ~233. - Performs symmetric truncation from both ends of the prompt: ```python elif self.truncation == "middle": left_half = raw_prompt_ids[: self.max_prompt_length // 2] right_half = raw_prompt_ids[-self.max_prompt_length // 2 :] raw_prompt_ids = left_half + right_half ``` **In `verl/utils/torch_functional.py`:** - Added support for `"middle"` truncation mode in the `postprocess_data` function. - Updated truncation assertion to include `"middle"`: ```python assert truncation in ["left", "right", "middle", "error"] ``` - Implemented middle truncation logic: ```python elif truncation == "middle": left_half = max_length // 2 right_half = max_length - left_half input_ids = torch.cat([input_ids[:, :left_half], input_ids[:, -right_half:]], dim=-1) attention_mask = torch.cat([attention_mask[:, :left_half], attention_mask[:, -right_half:]], dim=-1) ``` - Adds `"middle"` as a valid option to the `truncation` argument in the API. ```python from verl.utils.dataset.rl_dataset import RLDataset rl_dataset = RLDataset( ..., # other args truncation="middle" ) ``` This change aligns with precedents from long-context evaluation benchmarks, where *middle truncation* is the default/preferred method for handling overly long inputs: - [LongBench implementation](https://github.com/THUDM/LongBench/blob/2e00731f8d0bff23dc4325161044d0ed8af94c1e/LongBench/pred.py#L56) ([paper](https://arxiv.org/pdf/2308.14508)) - [InfiniteBench implementation](https://github.com/OpenBMB/InfiniteBench/blob/51d9b37b0f1790ead936df2243abbf7f0420e439/src/eval_utils.py#L413) ([paper](https://arxiv.org/pdf/2402.13718)) Both benchmarks favor middle truncation for long inputs, as it better preserves relevant context information from both the beginning and end of the sequence. - **Issue Number**: N/A (no linked issue yet) - **Training**: None affected - **Inference**: None affected - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Co-authored-by: Wang Siyuan <v-siywang@microsoft.com> Co-authored-by: Wang Siyuan <wsy0227@sjtu.edu.cn>

…with CI tests, Fix vllm resharding process (verl-project#1444) - [x] Search for similar PR(s). 1. This PR eliminates the micro-dp group as the article says, and support train-infer tp to be different. 2. Side Effect: able to run Qwen3moe on megatron aligned with FSDP. 3. CI tests have been added to check the effect. This PR eliminates the micro-dp group as the article says, since the `generate_sequence` process only relates to inference engine, there is no need for us to consider the training side. The only problem now is that the `dispatch/collect` function cannot directly use the inference parallel size, so current solution is that we define a new `MEGATRON_ALL_DP` dispatch method to view all ranks as Data Parallel rank, which is the same as FSDP. So we follow the way of FSDP to pre/post-process the data. Mainly in `megatron_vllm.py` None ```sh actor_rollout_ref.actor.megatron.tensor_model_parallel_size=2 \ actor_rollout_ref.rollout.tensor_model_parallel_size=4 \ actor_rollout_ref.actor.megatron.tensor_model_parallel_size=4 \ actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ ``` Added CI tests. For e2e test with Qwen 2.5 7B, please refer to `examples/grpo_trainer/run_qwen2_5-7b_math_megatron_diff_tp.sh` - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: Megatron - **Inference**: vLLM - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.

@marco-dossantos

This PR now: - Reverts changes by @marco-dossantos in verl/verl/protocol.py - Include both extra info and turn info in batch reward manager. - Fixes a bug that prevents from detokenizing multi-turn conversation correctly in `ray_trainer.py` - Sets timeout to None in new async server to allow long waits for long token completions. - Fixes a bug where filter overlong prompt is not based on token length but pure length (I think token length is better as design choice)

* first attempt * balance negative grad

This reverts commit dbd7d0c.

* add numina metrics function * fix * add pass@k * fixes and debug prints * fix * wip * add todo * wip * wip * wip * wip * wip * fix * clean * fix: put back pass@32 in val-core * clean * clean * clean * clean

#35) * fix * wip * wip * wip * wip * clean * use_fast tok * debug prints * debug * add LRU max_cache_size as param for rollout * clean * clean * clean

paulinebourigault and others added 30 commits July 1, 2025 15:39

negative positive token length to wandb

66550dd

negative positive token length

4a32905

update ppo/ray_trainer, ppo/reward

4f56054

simplify negative positive token length

6f3bd65

length penaly in reward function

de5a1fe

change default

fc2f34a

added length penalty option

fefce88

fix conflict

d475cae

pass at k grpo

34cb91d

add option to keep extra_info in batch before inference

73ce623

Revert "Track Negative and Positive Token Length"

6158a9a

avg + - token len

2f3d834

extra_info for validation

f645286

fix

f8770e0

fix

1acb5d8

small change

9703f72

change dataset dump for robustness, message parsing, and extra info

d13ba8c

Fix reinforce_plus_plus_baseline advantage mask (verl-project#1527)

ab01d9e

[chore] refactor: clean utils code. (verl-project#1290)

78f68c5

Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>

remove test

9617371

grpo pass@k

372705c

self imitation learning (#25)

1b1148e

Small Fixes

c744271

Disable SIL database when the flag is false (#30)

ee3efd3

Balance Negative Gradient (#31)

6f6af66

* first attempt * balance negative grad

thibautbar and others added 5 commits July 1, 2025 16:48

Fix self-imitation learning memory issues (#32)

ab6c47b

add numina specific metrics

5d4dcbf

Revert "add numina specific metrics"

7bab39f

This reverts commit dbd7d0c.

Numina metrics: pass@k and others (#34)

cb004f7

* add numina metrics function * fix * add pass@k * fixes and debug prints * fix * wip * add todo * wip * wip * wip * wip * wip * fix * clean * fix: put back pass@32 in val-core * clean * clean * clean * clean

Avoid KeyError when actor_rollout_ref.rollout.n > 8 and batchsize=1024 (

b7261c7

#35) * fix * wip * wip * wip * wip * clean * use_fast tok * debug prints * debug * add LRU max_cache_size as param for rollout * clean * clean * clean

desaxce force-pushed the numina-hugues branch 3 times, most recently from 31ea404 to 38d9222 Compare July 1, 2025 15:54

fix: imports in megatron_workers

0834a3c

desaxce force-pushed the numina-hugues branch from 38d9222 to 0834a3c Compare July 1, 2025 15:59

thibautbar and others added 2 commits July 1, 2025 18:20

fix serialization (#37)

ffae939

fix: remove duplicate advantage estimator

d5f3579

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cherry-picks of numina-main commits on a refreshed verl main#38

Cherry-picks of numina-main commits on a refreshed verl main#38
desaxce wants to merge 38 commits into
mainfrom
numina-hugues

desaxce commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

Uh oh!

Conversation

desaxce commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants