Skip to content

Cherry-picks of numina-main commits on a refreshed verl main#38

Open
desaxce wants to merge 38 commits into
mainfrom
numina-hugues
Open

Cherry-picks of numina-main commits on a refreshed verl main#38
desaxce wants to merge 38 commits into
mainfrom
numina-hugues

Conversation

@desaxce

@desaxce desaxce commented Jul 1, 2025

Copy link
Copy Markdown

On a fresh main of verl, I did:

git cherry-pick $(git rev-list --reverse origin/main..numina-main)

  • Resolved conflicts where needed
  • Skipped all merge commits
  • Allowed empty commits

paulinebourigault and others added 30 commits July 1, 2025 15:39
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
… index.rst. (verl-project#1539)

- [x] Search for similar PR(s).

- move dev folder to scripts @ETOgaosion
- add sandbox documentation to index.rst @chenhaiq
- installation docs have been updated

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
…ject#1513)

- [x] Search for similar PR(s).

fix head_dim in GQA model when load from hf ckpt

> Demonstrate the high-level design if this PR is complex.

- Change the acquisition methods of q and kv head_dim to be compatible
with GQA.
- Add the conversions of q_layernorm and k_layernorm in
convert_megatron_model_to_transformers_model for Qwen3.

> Demonstrate how the API changes if any.

> Provide usage example(s) for easier usage.

```python
```

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

- **Issue Number**: Fixes issue verl-project#1510

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.

---------

Signed-off-by: ShareLer <ShareLe@163.com>
- [x] Search for similar PR(s).

This PR adds support for a new truncation mode, **middle**, for loading
datasets. It enables data that exceed the `max_prompt_length` to retain
both the beginning and the end of the prompt, instead of truncating
content only from the left or only from the right.

The implementation introduces a `"middle"` option, alongside the
existing truncation modes, making changes in both `rl_dataset.py` and
`torch_functional.py`. When selected, the logic splits the allowed max
length roughly in half and keeps the head and tail of the sequence,
effectively discarding the middle section.

**In `verl/utils/dataset/rl_dataset.py`:**
- Added support for `self.truncation == "middle"` at line ~233.
- Performs symmetric truncation from both ends of the prompt:
```python
elif self.truncation == "middle":
    left_half = raw_prompt_ids[: self.max_prompt_length // 2]
    right_half = raw_prompt_ids[-self.max_prompt_length // 2 :]
    raw_prompt_ids = left_half + right_half
```

**In `verl/utils/torch_functional.py`:**
- Added support for `"middle"` truncation mode in the `postprocess_data`
function.
- Updated truncation assertion to include `"middle"`:
```python
assert truncation in ["left", "right", "middle", "error"]
```
- Implemented middle truncation logic:
```python
elif truncation == "middle":
    left_half = max_length // 2
    right_half = max_length - left_half
    input_ids = torch.cat([input_ids[:, :left_half], input_ids[:, -right_half:]], dim=-1)
    attention_mask = torch.cat([attention_mask[:, :left_half], attention_mask[:, -right_half:]], dim=-1)
```

- Adds `"middle"` as a valid option to the `truncation` argument in the
API.

```python
from verl.utils.dataset.rl_dataset import RLDataset

rl_dataset = RLDataset(
    ...,  # other args
    truncation="middle"
)
```

This change aligns with precedents from long-context evaluation
benchmarks, where *middle truncation* is the default/preferred method
for handling overly long inputs:

- [LongBench
implementation](https://github.com/THUDM/LongBench/blob/2e00731f8d0bff23dc4325161044d0ed8af94c1e/LongBench/pred.py#L56)
([paper](https://arxiv.org/pdf/2308.14508))
- [InfiniteBench
implementation](https://github.com/OpenBMB/InfiniteBench/blob/51d9b37b0f1790ead936df2243abbf7f0420e439/src/eval_utils.py#L413)
([paper](https://arxiv.org/pdf/2402.13718))

Both benchmarks favor middle truncation for long inputs, as it better
preserves relevant context information from both the beginning and end
of the sequence.

- **Issue Number**: N/A (no linked issue yet)
- **Training**: None affected
- **Inference**: None affected

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.

---------

Co-authored-by: Wang Siyuan <v-siywang@microsoft.com>
Co-authored-by: Wang Siyuan <wsy0227@sjtu.edu.cn>
…with CI tests, Fix vllm resharding process (verl-project#1444)

- [x] Search for similar PR(s).

1. This PR eliminates the micro-dp group as the article says, and
support train-infer tp to be different.
2. Side Effect: able to run Qwen3moe on megatron aligned with FSDP.
3. CI tests have been added to check the effect.

This PR eliminates the micro-dp group as the article says, since the
`generate_sequence` process only relates to inference engine, there is
no need for us to consider the training side.

The only problem now is that the `dispatch/collect` function cannot
directly use the inference parallel size, so current solution is that we
define a new `MEGATRON_ALL_DP` dispatch method to view all ranks as Data
Parallel rank, which is the same as FSDP.

So we follow the way of FSDP to pre/post-process the data.

Mainly in `megatron_vllm.py`

None

```sh
actor_rollout_ref.actor.megatron.tensor_model_parallel_size=2 \
actor_rollout_ref.rollout.tensor_model_parallel_size=4 \

actor_rollout_ref.actor.megatron.tensor_model_parallel_size=4 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
```

Added CI tests.

For e2e test with Qwen 2.5 7B, please refer to
`examples/grpo_trainer/run_qwen2_5-7b_math_megatron_diff_tp.sh`

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: Megatron
- **Inference**: vLLM

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
This PR now:
- Reverts changes by @marco-dossantos in verl/verl/protocol.py
- Include both extra info and turn info in batch reward manager.
- Fixes a bug that prevents from detokenizing multi-turn conversation correctly in `ray_trainer.py`
- Sets timeout to None in new async server to allow long waits for long token completions.
- Fixes a bug where filter overlong prompt is not based on token length but pure length (I think token length is better as design choice)
* first attempt

* balance negative grad
thibautbar and others added 5 commits July 1, 2025 16:48
* add numina metrics function

* fix

* add pass@k

* fixes and debug prints

* fix

* wip

* add todo

* wip

* wip

* wip

* wip

* wip

* fix

* clean

* fix: put back pass@32 in val-core

* clean

* clean

* clean

* clean
#35)

* fix

* wip

* wip

* wip

* wip

* clean

* use_fast tok

* debug prints

* debug

* add LRU max_cache_size as param for rollout

* clean

* clean

* clean
@desaxce desaxce force-pushed the numina-hugues branch 3 times, most recently from 31ea404 to 38d9222 Compare July 1, 2025 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.