Cherry-picks of numina-main commits on a refreshed verl main#38
Open
desaxce wants to merge 38 commits into
Open
Cherry-picks of numina-main commits on a refreshed verl main#38desaxce wants to merge 38 commits into
desaxce wants to merge 38 commits into
Conversation
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
… index.rst. (verl-project#1539) - [x] Search for similar PR(s). - move dev folder to scripts @ETOgaosion - add sandbox documentation to index.rst @chenhaiq - installation docs have been updated - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.
…ject#1513) - [x] Search for similar PR(s). fix head_dim in GQA model when load from hf ckpt > Demonstrate the high-level design if this PR is complex. - Change the acquisition methods of q and kv head_dim to be compatible with GQA. - Add the conversions of q_layernorm and k_layernorm in convert_megatron_model_to_transformers_model for Qwen3. > Demonstrate how the API changes if any. > Provide usage example(s) for easier usage. ```python ``` > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. - **Issue Number**: Fixes issue verl-project#1510 - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. --------- Signed-off-by: ShareLer <ShareLe@163.com>
- [x] Search for similar PR(s).
This PR adds support for a new truncation mode, **middle**, for loading
datasets. It enables data that exceed the `max_prompt_length` to retain
both the beginning and the end of the prompt, instead of truncating
content only from the left or only from the right.
The implementation introduces a `"middle"` option, alongside the
existing truncation modes, making changes in both `rl_dataset.py` and
`torch_functional.py`. When selected, the logic splits the allowed max
length roughly in half and keeps the head and tail of the sequence,
effectively discarding the middle section.
**In `verl/utils/dataset/rl_dataset.py`:**
- Added support for `self.truncation == "middle"` at line ~233.
- Performs symmetric truncation from both ends of the prompt:
```python
elif self.truncation == "middle":
left_half = raw_prompt_ids[: self.max_prompt_length // 2]
right_half = raw_prompt_ids[-self.max_prompt_length // 2 :]
raw_prompt_ids = left_half + right_half
```
**In `verl/utils/torch_functional.py`:**
- Added support for `"middle"` truncation mode in the `postprocess_data`
function.
- Updated truncation assertion to include `"middle"`:
```python
assert truncation in ["left", "right", "middle", "error"]
```
- Implemented middle truncation logic:
```python
elif truncation == "middle":
left_half = max_length // 2
right_half = max_length - left_half
input_ids = torch.cat([input_ids[:, :left_half], input_ids[:, -right_half:]], dim=-1)
attention_mask = torch.cat([attention_mask[:, :left_half], attention_mask[:, -right_half:]], dim=-1)
```
- Adds `"middle"` as a valid option to the `truncation` argument in the
API.
```python
from verl.utils.dataset.rl_dataset import RLDataset
rl_dataset = RLDataset(
..., # other args
truncation="middle"
)
```
This change aligns with precedents from long-context evaluation
benchmarks, where *middle truncation* is the default/preferred method
for handling overly long inputs:
- [LongBench
implementation](https://github.com/THUDM/LongBench/blob/2e00731f8d0bff23dc4325161044d0ed8af94c1e/LongBench/pred.py#L56)
([paper](https://arxiv.org/pdf/2308.14508))
- [InfiniteBench
implementation](https://github.com/OpenBMB/InfiniteBench/blob/51d9b37b0f1790ead936df2243abbf7f0420e439/src/eval_utils.py#L413)
([paper](https://arxiv.org/pdf/2402.13718))
Both benchmarks favor middle truncation for long inputs, as it better
preserves relevant context information from both the beginning and end
of the sequence.
- **Issue Number**: N/A (no linked issue yet)
- **Training**: None affected
- **Inference**: None affected
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
---------
Co-authored-by: Wang Siyuan <v-siywang@microsoft.com>
Co-authored-by: Wang Siyuan <wsy0227@sjtu.edu.cn>
…with CI tests, Fix vllm resharding process (verl-project#1444) - [x] Search for similar PR(s). 1. This PR eliminates the micro-dp group as the article says, and support train-infer tp to be different. 2. Side Effect: able to run Qwen3moe on megatron aligned with FSDP. 3. CI tests have been added to check the effect. This PR eliminates the micro-dp group as the article says, since the `generate_sequence` process only relates to inference engine, there is no need for us to consider the training side. The only problem now is that the `dispatch/collect` function cannot directly use the inference parallel size, so current solution is that we define a new `MEGATRON_ALL_DP` dispatch method to view all ranks as Data Parallel rank, which is the same as FSDP. So we follow the way of FSDP to pre/post-process the data. Mainly in `megatron_vllm.py` None ```sh actor_rollout_ref.actor.megatron.tensor_model_parallel_size=2 \ actor_rollout_ref.rollout.tensor_model_parallel_size=4 \ actor_rollout_ref.actor.megatron.tensor_model_parallel_size=4 \ actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ ``` Added CI tests. For e2e test with Qwen 2.5 7B, please refer to `examples/grpo_trainer/run_qwen2_5-7b_math_megatron_diff_tp.sh` - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: Megatron - **Inference**: vLLM - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.
This PR now: - Reverts changes by @marco-dossantos in verl/verl/protocol.py - Include both extra info and turn info in batch reward manager. - Fixes a bug that prevents from detokenizing multi-turn conversation correctly in `ray_trainer.py` - Sets timeout to None in new async server to allow long waits for long token completions. - Fixes a bug where filter overlong prompt is not based on token length but pure length (I think token length is better as design choice)
* first attempt * balance negative grad
This reverts commit dbd7d0c.
* add numina metrics function * fix * add pass@k * fixes and debug prints * fix * wip * add todo * wip * wip * wip * wip * wip * fix * clean * fix: put back pass@32 in val-core * clean * clean * clean * clean
#35) * fix * wip * wip * wip * wip * clean * use_fast tok * debug prints * debug * add LRU max_cache_size as param for rollout * clean * clean * clean
31ea404 to
38d9222
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On a fresh
mainof verl, I did:git cherry-pick $(git rev-list --reverse origin/main..numina-main)