Skip to content

Update torch-memory-saver requirement from >=0.0.5 to >=0.0.9.post1#5

Closed
dependabot[bot] wants to merge 499 commits into
masterfrom
dependabot/pip/torch-memory-saver-gte-0.0.9.post1
Closed

Update torch-memory-saver requirement from >=0.0.5 to >=0.0.9.post1#5
dependabot[bot] wants to merge 499 commits into
masterfrom
dependabot/pip/torch-memory-saver-gte-0.0.9.post1

Conversation

@dependabot

@dependabot dependabot Bot commented on behalf of github May 14, 2026

Copy link
Copy Markdown
Contributor

Updates the requirements on torch-memory-saver to permit the latest version.

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

vermouth1992 and others added 30 commits March 6, 2025 22:05
- Add allgather method to dataproto
- Add tests
- Replace existing raw allgather with this function
This PR solves these 2 following problems.

1. Last step skipped

`self.global_steps += 1` before if `self.global_steps >=
self.total_training_steps` makes the last step skipped.

We start from step 1, and we expect `self.total_training_steps` in
total.


https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L999-L1001

   When `self.global_steps == self.total_training_steps-1`:

   * we have only executed `self.total_training_steps-1` steps

   * `self.global_steps` is updated to `self.total_training_steps`
* `self.global_steps >= self.total_training_steps` is satisfied, and the
training ends.

   Therefore, we should put `self.global_steps += 1` at last

2. redundant validation and logging

If `self.total_training_steps % self.config.trainer.test_freq == 0` :

   * `self._validate()` will be executed twice 

1.
https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L984

2.
https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L1005

   * logging will also be executed twice

1.
https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L985
and
https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L997
2.
https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L1007
…llelism with multiple bug fixed (#495)

This PR combines multiple modifications.

# QWen2.5 checkpoint saver bug fix

Thanks for the efforts @uygnef contributed to #368 , we use the new
saver for model loader and saver for 3D parallelism support.

# Megatron backend 3D-parallelism test benches

We modify the scripts in `examples/ppo_trainer` and `tests/e2e`, as well
as the CI workflows, all tested.

# Bug Fix for 3D-parallelism

Including configuration bugs as well as the module packing.

Original TP VocabParallelEntropy can lead to CUDA OOM, we refactor the
implementation with `torch.bmm`.

# Fully migration to Megatron Core

Now we only use Megatron core in verl, fully get rid of calling other
components. If they are in need, please integrate them into
`utils/megatron`.

---------

Co-authored-by: uygnef <admin@fengyu.org>
# Background

In RLHFDataset, we filter out prompts that are too long. This requires
apply_chat_template to the whole dataset, which is not scalable when the
dataset is large.
https://github.com/volcengine/verl/blob/main/verl/utils/dataset/rl_dataset.py#L132

Instead of performing filtering online, we probably want to move this
process offline and add an assertion to avoid truncation or simply
perform truncation

Reference: #502 

# Key Changes

- Add an option `data.filter_overlong_prompts=True \` to enable the
above data filtering. The default value is set to False, but we enable
it for all the example scripts.
- Add an option `data.truncation` to truncate the input_ids or prompt
length if they
exceed max_prompt_length. The default is 'error', which does not allow
the
max_prompt_length to be exceeded. The users should increase the
max_prompt_length if
  throwing the error. You can also set `left` and `right`.

### Suggestion for large-scale dataset.
For large-scale datasets, filtering overlong prompts could be
time-consuming. You should set `data.filtering_overlong_prompts=False`
and set `truncation='left'`. Also, please note that you should increase
`data.max_prompt_length` to avoid over-truncation of the prompts.
…469)

Since searching for an appropriate `simplify` algorithm may cause
`sympy.simplify` to timeout, and `ProcessPool` may get stuck due to
excessive concurrency, the timeout mechanism in
`verl/verl/workers/reward_manager/prime.py` cannot capture the timeout.
To address this issue, a timeout detection mechanism is added to
`verl/verl/utils/reward_score/prime_math/__init__.py` for
`sympy.simplify` to solve it easily.
- [x] Add concurrency to workflows to cancel previous workflows when new
commit is pushed to the same branch.
- [ ] Cancel all workflows/jobs from the same commit if any fails? (Not
sure whether we really need it)

Note: we leave out `secrets_scan.yml` and `scorecard.yml` to avoid any
possible leakage or security risk, which also cost little.
Current bugs when enable hsdp:
- **Incorrect Division in Batch Sizes**
- `ppo_micro_batch`, `ppo_minibatch`, etc... should be divided by
`self.device_mesh.size()` instead of `self.device_mesh.shape[0]`.
- **Improper Weight Initialization** in
`get_init_weight_context_manager`
- The `get_init_weight_context_manager` function must initialize empty
weights only on local_rank == 0 within every fsdp mesh.
- When `sync_module_states=True`, PyTorch's FSDP first broadcasts
parameters within the fsdp process group and then within the ddp process
group. If weights are not initialized correctly on `local_rank == 0` of
each fsdp mesh, the synchronization process may fail or produce
incorrect results.
https://github.com/pytorch/pytorch/blob/3f069e7679588d5ee4b1d5b2492ca0e20f9320b5/torch/distributed/fsdp/_init_utils.py#L614-L621
- Ensure initialization occurs only when
`self.device_mesh.get_coordinate()[-1] == 0`, which corresponds to
`local_rank == 0 `within each fsdp mesh.
[bugfix] Fix position embedding processing for Qwen2.5-VL

In the `RLHFDataset.__getitem__` method, a bug was identified in how
multimodal position IDs (3D in Qwen2.5-VL) are determined. Previously,
the code checked for `self.image_key in row_dict` to decide whether to
use multimodal position IDs. However, since `self.image_key` is popped
from `row_dict` during image token expansion, this check incorrectly
fails for subsequent operations.

This causes the VL model to use incorrect position IDs, resulting in
significant performance degradation.

<img width="349" alt="image"
src="https://github.com/user-attachments/assets/79790bbf-239e-4667-a2c5-d63d91d63165"
/>


The fix introduces an explicit `is_multi_modal` flag to properly track
multimodal content throughout the processing pipeline.

Co-authored-by: songyifan <songyifan3@xiaomi.com>
Refactor and merge PRIME algorithm into verl/main
https://github.com/PRIME-RL/PRIME

Breaking changes:    
`trainer.fsdp_config.min_num_params` is now moved to `trainer.fsdp_config.wrap_policy.min_num_params`.
1. add [PRIME](https://arxiv.org/abs/2502.01456) to README.md
2. slightly change the example script to align with the paper
This PR removes several unnecessary `empty_cache` to improve efficiency.

Credit to @PeterSH6
Urgently update megatron core_r0.11.0 documentation.
# Description

verl-project/verl#287,
verl-project/verl#295.
This PR introduces support for
[Math-Verify](https://github.com/huggingface/Math-Verify) as a new
rule-based reward scorer, significantly improving evaluation accuracy.

# Key changes

- Added `math-verify` to the installation dependencies.
- Introduced `reward_score/math_verify.py` and updated
`reward_score/__init__.py`.

# Test

Comparison between the existing scorer in math.py and the newly added
`math_verify.py`, using Qwen2.5-Math-7B-Instruct:

```
# Use scorer in math.py (original)
{'val/test_score/DigitalLearningGmbH/MATH-lighteval': 0.803}

# Use scorer in math_verify.py (newly added)
{'val/test_score/DigitalLearningGmbH/MATH-lighteval': 0.8338}
```

Test scripts:

```bash
set -x

# Data Process
python examples/data_preprocess/math_dataset.py --local_dir /workspace/datasets/math

# Evaluation
export CUDA_VISIBLE_DEVICES=4,5,6,7
export VLLM_ATTENTION_BACKEND=XFORMERS

math_train_path=/workspace/datasets/math/train.parquet
math_test_path=/workspace/datasets/math/test.parquet

python3 -m verl.trainer.main_ppo \
    data.train_files="$math_train_path" \
    data.val_files="$math_test_path" \
    data.max_prompt_length=2048 \
    data.max_response_length=2048 \
    actor_rollout_ref.model.path=Qwen/Qwen2.5-Math-7B-Instruct \
    actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n=1 \
    actor_rollout_ref.rollout.temperature=0 \
    trainer.logger=['console'] \
    trainer.project_name='test-math-verify' \
    trainer.experiment_name='test-math-verify' \
    +trainer.val_before_train=True \
    trainer.n_gpus_per_node=4 \
    trainer.nnodes=1 \
    trainer.total_epochs=0 \
    data.train_batch_size=1024 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
    algorithm.adv_estimator=grpo $@
```
… directly (#543)

As we're moving to vllm>=0.7.3, we should remove `verl/third_party`
complelely in the future.
Currently, eager mode is applied in the validation stage. However, in
some reasoning tasks, we may need to generate n times and average the
scores.

In this PR, we support using non-eager sampling parameters during
validation by specifying the `val_kwargs` in `actor_rollout_ref.rollout`
config field.


**Future work**
- [ ] Merge `vllm_rollout_spmd.py` and `vllm_rollout.py` into one file.
# Description
- Corrected dummy size to avoid faulty communication.
- Fixed batch number calculation.
- Adjusted worker group role to alleviate memory overhead.
- Add ray.init() to prevent failing to register worker.
… xformers (#570)

### Description
- fix filter_overlong_prompts setting in PRIME

- fix padding side incorrect for Qwen in PRIME 

- When I utilize PRIME recipe to train Qwen series models, I got
“*ValueError: You are attempting to perform batched generation with
padding_side='right' this may lead to unexpected behaviour for Flash
Attention version of Qwen2. Make sure to call tokenizer.padding_side =
'left' before tokenizing the input.*” So I set `use_cache = False` when
calling model to calculate output logits.

- fix CUDA error with vllm v0.6.3 

- When I run PRIME, I may get an error — *CUDA error: an illegal memory
access was encountered*. According to
vllm-project/vllm#10389, I set
`VLLM_ATTENTION_BACKEND=XFORMERS` .
#556 take effort to remove remove unnecessary empty_cache, but will
cause CUDA oom at vllm wake_up.
```text
  File "/opt/tiger/ray/session_2025-03-13_12-11-30_408315_2895/runtime_resources/working_dir_files/_ray_pkg_a64b690733067c5c/verl/workers/fsdp_workers.py", line 481, in generate_sequences
    with self.rollout_sharding_manager:
  File "/opt/tiger/ray/session_2025-03-13_12-11-30_408315_2895/runtime_resources/working_dir_files/_ray_pkg_a64b690733067c5c/verl/workers/sharding_manager/fsdp_vllm.py", line 82, in __enter__
    self.inference_engine.wake_up()
  File "/usr/local/lib/python3.11/dist-packages/vllm/entrypoints/llm.py", line 1244, in wake_up
    self.llm_engine.wake_up()
  File "/usr/local/lib/python3.11/dist-packages/vllm/engine/llm_engine.py", line 1859, in wake_up
    self.model_executor.wake_up()
  File "/usr/local/lib/python3.11/dist-packages/vllm/executor/executor_base.py", line 216, in wake_up
    self.collective_rpc("wake_up")
  File "/usr/local/lib/python3.11/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
    answer = run_method(self.driver_worker, method, args, kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/vllm/utils.py", line 2196, in run_method
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/vllm/worker/worker.py", line 140, in wake_up
    allocator.wake_up()
  File "/usr/local/lib/python3.11/dist-packages/vllm/device_allocator/cumem.py", line 207, in wake_up
    create_and_map(handle)
  File "/usr/local/lib/python3.11/dist-packages/vllm/device_allocator/cumem.py", line 75, in create_and_map
    python_create_and_map(*allocation_handle)
RuntimeError: CUDA Error: out of memory at /workspace/csrc/cumem_allocator.cpp:62
```
This PR remove all redundant `torch.cuda.empty_cache()` in FSDP worker
and only empty cache before vllm wake_up and after vllm sleep, since
vllm has its own caching memory allocator
[CuMemAllocator](https://github.com/vllm-project/vllm/blob/v0.7.3/vllm/device_allocator/cumem.py#L103).
Out of vllm scope, we should avoid empty cache to let pytorch using
caching memory to speed up memory allocations.

- [x] Cleanup FSDP worker torch.cuda.empty_cache()
- [ ] Cleanup Megatron worker torch.cuda.empty_cache()
Remove redundant broadcast in fsdp vllm postprocess since vllm output in
each tp rank should be identical.
langfengQ and others added 22 commits July 16, 2025 16:20
* Update README.md
* update README.md

* fix some tiny bugs
…ch env worker (#148)

* add 'resources_per_worker' config for easily managing cpus/gpus of each env worker
…arch-r1 experiments & similarity-based GiGPO (#159)

* add search-r1 experiment (tool-calling); add similarity-based GiGPO

* update script

* use env_kwargs

* add the results of Search-R1 experiments

* add wandb of search

* add copyright

* fix some bug & add tool use count
* add search-r1 experiment (tool-calling); add similarity-based GiGPO

* update script

* use env_kwargs

* add the results of Search-R1 experiments

* add wandb of search

* add copyright

* fix some bug & add tool use count

* GiGPO @ NeurIPS 2025

* GiGPO @ NeurIPS 2025

* adjust space

* adjust space
* fix bugs in adjust_batch of VLM
* update readme

* clarify data prepare
Co-authored-by: Markus Junttila <markus.1.junttila@nokia.com>
fix(ray): ensure Ray version < 2.50.0 to avoid incompatibility with latest release
* My initial modifications

* Add gspo to verl-agent

* Add an example file for gspo

---------

Co-authored-by: Markus Junttila <markus.1.junttila@nokia.com>
* add qwen3

* fix some issues in qwen3-vl pr

* address the issues of tensor model parallel

* update scripts

* update readme

---------

Co-authored-by: langfeng <langfeng.cs@gmail.com>
Co-authored-by: langfeng <1371441151@qq.com>
* Add recipe/hgpo (HGPO trainer, configs and run scripts)

* docs: update README files for HGPO recipe

* docs: update recipe/hgpo README

* docs: update README
Updates the requirements on torch-memory-saver to permit the latest version.

---
updated-dependencies:
- dependency-name: torch-memory-saver
  dependency-version: 0.0.9.post1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 14, 2026
@lll6gg lll6gg closed this May 14, 2026
@dependabot @github

dependabot Bot commented on behalf of github May 14, 2026

Copy link
Copy Markdown
Contributor Author

OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting @dependabot ignore this major version or @dependabot ignore this minor version. You can also ignore all major, minor, or patch releases for a dependency by adding an ignore condition with the desired update_types to your config file.

If you change your mind, just re-open this PR and I'll resolve any conflicts on it.

@dependabot dependabot Bot deleted the dependabot/pip/torch-memory-saver-gte-0.0.9.post1 branch May 14, 2026 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.