Update torch-memory-saver requirement from >=0.0.5 to >=0.0.9.post1 by dependabot[bot] · Pull Request #5 · ZJU-REAL/SDAR

dependabot · 2026-05-14T18:12:10Z

Updates the requirements on torch-memory-saver to permit the latest version.

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

- Add allgather method to dataproto - Add tests - Replace existing raw allgather with this function

This PR solves these 2 following problems. 1. Last step skipped `self.global_steps += 1` before if `self.global_steps >= self.total_training_steps` makes the last step skipped. We start from step 1, and we expect `self.total_training_steps` in total. https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L999-L1001 When `self.global_steps == self.total_training_steps-1`: * we have only executed `self.total_training_steps-1` steps * `self.global_steps` is updated to `self.total_training_steps` * `self.global_steps >= self.total_training_steps` is satisfied, and the training ends. Therefore, we should put `self.global_steps += 1` at last 2. redundant validation and logging If `self.total_training_steps % self.config.trainer.test_freq == 0` : * `self._validate()` will be executed twice 1. https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L984 2. https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L1005 * logging will also be executed twice 1. https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L985 and https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L997 2. https://github.com/volcengine/verl/blob/82b38e25c72e1b6de7d7d2092af6e1ed5dd2a400/verl/trainer/ppo/ray_trainer.py#L1007

@uygnef

…llelism with multiple bug fixed (#495) This PR combines multiple modifications. # QWen2.5 checkpoint saver bug fix Thanks for the efforts @uygnef contributed to #368 , we use the new saver for model loader and saver for 3D parallelism support. # Megatron backend 3D-parallelism test benches We modify the scripts in `examples/ppo_trainer` and `tests/e2e`, as well as the CI workflows, all tested. # Bug Fix for 3D-parallelism Including configuration bugs as well as the module packing. Original TP VocabParallelEntropy can lead to CUDA OOM, we refactor the implementation with `torch.bmm`. # Fully migration to Megatron Core Now we only use Megatron core in verl, fully get rid of calling other components. If they are in need, please integrate them into `utils/megatron`. --------- Co-authored-by: uygnef <admin@fengyu.org>

close #503

# Background In RLHFDataset, we filter out prompts that are too long. This requires apply_chat_template to the whole dataset, which is not scalable when the dataset is large. https://github.com/volcengine/verl/blob/main/verl/utils/dataset/rl_dataset.py#L132 Instead of performing filtering online, we probably want to move this process offline and add an assertion to avoid truncation or simply perform truncation Reference: #502 # Key Changes - Add an option `data.filter_overlong_prompts=True \` to enable the above data filtering. The default value is set to False, but we enable it for all the example scripts. - Add an option `data.truncation` to truncate the input_ids or prompt length if they exceed max_prompt_length. The default is 'error', which does not allow the max_prompt_length to be exceeded. The users should increase the max_prompt_length if throwing the error. You can also set `left` and `right`. ### Suggestion for large-scale dataset. For large-scale datasets, filtering overlong prompts could be time-consuming. You should set `data.filtering_overlong_prompts=False` and set `truncation='left'`. Also, please note that you should increase `data.max_prompt_length` to avoid over-truncation of the prompts.

…469) Since searching for an appropriate `simplify` algorithm may cause `sympy.simplify` to timeout, and `ProcessPool` may get stuck due to excessive concurrency, the timeout mechanism in `verl/verl/workers/reward_manager/prime.py` cannot capture the timeout. To address this issue, a timeout detection mechanism is added to `verl/verl/utils/reward_score/prime_math/__init__.py` for `sympy.simplify` to solve it easily.

- [x] Add concurrency to workflows to cancel previous workflows when new commit is pushed to the same branch. - [ ] Cancel all workflows/jobs from the same commit if any fails? (Not sure whether we really need it) Note: we leave out `secrets_scan.yml` and `scorecard.yml` to avoid any possible leakage or security risk, which also cost little.

…g` is None (#322)

Current bugs when enable hsdp: - **Incorrect Division in Batch Sizes** - `ppo_micro_batch`, `ppo_minibatch`, etc... should be divided by `self.device_mesh.size()` instead of `self.device_mesh.shape[0]`. - **Improper Weight Initialization** in `get_init_weight_context_manager` - The `get_init_weight_context_manager` function must initialize empty weights only on local_rank == 0 within every fsdp mesh. - When `sync_module_states=True`, PyTorch's FSDP first broadcasts parameters within the fsdp process group and then within the ddp process group. If weights are not initialized correctly on `local_rank == 0` of each fsdp mesh, the synchronization process may fail or produce incorrect results. https://github.com/pytorch/pytorch/blob/3f069e7679588d5ee4b1d5b2492ca0e20f9320b5/torch/distributed/fsdp/_init_utils.py#L614-L621 - Ensure initialization occurs only when `self.device_mesh.get_coordinate()[-1] == 0`, which corresponds to `local_rank == 0 `within each fsdp mesh.

[bugfix] Fix position embedding processing for Qwen2.5-VL In the `RLHFDataset.__getitem__` method, a bug was identified in how multimodal position IDs (3D in Qwen2.5-VL) are determined. Previously, the code checked for `self.image_key in row_dict` to decide whether to use multimodal position IDs. However, since `self.image_key` is popped from `row_dict` during image token expansion, this check incorrectly fails for subsequent operations. This causes the VL model to use incorrect position IDs, resulting in significant performance degradation. <img width="349" alt="image" src="https://github.com/user-attachments/assets/79790bbf-239e-4667-a2c5-d63d91d63165" /> The fix introduces an explicit `is_multi_modal` flag to properly track multimodal content throughout the processing pipeline. Co-authored-by: songyifan <songyifan3@xiaomi.com>

Refactor and merge PRIME algorithm into verl/main https://github.com/PRIME-RL/PRIME Breaking changes: `trainer.fsdp_config.min_num_params` is now moved to `trainer.fsdp_config.wrap_policy.min_num_params`.

1. add [PRIME](https://arxiv.org/abs/2502.01456) to README.md 2. slightly change the example script to align with the paper

@PeterSH6

This PR removes several unnecessary `empty_cache` to improve efficiency. Credit to @PeterSH6

Urgently update megatron core_r0.11.0 documentation.

# Description verl-project/verl#287, verl-project/verl#295. This PR introduces support for [Math-Verify](https://github.com/huggingface/Math-Verify) as a new rule-based reward scorer, significantly improving evaluation accuracy. # Key changes - Added `math-verify` to the installation dependencies. - Introduced `reward_score/math_verify.py` and updated `reward_score/__init__.py`. # Test Comparison between the existing scorer in math.py and the newly added `math_verify.py`, using Qwen2.5-Math-7B-Instruct: ``` # Use scorer in math.py (original) {'val/test_score/DigitalLearningGmbH/MATH-lighteval': 0.803} # Use scorer in math_verify.py (newly added) {'val/test_score/DigitalLearningGmbH/MATH-lighteval': 0.8338} ``` Test scripts: ```bash set -x # Data Process python examples/data_preprocess/math_dataset.py --local_dir /workspace/datasets/math # Evaluation export CUDA_VISIBLE_DEVICES=4,5,6,7 export VLLM_ATTENTION_BACKEND=XFORMERS math_train_path=/workspace/datasets/math/train.parquet math_test_path=/workspace/datasets/math/test.parquet python3 -m verl.trainer.main_ppo \ data.train_files="$math_train_path" \ data.val_files="$math_test_path" \ data.max_prompt_length=2048 \ data.max_response_length=2048 \ actor_rollout_ref.model.path=Qwen/Qwen2.5-Math-7B-Instruct \ actor_rollout_ref.rollout.tensor_model_parallel_size=1 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.n=1 \ actor_rollout_ref.rollout.temperature=0 \ trainer.logger=['console'] \ trainer.project_name='test-math-verify' \ trainer.experiment_name='test-math-verify' \ +trainer.val_before_train=True \ trainer.n_gpus_per_node=4 \ trainer.nnodes=1 \ trainer.total_epochs=0 \ data.train_batch_size=1024 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \ algorithm.adv_estimator=grpo $@ ```

… directly (#543) As we're moving to vllm>=0.7.3, we should remove `verl/third_party` complelely in the future.

Currently, eager mode is applied in the validation stage. However, in some reasoning tasks, we may need to generate n times and average the scores. In this PR, we support using non-eager sampling parameters during validation by specifying the `val_kwargs` in `actor_rollout_ref.rollout` config field. **Future work** - [ ] Merge `vllm_rollout_spmd.py` and `vllm_rollout.py` into one file.

# Description - Corrected dummy size to avoid faulty communication. - Fixed batch number calculation. - Adjusted worker group role to alleviate memory overhead. - Add ray.init() to prevent failing to register worker.

… xformers (#570) ### Description - fix filter_overlong_prompts setting in PRIME - fix padding side incorrect for Qwen in PRIME - When I utilize PRIME recipe to train Qwen series models, I got “*ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Qwen2. Make sure to call tokenizer.padding_side = 'left' before tokenizing the input.*” So I set `use_cache = False` when calling model to calculate output logits. - fix CUDA error with vllm v0.6.3 - When I run PRIME, I may get an error — *CUDA error: an illegal memory access was encountered*. According to vllm-project/vllm#10389, I set `VLLM_ATTENTION_BACKEND=XFORMERS` .

#556 take effort to remove remove unnecessary empty_cache, but will cause CUDA oom at vllm wake_up. ```text File "/opt/tiger/ray/session_2025-03-13_12-11-30_408315_2895/runtime_resources/working_dir_files/_ray_pkg_a64b690733067c5c/verl/workers/fsdp_workers.py", line 481, in generate_sequences with self.rollout_sharding_manager: File "/opt/tiger/ray/session_2025-03-13_12-11-30_408315_2895/runtime_resources/working_dir_files/_ray_pkg_a64b690733067c5c/verl/workers/sharding_manager/fsdp_vllm.py", line 82, in __enter__ self.inference_engine.wake_up() File "/usr/local/lib/python3.11/dist-packages/vllm/entrypoints/llm.py", line 1244, in wake_up self.llm_engine.wake_up() File "/usr/local/lib/python3.11/dist-packages/vllm/engine/llm_engine.py", line 1859, in wake_up self.model_executor.wake_up() File "/usr/local/lib/python3.11/dist-packages/vllm/executor/executor_base.py", line 216, in wake_up self.collective_rpc("wake_up") File "/usr/local/lib/python3.11/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc answer = run_method(self.driver_worker, method, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/vllm/utils.py", line 2196, in run_method return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/dist-packages/vllm/worker/worker.py", line 140, in wake_up allocator.wake_up() File "/usr/local/lib/python3.11/dist-packages/vllm/device_allocator/cumem.py", line 207, in wake_up create_and_map(handle) File "/usr/local/lib/python3.11/dist-packages/vllm/device_allocator/cumem.py", line 75, in create_and_map python_create_and_map(*allocation_handle) RuntimeError: CUDA Error: out of memory at /workspace/csrc/cumem_allocator.cpp:62 ``` This PR remove all redundant `torch.cuda.empty_cache()` in FSDP worker and only empty cache before vllm wake_up and after vllm sleep, since vllm has its own caching memory allocator [CuMemAllocator](https://github.com/vllm-project/vllm/blob/v0.7.3/vllm/device_allocator/cumem.py#L103). Out of vllm scope, we should avoid empty cache to let pytorch using caching memory to speed up memory allocations. - [x] Cleanup FSDP worker torch.cuda.empty_cache() - [ ] Cleanup Megatron worker torch.cuda.empty_cache()

Remove redundant broadcast in fsdp vllm postprocess since vllm output in each tp rank should be identical.

… (#583)

* Update README.md

* update README.md * fix some tiny bugs

…ch env worker (#148) * add 'resources_per_worker' config for easily managing cpus/gpus of each env worker

…arch-r1 experiments & similarity-based GiGPO (#159) * add search-r1 experiment (tool-calling); add similarity-based GiGPO * update script * use env_kwargs * add the results of Search-R1 experiments * add wandb of search * add copyright * fix some bug & add tool use count

* add search-r1 experiment (tool-calling); add similarity-based GiGPO * update script * use env_kwargs * add the results of Search-R1 experiments * add wandb of search * add copyright * fix some bug & add tool use count * GiGPO @ NeurIPS 2025 * GiGPO @ NeurIPS 2025 * adjust space * adjust space

* fix bugs in adjust_batch of VLM

* update readme * clarify data prepare

Co-authored-by: Markus Junttila <markus.1.junttila@nokia.com>

fix(ray): ensure Ray version < 2.50.0 to avoid incompatibility with latest release

* My initial modifications * Add gspo to verl-agent * Add an example file for gspo --------- Co-authored-by: Markus Junttila <markus.1.junttila@nokia.com>

* add qwen3 * fix some issues in qwen3-vl pr * address the issues of tensor model parallel * update scripts * update readme --------- Co-authored-by: langfeng <langfeng.cs@gmail.com> Co-authored-by: langfeng <1371441151@qq.com>

* Add recipe/hgpo (HGPO trainer, configs and run scripts) * docs: update README files for HGPO recipe * docs: update recipe/hgpo README * docs: update README

Updates the requirements on torch-memory-saver to permit the latest version. --- updated-dependencies: - dependency-name: torch-memory-saver dependency-version: 0.0.9.post1 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot · 2026-05-14T18:13:10Z

OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting @dependabot ignore this major version or @dependabot ignore this minor version. You can also ignore all major, minor, or patch releases for a dependency by adding an ignore condition with the desired update_types to your config file.

If you change your mind, just re-open this PR and I'll resolve any conflicts on it.

vermouth1992 and others added 30 commits March 6, 2025 22:05

[misc] feat: add allgather method to dataproto (#497)

0cc2bda

- Add allgather method to dataproto - Add tests - Replace existing raw allgather with this function

[ckpt] sort pgs by node ip to make RANK consistent across nodes (#500)

becf7cb

test: Added the permission setting on the workflow (#504)

cb97d07

misc: precheck resource pool available to prevent pg hang (#505)

0f0bc5a

close #503

fix missing raise keyword in NotImplementedError for hdfs loading (#507)

fbad52e

feat: support loading reward function from an external file (#452)

13a87c7

fix _build_model_optimizer when role is rollout, whose `optim_confi…

f8acd90

…g` is None (#322)

recipe: PRIME algorithm (#362)

f0e7f9f

Refactor and merge PRIME algorithm into verl/main https://github.com/PRIME-RL/PRIME Breaking changes: `trainer.fsdp_config.min_num_params` is now moved to `trainer.fsdp_config.wrap_policy.min_num_params`.

update README.md (#534)

b14299c

1. add [PRIME](https://arxiv.org/abs/2502.01456) to README.md 2. slightly change the example script to align with the paper

[misc] feat: support vllm>0.7 world size 1 generation (#520)

be2c705

[Efficiency] feat: remove unnecessary empty_cache (#556)

51a5ff9

This PR removes several unnecessary `empty_cache` to improve efficiency. Credit to @PeterSH6

Update e2e_vlm_geo3k.yml (#563)

0a11fc6

[doc] update megatron core_r0.11.0 documentation (#562)

1d12fe3

Urgently update megatron core_r0.11.0 documentation.

refactor: remove custom vllm weight loader and use model.load_weights…

6680185

… directly (#543) As we're moving to vllm>=0.7.3, we should remove `verl/third_party` complelely in the future.

[fix] Fix config param issue (#558)

329dcfe

[misc] add assertion for normalized ppo_mini_batch_size (#552)

39b008d

[bugfix] fix: generation script (#542)

79e072f

# Description - Corrected dummy size to avoid faulty communication. - Fixed batch number calculation. - Adjusted worker group role to alleviate memory overhead. - Add ray.init() to prevent failing to register worker.

fix: remove redundant broadcast in fsdp vllm postprocess (#577)

f7e183e

Remove redundant broadcast in fsdp vllm postprocess since vllm output in each tp rank should be identical.

fix bug #544 that 'left' and 'right' config for truncation don't work…

e7c40b3

… (#583)

langfengQ and others added 22 commits July 16, 2025 16:20

Add copyright (#110)

e03bd50

Update README.md (#134)

8589da9

* Update README.md

fix truncate bug (#135)

158ac23

update README.md (#137)

aea4b90

fix some tiny bugs (#145)

f688a80

* update README.md * fix some tiny bugs

add 'resources_per_worker' config for easily managing cpus/gpus of ea…

574dd5b

…ch env worker (#148) * add 'resources_per_worker' config for easily managing cpus/gpus of each env worker

fix bugs in visual position_ids & update readme (#169)

75520c2

fix bugs in adjust_batch of VLM (#170)

35b3da3

* fix bugs in adjust_batch of VLM

Update README and Add FAQ (#173)

7b68c51

* update readme * clarify data prepare

Fixed typos in ALFWorld and Sokoban environment prompts. (#174)

35b026c

Co-authored-by: Markus Junttila <markus.1.junttila@nokia.com>

fix(ray): ensure Ray < 2.50.0 to avoid incompatibility (#180)

521fb00

fix(ray): ensure Ray version < 2.50.0 to avoid incompatibility with latest release

Add GSPO to verl-agent (#179)

d3a8f33

* My initial modifications * Add gspo to verl-agent * Add an example file for gspo --------- Co-authored-by: Markus Junttila <markus.1.junttila@nokia.com>

code adjustment & fix prompt agent bug (#183)

d0b9724

fix: ensure peft == 0.17.0 in webshop task (#207)

080965f

add Qwen3-VL (#196)

373e6b2

* add qwen3 * fix some issues in qwen3-vl pr * address the issues of tensor model parallel * update scripts * update readme --------- Co-authored-by: langfeng <langfeng.cs@gmail.com> Co-authored-by: langfeng <1371441151@qq.com>

update readme (#225)

11b8fbd

Update README (#226)

2716b30

Add recipe/hgpo (HGPO trainer, configs and run scripts) (#232)

b4918cc

Update HGPO readme (#233)

796ed31

* Add recipe/hgpo (HGPO trainer, configs and run scripts) * docs: update README files for HGPO recipe * docs: update recipe/hgpo README * docs: update README

dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 14, 2026

lll6gg closed this May 14, 2026

lll6gg force-pushed the master branch from 796ed31 to 33ad5e6 Compare May 14, 2026 18:13

dependabot Bot deleted the dependabot/pip/torch-memory-saver-gte-0.0.9.post1 branch May 14, 2026 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update torch-memory-saver requirement from >=0.0.5 to >=0.0.9.post1#5

Update torch-memory-saver requirement from >=0.0.5 to >=0.0.9.post1#5
dependabot[bot] wants to merge 499 commits into
masterfrom
dependabot/pip/torch-memory-saver-gte-0.0.9.post1

dependabot Bot commented on behalf of github May 14, 2026

Uh oh!

dependabot Bot commented on behalf of github May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

dependabot Bot commented on behalf of github May 14, 2026

Uh oh!

dependabot Bot commented on behalf of github May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants