sync: gitlab/main -> github/main#38
Merged
Merged
Conversation
…eyes script # 🔩 Chore ## Align launch argument order in deepeyes run script - Remove duplicate `--rollout-num-gpus-per-engine` definition in `examples/deepeyes/run_deepeyes.sh`. - Keep the effective `--rollout-num-gpus-per-engine 2` from the SGLang config section and remove the earlier conflicting value.
# 🐛 Bug Fix ## Preserve DeepEyes multimodal state across partial resume - Keep `sample.multimodal_inputs` as the read-only dataset input instead of appending observation images into the shared shallow-copy reference - Make `_prepare_initial_inputs()` return the initial processor output as local `init_mm_train` without overwriting `sample.multimodal_train_inputs` - Keep observation images in `current_image_data` and append observation processor chunks to `multimodal_train_inputs_buffer` - Route the budget-exhausted early return through `_finalize_sample()` so resumed samples leave `generate()` with merged multimodal train inputs
# ⭐ Feature ## Auto-enable true-on-policy mode in fully-async - Auto-enable `--true-on-policy-mode` in `slime_validate_args` when `fully_async` and `rollout_batch_size * n_samples_per_prompt == global_batch_size`, since the train forward log_probs equal what actor_fwd would produce - Add `ROLES_FULLY_ASYNC_ON_POLICY` (no actor_fwd) and route to it from `process_role` when the mode is on - Recompute `old_log_probs` inline in `policy_loss_function` via `log_probs.detach()`, recovering vanilla PG (ratio ≡ 1) while keeping TIS valid against rollout_log_probs - Drop `log_probs` from required data fields in `MegatronTrainRayActor` and `Advantages` when actor_fwd is absent; treat `rollout_log_probs` as kl-zero template in advantages compute - Make `Controller` fully-async weight-sync skip `actor_fwd` recv when the role is not registered, and skip the actor_fwd HTTP probe in the actor health check - Fall back to `rollout_log_probs` for entropy logging in `log_rollout_data` when `log_probs` is unavailable --- # 🔩 Chore ## Tune training scripts and runtime env - Add `NVSHMEM_BOOTSTRAP_UID_SOCK_IFNAME` default in `scripts/entrypoint/local.sh` - Drop `actor_fwd`/`reference` from resource specs in async scripts (35B/9B text, 9B openr1mm-mm) now that true-on-policy mode handles them - Add `--log-probs-max-tokens-per-gpu` and adjust parallelism/recompute/MoE-dispatcher knobs across qwen3/qwen35/qwen36 scripts to fit larger micro-batches
# 🐛 Bug Fix ## PR #1889: validate MoE HF config when dense layers exist - Add `_has_dense_moe_layers` / `_is_moe_config` helpers in `relax/backends/megatron/arguments.py` - Validate `moe_intermediate_size` and `shared_expert_intermediate_size` - Skip `intermediate_size` check when model is pure MoE with no dense layer - Ref: THUDM/slime#1889 ## PR #1880: fix per-actor HTTP POST concurrency split - Replace node-count-only divisor with `len(nodes) * num_gpus_per_node` - Use ceiling division `(c + n - 1) // n` to avoid losing concurrency budget - File: `relax/utils/http_utils.py` - Ref: THUDM/slime#1880 ## PR #1873: avoid blocking the asyncio event loop on ray.get - Replace `asyncio.to_thread(ray.get, obj_ref)` with direct `await obj_ref` - Removes the hard ThreadPoolExecutor cap on parallel POSTs - File: `relax/utils/http_utils.py` - Ref: THUDM/slime#1873 ## PR #1888: actor save_model must wake_up / sleep, not just reload PG - `save_model` was calling `reload_process_groups()` without resuming `torch_memory_saver`, leaving GPU tensors unreachable when NCCL is triggered during checkpoint save - Use `self.wake_up()` / `self.sleep()` instead so PG and TMS stay in sync - File: `relax/backends/megatron/actor.py` - Ref: THUDM/slime#1888 ## PR #1882: disaggregate PPO must reconnect rollout NCCL across sleep - Add `disconnect_rollout_engines` to `UpdateWeightFromDistributed` - In `actor.sleep()`, tear down weight-sync NCCL group for disaggregate PPO (`use_critic` + not `colocate`) before `destroy_process_groups()` - In `actor.update_weights()`, force `wake_up` + `connect_rollout_engines` + `sleep` when the disconnect path was taken - Drop the buggy `args.use_critic` GPU-offset branch in `sglang_engine.get_base_gpu_id` - Auto-enable `offload_train` when `use_critic` is on - Files: `relax/backends/megatron/actor.py`, `relax/backends/megatron/weight_update/update_weight_from_distributed.py`, `relax/backends/sglang/sglang_engine.py`, `relax/utils/arguments.py` - Ref: THUDM/slime#1882 ## PR #1878: reinitialize critic output_layer when ckpt shape mismatches - Detect missing or shape-mismatched `output_layer.{weight,bias}` in the critic checkpoint metadata before / after `load_checkpoint` - Reinitialize with `normal_(0, 0.02)` for weight and zero for bias, plus `optimizer.reload_model_params()` for fp16/bf16 master sync - Gated on `role == "critic"`, no effect on actor or non-PPO algorithms - File: `relax/backends/megatron/model.py` - Ref: THUDM/slime#1878 --- # ⭐ Feature ## PR #1890: add missing spec / prefix-cache rollout metrics - Wire `_compute_spec_metrics` and `_compute_prefix_cache_metrics` into `compute_metrics_from_samples` - File: `relax/distributed/ray/rollout.py` - Ref: THUDM/slime#1890 --- # 🔩 Chore ## PR #1862: improve `slice_log_prob_with_cp` assert message - Include `len(log_prob)`, `response_length`, `total_length` in the assert so failures are diagnosable - File: `relax/backends/megatron/cp_utils.py` - Ref: THUDM/slime#1862 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
# ⭐ Feature ## Add --custom-prompt-path for prompt transformation hook - Add CLI argument in add_data_arguments (arguments.py) - Load custom function via load_function in data_source.py - Thread custom_prompt_func through build_messages, process_raw_sample, BaseDataset, Dataset, and StreamingDataset - Custom function is called after prompt extraction, before conversation/multimodal processing ## Add --image-resize-scale-factor for image dimension alignment control - Add CLI argument in add_data_arguments (arguments.py) - Add image_resize_scale_factor field to MultimodalConfig with from_args propagation and getter function - Update fetch_image with 3-way logic: None uses default patch_factor, 0 disables alignment, positive int uses custom value --- # 📝 Documentation ## Update configuration reference docs - Add --custom-prompt-path to Dataset table (EN + ZH) - Add --image-resize-scale-factor to Multimodal Data table (EN + ZH) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
# ⭐ Feature ## Add hybrid training mode combining async data pipeline with colocate weight sharing - Introduce `--hybrid` flag that sets `fully_async=True` and `colocate=True` so actor/ref/actor_fwd share GPUs via TensorBackuper+_switch_model while rollout runs on a separate GPU placement group with streaming transfer queue - Add `train_hybrid()` method in MegatronTrainRayActor: collects sub-batches from transfer queue, runs ref/teacher/actor forward per sub-batch, merges all sub-batches, computes advantages with correct global normalization, then trains on the full merged batch - Register hybrid mode in `process_role()` to use ROLES_COLOCATE (actor + rollout only, no separate reference/actor_fwd services) - Update controller to skip shared placement groups and fully-async DCS weight sync setup when hybrid is active - Skip actor_fwd health probe in `_check_services_health` for hybrid mode to avoid spurious warnings - Add `train_hybrid()` dispatch in RayTrainGroup and Actor component - Validate argument combinations: `--hybrid` is the supported way to combine async pipeline with colocate weight sharing; bare `--fully-async --colocate` now raises ValueError - Set `offload_train=False`, `offload_rollout=False`, and `compute_advantages_and_returns=True` for hybrid mode - Add Qwen3-4B 8xGPU hybrid-async training launch script
# ⭐ Feature
## Add `relax/utils/visualize` rollout result viewer
- Add web viewer adapted from rlsp/utils/visualize: FastAPI + single-page UI
for browsing `<save>/rollout_result/{train,eval}/{step}.jsonl`, with
step dropdown, sample nav, sort by reward / response_length, sample-info
card, and prompt / response / label rendering with chat-template / tool-call
/ `<think>` highlighting
- Auto-discover `train/` and `eval/` subdirs and render a tab toggle when both
exist; fall back to a single anonymous bucket for flat dirs
- Add terminal UI mode (`--tui`) adapted from redaccel/verl reward_viewer_v2:
sync-load the first step then stream remaining steps via a daemon thread
(newest first), with step / sample / dataset / sort dropdowns, field
filter, fuzzy search (`f`/`enter`/`esc`), vim-style page nav, and
text/table render toggle
- Default theme switched to dark; header shows the Relax wordmark linking
to the GitHub repo plus a GitHub icon
- Mask multimodal pad tokens (`<|image_pad|>`, etc.) by default in the TUI
via `--mask-str`
## Add `relax/entrypoints/visualize` thin wrapper
- One command for both modes: `python -m relax.entrypoints.visualize <dir>`
(web) and `... --tui` (terminal)
- `DATA_DIR` is a required positional argument
- TUI dependencies (`textual`, `rich`) are lazy-imported; clear error
message if missing
---
# 📝 Documentation
## Add bilingual rollout result viewer guide
- Add `docs/{en,zh}/guide/rollout-result-viewer.md` covering data layout,
launch command, flags, page features, terminal UI key bindings, and
reverse-proxy notes
- Register the new pages under the existing "Operations & Debugging" /
"运维与调试" sidebar group in `docs/.vitepress/config.mts`
- Add `docs/public/relax-viewer.png` screenshot (palette-compressed PNG
to stay under the 500 KB pre-commit limit)
NINGBENZHE
approved these changes
May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Routine internal -> external sync.