sync: gitlab/main -> github/main by Yangruipis · Pull Request #38 · redai-infra/Relax

Yangruipis · 2026-05-22T08:57:31Z

Routine internal -> external sync.

…eyes script # 🔩 Chore ## Align launch argument order in deepeyes run script - Remove duplicate `--rollout-num-gpus-per-engine` definition in `examples/deepeyes/run_deepeyes.sh`. - Keep the effective `--rollout-num-gpus-per-engine 2` from the SGLang config section and remove the earlier conflicting value.

# 🐛 Bug Fix ## Preserve DeepEyes multimodal state across partial resume - Keep `sample.multimodal_inputs` as the read-only dataset input instead of appending observation images into the shared shallow-copy reference - Make `_prepare_initial_inputs()` return the initial processor output as local `init_mm_train` without overwriting `sample.multimodal_train_inputs` - Keep observation images in `current_image_data` and append observation processor chunks to `multimodal_train_inputs_buffer` - Route the budget-exhausted early return through `_finalize_sample()` so resumed samples leave `generate()` with merged multimodal train inputs

# ⭐ Feature ## Auto-enable true-on-policy mode in fully-async - Auto-enable `--true-on-policy-mode` in `slime_validate_args` when `fully_async` and `rollout_batch_size * n_samples_per_prompt == global_batch_size`, since the train forward log_probs equal what actor_fwd would produce - Add `ROLES_FULLY_ASYNC_ON_POLICY` (no actor_fwd) and route to it from `process_role` when the mode is on - Recompute `old_log_probs` inline in `policy_loss_function` via `log_probs.detach()`, recovering vanilla PG (ratio ≡ 1) while keeping TIS valid against rollout_log_probs - Drop `log_probs` from required data fields in `MegatronTrainRayActor` and `Advantages` when actor_fwd is absent; treat `rollout_log_probs` as kl-zero template in advantages compute - Make `Controller` fully-async weight-sync skip `actor_fwd` recv when the role is not registered, and skip the actor_fwd HTTP probe in the actor health check - Fall back to `rollout_log_probs` for entropy logging in `log_rollout_data` when `log_probs` is unavailable --- # 🔩 Chore ## Tune training scripts and runtime env - Add `NVSHMEM_BOOTSTRAP_UID_SOCK_IFNAME` default in `scripts/entrypoint/local.sh` - Drop `actor_fwd`/`reference` from resource specs in async scripts (35B/9B text, 9B openr1mm-mm) now that true-on-policy mode handles them - Add `--log-probs-max-tokens-per-gpu` and adjust parallelism/recompute/MoE-dispatcher knobs across qwen3/qwen35/qwen36 scripts to fit larger micro-batches

# 🐛 Bug Fix ## PR #1889: validate MoE HF config when dense layers exist - Add `_has_dense_moe_layers` / `_is_moe_config` helpers in `relax/backends/megatron/arguments.py` - Validate `moe_intermediate_size` and `shared_expert_intermediate_size` - Skip `intermediate_size` check when model is pure MoE with no dense layer - Ref: THUDM/slime#1889 ## PR #1880: fix per-actor HTTP POST concurrency split - Replace node-count-only divisor with `len(nodes) * num_gpus_per_node` - Use ceiling division `(c + n - 1) // n` to avoid losing concurrency budget - File: `relax/utils/http_utils.py` - Ref: THUDM/slime#1880 ## PR #1873: avoid blocking the asyncio event loop on ray.get - Replace `asyncio.to_thread(ray.get, obj_ref)` with direct `await obj_ref` - Removes the hard ThreadPoolExecutor cap on parallel POSTs - File: `relax/utils/http_utils.py` - Ref: THUDM/slime#1873 ## PR #1888: actor save_model must wake_up / sleep, not just reload PG - `save_model` was calling `reload_process_groups()` without resuming `torch_memory_saver`, leaving GPU tensors unreachable when NCCL is triggered during checkpoint save - Use `self.wake_up()` / `self.sleep()` instead so PG and TMS stay in sync - File: `relax/backends/megatron/actor.py` - Ref: THUDM/slime#1888 ## PR #1882: disaggregate PPO must reconnect rollout NCCL across sleep - Add `disconnect_rollout_engines` to `UpdateWeightFromDistributed` - In `actor.sleep()`, tear down weight-sync NCCL group for disaggregate PPO (`use_critic` + not `colocate`) before `destroy_process_groups()` - In `actor.update_weights()`, force `wake_up` + `connect_rollout_engines` + `sleep` when the disconnect path was taken - Drop the buggy `args.use_critic` GPU-offset branch in `sglang_engine.get_base_gpu_id` - Auto-enable `offload_train` when `use_critic` is on - Files: `relax/backends/megatron/actor.py`, `relax/backends/megatron/weight_update/update_weight_from_distributed.py`, `relax/backends/sglang/sglang_engine.py`, `relax/utils/arguments.py` - Ref: THUDM/slime#1882 ## PR #1878: reinitialize critic output_layer when ckpt shape mismatches - Detect missing or shape-mismatched `output_layer.{weight,bias}` in the critic checkpoint metadata before / after `load_checkpoint` - Reinitialize with `normal_(0, 0.02)` for weight and zero for bias, plus `optimizer.reload_model_params()` for fp16/bf16 master sync - Gated on `role == "critic"`, no effect on actor or non-PPO algorithms - File: `relax/backends/megatron/model.py` - Ref: THUDM/slime#1878 --- # ⭐ Feature ## PR #1890: add missing spec / prefix-cache rollout metrics - Wire `_compute_spec_metrics` and `_compute_prefix_cache_metrics` into `compute_metrics_from_samples` - File: `relax/distributed/ray/rollout.py` - Ref: THUDM/slime#1890 --- # 🔩 Chore ## PR #1862: improve `slice_log_prob_with_cp` assert message - Include `len(log_prob)`, `response_length`, `total_length` in the assert so failures are diagnosable - File: `relax/backends/megatron/cp_utils.py` - Ref: THUDM/slime#1862 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…r timeout

# ⭐ Feature ## Add --custom-prompt-path for prompt transformation hook - Add CLI argument in add_data_arguments (arguments.py) - Load custom function via load_function in data_source.py - Thread custom_prompt_func through build_messages, process_raw_sample, BaseDataset, Dataset, and StreamingDataset - Custom function is called after prompt extraction, before conversation/multimodal processing ## Add --image-resize-scale-factor for image dimension alignment control - Add CLI argument in add_data_arguments (arguments.py) - Add image_resize_scale_factor field to MultimodalConfig with from_args propagation and getter function - Update fetch_image with 3-way logic: None uses default patch_factor, 0 disables alignment, positive int uses custom value --- # 📝 Documentation ## Update configuration reference docs - Add --custom-prompt-path to Dataset table (EN + ZH) - Add --image-resize-scale-factor to Multimodal Data table (EN + ZH) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

# ⭐ Feature ## Add hybrid training mode combining async data pipeline with colocate weight sharing - Introduce `--hybrid` flag that sets `fully_async=True` and `colocate=True` so actor/ref/actor_fwd share GPUs via TensorBackuper+_switch_model while rollout runs on a separate GPU placement group with streaming transfer queue - Add `train_hybrid()` method in MegatronTrainRayActor: collects sub-batches from transfer queue, runs ref/teacher/actor forward per sub-batch, merges all sub-batches, computes advantages with correct global normalization, then trains on the full merged batch - Register hybrid mode in `process_role()` to use ROLES_COLOCATE (actor + rollout only, no separate reference/actor_fwd services) - Update controller to skip shared placement groups and fully-async DCS weight sync setup when hybrid is active - Skip actor_fwd health probe in `_check_services_health` for hybrid mode to avoid spurious warnings - Add `train_hybrid()` dispatch in RayTrainGroup and Actor component - Validate argument combinations: `--hybrid` is the supported way to combine async pipeline with colocate weight sharing; bare `--fully-async --colocate` now raises ValueError - Set `offload_train=False`, `offload_rollout=False`, and `compute_advantages_and_returns=True` for hybrid mode - Add Qwen3-4B 8xGPU hybrid-async training launch script

# ⭐ Feature ## Add `relax/utils/visualize` rollout result viewer - Add web viewer adapted from rlsp/utils/visualize: FastAPI + single-page UI for browsing `<save>/rollout_result/{train,eval}/{step}.jsonl`, with step dropdown, sample nav, sort by reward / response_length, sample-info card, and prompt / response / label rendering with chat-template / tool-call / `<think>` highlighting - Auto-discover `train/` and `eval/` subdirs and render a tab toggle when both exist; fall back to a single anonymous bucket for flat dirs - Add terminal UI mode (`--tui`) adapted from redaccel/verl reward_viewer_v2: sync-load the first step then stream remaining steps via a daemon thread (newest first), with step / sample / dataset / sort dropdowns, field filter, fuzzy search (`f`/`enter`/`esc`), vim-style page nav, and text/table render toggle - Default theme switched to dark; header shows the Relax wordmark linking to the GitHub repo plus a GitHub icon - Mask multimodal pad tokens (`<|image_pad|>`, etc.) by default in the TUI via `--mask-str` ## Add `relax/entrypoints/visualize` thin wrapper - One command for both modes: `python -m relax.entrypoints.visualize <dir>` (web) and `... --tui` (terminal) - `DATA_DIR` is a required positional argument - TUI dependencies (`textual`, `rich`) are lazy-imported; clear error message if missing --- # 📝 Documentation ## Add bilingual rollout result viewer guide - Add `docs/{en,zh}/guide/rollout-result-viewer.md` covering data layout, launch command, flags, page features, terminal UI key bindings, and reverse-proxy notes - Register the new pages under the existing "Operations & Debugging" / "运维与调试" sidebar group in `docs/.vitepress/config.mts` - Add `docs/public/relax-viewer.png` screenshot (palette-compressed PNG to stay under the 500 KB pre-commit limit)

dirtyDan0 and others added 10 commits May 22, 2026 16:50

fix: resume bug if num_rollout changed

8a8e4af

fix: add qwen3.6 async image example and rollout healthcheck retry fo…

df15b81

…r timeout

feat: add eval for multimodal

90b5b90

Yangruipis requested review from Aurelius84, NINGBENZHE and yxyOo as code owners May 22, 2026 08:57

NINGBENZHE approved these changes May 22, 2026

View reviewed changes

Yangruipis merged commit 5c5fad1 into main May 22, 2026
5 checks passed

Yangruipis deleted the sync/from-gitlab branch May 22, 2026 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync: gitlab/main -> github/main#38

sync: gitlab/main -> github/main#38
Yangruipis merged 10 commits into
mainfrom
sync/from-gitlab

Yangruipis commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

Yangruipis commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants