Skip to content

sync: gitlab/main -> github/main#38

Merged
Yangruipis merged 10 commits into
mainfrom
sync/from-gitlab
May 22, 2026
Merged

sync: gitlab/main -> github/main#38
Yangruipis merged 10 commits into
mainfrom
sync/from-gitlab

Conversation

@Yangruipis

Copy link
Copy Markdown
Collaborator

Routine internal -> external sync.

dirtyDan0 and others added 10 commits May 22, 2026 16:50
…eyes script

# 🔩 Chore

## Align launch argument order in deepeyes run script

- Remove duplicate `--rollout-num-gpus-per-engine` definition in `examples/deepeyes/run_deepeyes.sh`.
- Keep the effective `--rollout-num-gpus-per-engine 2` from the SGLang config section and remove the earlier conflicting value.
# 🐛 Bug Fix

## Preserve DeepEyes multimodal state across partial resume

- Keep `sample.multimodal_inputs` as the read-only dataset input instead of appending observation images into the shared shallow-copy reference
- Make `_prepare_initial_inputs()` return the initial processor output as local `init_mm_train` without overwriting `sample.multimodal_train_inputs`
- Keep observation images in `current_image_data` and append observation processor chunks to `multimodal_train_inputs_buffer`
- Route the budget-exhausted early return through `_finalize_sample()` so resumed samples leave `generate()` with merged multimodal train inputs
# ⭐ Feature

## Auto-enable true-on-policy mode in fully-async

- Auto-enable `--true-on-policy-mode` in `slime_validate_args` when `fully_async` and `rollout_batch_size * n_samples_per_prompt == global_batch_size`, since the train forward log_probs equal what actor_fwd would produce
- Add `ROLES_FULLY_ASYNC_ON_POLICY` (no actor_fwd) and route to it from `process_role` when the mode is on
- Recompute `old_log_probs` inline in `policy_loss_function` via `log_probs.detach()`, recovering vanilla PG (ratio ≡ 1) while keeping TIS valid against rollout_log_probs
- Drop `log_probs` from required data fields in `MegatronTrainRayActor` and `Advantages` when actor_fwd is absent; treat `rollout_log_probs` as kl-zero template in advantages compute
- Make `Controller` fully-async weight-sync skip `actor_fwd` recv when the role is not registered, and skip the actor_fwd HTTP probe in the actor health check
- Fall back to `rollout_log_probs` for entropy logging in `log_rollout_data` when `log_probs` is unavailable

---

# 🔩 Chore

## Tune training scripts and runtime env

- Add `NVSHMEM_BOOTSTRAP_UID_SOCK_IFNAME` default in `scripts/entrypoint/local.sh`
- Drop `actor_fwd`/`reference` from resource specs in async scripts (35B/9B text, 9B openr1mm-mm) now that true-on-policy mode handles them
- Add `--log-probs-max-tokens-per-gpu` and adjust parallelism/recompute/MoE-dispatcher knobs across qwen3/qwen35/qwen36 scripts to fit larger micro-batches
# 🐛 Bug Fix

## PR #1889: validate MoE HF config when dense layers exist

- Add `_has_dense_moe_layers` / `_is_moe_config` helpers in
  `relax/backends/megatron/arguments.py`
- Validate `moe_intermediate_size` and `shared_expert_intermediate_size`
- Skip `intermediate_size` check when model is pure MoE with no dense layer
- Ref: THUDM/slime#1889

## PR #1880: fix per-actor HTTP POST concurrency split

- Replace node-count-only divisor with `len(nodes) * num_gpus_per_node`
- Use ceiling division `(c + n - 1) // n` to avoid losing concurrency budget
- File: `relax/utils/http_utils.py`
- Ref: THUDM/slime#1880

## PR #1873: avoid blocking the asyncio event loop on ray.get

- Replace `asyncio.to_thread(ray.get, obj_ref)` with direct `await obj_ref`
- Removes the hard ThreadPoolExecutor cap on parallel POSTs
- File: `relax/utils/http_utils.py`
- Ref: THUDM/slime#1873

## PR #1888: actor save_model must wake_up / sleep, not just reload PG

- `save_model` was calling `reload_process_groups()` without resuming
  `torch_memory_saver`, leaving GPU tensors unreachable when NCCL is
  triggered during checkpoint save
- Use `self.wake_up()` / `self.sleep()` instead so PG and TMS stay in sync
- File: `relax/backends/megatron/actor.py`
- Ref: THUDM/slime#1888

## PR #1882: disaggregate PPO must reconnect rollout NCCL across sleep

- Add `disconnect_rollout_engines` to `UpdateWeightFromDistributed`
- In `actor.sleep()`, tear down weight-sync NCCL group for disaggregate
  PPO (`use_critic` + not `colocate`) before `destroy_process_groups()`
- In `actor.update_weights()`, force `wake_up` + `connect_rollout_engines`
  + `sleep` when the disconnect path was taken
- Drop the buggy `args.use_critic` GPU-offset branch in
  `sglang_engine.get_base_gpu_id`
- Auto-enable `offload_train` when `use_critic` is on
- Files: `relax/backends/megatron/actor.py`,
  `relax/backends/megatron/weight_update/update_weight_from_distributed.py`,
  `relax/backends/sglang/sglang_engine.py`, `relax/utils/arguments.py`
- Ref: THUDM/slime#1882

## PR #1878: reinitialize critic output_layer when ckpt shape mismatches

- Detect missing or shape-mismatched `output_layer.{weight,bias}` in the
  critic checkpoint metadata before / after `load_checkpoint`
- Reinitialize with `normal_(0, 0.02)` for weight and zero for bias, plus
  `optimizer.reload_model_params()` for fp16/bf16 master sync
- Gated on `role == "critic"`, no effect on actor or non-PPO algorithms
- File: `relax/backends/megatron/model.py`
- Ref: THUDM/slime#1878

---

# ⭐ Feature

## PR #1890: add missing spec / prefix-cache rollout metrics

- Wire `_compute_spec_metrics` and `_compute_prefix_cache_metrics` into
  `compute_metrics_from_samples`
- File: `relax/distributed/ray/rollout.py`
- Ref: THUDM/slime#1890

---

# 🔩 Chore

## PR #1862: improve `slice_log_prob_with_cp` assert message

- Include `len(log_prob)`, `response_length`, `total_length` in the assert
  so failures are diagnosable
- File: `relax/backends/megatron/cp_utils.py`
- Ref: THUDM/slime#1862

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
# ⭐ Feature

## Add --custom-prompt-path for prompt transformation hook

- Add CLI argument in add_data_arguments (arguments.py)
- Load custom function via load_function in data_source.py
- Thread custom_prompt_func through build_messages, process_raw_sample,
  BaseDataset, Dataset, and StreamingDataset
- Custom function is called after prompt extraction, before
  conversation/multimodal processing

## Add --image-resize-scale-factor for image dimension alignment control

- Add CLI argument in add_data_arguments (arguments.py)
- Add image_resize_scale_factor field to MultimodalConfig with
  from_args propagation and getter function
- Update fetch_image with 3-way logic: None uses default patch_factor,
  0 disables alignment, positive int uses custom value

---

# 📝 Documentation

## Update configuration reference docs

- Add --custom-prompt-path to Dataset table (EN + ZH)
- Add --image-resize-scale-factor to Multimodal Data table (EN + ZH)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
# ⭐ Feature

## Add hybrid training mode combining async data pipeline with colocate weight sharing

- Introduce `--hybrid` flag that sets `fully_async=True` and `colocate=True`
  so actor/ref/actor_fwd share GPUs via TensorBackuper+_switch_model while
  rollout runs on a separate GPU placement group with streaming transfer queue
- Add `train_hybrid()` method in MegatronTrainRayActor: collects sub-batches
  from transfer queue, runs ref/teacher/actor forward per sub-batch, merges
  all sub-batches, computes advantages with correct global normalization,
  then trains on the full merged batch
- Register hybrid mode in `process_role()` to use ROLES_COLOCATE (actor +
  rollout only, no separate reference/actor_fwd services)
- Update controller to skip shared placement groups and fully-async DCS
  weight sync setup when hybrid is active
- Skip actor_fwd health probe in `_check_services_health` for hybrid mode
  to avoid spurious warnings
- Add `train_hybrid()` dispatch in RayTrainGroup and Actor component
- Validate argument combinations: `--hybrid` is the supported way to combine
  async pipeline with colocate weight sharing; bare `--fully-async --colocate`
  now raises ValueError
- Set `offload_train=False`, `offload_rollout=False`, and
  `compute_advantages_and_returns=True` for hybrid mode
- Add Qwen3-4B 8xGPU hybrid-async training launch script
# ⭐ Feature

## Add `relax/utils/visualize` rollout result viewer

- Add web viewer adapted from rlsp/utils/visualize: FastAPI + single-page UI
  for browsing `<save>/rollout_result/{train,eval}/{step}.jsonl`, with
  step dropdown, sample nav, sort by reward / response_length, sample-info
  card, and prompt / response / label rendering with chat-template / tool-call
  / `<think>` highlighting
- Auto-discover `train/` and `eval/` subdirs and render a tab toggle when both
  exist; fall back to a single anonymous bucket for flat dirs
- Add terminal UI mode (`--tui`) adapted from redaccel/verl reward_viewer_v2:
  sync-load the first step then stream remaining steps via a daemon thread
  (newest first), with step / sample / dataset / sort dropdowns, field
  filter, fuzzy search (`f`/`enter`/`esc`), vim-style page nav, and
  text/table render toggle
- Default theme switched to dark; header shows the Relax wordmark linking
  to the GitHub repo plus a GitHub icon
- Mask multimodal pad tokens (`<|image_pad|>`, etc.) by default in the TUI
  via `--mask-str`

## Add `relax/entrypoints/visualize` thin wrapper

- One command for both modes: `python -m relax.entrypoints.visualize <dir>`
  (web) and `... --tui` (terminal)
- `DATA_DIR` is a required positional argument
- TUI dependencies (`textual`, `rich`) are lazy-imported; clear error
  message if missing

---

# 📝 Documentation

## Add bilingual rollout result viewer guide

- Add `docs/{en,zh}/guide/rollout-result-viewer.md` covering data layout,
  launch command, flags, page features, terminal UI key bindings, and
  reverse-proxy notes
- Register the new pages under the existing "Operations & Debugging" /
  "运维与调试" sidebar group in `docs/.vitepress/config.mts`
- Add `docs/public/relax-viewer.png` screenshot (palette-compressed PNG
  to stay under the 500 KB pre-commit limit)
@Yangruipis Yangruipis merged commit 5c5fad1 into main May 22, 2026
5 checks passed
@Yangruipis Yangruipis deleted the sync/from-gitlab branch May 22, 2026 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants