Skip to content

feat(models): support dots.mocr training#42

Open
Yangruipis wants to merge 10 commits into
redai-infra:mainfrom
Yangruipis:feat/wuhuan/support_dotsocr
Open

feat(models): support dots.mocr training#42
Yangruipis wants to merge 10 commits into
redai-infra:mainfrom
Yangruipis:feat/wuhuan/support_dotsocr

Conversation

@Yangruipis

@Yangruipis Yangruipis commented May 30, 2026

Copy link
Copy Markdown
Collaborator

What

Add first-class support for dots.mocr, including Megatron training, SGLang rollout, launch scripts, alignment
tools, and docs.

Why

dots.mocr needs model-specific handling across both training and rollout:

  • custom multimodal token rules: <|img|><|imgpad|><|endofimg|>
  • Megatron Bridge mappings for HF ↔ Megatron weights
  • SGLang external model / processor registration
  • packed sequence and CP-compatible vision embedding flow

This PR also makes the dots.mocr path a reference example for adding future external multimodal models.

How

  • Add relax.models.dots_ocr for config, vision tower, Megatron model/provider/Bridge, SGLang model, and SGLang processor.
  • Add --sglang-external-model-package to register custom SGLang model packages before server spawn.
  • Wire vision_dp_when_cp into Megatron provider overrides.
  • Add DotsOCR2 launch scripts:
    • scripts/training/multimodal/run-dotsocr2-8xgpu.sh
    • scripts/training/multimodal/run-dotsocr2-8xgpu-hybrid.sh
  • Add HF / SGLang / Megatron alignment scripts for single-sample and packed/CP paths.
  • Document the external model integration workflow in EN/ZH docs.

Start

image image
image image

|

Testing

  • pre-commit run --all-files passes
  • Tests pass (pytest tests/)
  • New tests added (if applicable)
  • Documentation updated (if applicable)

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • CI/CD or build changes

Yangruipis added 10 commits May 31, 2026 03:34
# ⭐ Feature

## Add DotsOCR2 backend integration

- Register a Megatron bridge for DotsOCRForCausalLM with Qwen2 language weight mapping and replicated custom ViT weights
- Add DotsOCR2 Megatron provider/model wiring with image embedding scatter support
- Add SGLang external model and multimodal processor package for rollout inference

## Add training launch scripts

- Add dotsocr2 model argument preset
- Add 8xGPU multimodal GRPO launch script with async and sync modes
- Propagate SGLang external package environment variables through Ray runtime env
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant