feat(models): support dots.mocr training#42
Open
Yangruipis wants to merge 10 commits into
Open
Conversation
# ⭐ Feature ## Add DotsOCR2 backend integration - Register a Megatron bridge for DotsOCRForCausalLM with Qwen2 language weight mapping and replicated custom ViT weights - Add DotsOCR2 Megatron provider/model wiring with image embedding scatter support - Add SGLang external model and multimodal processor package for rollout inference ## Add training launch scripts - Add dotsocr2 model argument preset - Add 8xGPU multimodal GRPO launch script with async and sync modes - Propagate SGLang external package environment variables through Ray runtime env
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Add first-class support for dots.mocr, including Megatron training, SGLang rollout, launch scripts, alignment
tools, and docs.
Why
dots.mocr needs model-specific handling across both training and rollout:
<|img|><|imgpad|><|endofimg|>This PR also makes the dots.mocr path a reference example for adding future external multimodal models.
How
relax.models.dots_ocrfor config, vision tower, Megatron model/provider/Bridge, SGLang model, and SGLang processor.--sglang-external-model-packageto register custom SGLang model packages before server spawn.vision_dp_when_cpinto Megatron provider overrides.scripts/training/multimodal/run-dotsocr2-8xgpu.shscripts/training/multimodal/run-dotsocr2-8xgpu-hybrid.shStart
relax/models/dots_ocr/chat_template.jinjaopenr1-multi-modaldataset, see https://redai-infra.github.io/Relax/en/guide/quick-start.html#task-2-open-r1-vision-languageMODEL_DIRandDATA_DIR, then run the scriptscripts/training/multimodal/run-dotsocr2-8xgpu-hybrid.sh|
Testing
pre-commit run --all-filespassespytest tests/)Type of Change