Skip to content

[Roadmap] nnScaler Roadmap Q3 2025 #43

@zyeric

Description

@zyeric

Large Scale MoE Support

Start from deepseek v3, SOTA open source models are all MoE models with parameter numbers ranging from 100B to 1T. We want to refine existing implementation to automatically generate high performance distributed plans for these models efficiently.

Tracer & Parser

  • trace large model in less than 10 minutes

Schedule

  • integrate zero-bubble schedule
  • integrate dual pipeline
  • implement and test computation and communication overlap

AutoDist

  • support partitioning multiple dimension
  • support profiling operators with communication, like ring-attention
  • add interleaved pipeline parallelism to the search space
  • refine partition constraint interface: add constraint by torch's full qualified name

Codegen

  • reduce code generation time when scale unit is large, like 128 devices

Runtime

  • improve saving checkpoint
  • support parameter in bf16, but accumulated in fp32 in reducer
  • support multiple parameter groups, like muon optimizer
  • support dynamic sequence length and forbidden certain dims to be partitioned
  • dedup checkpoints for modules not parallelized

User experience

  • add examples for hooks, e.g., logging router logits in MoE
  • integrate nnScaler in RL training framework, like veRL
  • bump transformers version in example folder

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions