[Roadmap] nnScaler Roadmap Q3 2025

# Large Scale MoE Support

Start from deepseek v3, SOTA open source models are all MoE models with parameter numbers ranging from 100B to 1T. We want to refine existing implementation to automatically generate high performance distributed plans for these models efficiently.

## Tracer & Parser

- [ ] trace large model in less than 10 minutes

## Schedule

- [ ] integrate zero-bubble schedule
- [ ] integrate dual pipeline
- [ ] implement and test computation and communication overlap

## AutoDist

- [ ] support partitioning multiple dimension
- [ ] support profiling operators with communication, like ring-attention
- [ ] add interleaved pipeline parallelism to the search space
- [ ] refine partition constraint interface: add constraint by torch's full qualified name

## Codegen

- [ ] reduce code generation time when scale unit is large, like 128 devices

## Runtime

- [ ] improve saving checkpoint
- [ ] support parameter in bf16, but accumulated in fp32 in reducer
- [ ] support multiple parameter groups, like muon optimizer
- [ ] support dynamic sequence length and forbidden certain dims to be partitioned
- [ ] dedup checkpoints for modules not parallelized

## User experience

- [ ] add examples for hooks, e.g., logging router logits in MoE
- [ ] integrate nnScaler in RL training framework, like veRL
- [ ] bump transformers version in `example` folder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Roadmap] nnScaler Roadmap Q3 2025 #43

Large Scale MoE Support

Tracer & Parser

Schedule

AutoDist

Codegen

Runtime

User experience

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Roadmap] nnScaler Roadmap Q3 2025 #43

Description

Large Scale MoE Support

Tracer & Parser

Schedule

AutoDist

Codegen

Runtime

User experience

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions