[WIP] Enable DP+EP for MoE inference in vLLM wrapper by wwwjn · Pull Request #3236 · pytorch/torchtitan

wwwjn · 2026-05-06T02:54:51Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

ghstack-source-id: ddb3bcb Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: c12da6e Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: e3a8bb8 Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: 1eae488 Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: 3dac9e4 Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: 85ac9a8 Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: ae92888 Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: 9ede8f8 Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: 7d76787 Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: 39c4906 Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: d6f7065 Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: d24f239 Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: ecd4868 Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: 7fbc42c Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: 0ba8734 Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: 8a8da02 Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: 23946a9 Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: 0197654 Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: a0640fa Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: 225e6a1 Pull Request resolved: #3236

… and fix combine() shape mismatch (#3595) Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0) (oldest at bottom): * #3236 * #3142 * __->__ #3595 ## What's the problem - [Currently] combine() wrongly assume the tokens are evenly sharded on each rank https://github.com/pytorch/torchtitan/blob/c0428bb186f5c97e1d1b4ed89febd22916eee302/torchtitan/models/common/token_dispatcher.py#L439-L456 (infer global SPMD in local SPMD region) - If uneven sharded, out_TD will have different shapes across SP ranks. - We should directly ban if input number of tokens in input batch can not be evenly sharded by SP ranks - [Future] Router will use spmd_types soon, and router is per SP rank. Per SP rank should have even sharding - [Future] we want to avoid dispatch/load_balacing the padded token, we should be able to do that by adding metadata field to record the actually local tokens for each sp rank ## What does this PR do? This PR is doing "virtual padding" , and passing metadata around - Calculate num_local_tokens_after_padding = (T + pad_tokens) // sp_size in MoE module - Pass num_local_tokens_after_padding to GroupedExperts module, then to combine() - combine() returns a tensor with shape (num_local_tokens_after_padding * sp_rank, .... ) - slice the combined tensor to (T, ...) in MoE

[ghstack-poisoned]

ghstack-source-id: f5f275c Pull Request resolved: #3236

[ghstack-poisoned]

ghstack-source-id: bbe3f3c Pull Request resolved: #3236

Enable DP+EP for MoE inference in vLLM wrapper

ca11ce5

[ghstack-poisoned]

wwwjn requested review from fegin, tianyu-l and wconstab as code owners May 6, 2026 02:54

wwwjn mentioned this pull request May 6, 2026

[Not ready][rl] Enable TP2EP for unified MoE model in vLLM wrapper #3142

Open

2 tasks

pytorch-bot Bot added the ciflow/8gpu label May 6, 2026

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 6, 2026

Update on "Enable DP+EP for MoE inference in vLLM wrapper"

ef41fa0

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request May 6, 2026

Enable DP+EP for MoE inference in vLLM wrapper

54bc159

ghstack-source-id: ddb3bcb Pull Request resolved: #3236

Update on "Enable DP+EP for MoE inference in vLLM wrapper"

88d4f42

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request May 6, 2026

Enable DP+EP for MoE inference in vLLM wrapper

7b0bc08

ghstack-source-id: c12da6e Pull Request resolved: #3236

Update on "Enable DP+EP for MoE inference in vLLM wrapper"

e388701

[ghstack-poisoned]

wwwjn mentioned this pull request May 6, 2026

[rl] Register customized config parser to vllm + less vllm config dependency #3242

Merged

wwwjn added a commit that referenced this pull request May 6, 2026

Enable DP+EP for MoE inference in vLLM wrapper

dc45ede

ghstack-source-id: e3a8bb8 Pull Request resolved: #3236

Update on "Enable DP+EP for MoE inference in vLLM wrapper"

da506cf

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request May 6, 2026

Enable DP+EP for MoE inference in vLLM wrapper

f2b577d

ghstack-source-id: 1eae488 Pull Request resolved: #3236

Update on "Enable DP+EP for MoE inference in vLLM wrapper"

71fcbf3

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request May 6, 2026

Enable DP+EP for MoE inference in vLLM wrapper

16ff5b3

ghstack-source-id: 3dac9e4 Pull Request resolved: #3236

wwwjn changed the title ~~Enable DP+EP for MoE inference in vLLM wrapper~~ [WIP] Enable DP+EP for MoE inference in vLLM wrapper May 6, 2026

Update on "[WIP] Enable DP+EP for MoE inference in vLLM wrapper"

916c02c

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request May 6, 2026

Enable DP+EP for MoE inference in vLLM wrapper

0027252

ghstack-source-id: 85ac9a8 Pull Request resolved: #3236

Update on "[WIP] Enable DP+EP for MoE inference in vLLM wrapper"

0ce00f2

[ghstack-poisoned]

pytorch-bot Bot added the ciflow/rl label May 7, 2026

wwwjn added a commit that referenced this pull request May 7, 2026

Enable DP+EP for MoE inference in vLLM wrapper

348fbfc

ghstack-source-id: ae92888 Pull Request resolved: #3236

Update on "[WIP] Enable DP+EP for MoE inference in vLLM wrapper"

132d1ad

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request May 7, 2026

Enable DP+EP for MoE inference in vLLM wrapper

4292bdf

ghstack-source-id: 9ede8f8 Pull Request resolved: #3236

Update on "[WIP] Enable DP+EP for MoE inference in vLLM wrapper"

50f97a7

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request May 7, 2026

Enable DP+EP for MoE inference in vLLM wrapper

b9c9cfa

ghstack-source-id: 7d76787 Pull Request resolved: #3236

Update on "[WIP] Enable DP+EP for MoE inference in vLLM wrapper"

ed80447

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request May 8, 2026

Enable DP+EP for MoE inference in vLLM wrapper

ca0e15c

ghstack-source-id: 39c4906 Pull Request resolved: #3236

Update

bfa0986

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Jun 5, 2026

Enable DP+EP for MoE inference in vLLM wrapper

c629e2f

ghstack-source-id: d6f7065 Pull Request resolved: #3236

wwwjn added a commit that referenced this pull request Jun 5, 2026

Enable DP+EP for MoE inference in vLLM wrapper

53b3005

ghstack-source-id: d6f7065 Pull Request resolved: #3236

Update

e00fd25

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Jun 5, 2026

Enable DP+EP for MoE inference in vLLM wrapper

2f6fa4b

ghstack-source-id: d24f239 Pull Request resolved: #3236

Update

bfd88de

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Jun 9, 2026

Enable DP+EP for MoE inference in vLLM wrapper

9e37886

ghstack-source-id: ecd4868 Pull Request resolved: #3236

wwwjn mentioned this pull request Jun 9, 2026

Using "virtual padding" to calculate number_local_tokens per SP rank, and fix combine() shape mismatch #3595

Merged

wwwjn added a commit that referenced this pull request Jun 9, 2026

Enable DP+EP for MoE inference in vLLM wrapper

adf53b2

ghstack-source-id: ecd4868 Pull Request resolved: #3236

wwwjn added a commit that referenced this pull request Jun 10, 2026

Enable DP+EP for MoE inference in vLLM wrapper

0e0114f

ghstack-source-id: ecd4868 Pull Request resolved: #3236

Update

5f9b3e8

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Jun 10, 2026

Enable DP+EP for MoE inference in vLLM wrapper

1b7b24e

ghstack-source-id: 7fbc42c Pull Request resolved: #3236

Update

7b14114

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Jun 10, 2026

Enable DP+EP for MoE inference in vLLM wrapper

16b4d2c

ghstack-source-id: 0ba8734 Pull Request resolved: #3236

Update

df33445

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Jun 10, 2026

Enable DP+EP for MoE inference in vLLM wrapper

349b242

ghstack-source-id: 8a8da02 Pull Request resolved: #3236

Update

ca1cff6

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Jun 10, 2026

Enable DP+EP for MoE inference in vLLM wrapper

78e785d

ghstack-source-id: 23946a9 Pull Request resolved: #3236

wwwjn added a commit that referenced this pull request Jun 10, 2026

Enable DP+EP for MoE inference in vLLM wrapper

0757c79

ghstack-source-id: 23946a9 Pull Request resolved: #3236

Update

9dd7819

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Jun 10, 2026

Enable DP+EP for MoE inference in vLLM wrapper

71603e1

ghstack-source-id: 0197654 Pull Request resolved: #3236

Update

a1b55a7

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Jun 10, 2026

Enable DP+EP for MoE inference in vLLM wrapper

3cd8c5b

ghstack-source-id: a0640fa Pull Request resolved: #3236

Update

5f41ff6

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Jun 10, 2026

Enable DP+EP for MoE inference in vLLM wrapper

d94a3c3

ghstack-source-id: 225e6a1 Pull Request resolved: #3236

Update

e8ad8f9

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Jun 10, 2026

Enable DP+EP for MoE inference in vLLM wrapper

1c94190

ghstack-source-id: f5f275c Pull Request resolved: #3236

Update

d94b9bf

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Jun 10, 2026

Enable DP+EP for MoE inference in vLLM wrapper

adcef73

ghstack-source-id: bbe3f3c Pull Request resolved: #3236

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Enable DP+EP for MoE inference in vLLM wrapper#3236

[WIP] Enable DP+EP for MoE inference in vLLM wrapper#3236
wwwjn wants to merge 27 commits into
gh/wwwjn/19/basefrom
gh/wwwjn/19/head

wwwjn commented May 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wwwjn commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wwwjn commented May 6, 2026 •

edited

Loading