Skip to content

[WIP] Enable DP+EP for MoE inference in vLLM wrapper#3236

Open
wwwjn wants to merge 27 commits into
gh/wwwjn/19/basefrom
gh/wwwjn/19/head
Open

[WIP] Enable DP+EP for MoE inference in vLLM wrapper#3236
wwwjn wants to merge 27 commits into
gh/wwwjn/19/basefrom
gh/wwwjn/19/head

Conversation

@wwwjn

@wwwjn wwwjn commented May 6, 2026

Copy link
Copy Markdown
Contributor

@wwwjn wwwjn requested review from fegin, tianyu-l and wconstab as code owners May 6, 2026 02:54
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 6, 2026
wwwjn added a commit that referenced this pull request May 6, 2026
ghstack-source-id: ddb3bcb
Pull Request resolved: #3236
wwwjn added a commit that referenced this pull request May 6, 2026
ghstack-source-id: c12da6e
Pull Request resolved: #3236
wwwjn added a commit that referenced this pull request May 6, 2026
ghstack-source-id: e3a8bb8
Pull Request resolved: #3236
wwwjn added a commit that referenced this pull request May 6, 2026
ghstack-source-id: 1eae488
Pull Request resolved: #3236
wwwjn added a commit that referenced this pull request May 6, 2026
ghstack-source-id: 3dac9e4
Pull Request resolved: #3236
@wwwjn wwwjn changed the title Enable DP+EP for MoE inference in vLLM wrapper [WIP] Enable DP+EP for MoE inference in vLLM wrapper May 6, 2026
wwwjn added a commit that referenced this pull request May 6, 2026
ghstack-source-id: 85ac9a8
Pull Request resolved: #3236
@pytorch-bot pytorch-bot Bot added the ciflow/rl label May 7, 2026
wwwjn added a commit that referenced this pull request May 7, 2026
ghstack-source-id: ae92888
Pull Request resolved: #3236
wwwjn added a commit that referenced this pull request May 7, 2026
ghstack-source-id: 9ede8f8
Pull Request resolved: #3236
wwwjn added a commit that referenced this pull request May 7, 2026
ghstack-source-id: 7d76787
Pull Request resolved: #3236
wwwjn added a commit that referenced this pull request May 8, 2026
ghstack-source-id: 39c4906
Pull Request resolved: #3236
[ghstack-poisoned]
wwwjn added a commit that referenced this pull request Jun 5, 2026
ghstack-source-id: d6f7065
Pull Request resolved: #3236
wwwjn added a commit that referenced this pull request Jun 5, 2026
ghstack-source-id: d6f7065
Pull Request resolved: #3236
[ghstack-poisoned]
wwwjn added a commit that referenced this pull request Jun 5, 2026
ghstack-source-id: d24f239
Pull Request resolved: #3236
[ghstack-poisoned]
wwwjn added a commit that referenced this pull request Jun 9, 2026
ghstack-source-id: ecd4868
Pull Request resolved: #3236
wwwjn added a commit that referenced this pull request Jun 9, 2026
ghstack-source-id: ecd4868
Pull Request resolved: #3236
wwwjn added a commit that referenced this pull request Jun 10, 2026
ghstack-source-id: ecd4868
Pull Request resolved: #3236
[ghstack-poisoned]
wwwjn added a commit that referenced this pull request Jun 10, 2026
ghstack-source-id: 7fbc42c
Pull Request resolved: #3236
[ghstack-poisoned]
wwwjn added a commit that referenced this pull request Jun 10, 2026
ghstack-source-id: 0ba8734
Pull Request resolved: #3236
[ghstack-poisoned]
wwwjn added a commit that referenced this pull request Jun 10, 2026
ghstack-source-id: 8a8da02
Pull Request resolved: #3236
[ghstack-poisoned]
wwwjn added a commit that referenced this pull request Jun 10, 2026
ghstack-source-id: 23946a9
Pull Request resolved: #3236
wwwjn added a commit that referenced this pull request Jun 10, 2026
ghstack-source-id: 23946a9
Pull Request resolved: #3236
[ghstack-poisoned]
wwwjn added a commit that referenced this pull request Jun 10, 2026
ghstack-source-id: 0197654
Pull Request resolved: #3236
[ghstack-poisoned]
wwwjn added a commit that referenced this pull request Jun 10, 2026
ghstack-source-id: a0640fa
Pull Request resolved: #3236
[ghstack-poisoned]
wwwjn added a commit that referenced this pull request Jun 10, 2026
ghstack-source-id: 225e6a1
Pull Request resolved: #3236
wwwjn added a commit that referenced this pull request Jun 10, 2026
… and fix combine() shape mismatch (#3595)

Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0)
(oldest at bottom):
* #3236
* #3142
* __->__ #3595

## What's the problem
- [Currently] combine() wrongly assume the tokens are evenly sharded on
each rank
https://github.com/pytorch/torchtitan/blob/c0428bb186f5c97e1d1b4ed89febd22916eee302/torchtitan/models/common/token_dispatcher.py#L439-L456
(infer global SPMD in local SPMD region)
- If uneven sharded, out_TD will have different shapes across SP ranks.
- We should directly ban if input number of tokens in input batch can
not be evenly sharded by SP ranks

- [Future] Router will use spmd_types soon, and router is per SP rank.
Per SP rank should have even sharding
- [Future] we want to avoid dispatch/load_balacing the padded token, we
should be able to do that by adding metadata field to record the
actually local tokens for each sp rank

## What does this PR do?

This PR is doing "virtual padding" , and passing metadata around
- Calculate num_local_tokens_after_padding = (T + pad_tokens) // sp_size
in MoE module
- Pass num_local_tokens_after_padding to GroupedExperts module, then to
combine()
- combine() returns a tensor with shape (num_local_tokens_after_padding
* sp_rank, .... )
- slice the combined tensor to (T, ...)  in MoE
[ghstack-poisoned]
wwwjn added a commit that referenced this pull request Jun 10, 2026
ghstack-source-id: f5f275c
Pull Request resolved: #3236
[ghstack-poisoned]
wwwjn added a commit that referenced this pull request Jun 10, 2026
ghstack-source-id: bbe3f3c
Pull Request resolved: #3236
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rl ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant