[WIP] Enable DP-to-EP for MoE inference by wwwjn · Pull Request #3143 · pytorch/torchtitan

wwwjn · 2026-04-28T21:00:25Z

Stack from ghstack (oldest at bottom):

In development. Currently it doesn't work with weight loading

Map vLLM's data_parallel_size to dp_shard in TorchTitan's mesh math, enabling EP to span both DP and TP ranks (ep = dp * tp). For inference, the skip_dp path computes 2D meshes (fsdp + dp_replicate) then calls apply_fsdp which uses shard_placement_fn to route expert params to efsdp mesh and dense params to fsdp mesh.

Changes:

parallel_dims: dp_replicate mesh always exists (needed for 2D mesh)
vllm_wrapper: ep_size = dp_size * tp_size, dp_shard = dp_size
qwen3/parallelize: inference path computes 2D meshes for apply_fsdp
llama4/parallelize: clarify shard_placement_fn comments

Map vLLM's data_parallel_size to dp_shard in TorchTitan's mesh math, enabling EP to span both DP and TP ranks (ep = dp * tp). For inference, the skip_dp path computes 2D meshes (fsdp + dp_replicate) then calls apply_fsdp which uses shard_placement_fn to route expert params to efsdp mesh and dense params to fsdp mesh. Changes: - parallel_dims: dp_replicate mesh always exists (needed for 2D mesh) - vllm_wrapper: ep_size = dp_size * tp_size, dp_shard = dp_size - qwen3/parallelize: inference path computes 2D meshes for apply_fsdp - llama4/parallelize: clarify shard_placement_fn comments [ghstack-poisoned]

Map vLLM's data_parallel_size to dp_shard in TorchTitan's mesh math, enabling EP to span both DP and TP ranks (ep = dp * tp). For inference, the skip_dp path computes 2D meshes (fsdp + dp_replicate) then calls apply_fsdp which uses shard_placement_fn to route expert params to efsdp mesh and dense params to fsdp mesh. Changes: - parallel_dims: dp_replicate mesh always exists (needed for 2D mesh) - vllm_wrapper: ep_size = dp_size * tp_size, dp_shard = dp_size - qwen3/parallelize: inference path computes 2D meshes for apply_fsdp - llama4/parallelize: clarify shard_placement_fn comments ghstack-source-id: 3b4296c Pull Request resolved: #3143

Map vLLM's data_parallel_size to dp_shard in TorchTitan's mesh math, enabling EP to span both DP and TP ranks (ep = dp * tp). For inference, the skip_dp path computes 2D meshes (fsdp + dp_replicate) then calls apply_fsdp which uses shard_placement_fn to route expert params to efsdp mesh and dense params to fsdp mesh. Changes: - parallel_dims: dp_replicate mesh always exists (needed for 2D mesh) - vllm_wrapper: ep_size = dp_size * tp_size, dp_shard = dp_size - qwen3/parallelize: inference path computes 2D meshes for apply_fsdp - llama4/parallelize: clarify shard_placement_fn comments [ghstack-poisoned]

Map vLLM's data_parallel_size to dp_shard in TorchTitan's mesh math, enabling EP to span both DP and TP ranks (ep = dp * tp). For inference, the skip_dp path computes 2D meshes (fsdp + dp_replicate) then calls apply_fsdp which uses shard_placement_fn to route expert params to efsdp mesh and dense params to fsdp mesh. Changes: - parallel_dims: dp_replicate mesh always exists (needed for 2D mesh) - vllm_wrapper: ep_size = dp_size * tp_size, dp_shard = dp_size - qwen3/parallelize: inference path computes 2D meshes for apply_fsdp - llama4/parallelize: clarify shard_placement_fn comments ghstack-source-id: 3b4296c Pull Request resolved: #3143

In development. Currently it doesn't work with weight loading Map vLLM's data_parallel_size to dp_shard in TorchTitan's mesh math, enabling EP to span both DP and TP ranks (ep = dp * tp). For inference, the skip_dp path computes 2D meshes (fsdp + dp_replicate) then calls apply_fsdp which uses shard_placement_fn to route expert params to efsdp mesh and dense params to fsdp mesh. Changes: - parallel_dims: dp_replicate mesh always exists (needed for 2D mesh) - vllm_wrapper: ep_size = dp_size * tp_size, dp_shard = dp_size - qwen3/parallelize: inference path computes 2D meshes for apply_fsdp - llama4/parallelize: clarify shard_placement_fn comments [ghstack-poisoned]

Map vLLM's data_parallel_size to dp_shard in TorchTitan's mesh math, enabling EP to span both DP and TP ranks (ep = dp * tp). For inference, the skip_dp path computes 2D meshes (fsdp + dp_replicate) then calls apply_fsdp which uses shard_placement_fn to route expert params to efsdp mesh and dense params to fsdp mesh. Changes: - parallel_dims: dp_replicate mesh always exists (needed for 2D mesh) - vllm_wrapper: ep_size = dp_size * tp_size, dp_shard = dp_size - qwen3/parallelize: inference path computes 2D meshes for apply_fsdp - llama4/parallelize: clarify shard_placement_fn comments ghstack-source-id: 2228b13 Pull Request resolved: #3143

wwwjn · 2026-04-28T22:10:56Z

            # Always keep fsdp mesh with real backend so fully_shard()
            # can apply MixedPrecisionPolicy even at degree 1.
            return True
+        if name == "dp_replicate":


We need to make dp_replicate always exist because we need a 2D mesh to apply DDP via fully_shard

wwwjn requested review from fegin, tianyu-l and wconstab as code owners April 28, 2026 21:00

wwwjn mentioned this pull request Apr 28, 2026

[Not ready][rl] Enable TP2EP for unified MoE model in vLLM wrapper #3142

Open

2 tasks

pytorch-bot Bot added the ciflow/8gpu label Apr 28, 2026

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 28, 2026

wwwjn changed the title ~~Enable DP-to-EP for MoE inference~~ [WIP] Enable DP-to-EP for MoE inference Apr 28, 2026

wwwjn commented Apr 28, 2026

View reviewed changes

wwwjn closed this Apr 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Enable DP-to-EP for MoE inference#3143

[WIP] Enable DP-to-EP for MoE inference#3143
wwwjn wants to merge 4 commits into
gh/wwwjn/15/basefrom
gh/wwwjn/15/head

wwwjn commented Apr 28, 2026 •

edited

Loading

Uh oh!

wwwjn Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wwwjn commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wwwjn Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wwwjn commented Apr 28, 2026 •

edited

Loading