Skip to content

support expert parallel with deepep#9

Open
GHtyt wants to merge 3 commits into
Dao-AILab:mainfrom
GHtyt:expert_parallel
Open

support expert parallel with deepep#9
GHtyt wants to merge 3 commits into
Dao-AILab:mainfrom
GHtyt:expert_parallel

Conversation

@GHtyt
Copy link
Copy Markdown

@GHtyt GHtyt commented Dec 26, 2025

Description

We integrated expert parallelism via DeepEP, leveraging a f/b approach similar to Megatron's, while keeping the router in high precision to prevent backwardgrads discrepancies. We also added tests to check the outputs and gradients between ep and no-ep implementation.

Usage

import torch
from sonicmoe import MoE, KernelBackendMoE
from sonicmoe.enums import ActivationType
import deep_ep

# torch init_process_group
...

# deepep buffer should be init first
ep_group = torch.distributed.new_grouplist(range(world_size))
ep_buffer = deep_ep.Buffer(group, int(1e9), 0, low_latency_mode=False,
                            num_qps_per_rank=(1))
ep_config = deep_ep.Config(24, 8, 256)

moe = MoE(
        num_experts=32,
        num_experts_per_tok=8,
        hidden_size=hidden_size,
        intermediate_size=intermediate_size,
        activation_function=ActivationType.SWIGLU,
        add_bias=False,
        std=0.02,
        rank=rank,
        ep_size=num_ranks,
        ep_group=group,
        ep_buffer=ep_buffer,
        ep_config=ep_config
).to(device=rank, dtype=torch.bfloat16)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant