support expert parallel with deepep by GHtyt · Pull Request #9 · Dao-AILab/sonic-moe

GHtyt · 2025-12-26T05:42:49Z

Description

We integrated expert parallelism via DeepEP, leveraging a f/b approach similar to Megatron's, while keeping the router in high precision to prevent backwardgrads discrepancies. We also added tests to check the outputs and gradients between ep and no-ep implementation.

Usage

import torch
from sonicmoe import MoE, KernelBackendMoE
from sonicmoe.enums import ActivationType
import deep_ep

# torch init_process_group
...

# deepep buffer should be init first
ep_group = torch.distributed.new_grouplist(range(world_size))
ep_buffer = deep_ep.Buffer(group, int(1e9), 0, low_latency_mode=False,
                            num_qps_per_rank=(1))
ep_config = deep_ep.Config(24, 8, 256)

moe = MoE(
        num_experts=32,
        num_experts_per_tok=8,
        hidden_size=hidden_size,
        intermediate_size=intermediate_size,
        activation_function=ActivationType.SWIGLU,
        add_bias=False,
        std=0.02,
        rank=rank,
        ep_size=num_ranks,
        ep_group=group,
        ep_buffer=ep_buffer,
        ep_config=ep_config
).to(device=rank, dtype=torch.bfloat16)

support expert parallel with deepep

3e89fb6

GarlGuo requested review from GarlGuo and mayank31398 December 29, 2025 06:10

hemildesai mentioned this pull request Dec 31, 2025

Integrate SonicMoE kernels NVIDIA-NeMo/Automodel#1001

Open

GHtyt added 2 commits January 5, 2026 10:52

remove extra y1 z usage

eadb34a

rm debug codes

e5b8c31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support expert parallel with deepep#9

support expert parallel with deepep#9
GHtyt wants to merge 3 commits into
Dao-AILab:mainfrom
GHtyt:expert_parallel

GHtyt commented Dec 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GHtyt commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

GHtyt commented Dec 26, 2025 •

edited

Loading