Skip to content

Support EP sentinels#51

Open
IlyasMoutawwakil wants to merge 12 commits into
Dao-AILab:mainfrom
IlyasMoutawwakil:sentinel-experts
Open

Support EP sentinels#51
IlyasMoutawwakil wants to merge 12 commits into
Dao-AILab:mainfrom
IlyasMoutawwakil:sentinel-experts

Conversation

@IlyasMoutawwakil
Copy link
Copy Markdown

@IlyasMoutawwakil IlyasMoutawwakil commented Apr 26, 2026

This PR adds support for EP through sentinel experts (experts_ids >= E) #50

@IlyasMoutawwakil IlyasMoutawwakil marked this pull request as ready for review April 26, 2026 09:50
@IlyasMoutawwakil
Copy link
Copy Markdown
Author

IlyasMoutawwakil commented Apr 29, 2026

@GarlGuo we have some pretty good numbers with this PR

bench_grouped_mm_ep_sentinel bench_sonicmoe_ep_sentinel

compared to grouped_mm, we noticed that sonicmoe scales pretty well with naive EP (all tokens are passed but only a subset is routed on each rank, the non-routed ones get a sentinel value and are skipped by the grouped gemm computation).
this naive EP approach allows us to shard experts and reduce compute but not token hidden state memory (all-to-all would reduce both), the nice thing about it is that it doesn't add any cpu-gpu syncs (because the shapes are not routing dependent) so torch.compile/fullgraph/cuda-graph compatible

@tridao
Copy link
Copy Markdown
Member

tridao commented Apr 29, 2026

woah this scaling is pretty good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants