[graph_trainer] Add DeepSeek V3 16B SDPA config by sanketpurandare · Pull Request #3361 · pytorch/torchtitan

sanketpurandare · 2026-05-15T03:03:27Z

Stacked PRs:

[graph_trainer] Add DeepSeek V3 16B SDPA config

Add a graph_trainer DeepSeek V3 16B config that selects the SDPA attention backend. This complements the existing 16B FlexAttention graph_trainer config and gives performance and validation runs an explicit SDPA entry point.

Test Plan:\n- Not run; registry-only config addition.

Add a graph_trainer DeepSeek V3 16B config that selects the SDPA attention backend. This complements the existing 16B FlexAttention graph_trainer config and gives performance and validation runs an explicit SDPA entry point. Test Plan:\n- Not run; registry-only config addition. stack-info: PR: #3361, branch: sanketpurandare/stack/15

… routing Merged 7 upstream commits (19c567f..af33f76). Documents which ones needed ezpz replays: - PR pytorch#3398 (Module subclass refactor): 3 import paths replayed in b052f29 — pure import-path swap, class API unchanged. - PR pytorch#3146 (deterministic MoE routing): inherits transitively; this is the upstream fix for the _histc_xpu non-determinism blocker we hit on 2026-05-21. --debug.deterministic on MoE+XPU should now work. - PR pytorch#3423 (MoE [7/n] 3D tensors): inherits transitively; doesn't touch deepseek_v3 callsites. - PR pytorch#3105 (FSDP symm_mem): skipped — ezpz has its own apply_fsdp and symm_mem is an optional optimization XPU CCL likely doesn't support. - PRs pytorch#3331/pytorch#3369/pytorch#3361: graph_trainer-only no-ops. Captures two action items: smoke-test before next production push, and re-try --debug.deterministic on MoE+XPU.

sanketpurandare requested review from SherlockNoMad, aditvenk, tianyu-l, xmfan and yiming0416 as code owners May 15, 2026 03:03

pytorch-bot Bot added the ciflow/8gpu label May 15, 2026

sanketpurandare force-pushed the sanketpurandare/stack/15 branch from 07f3629 to 1ee37b9 Compare May 15, 2026 03:03

sanketpurandare mentioned this pull request May 15, 2026

[graph_trainer] Support hinted symbolic input dims in tracing #3362

Open

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 15, 2026

This was referenced May 15, 2026

[graph_trainer] Add EP overlap eager chunking scaffolding #3363

Open

[graph_trainer] Add graph EP chunking pass #3325

Open

[graph_trainer] Add EP overlap scheduling pass #3328

Open

aditvenk approved these changes May 15, 2026

View reviewed changes

sanketpurandare marked this pull request as draft May 15, 2026 09:36

sanketpurandare mentioned this pull request May 15, 2026

[graph_trainer] Use separate EP process groups for overlap #3369

Merged

sanketpurandare marked this pull request as ready for review May 15, 2026 09:36

sanketpurandare marked this pull request as draft May 22, 2026 19:02

sanketpurandare marked this pull request as ready for review May 22, 2026 19:03

sanketpurandare merged commit 78b08dd into main May 26, 2026
20 of 26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[graph_trainer] Add DeepSeek V3 16B SDPA config#3361

[graph_trainer] Add DeepSeek V3 16B SDPA config#3361
sanketpurandare merged 1 commit into
mainfrom
sanketpurandare/stack/15

sanketpurandare commented May 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sanketpurandare commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!