Skip to content

[graph_trainer] Add DeepSeek V3 16B SDPA config#3361

Merged
sanketpurandare merged 1 commit into
mainfrom
sanketpurandare/stack/15
May 26, 2026
Merged

[graph_trainer] Add DeepSeek V3 16B SDPA config#3361
sanketpurandare merged 1 commit into
mainfrom
sanketpurandare/stack/15

Conversation

@sanketpurandare

@sanketpurandare sanketpurandare commented May 15, 2026

Copy link
Copy Markdown
Contributor

Stacked PRs:


[graph_trainer] Add DeepSeek V3 16B SDPA config

Add a graph_trainer DeepSeek V3 16B config that selects the SDPA attention backend. This complements the existing 16B FlexAttention graph_trainer config and gives performance and validation runs an explicit SDPA entry point.

Test Plan:\n- Not run; registry-only config addition.

Add a graph_trainer DeepSeek V3 16B config that selects the SDPA attention backend. This complements the existing 16B FlexAttention graph_trainer config and gives performance and validation runs an explicit SDPA entry point.

Test Plan:\n- Not run; registry-only config addition.

stack-info: PR: #3361, branch: sanketpurandare/stack/15
@sanketpurandare sanketpurandare marked this pull request as draft May 15, 2026 09:36
@sanketpurandare sanketpurandare marked this pull request as ready for review May 15, 2026 09:36
@sanketpurandare sanketpurandare marked this pull request as draft May 22, 2026 19:02
@sanketpurandare sanketpurandare marked this pull request as ready for review May 22, 2026 19:03
@sanketpurandare sanketpurandare merged commit 78b08dd into main May 26, 2026
20 of 26 checks passed
saforem2 added a commit to saforem2/torchtitan that referenced this pull request May 27, 2026
… routing

Merged 7 upstream commits (19c567f..af33f76). Documents which
ones needed ezpz replays:

- PR pytorch#3398 (Module subclass refactor): 3 import paths replayed in
  b052f29 — pure import-path swap, class API unchanged.
- PR pytorch#3146 (deterministic MoE routing): inherits transitively; this
  is the upstream fix for the _histc_xpu non-determinism blocker
  we hit on 2026-05-21. --debug.deterministic on MoE+XPU should now
  work.
- PR pytorch#3423 (MoE [7/n] 3D tensors): inherits transitively; doesn't
  touch deepseek_v3 callsites.
- PR pytorch#3105 (FSDP symm_mem): skipped — ezpz has its own apply_fsdp
  and symm_mem is an optional optimization XPU CCL likely doesn't
  support.
- PRs pytorch#3331/pytorch#3369/pytorch#3361: graph_trainer-only no-ops.

Captures two action items: smoke-test before next production push,
and re-try --debug.deterministic on MoE+XPU.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants