[graph_trainer] Add DeepSeek V3 16B SDPA config#3361
Merged
Conversation
Add a graph_trainer DeepSeek V3 16B config that selects the SDPA attention backend. This complements the existing 16B FlexAttention graph_trainer config and gives performance and validation runs an explicit SDPA entry point. Test Plan:\n- Not run; registry-only config addition. stack-info: PR: #3361, branch: sanketpurandare/stack/15
07f3629 to
1ee37b9
Compare
This was referenced May 15, 2026
aditvenk
approved these changes
May 15, 2026
saforem2
added a commit
to saforem2/torchtitan
that referenced
this pull request
May 27, 2026
… routing Merged 7 upstream commits (19c567f..af33f76). Documents which ones needed ezpz replays: - PR pytorch#3398 (Module subclass refactor): 3 import paths replayed in b052f29 — pure import-path swap, class API unchanged. - PR pytorch#3146 (deterministic MoE routing): inherits transitively; this is the upstream fix for the _histc_xpu non-determinism blocker we hit on 2026-05-21. --debug.deterministic on MoE+XPU should now work. - PR pytorch#3423 (MoE [7/n] 3D tensors): inherits transitively; doesn't touch deepseek_v3 callsites. - PR pytorch#3105 (FSDP symm_mem): skipped — ezpz has its own apply_fsdp and symm_mem is an optional optimization XPU CCL likely doesn't support. - PRs pytorch#3331/pytorch#3369/pytorch#3361: graph_trainer-only no-ops. Captures two action items: smoke-test before next production push, and re-try --debug.deterministic on MoE+XPU.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked PRs:
[graph_trainer] Add DeepSeek V3 16B SDPA config
Add a graph_trainer DeepSeek V3 16B config that selects the SDPA attention backend. This complements the existing 16B FlexAttention graph_trainer config and gives performance and validation runs an explicit SDPA entry point.
Test Plan:\n- Not run; registry-only config addition.