[graph_trainer] Support hinted symbolic input dims in tracing by sanketpurandare · Pull Request #3362 · pytorch/torchtitan

sanketpurandare · 2026-05-15T03:03:29Z

Stacked PRs:

[graph_trainer] Support hinted symbolic input dims in tracing

Add the graph_trainer tracing prerequisites needed by later EP chunking work, without introducing the chunking pass itself. minimal_fx_tracer now accepts both mark_unbacked and mark_dynamic metadata on plain tensor inputs, builds the corresponding symbolic context during fakeification, and keeps wrapper-subclass inputs rejected so DTensor-style layouts do not silently lose their metadata.

Make RoPE shape checks symbolic-shape friendly by replacing Python shape asserts with compact torch._check-based validation. This keeps the runtime checks but avoids specializing symbolic batch or sequence dimensions during tracing.

Make full-Inductor FX-to-FX canonicalization tolerate fresh intermediate unbacked scalar symbols, and keep MoE split-size CPU copies synchronous under compile/non-strict tracing so traced split sizes cannot race stale CPU reads.

Test Plan:

pytest -q torchtitan/experiments/graph_trainer/tests/test_trace_module.py -k 'mark_dynamic_batch_and_seq_dims_with_rope or dtensor_mark_unbacked_rejected'

aditvenk · 2026-05-15T19:28:19Z



-def _wrapper_subclass_has_mark_unbacked(tensor: torch.Tensor) -> bool:
+def _tensor_has_mark_dynamic(tensor: torch.Tensor) -> bool:


Should this also consider _dynamo_dynamic_range ? Previously, minimal tracer seems to have considered this too as "mark_dynamic"

Added support for both mark_dynamic and mark_unbacked metadata paths in the minimal tracer. The current helper imports the Dynamo dynamic/unbacked annotations into the StatelessSymbolicContext, including the unbacked bounds used by hinted symbolic dimensions.

aditvenk · 2026-05-15T19:28:20Z

            dynamic_sizes[dim] = DimDynamic.UNBACKED
            constraint_sizes[dim] = RelaxedUnspecConstraint(warn_only=False)
+        elif dim in marked_dynamic_indices:
+            dynamic_sizes[dim] = DimDynamic.DYNAMIC


Should we update constraint_sizes if dynamic has max/min hints?

The implementation keeps strict unbacked dimensions constrained with RelaxedUnspecConstraint and passes the Dynamo-provided unbacked bounds into the symbolic context. The min/max hint path is preserved through unbacked_bounds rather than specializing the traced graph to concrete sizes.

aditvenk · 2026-05-15T19:28:20Z

+        traced = minimal_fx_tracer(forward)(
+            x, xq, xk, freqs_cis, rope_cache, positions
+        )
+        self.assertTrue(


Test with a shape different than what was traced?

Added tracing coverage for marked dynamic inputs and runtime assertion materialization. The traced graph keeps the input dimension symbolic and emits ShapeEnv runtime checks, so the test now verifies behavior beyond the exact concrete size used during capture.

aditvenk · 2026-05-15T19:28:20Z

 ]


+def _check_shape_equal(actual, expected, context: str) -> None:


Can we restrict this change to graph_trainer only in some way? It seems necessary only for the minimal fx tracer?

Restricted the behavior to the graph-trainer/minimal-tracer use case by keeping the symbolic shape support in the tracer path. The RoPE change is limited to replacing Python shape equality assertions with torch._check, which is the PyTorch-native way to let symbolic sizes flow without weakening eager/runtime validation.

Add the graph_trainer tracing prerequisites needed by later EP chunking work, without introducing the chunking pass itself. minimal_fx_tracer now accepts both mark_unbacked and mark_dynamic metadata on plain tensor inputs, builds the corresponding symbolic context during fakeification, and keeps wrapper-subclass inputs rejected so DTensor-style layouts do not silently lose their metadata. Make RoPE shape checks symbolic-shape friendly by replacing Python shape asserts with compact torch._check-based validation. This keeps the runtime checks but avoids specializing symbolic batch or sequence dimensions during tracing. Make full-Inductor FX-to-FX canonicalization tolerate fresh intermediate unbacked scalar symbols, and keep MoE split-size CPU copies synchronous under compile/non-strict tracing so traced split sizes cannot race stale CPU reads. Test Plan: - pytest -q torchtitan/experiments/graph_trainer/tests/test_trace_module.py -k 'mark_dynamic_batch_and_seq_dims_with_rope or dtensor_mark_unbacked_rejected' stack-info: PR: #3362, branch: sanketpurandare/stack/16

SherlockNoMad · 2026-06-09T21:03:31Z

There are some coupling between tracer and EP_overlap passes...

This needs to be called out in readme.md / claude.md ... together with clear instruction on how to use ep overlap, and document it's composability and limitations.

Add the graph_trainer tracing prerequisites needed by later EP chunking work, without introducing the chunking pass itself. minimal_fx_tracer now accepts both mark_unbacked and mark_dynamic metadata on plain tensor inputs, builds the corresponding symbolic context during fakeification, and keeps wrapper-subclass inputs rejected so DTensor-style layouts do not silently lose their metadata. Make RoPE shape checks symbolic-shape friendly by replacing Python shape asserts with compact torch._check-based validation. This keeps the runtime checks but avoids specializing symbolic batch or sequence dimensions during tracing. Make full-Inductor FX-to-FX canonicalization tolerate fresh intermediate unbacked scalar symbols, and keep MoE split-size CPU copies synchronous under compile/non-strict tracing so traced split sizes cannot race stale CPU reads. Test Plan: - pytest -q torchtitan/experiments/graph_trainer/tests/test_trace_module.py -k 'mark_dynamic_batch_and_seq_dims_with_rope or dtensor_mark_unbacked_rejected' stack-info: PR: #3362, branch: sanketpurandare/stack/16

sanketpurandare requested review from SherlockNoMad, aditvenk, fegin, tianyu-l, wconstab, wwwjn, xmfan and yiming0416 as code owners May 15, 2026 03:03

pytorch-bot Bot added the ciflow/8gpu label May 15, 2026

sanketpurandare force-pushed the sanketpurandare/stack/15 branch from 07f3629 to 1ee37b9 Compare May 15, 2026 03:03

sanketpurandare force-pushed the sanketpurandare/stack/16 branch from 60ac042 to 3178797 Compare May 15, 2026 03:03

This was referenced May 15, 2026

[graph_trainer] Add DeepSeek V3 16B SDPA config #3361

Merged

[graph_trainer] Add EP overlap eager chunking scaffolding #3363

Open

[graph_trainer] Add graph EP chunking pass #3325

Open

[graph_trainer] Add EP overlap scheduling pass #3328

Open

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 15, 2026

sanketpurandare marked this pull request as draft May 15, 2026 09:36

sanketpurandare changed the base branch from sanketpurandare/stack/15 to main May 15, 2026 09:36

sanketpurandare force-pushed the sanketpurandare/stack/16 branch 2 times, most recently from d7fa96e to 0fb48ed Compare May 15, 2026 09:36

sanketpurandare mentioned this pull request May 15, 2026

[graph_trainer] Use separate EP process groups for overlap #3369

Merged

sanketpurandare changed the base branch from main to sanketpurandare/stack/18 May 15, 2026 09:36

sanketpurandare marked this pull request as ready for review May 15, 2026 09:37

SherlockNoMad mentioned this pull request May 15, 2026

[graph_trainer] Nightly scout tracking issue #2856

Open

aditvenk reviewed May 15, 2026

View reviewed changes

sanketpurandare marked this pull request as draft May 22, 2026 19:02

sanketpurandare changed the base branch from sanketpurandare/stack/18 to main May 22, 2026 19:02

sanketpurandare force-pushed the sanketpurandare/stack/16 branch from 0fb48ed to b243510 Compare May 22, 2026 19:03

sanketpurandare changed the base branch from main to sanketpurandare/stack/18 May 22, 2026 19:03

sanketpurandare marked this pull request as ready for review May 22, 2026 19:03

sanketpurandare marked this pull request as draft May 26, 2026 05:28

sanketpurandare changed the base branch from sanketpurandare/stack/18 to main May 26, 2026 05:28

sanketpurandare force-pushed the sanketpurandare/stack/16 branch from b243510 to bd80c98 Compare May 26, 2026 05:29

sanketpurandare changed the base branch from main to sanketpurandare/stack/18 May 26, 2026 05:29

sanketpurandare marked this pull request as ready for review May 26, 2026 05:29

sanketpurandare marked this pull request as draft May 26, 2026 05:30

sanketpurandare changed the base branch from sanketpurandare/stack/18 to main May 26, 2026 05:30

sanketpurandare changed the base branch from main to sanketpurandare/stack/18 May 26, 2026 05:31

sanketpurandare marked this pull request as ready for review May 26, 2026 05:31

sanketpurandare marked this pull request as draft May 27, 2026 03:51

sanketpurandare changed the base branch from sanketpurandare/stack/18 to main May 27, 2026 03:51

sanketpurandare force-pushed the sanketpurandare/stack/16 branch from bd80c98 to 7a084ea Compare May 27, 2026 03:51

sanketpurandare changed the base branch from main to sanketpurandare/stack/18 May 27, 2026 03:51

sanketpurandare marked this pull request as ready for review May 27, 2026 03:52

sanketpurandare force-pushed the sanketpurandare/stack/16 branch from 7a084ea to dd9fd88 Compare May 27, 2026 04:41

sanketpurandare changed the base branch from sanketpurandare/stack/18 to main May 27, 2026 04:41

sanketpurandare marked this pull request as draft May 27, 2026 15:28

sanketpurandare force-pushed the sanketpurandare/stack/16 branch from dd9fd88 to 1a4e569 Compare May 27, 2026 15:28

sanketpurandare marked this pull request as ready for review May 27, 2026 15:29

sanketpurandare marked this pull request as draft June 4, 2026 21:36

sanketpurandare force-pushed the sanketpurandare/stack/16 branch from 1a4e569 to adf04c0 Compare June 4, 2026 21:36

sanketpurandare marked this pull request as ready for review June 4, 2026 21:36

sanketpurandare requested a review from IvanKobzarev as a code owner June 4, 2026 21:36

sanketpurandare marked this pull request as draft June 10, 2026 23:49

sanketpurandare force-pushed the sanketpurandare/stack/16 branch from adf04c0 to 4bec890 Compare June 10, 2026 23:50

sanketpurandare marked this pull request as ready for review June 10, 2026 23:50

sanketpurandare mentioned this pull request Jun 10, 2026

[MoE] Use CPU split-size sum for EP permute output size #3627

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[graph_trainer] Support hinted symbolic input dims in tracing#3362

[graph_trainer] Support hinted symbolic input dims in tracing#3362
sanketpurandare wants to merge 1 commit into
mainfrom
sanketpurandare/stack/16

sanketpurandare commented May 15, 2026 •

edited

Loading

Uh oh!

aditvenk May 15, 2026

Uh oh!

sanketpurandare May 22, 2026

Uh oh!

aditvenk May 15, 2026

Uh oh!

sanketpurandare May 22, 2026

Uh oh!

aditvenk May 15, 2026

Uh oh!

sanketpurandare May 22, 2026

Uh oh!

aditvenk May 15, 2026

Uh oh!

sanketpurandare May 22, 2026

Uh oh!

SherlockNoMad commented Jun 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		def _wrapper_subclass_has_mark_unbacked(tensor: torch.Tensor) -> bool:
		def _tensor_has_mark_dynamic(tensor: torch.Tensor) -> bool:

		]


		def _check_shape_equal(actual, expected, context: str) -> None:

Conversation

sanketpurandare commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!