Skip to content

Re-enable example_ds3_local_map in CI after PyTorch OpStrategy.str fix #436

@aditvenk

Description

@aditvenk

example_ds3_local_map.py is skipped in CI (test_cuda.yml) because OpStrategy.str in PyTorch crashes when a PlacementStrategy has None in its input_specs or output_specs.

Root Cause

After #432 enabled repeated_subgraphs=True by default, graph clustering now calls str(op_strategy) on every node's strategy to build hash keys. Two kinds of nodes have None specs:

  • call_local_map nodes — opaque to sharding propagation, so the propagator assigns strategies with input_specs=(None,) and output_specs=None.
  • getitem nodes — extracting non-tensor outputs from multi-return ops (e.g., SDPA).

OpStrategy.str → mesh_shape → strategies[0].mesh → input_specs[0].mesh crashes with AttributeError: 'NoneType' object has no attribute 'mesh'.

Fix

The fix belongs in PyTorch's torch/distributed/tensor/_op_schema.py — OpStrategy.str (and the PlacementStrategy.mesh property it calls) should handle None specs, since PyTorch's own sharding
propagator produces them.

Action Items

  • Fix OpStrategy.str in PyTorch to handle None specs
  • Remove the skip in .github/workflows/test_cuda.yml and re-enable example_ds3_local_map.py

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions