Use CapabilityBasedPartitioner in AotiPartitioner (#20384) by shoumikhin · Pull Request #20384 · pytorch/executorch

shoumikhin · 2026-06-18T17:29:31Z

Summary:

AotiPartitioner (the base for the CUDA and Metal backends) groups the ops it
delegates into one partition, by hand. Every other ExecuTorch backend (XNNPACK,
Vulkan, CoreML) uses the shared CapabilityBasedPartitioner helper instead. This
switches AotiPartitioner to that helper too.

Why:

Consistency -- same partitioning path as the other backends, and a real
OperatorSupport hook instead of a hand-rolled tagging loop.
It can break. A delegate has to be one connected chunk of the graph. If the
ops being delegated aren't all next to each other (some other node sits in
between), putting them all in one partition is invalid and lowering crashes
with "AssertionError: Invalid partition, found dependency cycles".
CapabilityBasedPartitioner returns several maximal convex partitions instead,
each of which fuses cleanly.

No change for the common case: if every op can be delegated, you still get
exactly one partition (no extra delegate boundaries). When a non-delegated node
splits the delegated ops, this emits one partition (and one delegate boundary)
per island, which is the cost of producing a valid program. Control-flow ops
(cond/map/while_loop/scan) keep their branch get_attr operands in the same
partition, and constant/buffer tagging is unchanged.

Reviewed By: Gasoonjia

Differential Revision: D109040727

pytorch-bot · 2026-06-18T17:29:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20384

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 389 Pending

As of commit a67fd35 with merge base c9ef423 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-06-18T17:29:41Z

@shoumikhin has exported this pull request. If you are a Meta employee, you can view the originating Diff in D109040727.

github-actions · 2026-06-18T17:30:23Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

Switches the AOTInductor-based partitioning used by the CUDA/Metal backends (via AotiPartitioner) from a single hand-rolled “tag everything” partition to PyTorch’s shared CapabilityBasedPartitioner, so delegated regions are emitted as valid convex partitions when prior lowered regions or unsupported nodes split the graph.

Changes:

Refactors AotiPartitioner.partition() to use CapabilityBasedPartitioner over non-lowered nodes, producing one or more convex partitions instead of a single global tag.
Ensures control-flow branch get_attr nodes (for cond/map/while_loop/scan) inherit the control-flow op’s partition tag so they lower into the same submodule.
Extends CUDA partitioner tests to validate multi-partition behavior, control-flow tagging, and shared-constant handling across partitions.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
backends/aoti/aoti_partitioner.py	Replaces single-tag partitioning with `CapabilityBasedPartitioner`-driven convex partitions; preserves constant/mutated-buffer tagging and adds control-flow `get_attr` tag propagation.
backends/cuda/tests/test_cuda_partitioner.py	Updates assumptions about partition tags and adds targeted tests for split-graph multi-partitions, control-flow `get_attr` tagging, and shared constants across partitions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Summary: AotiPartitioner (the base for the CUDA and Metal backends) groups the ops it delegates into one partition, by hand. Every other ExecuTorch backend (XNNPACK, Vulkan, CoreML) uses the shared CapabilityBasedPartitioner helper instead. This switches AotiPartitioner to that helper too. Why: 1. Consistency -- same partitioning path as the other backends, and a real OperatorSupport hook instead of a hand-rolled tagging loop. 2. It can break. A delegate has to be one connected chunk of the graph. If the ops being delegated aren't all next to each other (some other node sits in between), putting them all in one partition is invalid and lowering crashes with "AssertionError: Invalid partition, found dependency cycles". CapabilityBasedPartitioner returns several maximal convex partitions instead, each of which fuses cleanly. No change for the common case: if every op can be delegated, you still get exactly one partition (no extra delegate boundaries). When a non-delegated node splits the delegated ops, this emits one partition (and one delegate boundary) per island, which is the cost of producing a valid program. Control-flow ops (cond/map/while_loop/scan) keep their branch get_attr operands in the same partition, and constant/buffer tagging is unchanged. Differential Revision: D109040727

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

        tag_constant_data(exported_program)
        tag_mutated_buffer(exported_program)

-        # A constant that still has users feeds only a prior delegate; tagging it
-        # would fail backend lowering's same-tag check (its user keeps the prior
-        # tag). tag_constant_data already claimed the ones this partition uses, so
-        # tag only the genuinely unused constants here.
-        for node in exported_program.graph.nodes:
-            if (
-                node.op == "placeholder"
-                and not node.users
-                and "delegation_tag" not in node.meta
-                and (
-                    is_param(exported_program, node)
-                    or is_buffer(exported_program, node)
-                    or is_lifted_tensor_constant(exported_program, node)
-                )
-            ):
-                node.meta["delegation_tag"] = tag
+        # tag_constant_data only tags constants that have users; tag the
+        # genuinely unused ones too so none are left dangling.


Summary: AotiPartitioner (the base for the CUDA and Metal backends) groups the ops it delegates into one partition, by hand. Every other ExecuTorch backend (XNNPACK, Vulkan, CoreML) uses the shared CapabilityBasedPartitioner helper instead. This switches AotiPartitioner to that helper too. Why: 1. Consistency -- same partitioning path as the other backends, and a real OperatorSupport hook instead of a hand-rolled tagging loop. 2. It can break. A delegate has to be one connected chunk of the graph. If the ops being delegated aren't all next to each other (some other node sits in between), putting them all in one partition is invalid and lowering crashes with "AssertionError: Invalid partition, found dependency cycles". CapabilityBasedPartitioner returns several maximal convex partitions instead, each of which fuses cleanly. No change for the common case: if every op can be delegated, you still get exactly one partition (no extra delegate boundaries). When a non-delegated node splits the delegated ops, this emits one partition (and one delegate boundary) per island, which is the cost of producing a valid program. Control-flow ops (cond/map/while_loop/scan) keep their branch get_attr operands in the same partition, and constant/buffer tagging is unchanged. Reviewed By: Gasoonjia Differential Revision: D109040727

Copilot AI review requested due to automatic review settings June 18, 2026 17:29

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 18, 2026

meta-codesync Bot added the meta-exported label Jun 18, 2026

meta-codesync Bot had a problem deploying to cadence June 18, 2026 17:29 Error

Copilot started reviewing on behalf of shoumikhin June 18, 2026 17:30 View session

Copilot AI reviewed Jun 18, 2026

View reviewed changes

shoumikhin added ciflow/trunk ciflow/metal ciflow/cuda ciflow/mlx labels Jun 18, 2026

shoumikhin temporarily deployed to cadence June 18, 2026 17:50 — with GitHub Actions Inactive

meta-codesync Bot changed the title ~~Use CapabilityBasedPartitioner in AotiPartitioner~~ Use CapabilityBasedPartitioner in AotiPartitioner (#20384) Jun 18, 2026

shoumikhin force-pushed the export-D109040727 branch from 2e0db98 to 4ccff6d Compare June 18, 2026 17:57

shoumikhin force-pushed the export-D109040727 branch from 4ccff6d to e3b2b11 Compare June 18, 2026 19:39

Copilot AI review requested due to automatic review settings June 18, 2026 19:39

Copilot started reviewing on behalf of shoumikhin June 18, 2026 19:39 View session

Copilot AI reviewed Jun 18, 2026

View reviewed changes

Gasoonjia approved these changes Jun 18, 2026

View reviewed changes

shoumikhin force-pushed the export-D109040727 branch from e3b2b11 to a67fd35 Compare June 18, 2026 20:29

shoumikhin requested review from JacobSzwejbka and larryliu0820 as code owners June 18, 2026 20:29

shoumikhin merged commit 5241b4e into pytorch:main Jun 18, 2026
546 of 572 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use CapabilityBasedPartitioner in AotiPartitioner (#20384)#20384

Use CapabilityBasedPartitioner in AotiPartitioner (#20384)#20384
shoumikhin merged 1 commit into
pytorch:mainfrom
shoumikhin:export-D109040727

shoumikhin commented Jun 18, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

pytorch-bot Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shoumikhin commented Jun 18, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20384

⏳ No Failures, 389 Pending

Uh oh!

meta-codesync Bot commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shoumikhin commented Jun 18, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented Jun 18, 2026 •

edited

Loading

This PR needs a `release notes:` label