Skip to content

Fix unsafe copies in EXIR and permute transforms#19873

Merged
oscarandersson8218 merged 5 commits into
pytorch:mainfrom
oscarandersson8218:fix_unsafe_deep_copies
Jun 22, 2026
Merged

Fix unsafe copies in EXIR and permute transforms#19873
oscarandersson8218 merged 5 commits into
pytorch:mainfrom
oscarandersson8218:fix_unsafe_deep_copies

Conversation

@oscarandersson8218

@oscarandersson8218 oscarandersson8218 commented May 29, 2026

Copy link
Copy Markdown
Collaborator

Replace deepcopy-based copies of shape lists with shallow container copies in paths that can carry torch.SymInt values from fake tensor metadata. This is safe as torch.Size is immutable, so creating a shallow list copy of it is safe: we can insert/delete elements in the new list without affecting the original shape.

This fixes symbolic-shape failures in exir.tensor.stride_from_dim_order() and in PostponePermuteOpBelowSqueezeOrUnsqueezeLikeView, where deepcopy() could crash while handling SymInt-backed shapes. The error that was produced: "RuntimeError: Cannot access data pointer of Tensor (e.g. FakeTensor, FunctionalTensor). If you're using torch.compile/export/fx, it is likely that we are erroneously tracing into a custom kernel. To fix this, please wrap the custom kernel into an opaque custom op. Please see the following for details:
https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html"

Add focused regression coverage for both cases: one test for stride computation with symbolic sizes, and one dynamic-shape edge-graph test for the permute/view transform pass.

cc @digantdesai @freddan80 @per @zingo @mansnils @Sebastian-Larsson @robell @rascani

Replace deepcopy-based copies of shape lists with shallow container
copies in paths that can carry torch.SymInt values from fake tensor
metadata. This is safe as torch.Size is immutable, so creating a shallow
list copy of it is safe: we can insert/delete elements in the new list
without affecting the original shape.

This fixes symbolic-shape failures in
exir.tensor.stride_from_dim_order() and in
PostponePermuteOpBelowSqueezeOrUnsqueezeLikeView, where deepcopy() could
crash while handling SymInt-backed shapes. The error that was produced:
"RuntimeError: Cannot access data pointer of Tensor (e.g.
FakeTensor, FunctionalTensor). If you're using torch.compile/export/fx,
it is likely that we are erroneously tracing into a custom kernel. To
fix this, please wrap the custom kernel into an opaque custom op. Please
see the following for details:
https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html"

Add focused regression coverage for both cases: one test for stride
computation with symbolic sizes, and one dynamic-shape edge-graph test
for the permute/view transform pass.

Signed-off-by: Oscar Andersson <oscar.andersson@arm.com>
Change-Id: I4eb2cae7b05f988edee570decb1307a671af4c2a
@pytorch-bot

pytorch-bot Bot commented May 29, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19873

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 1 Cancelled Job, 3 Unrelated Failures, 2 Unclassified Failures

As of commit 8807bdc with merge base 1a78804 (image):

NEW FAILURE - The following job has failed:

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 29, 2026
@oscarandersson8218 oscarandersson8218 added help wanted Extra attention is needed partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk release notes: none Do not include this in the release notes labels May 29, 2026
@mergennachin

Copy link
Copy Markdown
Contributor

@Gasoonjia can you review this pls

@digantdesai

Copy link
Copy Markdown
Contributor

@claude review this PR.

@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown

Claude finished @digantdesai's task in 2m 22s —— View job


Review of PR #19873: Fix unsafe copies in EXIR and permute transforms

  • Read changed files and understand the fix
  • Review exir/tensor.py changes
  • Review backends/transforms/postpone_permute_below_squeeze_view.py changes
  • Review test files
  • Post review feedback

Overall: This PR looks good. The fix is correct, minimal, and well-tested.

Summary

The PR replaces copy.deepcopy(...) with list(...) in two places where shape lists containing torch.SymInt values are copied before mutation. deepcopy crashes on SymInts because it tries to access the underlying tensor data pointer of the symbolic node. A shallow list copy is the right fix.

Correctness Analysis

exir/tensor.py (line 113): strides = list(sizes) — This creates a new list to hold stride values. Every element in strides is overwritten by the loop below (it never reads the initial values from sizes through strides), so a shallow copy is strictly more than needed, but correct and idiomatic.

backends/transforms/postpone_permute_below_squeeze_view.py (lines 111, 135): new_view_shape = list(pred_shape) — Here pred_shape is already a shallow list copy created at line 92 via [*cast(list, get_shape(...))]. The list() call creates another copy because new_view_shape gets mutated (.insert() / del) and pred_shape is needed intact for the other branches. The elements themselves (ints or SymInts) are never mutated — they're only read or replaced positionally — so shallow copy is semantically equivalent to the original deepcopy while being SymInt-safe.

Minor Notes

  1. The import copy removal is clean — no other usages remain in either file.
  2. The CI failure (Apple / build-demo-ios) appears unrelated to this change (it's a build script exit code, not a Python/shape issue).
  3. Both tests are well-constructed regression tests that would have caught the original bug — they exercise torch.SymInt values flowing through the affected code paths.

Nit (non-blocking)

In test_strides_from_dim_order_with_symbolic_sizes (line 409), the expected strides [12, 4, 1] are correct because dims 1 and 2 are concrete (3, 4) even though dim 0 is symbolic. A brief comment explaining why the strides are concrete despite symbolic input could aid future readers, though this is optional per the repo's minimal-comment style.

LGTM — no blocking issues found.


@oscarandersson8218

Copy link
Copy Markdown
Collaborator Author

Hi @digantdesai @Gasoonjia , this has been stale for 3 weeks. Any progress on this?

@zingo zingo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure other backends test still work.

@oscarandersson8218

Copy link
Copy Markdown
Collaborator Author

Make sure other backends test still work.

Done. All failures appear unrelated.

@oscarandersson8218 oscarandersson8218 merged commit 47f046d into pytorch:main Jun 22, 2026
484 of 492 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. help wanted Extra attention is needed partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm release notes: none Do not include this in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants