Fix unsafe copies in EXIR and permute transforms#19873
Conversation
Replace deepcopy-based copies of shape lists with shallow container copies in paths that can carry torch.SymInt values from fake tensor metadata. This is safe as torch.Size is immutable, so creating a shallow list copy of it is safe: we can insert/delete elements in the new list without affecting the original shape. This fixes symbolic-shape failures in exir.tensor.stride_from_dim_order() and in PostponePermuteOpBelowSqueezeOrUnsqueezeLikeView, where deepcopy() could crash while handling SymInt-backed shapes. The error that was produced: "RuntimeError: Cannot access data pointer of Tensor (e.g. FakeTensor, FunctionalTensor). If you're using torch.compile/export/fx, it is likely that we are erroneously tracing into a custom kernel. To fix this, please wrap the custom kernel into an opaque custom op. Please see the following for details: https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html" Add focused regression coverage for both cases: one test for stride computation with symbolic sizes, and one dynamic-shape edge-graph test for the permute/view transform pass. Signed-off-by: Oscar Andersson <oscar.andersson@arm.com> Change-Id: I4eb2cae7b05f988edee570decb1307a671af4c2a
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19873
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 1 Cancelled Job, 3 Unrelated Failures, 2 Unclassified FailuresAs of commit 8807bdc with merge base 1a78804 ( NEW FAILURE - The following job has failed:
UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:
CANCELLED JOB - The following job was cancelled. Please retry:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@Gasoonjia can you review this pls |
|
@claude review this PR. |
|
Claude finished @digantdesai's task in 2m 22s —— View job Review of PR #19873: Fix unsafe copies in EXIR and permute transforms
Overall: This PR looks good. The fix is correct, minimal, and well-tested. SummaryThe PR replaces Correctness Analysis
Minor Notes
Nit (non-blocking)In LGTM — no blocking issues found. |
|
Hi @digantdesai @Gasoonjia , this has been stale for 3 weeks. Any progress on this? |
zingo
left a comment
There was a problem hiding this comment.
Make sure other backends test still work.
Done. All failures appear unrelated. |
Replace deepcopy-based copies of shape lists with shallow container copies in paths that can carry torch.SymInt values from fake tensor metadata. This is safe as torch.Size is immutable, so creating a shallow list copy of it is safe: we can insert/delete elements in the new list without affecting the original shape.
This fixes symbolic-shape failures in exir.tensor.stride_from_dim_order() and in PostponePermuteOpBelowSqueezeOrUnsqueezeLikeView, where deepcopy() could crash while handling SymInt-backed shapes. The error that was produced: "RuntimeError: Cannot access data pointer of Tensor (e.g. FakeTensor, FunctionalTensor). If you're using torch.compile/export/fx, it is likely that we are erroneously tracing into a custom kernel. To fix this, please wrap the custom kernel into an opaque custom op. Please see the following for details:
https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html"
Add focused regression coverage for both cases: one test for stride computation with symbolic sizes, and one dynamic-shape edge-graph test for the permute/view transform pass.
cc @digantdesai @freddan80 @per @zingo @mansnils @Sebastian-Larsson @robell @rascani