Quantize moveaxis/movedim so they delegate to Ethos-U (#20314)#20314
Quantize moveaxis/movedim so they delegate to Ethos-U (#20314)#20314apullin wants to merge 1 commit into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20314
Note: Links to docs will display an error until the docs builds have been completed. ❌ 4 New Failures, 6 Unrelated Failures, 1 Unclassified FailureAs of commit 2658037 with merge base 5241b4e ( NEW FAILURES - The following jobs have failed:
UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
|
@apullin has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108478011. |
This PR needs a
|
2a0bc8c to
a71d81b
Compare
Summary: The ARM PT2 quantizer's pass-through shared-qspec set in quantization_annotator.py (_one_to_one_shared_input_qspec) covers permute/permute_copy/transpose/view/squeeze etc., but omits aten.moveaxis/aten.movedim. A model that uses torch.moveaxis therefore leaves those ops unquantized: the quantizer brackets each one with dequantize -> moveaxis(float) -> quantize. On lowering, moveaxis decomposes to a float permute_copy. The Ethos-U55 operator-support check (operator_support/ethos_u55_support.py) only delegates permute_copy for int8/int16/int32, so it rejects the float one. Each rejected permute is stranded on the host, splitting the model into many delegated partitions (one NPU island per permute), which bloats the .pte with per-partition delegate overhead and host round-trips. Add aten.moveaxis.int / aten.movedim.int to _one_to_one_shared_input_qspec (guarded with getattr for torch-build variance, mirroring the existing transpose.Dimname handling) so they share the input quantization spec exactly like transpose/permute. They then stay int8, decompose to int8 permute_copy, and delegate to the NPU -- eliminating the host float islands. Impact: a quantized example ensemble (ConvNeXt-style blocks that use torch.moveaxis) that previously lowered into 9 Ethos-U55 partitions now lowers into a single delegate, with zero host permutes and ~24% smaller .pte, with no model changes. Generalizes to any moveaxis/movedim-using model on the Ethos-U backend. Differential Revision: D108478011
JakeStevens
left a comment
There was a problem hiding this comment.
Please add a test for the new annotation.
It would be nice to show that this results in a lowerable chain for a simple model, I believethere are model-level tests in the backend as well
a71d81b to
c822247
Compare
Summary: The ARM PT2 quantizer's pass-through shared-qspec set in quantization_annotator.py (_one_to_one_shared_input_qspec) covers permute/permute_copy/transpose/view/squeeze etc., but omits aten.moveaxis/aten.movedim. A model that uses torch.moveaxis therefore leaves those ops unquantized: the quantizer brackets each one with dequantize -> moveaxis(float) -> quantize. On lowering, moveaxis decomposes to a float permute_copy. The Ethos-U55 operator-support check (operator_support/ethos_u55_support.py) only delegates permute_copy for int8/int16/int32, so it rejects the float one. Each rejected permute is stranded on the host, splitting the model into many delegated partitions (one NPU island per permute), which bloats the .pte with per-partition delegate overhead and host round-trips. Add aten.moveaxis.int / aten.movedim.int to _one_to_one_shared_input_qspec (guarded with getattr for torch-build variance, mirroring the existing transpose.Dimname handling) so they share the input quantization spec exactly like transpose/permute. They then stay int8, decompose to int8 permute_copy, and delegate to the NPU -- eliminating the host float islands. Impact: a quantized example ensemble (ConvNeXt-style blocks that use torch.moveaxis) that previously lowered into 9 Ethos-U55 partitions now lowers into a single delegate, with zero host permutes and ~24% smaller .pte, with no model changes. Generalizes to any moveaxis/movedim-using model on the Ethos-U backend. Differential Revision: D108478011
|
Added a model level test in |
| assert expected_ops <= quantization_annotator._one_to_one_shared_input_qspec | ||
|
|
||
|
|
||
| @common.XfailIfNoCorstone300 |
There was a problem hiding this comment.
Do we need this? We are explicitly saying run_on_fvp=False, so why7 fail if no corstone?
| torch.ops.aten.movedim.intlist, | ||
| } | ||
|
|
||
| assert expected_ops <= quantization_annotator._one_to_one_shared_input_qspec |
There was a problem hiding this comment.
nit: logically this should go somewhere like:
arm/test/quantizer/test_generic_annotater.py
Summary: The ARM PT2 quantizer's pass-through shared-qspec set in quantization_annotator.py (_one_to_one_shared_input_qspec) covers permute/permute_copy/transpose/view/squeeze etc., but omits aten.moveaxis/aten.movedim. A model that uses torch.moveaxis therefore leaves those ops unquantized: the quantizer brackets each one with dequantize -> moveaxis(float) -> quantize. On lowering, moveaxis decomposes to a float permute_copy. The Ethos-U55 operator-support check (operator_support/ethos_u55_support.py) only delegates permute_copy for int8/int16/int32, so it rejects the float one. Each rejected permute is stranded on the host, splitting the model into many delegated partitions (one NPU island per permute), which bloats the .pte with per-partition delegate overhead and host round-trips. Add aten.moveaxis.int / aten.movedim.int to _one_to_one_shared_input_qspec (guarded with getattr for torch-build variance, mirroring the existing transpose.Dimname handling) so they share the input quantization spec exactly like transpose/permute. They then stay int8, decompose to int8 permute_copy, and delegate to the NPU -- eliminating the host float islands. Impact: a quantized example ensemble (ConvNeXt-style blocks that use torch.moveaxis) that previously lowered into 9 Ethos-U55 partitions now lowers into a single delegate, with zero host permutes and ~24% smaller .pte, with no model changes. Generalizes to any moveaxis/movedim-using model on the Ethos-U backend. Differential Revision: D108478011
c822247 to
2658037
Compare
|
let's see what that CI looks like when branch is no longer out of date before landing, the tosa errors are giving me pause so make sure they are resolved |
Summary:
The ARM PT2 quantizer's pass-through shared-qspec set in quantization_annotator.py
(_one_to_one_shared_input_qspec) covers permute/permute_copy/transpose/view/squeeze
etc., but omits aten.moveaxis/aten.movedim. A model that uses torch.moveaxis
therefore leaves those ops unquantized: the quantizer brackets each one with
dequantize -> moveaxis(float) -> quantize.
On lowering, moveaxis decomposes to a float permute_copy. The Ethos-U55
operator-support check (operator_support/ethos_u55_support.py) only delegates
permute_copy for int8/int16/int32, so it rejects the float one. Each rejected
permute is stranded on the host, splitting the model into many delegated
partitions (one NPU island per permute), which bloats the .pte with per-partition
delegate overhead and host round-trips.
Add aten.moveaxis.int / aten.movedim.int to _one_to_one_shared_input_qspec
(guarded with getattr for torch-build variance, mirroring the existing
transpose.Dimname handling) so they share the input quantization spec exactly like
transpose/permute. They then stay int8, decompose to int8 permute_copy, and
delegate to the NPU -- eliminating the host float islands.
Impact: a quantized example ensemble (ConvNeXt-style blocks that
use torch.moveaxis) that previously lowered into 9 Ethos-U55 partitions now lowers
into a single delegate, with zero host permutes and ~24% smaller .pte, with no
model changes. Generalizes to any moveaxis/movedim-using model on the Ethos-U
backend.
Differential Revision: D108478011