Skip to content

[XNNPACK Quantizer] Select between TConvs and Convs#11863

Closed
GregoryComer wants to merge 80 commits into
gh/mcr229/32/origfrom
gh/mcr229/33/orig
Closed

[XNNPACK Quantizer] Select between TConvs and Convs#11863
GregoryComer wants to merge 80 commits into
gh/mcr229/32/origfrom
gh/mcr229/33/orig

Conversation

@GregoryComer

Copy link
Copy Markdown
Member

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #11732 by @mcr229
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/mcr229/33/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/mcr229/33/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/mcr229/32/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/mcr229/33/orig
@diff-train-skip-merge

leafs1 and others added 30 commits June 16, 2025 14:13
…uple outputs (#11647)

### Summary
This PR fixes the `channels_last_tagged_reshape_pass.py` to properly
handle tuple outputs with mixed memory formats. Previously, the pass
only checked and converted the first element of tuple outputs, which
could lead to incorrect memory formats for other elements in the tuple.
This fix is important for models that return multiple outputs with
different memory format requirements, such as a mix of convolution
outputs (which should be in NHWC format) and linear outputs (which
should be in standard format).

### Test plan
I added a new test class `ThreeOutputsModel` that has three outputs with
different memory format requirements. I ensured that this test output
given NCHW and NHWC inputs would evaluate properly. I also created a
simpler 2 input class `ConvAddConvOutput` which operated on different
inputs and returned two different dim order outputs.
Differential Revision: D76737404

Pull Request resolved: #11727
Differential Revision: D76469624

Pull Request resolved: #11577
### Summary
Fixed linter error.

### Test plan
CI

Co-authored-by: Guang Yang <guangyang@fb.com>
#11745)

### Summary
Running `install_dev.py` for `optimum-executorch` will force overriding
installed `executorch` and torch deps to the pinned nightly in
`optimum-executorch`. In ExecuTorch CI including the benchmark, we would
want to always run the optimum-executorch models with ExecuTorch from
source to catch issues/regressions.

### Test plan
Verified the installed deps in the CI and benchmark jobs

Co-authored-by: Guang Yang <guangyang@fb.com>
### Summary
1. Update MediaTek backend documents for the decoupled buffer allocator.
2. Follow backend template.
3. Remove unnecessary instructions.

Fixes #8532 

@pytorchbot label "partner: mediatek"
Differential Revision: D76745314

Pull Request resolved: #11739
As titled, this API allows us to support multi-turn conversation by
passing in a `start_pos` argument to `generate_from_pos`.

This pull request introduces a new feature to support text generation
from a specific starting position (`generate_from_pos`) and includes
updates to ensure proper error handling and functionality when
`max_new_tokens` is negative. The changes primarily focus on extending
the `TextLLMRunner` class and its associated methods to accommodate this
new feature while maintaining backward compatibility.

### New Feature: Text Generation from a Specific Starting Position

* **Added `generate_from_pos` Method**: Introduced a new method
`generate_from_pos` in `TextLLMRunner` to allow text generation starting
from a specified position in the KV cache. This includes updates to the
method signature, logic, and error handling.
(`extension/llm/runner/text_llm_runner.cpp`
[[1]](diffhunk://#diff-9b3bd38c0b1ad81b18afab15784634e2b394fda448f5e2dae03de58870751440L76-R78)
[[2]](diffhunk://#diff-9b3bd38c0b1ad81b18afab15784634e2b394fda448f5e2dae03de58870751440R129-R156)
[[3]](diffhunk://#diff-9b3bd38c0b1ad81b18afab15784634e2b394fda448f5e2dae03de58870751440L150-R165)
[[4]](diffhunk://#diff-9b3bd38c0b1ad81b18afab15784634e2b394fda448f5e2dae03de58870751440R219-R225);
`extension/llm/runner/text_llm_runner.h`
[[5]](diffhunk://#diff-d1aa44a87ea9b7ec51250c2002466cb9bd57db153c1c8b58ffdf73e8f231a89bR98-R122)

* **Updated Documentation**: Enhanced method documentation in
`TextLLMRunner` to describe the new functionality, including parameters
like `start_pos` and the expected behavior.
(`extension/llm/runner/text_llm_runner.h`
[[1]](diffhunk://#diff-d1aa44a87ea9b7ec51250c2002466cb9bd57db153c1c8b58ffdf73e8f231a89bL81-R83)
[[2]](diffhunk://#diff-d1aa44a87ea9b7ec51250c2002466cb9bd57db153c1c8b58ffdf73e8f231a89bR98-R122)

### Error Handling Improvements

* **Validation for `max_new_tokens`**: Added checks to ensure
`max_new_tokens` is positive. If it is not, an `InvalidArgument` error
is returned. This prevents invalid configurations during text
generation. (`extension/llm/runner/text_llm_runner.cpp`
[extension/llm/runner/text_llm_runner.cppR129-R156](diffhunk://#diff-9b3bd38c0b1ad81b18afab15784634e2b394fda448f5e2dae03de58870751440R129-R156))

* **Unit Test for Negative `max_new_tokens`**: Created a new test case
(`GenerateFromPosErrorsWithNegativeMaxNewTokens`) to verify that the
`generate_from_pos` method correctly handles scenarios where
`max_new_tokens` is negative.
(`extension/llm/runner/test/test_text_llm_runner.cpp`
[extension/llm/runner/test/test_text_llm_runner.cppR325-R379](diffhunk://#diff-0a1e69b4182878ccad887c4f4ba3929ef24082a26623e26a871d73f4e6cea503R325-R379))
…1724)

Arm backend: Added decomposition for MaxPool2D operator with dilation >
0

Signed-off-by: Elena Zhelezina <elena.zhelezina@arm.com>
- Adds support for per-channel quantization in TosaQuantizer and
TosaBackend
- Enables per-channel quantization for MobilneNetV2 test cases


cc @digantdesai @freddan80 @per @zingo

---------

Signed-off-by: Oscar Andersson <oscar.andersson@arm.com>
The introduction of decomposition for linalg vector norm revealed a bug
that when dim is None, then all dimensions should be reduced.

Signed-off-by: Elena Zhelezina <elena.zhelezina@arm.com>
Differential Revision: D76746854

Pull Request resolved: #11751
Differential Revision: D76791781

Pull Request resolved: #11750
### Summary
This PR uses `xnn_define_binary` and `xnn_define_unary` to define
XNNPack ops, instead of separately calling the individual definitions.

Further changes:
1. Removes individual node definitions for unary and binary ops
2. Creates a wrapper macro to generate function defs for individual ops
using `xnn_define_binary` and `xnn_define_unary` inside.

Fixes #11584

### Test plan
```
## Build steps
cmake -DEXECUTORCH_BUILD_XNNPACK=ON ..
cmake --build cmake-out -j9

Tests ran:
./test/run_oss_cpp_tests.sh
.
.
.
100% tests passed, 0 tests failed out of 86
```
…1546)

### Summary
This PR consists of 4 Encoder-Only models.
Following stats are based on SM8750.
1. Albert (16a16w)
- Accuracy: ~22% (NOTE: nn.Module accuracy is around 24%, so the
similarity between QNN and nn.Module is around 92%)
- Speed: 11ms/inf
- Script: `python examples/qualcomm/oss_scripts/albert.py -b
build-android -s $DEVICE -m SM8750 --dataset
../wikipedia-sentences/wikisent2.txt`
2. Bert (16a8w)
-  Accuracy: ~60%
- Speed: 9ms/inf
- Script: `python examples/qualcomm/oss_scripts/bert.py -b build-android
-s $DEVICE -m SM8750 --dataset ../wikipedia-sentences/wikisent2.txt`
3. Distilbert (16a8w)
-  Accuracy: ~59%
- Speed: 8ms/inf
- Script: `python examples/qualcomm/oss_scripts/distilbert.py -b
build-android -s $DEVICE -m SM8750 --dataset
../wikipedia-sentences/wikisent2.txt`
4. Eurobert (16a16w)
-  Accuracy: ~54%
- Speed: 40ms/inf
- Script: `python examples/qualcomm/oss_scripts/eurobert.py -b
build-android -s $DEVICE -m SM8750 --dataset
../wikipedia-sentences/wikisent2.txt`



### Test plan

- E2E Scripts under `test_qnn_delegate.py`
- Example script: `python backends/qualcomm/tests/test_qnn_delegate.py
-k TestExampleOssScript.test_{BERT_MODEL} --model SM8750 -s $DEVICE
--build_folder build-android/ -r ./ -a ./test --sentence_dataset
../wikipedia-sentences/wikisent2.txt`
- Mainline CI

Author: @haowhsu-quic, @chunit-quic, @winskuo-quic
)

### Summary
- delete convert_bmm_to_matmul pass
- add torch.ops.aten.matmul.default in skip_decomp_table

### Test plan
General CI
Differential Revision: D76781331

Pull Request resolved: #11759
#11596)

### Summary
Refactor the XNNPACK tester to split out reusable base components from
XNNPACK-specific parts. I've relocated the base classes to
backends/test/harness.

I've kept the tester structure pretty much unchanged, except for
replacing stage names with an enum.

It looks like Arm tests are directly importing for XNNPACK's tester
currently. Ideally, we'll want to refactor to have their own stage
implementations, but I've left that as a follow-up to minimize changes
for the initial refactor.

### Test plan
CI
… fbsource sleef (#11261)" (#11765)

This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: #11657 by
@swolchok
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/swolchok/458/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/swolchok/458/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/swolchok/458/orig
@diff-train-skip-merge

Co-authored-by: Scott Wolchok <swolchok@meta.com>
This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: #11369 by
@ahmtox
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/ahmtox/11/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/ahmtox/11/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/ahmtox/11/orig
@diff-train-skip-merge

Co-authored-by: morelos <morelos@devvm4573.ash0.facebook.com>
Creating the dequantize_per_tensor and dequantize_per_token logic shaders and impl which are linked with the testing framework.

Differential Revision: [D76267107](https://our.internmc.facebook.com/intern/diff/D76267107/)

[ghstack-poisoned]
Creating the choose_qparams per_tensor and per_token logic shaders and impl which are linked with the testing framework

Differential Revision: [D76436933](https://our.internmc.facebook.com/intern/diff/D76436933/)

[ghstack-poisoned]
Differential Revision: D76842266

Pull Request resolved: #11764
Differential Revision: D76483572

Pull Request resolved: #11592
…hapes

Differential Revision: D76530379

Pull Request resolved: #11611
…11778)

This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: #11757 by
@cccclai
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/cccclai/28/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/cccclai/28/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/cccclai/28/orig
@diff-train-skip-merge

Co-authored-by: Chen Lai <chenlai@fb.com>
Differential Revision: D76781745

Pull Request resolved: #11746
)

- Constant placeholders with same values but different data types, such
as int32 and fp32, shouldn't be fused into a single placeholder.
Otherwise, some operators will have operands with mismatched dtypes.
- Fix the bug by adding a dtype check to fuse only constants with
matching types and same values.

Signed-off-by: Yufeng Shi <yufeng.shi@arm.com>
pytorchbot and others added 3 commits June 23, 2025 14:50
…ups==1 (#11774)

This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: #11730 by
@mcr229
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/mcr229/31/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/mcr229/31/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/mcr229/31/orig
@diff-train-skip-merge

---------

Co-authored-by: Max Ren <maxren@meta.com>
Co-authored-by: Gregory Comer <gjcomer@meta.com>
Fixes some bugs with how enum fields are used.
Update documentation to use the new `export_llm` instead of the old
`export_llama`.
@pytorch-bot

pytorch-bot Bot commented Jun 23, 2025

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11863

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b7572d0 with merge base 0c12dcd (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 23, 2025
leafs1 and others added 4 commits June 23, 2025 16:36
### Summary
This PR adds support for the tanh operator in ExecuTorch via XNNPACK,
enabling optimized execution of torch.tanh on the XNNPACK backend. The
implementation includes updates to operator configuration,
serialization, and runtime handling. The tanh operator is now properly
registered in the XNNPACK partition config and mapped to XNNPACK's
xnn_create_tanh_operator API in the compiler.

### Test plan
I added a new test class TestTanh that is a simple torch model with a
tanh op. It then asserts that the XNNPACK delegate was called while
executing the tanh op instead of the torch default tanh op.
…ups==1

Pull Request resolved: #11730

Supporting Quantized Transposed Convs with Groups being 1.

Previously, There was some added support for Quantized Transposed Convolutions but only when the channel axis is 1 and when the groups is 1. The current Quantizer didn't support this because it only allows quantizaing along the zero dim, which is generally the output channels. However for TransposedConvs, the dimension of the weights are:
```
[in_channels, out_channels/groups, h, w]
```

Since we want to keep quantization along the output channels, we now need to quantize along axis = 1.

The reason we require groups to be one is because XNNPACK takes in filters of the dimension:
```
[out_channels, H, W, in_channels/groups]
```

Since we are quantizing along the output channels, in pytorch we expect to have out_channels/groups scales, but in xnnpack we have out_channels scales! Realistically we would need to support this with some affine quantization, where we provide a scale for every group, every out_channel. However for now, we just ensure the constraint where groups == 1.
ghstack-source-id: 291033630
@exported-using-ghexport

Differential Revision: [D76631781](https://our.internmc.facebook.com/intern/diff/D76631781/)
…groups ==1

Pull Request resolved: #11731

Here we support dynamically quantized Deconvolutions.

There is some refactoring of the previous diff, but in general, we just remove the constraint in the Dynamism check that the convolution isn't transposed. For the same reasons as before, this only supports channel_axis = 1 and groups = 1.
ghstack-source-id: 291033632
@exported-using-ghexport

Differential Revision: [D76638904](https://our.internmc.facebook.com/intern/diff/D76638904/)
Pull Request resolved: #11732

Allow selection of Difference between transposed convs and regular convs. Previously, we grouped all conv targets together (transposed and regular convs), but now we enable better per-operator selection
ghstack-source-id: 291033631

Differential Revision: [D76641838](https://our.internmc.facebook.com/intern/diff/D76641838/)
@github-actions

github-actions Bot commented Sep 2, 2025

Copy link
Copy Markdown

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions Bot added the Stale PRs inactive for over 60 days label Sep 2, 2025
@github-actions

github-actions Bot commented Nov 1, 2025

Copy link
Copy Markdown

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

2 similar comments
@github-actions

github-actions Bot commented Jan 1, 2026

Copy link
Copy Markdown

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions

github-actions Bot commented Mar 2, 2026

Copy link
Copy Markdown

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions Bot closed this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Stale PRs inactive for over 60 days

Projects

None yet

Development

Successfully merging this pull request may close these issues.