[ET-VK] De vectorise conv2d pw shader to improve perf.#11108

Merged

facebook-github-bot merged 7 commits into

gh/trivedivivek/89/basefrom

gh/trivedivivek/89/head

May 28, 2025

trviv commented May 23, 2025 •

edited

Loading

Contributor

Stack from ghstack (oldest at bottom):

This diff optimizes the performance of the conv2d_pw shader by de-vectorizing its implementation.

The original vectorized implementation of the conv2d_pw shader has been replaced with a de-vectorized approach to improve performance.
The sum array has been redefined to hold float values instead of vec4 to accommodate the de-vectorized computation.

These changes seem to allow shader compiler to better optimize operations within the shader hence improving perf.

Differential Revision: D75307267


          [ET-VK] De vectorise conv2d pw shader to improve perf.

f2fdfb0

This diff optimizes the performance of the `conv2d_pw` shader by de-vectorizing its implementation.

*   The original vectorized implementation of the `conv2d_pw` shader has been replaced with a de-vectorized approach to improve performance.
*   The `sum` array has been redefined to hold `float` values instead of `vec4` to accommodate the de-vectorized computation.

These changes seem to allow shader compiler to better optimize operations within the shader hence improving perf.

Differential Revision: [D75307267](https://our.internmc.facebook.com/intern/diff/D75307267/)

[ghstack-poisoned]

trviv requested a review from SS-JIA as a code owner

May 23, 2025 20:07

trviv added a commit that referenced this pull request


          [ET-VK] De vectorise conv2d pw shader to improve perf.

6561d33

This diff optimizes the performance of the `conv2d_pw` shader by de-vectorizing its implementation.

*   The original vectorized implementation of the `conv2d_pw` shader has been replaced with a de-vectorized approach to improve performance.
*   The `sum` array has been redefined to hold `float` values instead of `vec4` to accommodate the de-vectorized computation.

These changes seem to allow shader compiler to better optimize operations within the shader hence improving perf.

Differential Revision: [D75307267](https://our.internmc.facebook.com/intern/diff/D75307267/)

ghstack-source-id: 285942441
Pull Request resolved: #11108

pytorch-bot Bot commented May 23, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11108

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit a57f7bf with merge base 380eb5f ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / linux / linux-job (gh) (trunk failure)
examples/models/llama/tests/test_export_llama_lib.py::ExportLlamaLibTest::test_has_expected_ops_and_op_counts
pull / unittest-editable / linux / linux-job (gh) (trunk failure)
examples/models/llama/tests/test_export_llama_lib.py::ExportLlamaLibTest::test_has_expected_ops_and_op_counts

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot added the CLA Signed label

facebook-github-bot commented May 23, 2025

Contributor

This pull request was exported from Phabricator. Differential Revision: D75307267

facebook-github-bot added the fb-exported label

This was referenced May 23, 2025

[ET-VK] Remove the use of shared memory in conv2d pw to improve perf. #11110

Merged

[ET-VK] Tuning conv 2d pw op tile size to improve perf. #11112

Merged

[ET-VK] Minor tuning for conv2d pw op to improve performance. #11113

Merged

trviv added the topic: not user facing label

trviv mentioned this pull request

[ET-VK] De vectorise positions in conv2d pw shader to improve perf. #11122

Merged


          Update on "[ET-VK] De vectorise conv2d pw shader to improve perf."

396f82d

This diff optimizes the performance of the `conv2d_pw` shader by de-vectorizing its implementation.

*   The original vectorized implementation of the `conv2d_pw` shader has been replaced with a de-vectorized approach to improve performance.
*   The `sum` array has been redefined to hold `float` values instead of `vec4` to accommodate the de-vectorized computation.

These changes seem to allow shader compiler to better optimize operations within the shader hence improving perf.

Differential Revision: [D75307267](https://our.internmc.facebook.com/intern/diff/D75307267/)

[ghstack-poisoned]

This was referenced May 27, 2025

[ET-VK] Minor unroll tuning to improve conv2d pw perf. #11134

Merged

[ET-VK] Tuning local workgroup size calculation for conv2d pw to improve performance. #11135

Merged

[ET-VK] De vectorise all vectors in conv2d pw shader to improve perf. #11136

Merged

[ET-VK] Creating specialized version of conv2d pw shader for X and Y stride = 1 and padding = 0. #11137

Merged

[ET-VK] Storing positions in uint16 to instead of int in conv2d pw shader. #11138

Merged

[ET-VK] Reducing precision of some in members in conv2d pw to improved performance. #11139

Merged

facebook-github-bot commented May 27, 2025

Contributor

This pull request was exported from Phabricator. Differential Revision: D75307267


          Update on "[ET-VK] De vectorise conv2d pw shader to improve perf."

a12f1f4

This diff optimizes the performance of the `conv2d_pw` shader by de-vectorizing its implementation.

*   The original vectorized implementation of the `conv2d_pw` shader has been replaced with a de-vectorized approach to improve performance.
*   The `sum` array has been redefined to hold `float` values instead of `vec4` to accommodate the de-vectorized computation.

These changes seem to allow shader compiler to better optimize operations within the shader hence improving perf.

Differential Revision: [D75307267](https://our.internmc.facebook.com/intern/diff/D75307267/)

[ghstack-poisoned]

trviv mentioned this pull request

[ET-VK] Applying bias after sum calculation in conv2d pw shader to improve performance. #11150

Merged

facebook-github-bot commented May 27, 2025

Contributor

This pull request was exported from Phabricator. Differential Revision: D75307267

SS-JIA approved these changes

View reviewed changes


          Update on "[ET-VK] De vectorise conv2d pw shader to improve perf."

bef4195

This diff optimizes the performance of the `conv2d_pw` shader by de-vectorizing its implementation.

*   The original vectorized implementation of the `conv2d_pw` shader has been replaced with a de-vectorized approach to improve performance.
*   The `sum` array has been redefined to hold `float` values instead of `vec4` to accommodate the de-vectorized computation.

These changes seem to allow shader compiler to better optimize operations within the shader hence improving perf.

Differential Revision: [D75307267](https://our.internmc.facebook.com/intern/diff/D75307267/)

[ghstack-poisoned]

facebook-github-bot commented May 27, 2025

Contributor

This pull request was exported from Phabricator. Differential Revision: D75307267

trviv added the release notes: none label


          Update on "[ET-VK] De vectorise conv2d pw shader to improve perf."

0bff100

This diff optimizes the performance of the `conv2d_pw` shader by de-vectorizing its implementation.

*   The original vectorized implementation of the `conv2d_pw` shader has been replaced with a de-vectorized approach to improve performance.
*   The `sum` array has been redefined to hold `float` values instead of `vec4` to accommodate the de-vectorized computation.

These changes seem to allow shader compiler to better optimize operations within the shader hence improving perf.

Differential Revision: [D75307267](https://our.internmc.facebook.com/intern/diff/D75307267/)

[ghstack-poisoned]

facebook-github-bot commented May 28, 2025

Contributor

This pull request was exported from Phabricator. Differential Revision: D75307267

trviv mentioned this pull request

[ET-VK] Modifying should_squeeze function in SqueezeUnsqueezeInputs to not squeeze if significant axis are all 1 and trailing axis are all > 1. #11177

Merged

trviv mentioned this pull request

[ET-VK] Removed shared memory usage and simplied conv2d dw op shader to improve performance. #11178

Merged


          Update on "[ET-VK] De vectorise conv2d pw shader to improve perf."

0ce79f3

This diff optimizes the performance of the `conv2d_pw` shader by de-vectorizing its implementation.

*   The original vectorized implementation of the `conv2d_pw` shader has been replaced with a de-vectorized approach to improve performance.
*   The `sum` array has been redefined to hold `float` values instead of `vec4` to accommodate the de-vectorized computation.

These changes seem to allow shader compiler to better optimize operations within the shader hence improving perf.

Differential Revision: [D75307267](https://our.internmc.facebook.com/intern/diff/D75307267/)

[ghstack-poisoned]

facebook-github-bot commented May 28, 2025

Contributor

This pull request was exported from Phabricator. Differential Revision: D75307267


          Update on "[ET-VK] De vectorise conv2d pw shader to improve perf."

a57f7bf

This diff optimizes the performance of the `conv2d_pw` shader by de-vectorizing its implementation.

*   The original vectorized implementation of the `conv2d_pw` shader has been replaced with a de-vectorized approach to improve performance.
*   The `sum` array has been redefined to hold `float` values instead of `vec4` to accommodate the de-vectorized computation.

These changes seem to allow shader compiler to better optimize operations within the shader hence improving perf.

Differential Revision: [D75307267](https://our.internmc.facebook.com/intern/diff/D75307267/)

[ghstack-poisoned]

facebook-github-bot commented May 28, 2025

Contributor

This pull request was exported from Phabricator. Differential Revision: D75307267

facebook-github-bot merged commit cecc15d into gh/trivedivivek/89/base

96 of 98 checks passed

facebook-github-bot deleted the gh/trivedivivek/89/head branch

May 28, 2025 15:52

facebook-github-bot temporarily deployed to cherry-pick-bot

May 28, 2025 15:52

— with

GitHub Actions Inactive

pytorchbot mentioned this pull request

[ET-VK] De vectorise conv2d pw shader to improve perf. #11182

Merged

trviv added a commit that referenced this pull request


          [ET-VK] De vectorise conv2d pw shader to improve perf. (#11182)

0c00b49

This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: #11108 by
@trivedivivek
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/trivedivivek/89/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/trivedivivek/89/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/trivedivivek/89/orig
@diff-train-skip-merge

Co-authored-by: Vivek Trivedi <5340687+trivedivivek@users.noreply.github.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed fb-exported release notes: none topic: not user facing