[ET-VK] De vectorise conv2d pw shader to improve perf.#11108
Conversation
This diff optimizes the performance of the `conv2d_pw` shader by de-vectorizing its implementation. * The original vectorized implementation of the `conv2d_pw` shader has been replaced with a de-vectorized approach to improve performance. * The `sum` array has been redefined to hold `float` values instead of `vec4` to accommodate the de-vectorized computation. These changes seem to allow shader compiler to better optimize operations within the shader hence improving perf. Differential Revision: [D75307267](https://our.internmc.facebook.com/intern/diff/D75307267/) [ghstack-poisoned]
This diff optimizes the performance of the `conv2d_pw` shader by de-vectorizing its implementation. * The original vectorized implementation of the `conv2d_pw` shader has been replaced with a de-vectorized approach to improve performance. * The `sum` array has been redefined to hold `float` values instead of `vec4` to accommodate the de-vectorized computation. These changes seem to allow shader compiler to better optimize operations within the shader hence improving perf. Differential Revision: [D75307267](https://our.internmc.facebook.com/intern/diff/D75307267/) ghstack-source-id: 285942441 Pull Request resolved: #11108
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11108
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit a57f7bf with merge base 380eb5f ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D75307267 |
This diff optimizes the performance of the `conv2d_pw` shader by de-vectorizing its implementation. * The original vectorized implementation of the `conv2d_pw` shader has been replaced with a de-vectorized approach to improve performance. * The `sum` array has been redefined to hold `float` values instead of `vec4` to accommodate the de-vectorized computation. These changes seem to allow shader compiler to better optimize operations within the shader hence improving perf. Differential Revision: [D75307267](https://our.internmc.facebook.com/intern/diff/D75307267/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D75307267 |
This diff optimizes the performance of the `conv2d_pw` shader by de-vectorizing its implementation. * The original vectorized implementation of the `conv2d_pw` shader has been replaced with a de-vectorized approach to improve performance. * The `sum` array has been redefined to hold `float` values instead of `vec4` to accommodate the de-vectorized computation. These changes seem to allow shader compiler to better optimize operations within the shader hence improving perf. Differential Revision: [D75307267](https://our.internmc.facebook.com/intern/diff/D75307267/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D75307267 |
This diff optimizes the performance of the `conv2d_pw` shader by de-vectorizing its implementation. * The original vectorized implementation of the `conv2d_pw` shader has been replaced with a de-vectorized approach to improve performance. * The `sum` array has been redefined to hold `float` values instead of `vec4` to accommodate the de-vectorized computation. These changes seem to allow shader compiler to better optimize operations within the shader hence improving perf. Differential Revision: [D75307267](https://our.internmc.facebook.com/intern/diff/D75307267/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D75307267 |
This diff optimizes the performance of the `conv2d_pw` shader by de-vectorizing its implementation. * The original vectorized implementation of the `conv2d_pw` shader has been replaced with a de-vectorized approach to improve performance. * The `sum` array has been redefined to hold `float` values instead of `vec4` to accommodate the de-vectorized computation. These changes seem to allow shader compiler to better optimize operations within the shader hence improving perf. Differential Revision: [D75307267](https://our.internmc.facebook.com/intern/diff/D75307267/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D75307267 |
This diff optimizes the performance of the `conv2d_pw` shader by de-vectorizing its implementation. * The original vectorized implementation of the `conv2d_pw` shader has been replaced with a de-vectorized approach to improve performance. * The `sum` array has been redefined to hold `float` values instead of `vec4` to accommodate the de-vectorized computation. These changes seem to allow shader compiler to better optimize operations within the shader hence improving perf. Differential Revision: [D75307267](https://our.internmc.facebook.com/intern/diff/D75307267/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D75307267 |
This diff optimizes the performance of the `conv2d_pw` shader by de-vectorizing its implementation. * The original vectorized implementation of the `conv2d_pw` shader has been replaced with a de-vectorized approach to improve performance. * The `sum` array has been redefined to hold `float` values instead of `vec4` to accommodate the de-vectorized computation. These changes seem to allow shader compiler to better optimize operations within the shader hence improving perf. Differential Revision: [D75307267](https://our.internmc.facebook.com/intern/diff/D75307267/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D75307267 |
cecc15d
into
gh/trivedivivek/89/base
This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #11108 by @trivedivivek ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/trivedivivek/89/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/trivedivivek/89/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/trivedivivek/89/orig @diff-train-skip-merge Co-authored-by: Vivek Trivedi <5340687+trivedivivek@users.noreply.github.com>
Stack from ghstack (oldest at bottom):
This diff optimizes the performance of the
conv2d_pwshader by de-vectorizing its implementation.conv2d_pwshader has been replaced with a de-vectorized approach to improve performance.sumarray has been redefined to holdfloatvalues instead ofvec4to accommodate the de-vectorized computation.These changes seem to allow shader compiler to better optimize operations within the shader hence improving perf.
Differential Revision: D75307267