[ET-VK] enabling specifying input-specific storage type and memory layout optimizations by ahmtox · Pull Request #12616 · pytorch/executorch

ahmtox · 2025-07-17T22:19:04Z

Stack from ghstack (oldest at bottom):

This diff introduces support for optimal input storage specifications in the Vulkan backend, enabling input-specific storage type and memory layout optimizations for improved performance.

Key Changes:

(1). Modified tag_memory_meta_pass.py: Updated the memory metadata tagging pass to use propose_input_storage_type() and propose_input_memory_layout() methods instead of relying solely on output preferences. This allows operators to specify different optimal storage types for individual input tensors.

(2). Extended op_registry.py: Added comprehensive input-specific optimization support:

optimal_input_storage: Allows operators to specify preferred storage types for input tensors (can be a single type or list for per-input specification)
optimal_input_layout: Allows operators to specify preferred memory layouts for input tensors
propose_input_storage_type() and propose_input_memory_layout() methods to query input-specific preferences

(3). Enhanced quantization operator configurations: Updated quantization operators to leverage input-specific storage preferences:

quantize_per_channel and related ops now specify TEXTURE_3D for input tensors and BUFFER for scale/zero_point parameters
choose_qparams operators optimized for TEXTURE_3D input storage for better performance
choose_qparams_affine configured for BUFFER storage due to current implementation limitations

Technical Implementation:

The set_or_transition_arg_node() method now follows a two-step preference resolution:

First checks for input-specific preferences using the new propose_input_* methods
Falls back to output preferences if no input-specific preferences are defined

This approach maintains backward compatibility while enabling fine-grained optimization for operators that benefit from different storage types for different inputs.

Performance Impact:

This change enables more efficient memory layouts for quantization operations where input tensors benefit from texture storage (faster compute) while scale/zero_point parameters require buffer storage (required for current shader implementations).

Differential Revision: D78513192

…yout optimizations This diff introduces support for optimal input storage specifications in the Vulkan backend, enabling input-specific storage type and memory layout optimizations for improved performance. **Key Changes:** (1). **Modified `tag_memory_meta_pass.py`**: Updated the memory metadata tagging pass to use `propose_input_storage_type()` and `propose_input_memory_layout()` methods instead of relying solely on output preferences. This allows operators to specify different optimal storage types for individual input tensors. (2). **Extended `op_registry.py`**: Added comprehensive input-specific optimization support: - `optimal_input_storage`: Allows operators to specify preferred storage types for input tensors (can be a single type or list for per-input specification) - `optimal_input_layout`: Allows operators to specify preferred memory layouts for input tensors - `propose_input_storage_type()` and `propose_input_memory_layout()` methods to query input-specific preferences (3). **Enhanced quantization operator configurations**: Updated quantization operators to leverage input-specific storage preferences: - `quantize_per_channel` and related ops now specify `TEXTURE_3D` for input tensors and `BUFFER` for scale/zero_point parameters - `choose_qparams` operators optimized for `TEXTURE_3D` input storage for better performance - `choose_qparams_affine` configured for `BUFFER` storage due to current implementation limitations **Technical Implementation:** The `set_or_transition_arg_node()` method now follows a two-step preference resolution: 1. First checks for input-specific preferences using the new `propose_input_*` methods 2. Falls back to output preferences if no input-specific preferences are defined This approach maintains backward compatibility while enabling fine-grained optimization for operators that benefit from different storage types for different inputs. **Performance Impact:** This change enables more efficient memory layouts for quantization operations where input tensors benefit from texture storage (faster compute) while scale/zero_point parameters require buffer storage (required for current shader implementations). Differential Revision: [D78513192](https://our.internmc.facebook.com/intern/diff/D78513192/) [ghstack-poisoned]

pytorch-bot · 2025-07-17T22:19:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12616

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 6ab7524 with merge base f57633b ():

NEW FAILURE - The following job has failed:

Propose to merge ghstack orig PRs to main / Try to create a PR with ghstack /orig branch (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…yout optimizations This diff introduces support for optimal input storage specifications in the Vulkan backend, enabling input-specific storage type and memory layout optimizations for improved performance. **Key Changes:** (1). **Modified `tag_memory_meta_pass.py`**: Updated the memory metadata tagging pass to use `propose_input_storage_type()` and `propose_input_memory_layout()` methods instead of relying solely on output preferences. This allows operators to specify different optimal storage types for individual input tensors. (2). **Extended `op_registry.py`**: Added comprehensive input-specific optimization support: - `optimal_input_storage`: Allows operators to specify preferred storage types for input tensors (can be a single type or list for per-input specification) - `optimal_input_layout`: Allows operators to specify preferred memory layouts for input tensors - `propose_input_storage_type()` and `propose_input_memory_layout()` methods to query input-specific preferences (3). **Enhanced quantization operator configurations**: Updated quantization operators to leverage input-specific storage preferences: - `quantize_per_channel` and related ops now specify `TEXTURE_3D` for input tensors and `BUFFER` for scale/zero_point parameters - `choose_qparams` operators optimized for `TEXTURE_3D` input storage for better performance - `choose_qparams_affine` configured for `BUFFER` storage due to current implementation limitations **Technical Implementation:** The `set_or_transition_arg_node()` method now follows a two-step preference resolution: 1. First checks for input-specific preferences using the new `propose_input_*` methods 2. Falls back to output preferences if no input-specific preferences are defined This approach maintains backward compatibility while enabling fine-grained optimization for operators that benefit from different storage types for different inputs. **Performance Impact:** This change enables more efficient memory layouts for quantization operations where input tensors benefit from texture storage (faster compute) while scale/zero_point parameters require buffer storage (required for current shader implementations). Differential Revision: [D78513192](https://our.internmc.facebook.com/intern/diff/D78513192/) ghstack-source-id: 296946728 Pull Request resolved: #12616

github-actions · 2025-07-17T22:19:37Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

facebook-github-bot · 2025-07-17T22:20:19Z

This pull request was exported from Phabricator. Differential Revision: D78513192

ahmtox requested a review from SS-JIA as a code owner July 17, 2025 22:19

This was referenced Jul 17, 2025

[ET-VK] Migrate off of xnnpack_quantizer_utils #12572

Merged

[ET-VK] Creating get_symmetric_quantization_config #12573

Merged

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 17, 2025

facebook-github-bot added the fb-exported label Jul 17, 2025

ahmtox closed this Aug 4, 2025

ahmtox had a problem deploying to cherry-pick-bot August 4, 2025 23:00 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK] enabling specifying input-specific storage type and memory layout optimizations#12616

[ET-VK] enabling specifying input-specific storage type and memory layout optimizations#12616
ahmtox wants to merge 1 commit into
gh/ahmtox/46/basefrom
gh/ahmtox/46/head

ahmtox commented Jul 17, 2025 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jul 17, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 17, 2025

Uh oh!

facebook-github-bot commented Jul 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ahmtox commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12616

❌ 1 New Failure

Uh oh!

github-actions Bot commented Jul 17, 2025

This PR needs a release notes: label

Uh oh!

facebook-github-bot commented Jul 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ahmtox commented Jul 17, 2025 •

edited

Loading

pytorch-bot Bot commented Jul 17, 2025 •

edited

Loading

This PR needs a `release notes:` label