[ET-VK] enabling specifying input-specific storage type and memory layout optimizations#12616
[ET-VK] enabling specifying input-specific storage type and memory layout optimizations#12616ahmtox wants to merge 1 commit into
Conversation
…yout optimizations This diff introduces support for optimal input storage specifications in the Vulkan backend, enabling input-specific storage type and memory layout optimizations for improved performance. **Key Changes:** (1). **Modified `tag_memory_meta_pass.py`**: Updated the memory metadata tagging pass to use `propose_input_storage_type()` and `propose_input_memory_layout()` methods instead of relying solely on output preferences. This allows operators to specify different optimal storage types for individual input tensors. (2). **Extended `op_registry.py`**: Added comprehensive input-specific optimization support: - `optimal_input_storage`: Allows operators to specify preferred storage types for input tensors (can be a single type or list for per-input specification) - `optimal_input_layout`: Allows operators to specify preferred memory layouts for input tensors - `propose_input_storage_type()` and `propose_input_memory_layout()` methods to query input-specific preferences (3). **Enhanced quantization operator configurations**: Updated quantization operators to leverage input-specific storage preferences: - `quantize_per_channel` and related ops now specify `TEXTURE_3D` for input tensors and `BUFFER` for scale/zero_point parameters - `choose_qparams` operators optimized for `TEXTURE_3D` input storage for better performance - `choose_qparams_affine` configured for `BUFFER` storage due to current implementation limitations **Technical Implementation:** The `set_or_transition_arg_node()` method now follows a two-step preference resolution: 1. First checks for input-specific preferences using the new `propose_input_*` methods 2. Falls back to output preferences if no input-specific preferences are defined This approach maintains backward compatibility while enabling fine-grained optimization for operators that benefit from different storage types for different inputs. **Performance Impact:** This change enables more efficient memory layouts for quantization operations where input tensors benefit from texture storage (faster compute) while scale/zero_point parameters require buffer storage (required for current shader implementations). Differential Revision: [D78513192](https://our.internmc.facebook.com/intern/diff/D78513192/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12616
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 6ab7524 with merge base f57633b ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…yout optimizations This diff introduces support for optimal input storage specifications in the Vulkan backend, enabling input-specific storage type and memory layout optimizations for improved performance. **Key Changes:** (1). **Modified `tag_memory_meta_pass.py`**: Updated the memory metadata tagging pass to use `propose_input_storage_type()` and `propose_input_memory_layout()` methods instead of relying solely on output preferences. This allows operators to specify different optimal storage types for individual input tensors. (2). **Extended `op_registry.py`**: Added comprehensive input-specific optimization support: - `optimal_input_storage`: Allows operators to specify preferred storage types for input tensors (can be a single type or list for per-input specification) - `optimal_input_layout`: Allows operators to specify preferred memory layouts for input tensors - `propose_input_storage_type()` and `propose_input_memory_layout()` methods to query input-specific preferences (3). **Enhanced quantization operator configurations**: Updated quantization operators to leverage input-specific storage preferences: - `quantize_per_channel` and related ops now specify `TEXTURE_3D` for input tensors and `BUFFER` for scale/zero_point parameters - `choose_qparams` operators optimized for `TEXTURE_3D` input storage for better performance - `choose_qparams_affine` configured for `BUFFER` storage due to current implementation limitations **Technical Implementation:** The `set_or_transition_arg_node()` method now follows a two-step preference resolution: 1. First checks for input-specific preferences using the new `propose_input_*` methods 2. Falls back to output preferences if no input-specific preferences are defined This approach maintains backward compatibility while enabling fine-grained optimization for operators that benefit from different storage types for different inputs. **Performance Impact:** This change enables more efficient memory layouts for quantization operations where input tensors benefit from texture storage (faster compute) while scale/zero_point parameters require buffer storage (required for current shader implementations). Differential Revision: [D78513192](https://our.internmc.facebook.com/intern/diff/D78513192/) ghstack-source-id: 296946728 Pull Request resolved: #12616
This PR needs a
|
|
This pull request was exported from Phabricator. Differential Revision: D78513192 |
Stack from ghstack (oldest at bottom):
This diff introduces support for optimal input storage specifications in the Vulkan backend, enabling input-specific storage type and memory layout optimizations for improved performance.
Key Changes:
(1). Modified
tag_memory_meta_pass.py: Updated the memory metadata tagging pass to usepropose_input_storage_type()andpropose_input_memory_layout()methods instead of relying solely on output preferences. This allows operators to specify different optimal storage types for individual input tensors.(2). Extended
op_registry.py: Added comprehensive input-specific optimization support:optimal_input_storage: Allows operators to specify preferred storage types for input tensors (can be a single type or list for per-input specification)optimal_input_layout: Allows operators to specify preferred memory layouts for input tensorspropose_input_storage_type()andpropose_input_memory_layout()methods to query input-specific preferences(3). Enhanced quantization operator configurations: Updated quantization operators to leverage input-specific storage preferences:
quantize_per_channeland related ops now specifyTEXTURE_3Dfor input tensors andBUFFERfor scale/zero_point parameterschoose_qparamsoperators optimized forTEXTURE_3Dinput storage for better performancechoose_qparams_affineconfigured forBUFFERstorage due to current implementation limitationsTechnical Implementation:
The
set_or_transition_arg_node()method now follows a two-step preference resolution:propose_input_*methodsThis approach maintains backward compatibility while enabling fine-grained optimization for operators that benefit from different storage types for different inputs.
Performance Impact:
This change enables more efficient memory layouts for quantization operations where input tensors benefit from texture storage (faster compute) while scale/zero_point parameters require buffer storage (required for current shader implementations).
Differential Revision: D78513192