[Executorch][quant] Optimize per channel dequantize#5670
Conversation
When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5670
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 0775033 with merge base c726a9b ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D63338858 |
When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/) ghstack-source-id: 244715672 Pull Request resolved: #5670
When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D63338858 |
When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D63338858 |
Pull Request resolved: #5670 When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. ghstack-source-id: 245231655 @exported-using-ghexport Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/)
When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D63338858 |
When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D63338858 |
When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D63338858 |
When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D63338858 |
When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D63338858 |
When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D63338858 |
When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D63338858 |
When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D63338858 |
Pull Request resolved: #5670 When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. ghstack-source-id: 255730818 @exported-using-ghexport Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/) Co-authored-by: Kimish Patel <kimishpatel@fb.com>
Pull Request resolved: pytorch/executorch#5670 When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. ghstack-source-id: 254871262 @exported-using-ghexport Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/)
Stack from ghstack (oldest at bottom):
When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.
Differential Revision: D63338858