[Executorch][quant] Optimize per channel dequantize#5670

Merged

facebook-github-bot merged 11 commits into

gh/kimishpatel/113/basefrom

gh/kimishpatel/113/head

Dec 2, 2024

kimishpatel commented Sep 25, 2024 •

edited

Loading

Contributor

Stack from ghstack (oldest at bottom):

When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.

Differential Revision: D63338858


          [Executorch][quant] Optimize per channel dequantize

bdc1a33

When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.

Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/)

[ghstack-poisoned]

pytorch-bot Bot commented Sep 25, 2024 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5670

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0775033 with merge base c726a9b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot added the CLA Signed label

kimishpatel mentioned this pull request

[ExecuTorch] Some updated to kv cache #5663

Closed

facebook-github-bot commented Sep 25, 2024

Contributor

This pull request was exported from Phabricator. Differential Revision: D63338858

kimishpatel mentioned this pull request

Fix dequantize per channel to handle double scale type #5524

Closed

facebook-github-bot added the fb-exported label

This was referenced Sep 25, 2024

[ExecuTorch] Add quantized kv cache to llama #5664

Closed

Refactor custom SDPA op to separate kv cache update from the custom sdpa op #5665

Closed

Add update_quantized_cache op #5527

Closed

[Executorch][llama] Update SDPA op to use quantized kv cache #5666

Closed

[Executorch][llama] Refactoring sdpa #5667

Closed

[Executorch] Update EXECUTORCH_LIBRARY macro #5668

Closed

[Executorch][llama] Add custom_sdpa and use that instead of sdpa_with_kv_cache #5669

Closed

kimishpatel added a commit that referenced this pull request


          [Executorch][quant] Optimize per channel dequantize

When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.

Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/)

ghstack-source-id: 244715672
Pull Request resolved: #5670


          Update on "[Executorch][quant] Optimize per channel dequantize"

c8b3e00

When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.

Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/)

[ghstack-poisoned]

facebook-github-bot commented Sep 26, 2024

Contributor

This pull request was exported from Phabricator. Differential Revision: D63338858

kimishpatel mentioned this pull request

Dont quantize the current token for attention #5715

Merged


          Update on "[Executorch][quant] Optimize per channel dequantize"

45530bb

When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.

Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/)

[ghstack-poisoned]

facebook-github-bot commented Sep 28, 2024

Contributor

This pull request was exported from Phabricator. Differential Revision: D63338858

kimishpatel added a commit that referenced this pull request


          [Executorch][quant] Optimize per channel dequantize

9c3d846

Pull Request resolved: #5670

When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.
ghstack-source-id: 245231655
@exported-using-ghexport

Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/)


          Update on "[Executorch][quant] Optimize per channel dequantize"

5d799e8

When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.

Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/)

[ghstack-poisoned]

facebook-github-bot commented Sep 30, 2024

Contributor

This pull request was exported from Phabricator. Differential Revision: D63338858

digantdesai approved these changes

View reviewed changes


          Update on "[Executorch][quant] Optimize per channel dequantize"

f63c0f3

When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.

Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/)

[ghstack-poisoned]

facebook-github-bot commented Oct 1, 2024

Contributor

This pull request was exported from Phabricator. Differential Revision: D63338858


          Update on "[Executorch][quant] Optimize per channel dequantize"

e505c03

When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.

Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/)

[ghstack-poisoned]

facebook-github-bot commented Oct 1, 2024

Contributor

This pull request was exported from Phabricator. Differential Revision: D63338858


          Update on "[Executorch][quant] Optimize per channel dequantize"

917597e

When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.

Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/)

[ghstack-poisoned]

facebook-github-bot commented Oct 1, 2024

Contributor

This pull request was exported from Phabricator. Differential Revision: D63338858


          Update on "[Executorch][quant] Optimize per channel dequantize"

c1658eb

When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.

Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/)

[ghstack-poisoned]

facebook-github-bot commented Oct 3, 2024

Contributor

This pull request was exported from Phabricator. Differential Revision: D63338858


          Update on "[Executorch][quant] Optimize per channel dequantize"

cc55a2a

When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.

Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/)

[ghstack-poisoned]

facebook-github-bot commented Nov 16, 2024

Contributor

This pull request was exported from Phabricator. Differential Revision: D63338858

This was referenced Nov 16, 2024

[Executorch][llama] Rename update_quantized_cache to update_cache #6914

Merged

[Executorch][BE] Rename sdpa_with_kv_cache.py to custom_ops.py #6996

Merged

[Executorch] Add quantized kv cache to oss ci #6997

Merged


          Update on "[Executorch][quant] Optimize per channel dequantize"

e3cefc5

When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.

Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/)

[ghstack-poisoned]

facebook-github-bot commented Nov 21, 2024

Contributor

This pull request was exported from Phabricator. Differential Revision: D63338858

kimishpatel added the topic: not user facing label

This was referenced Nov 22, 2024

[Executorch][custom ops] Change lib loading logic to account for package dir #7038

Merged

[Executorch][CI] Fix qnn runner ci job scripts #7049

Closed


          Update on "[Executorch][quant] Optimize per channel dequantize"

When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.

Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/)

[ghstack-poisoned]

facebook-github-bot commented Nov 28, 2024

Contributor

This pull request was exported from Phabricator. Differential Revision: D63338858

facebook-github-bot merged commit 3046412 into gh/kimishpatel/113/base

facebook-github-bot deleted the gh/kimishpatel/113/head branch

December 2, 2024 16:19

facebook-github-bot temporarily deployed to cherry-pick-bot

December 2, 2024 16:19

— with

GitHub Actions Inactive

pytorchbot mentioned this pull request

[Executorch][quant] Optimize per channel dequantize #7139

Merged

kirklandsign pushed a commit that referenced this pull request


          [Executorch][quant] Optimize per channel dequantize

ddec0c7

Pull Request resolved: #5670

When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.
ghstack-source-id: 255730818
@exported-using-ghexport

Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/)

Co-authored-by: Kimish Patel <kimishpatel@fb.com>

kedarnath03 pushed a commit to kedarnath03/executorch that referenced this pull request


          [Executorch][quant] Optimize per channel dequantize

887fadf

Pull Request resolved: pytorch/executorch#5670

When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.
ghstack-source-id: 254871262
@exported-using-ghexport

Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed fb-exported topic: not user facing