[Executorch][llm] Fix ring kv cache when used with quantized kv cache and sdpa#12132
Conversation
… and sdpa When using quantized kv cache and SDPA, there was two bugs: 1. It did not reset return_float_values of QuantizedRingKVCache. Which results in QuantizedKVCache returning float values post dequant. 2. For quantized kv cache, SDPA module stores kv_cache that is owned by attention module. When replacing kv cache in Attention we have to make sure that we change the reference in SDPA as well. Differential Revision: [D77516823](https://our.internmc.facebook.com/intern/diff/D77516823/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12132
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ec99dce with merge base cf0bfd2 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
… and sdpa When using quantized kv cache and SDPA, there was two bugs: 1. It did not reset return_float_values of QuantizedRingKVCache. Which results in QuantizedKVCache returning float values post dequant. 2. For quantized kv cache, SDPA module stores kv_cache that is owned by attention module. When replacing kv cache in Attention we have to make sure that we change the reference in SDPA as well. Differential Revision: [D77516823](https://our.internmc.facebook.com/intern/diff/D77516823/) ghstack-source-id: 293661340 Pull Request resolved: #12132
|
This pull request was exported from Phabricator. Differential Revision: D77516823 |
This PR needs a
|
2881e7d
into
gh/kimishpatel/196/base
… and sdpa (#12143) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #12132 by @kimishpatel ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/196/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/196/head Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/195/orig Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/196/orig @diff-train-skip-merge --------- Co-authored-by: Kimish Patel <kimishpatel@fb.com>
… and sdpa (pytorch#12143) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: pytorch#12132 by @kimishpatel ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/196/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/196/head Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/195/orig Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/196/orig @diff-train-skip-merge --------- Co-authored-by: Kimish Patel <kimishpatel@fb.com>
Stack from ghstack (oldest at bottom):
When using quantized kv cache and SDPA, there was two bugs:
Differential Revision: D77516823