[Executorch][llama] Update SDPA op to use quantized kv cache#5666
[Executorch][llama] Update SDPA op to use quantized kv cache#5666kimishpatel wants to merge 7 commits into
Conversation
Using quantized kv cache, we cannot rely on sdpa to update the original case. SO we insert cache update op Differential Revision: [D62301841](https://our.internmc.facebook.com/intern/diff/D62301841/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5666
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 4e1b354 with merge base ee32848 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D62301841 |
Using quantized kv cache, we cannot rely on sdpa to update the original case. SO we insert cache update op Differential Revision: [D62301841](https://our.internmc.facebook.com/intern/diff/D62301841/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D62301841 |
Using quantized kv cache, we cannot rely on sdpa to update the original case. SO we insert cache update op Differential Revision: [D62301841](https://our.internmc.facebook.com/intern/diff/D62301841/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D62301841 |
Using quantized kv cache, we cannot rely on sdpa to update the original case. SO we insert cache update op Differential Revision: [D62301841](https://our.internmc.facebook.com/intern/diff/D62301841/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D62301841 |
Using quantized kv cache, we cannot rely on sdpa to update the original case. SO we insert cache update op Differential Revision: [D62301841](https://our.internmc.facebook.com/intern/diff/D62301841/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D62301841 |
Using quantized kv cache, we cannot rely on sdpa to update the original case. SO we insert cache update op Differential Revision: [D62301841](https://our.internmc.facebook.com/intern/diff/D62301841/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D62301841 |
Using quantized kv cache, we cannot rely on sdpa to update the original case. SO we insert cache update op Differential Revision: [D62301841](https://our.internmc.facebook.com/intern/diff/D62301841/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D62301841 |
|
This pull request has been merged in bca3ad6. |
Pull Request resolved: pytorch/executorch#5666 Using quantized kv cache, we cannot rely on sdpa to update the original case. SO we insert cache update op ghstack-source-id: 245751546 @exported-using-ghexport Differential Revision: [D62301841](https://our.internmc.facebook.com/intern/diff/D62301841/)
Stack from ghstack (oldest at bottom):
Using quantized kv cache, we cannot rely on sdpa to update the original case.
SO we insert cache update op
Differential Revision: D62301841