[Executorch][llm] Enable leveraging ring kv cache via module swap#10611
Conversation
This allows us to make some of the attention modules to use sliding window kv cache. Will help enable models like gemma3. Differential Revision: [D73891426](https://our.internmc.facebook.com/intern/diff/D73891426/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10611
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 664a638 with merge base bf50527 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D73891426 |
…le swap" This allows us to make some of the attention modules to use sliding window kv cache. Will help enable models like gemma3. Differential Revision: [D73891426](https://our.internmc.facebook.com/intern/diff/D73891426/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D73891426 |
…le swap" This allows us to make some of the attention modules to use sliding window kv cache. Will help enable models like gemma3. Differential Revision: [D73891426](https://our.internmc.facebook.com/intern/diff/D73891426/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D73891426 |
…le swap" This allows us to make some of the attention modules to use sliding window kv cache. Will help enable models like gemma3. Differential Revision: [D73891426](https://our.internmc.facebook.com/intern/diff/D73891426/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D73891426 |
…le swap" This allows us to make some of the attention modules to use sliding window kv cache. Will help enable models like gemma3. Differential Revision: [D73891426](https://our.internmc.facebook.com/intern/diff/D73891426/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D73891426 |
…le swap" This allows us to make some of the attention modules to use sliding window kv cache. Will help enable models like gemma3. Differential Revision: [D73891426](https://our.internmc.facebook.com/intern/diff/D73891426/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D73891426 |
…le swap" This allows us to make some of the attention modules to use sliding window kv cache. Will help enable models like gemma3. Differential Revision: [D73891426](https://our.internmc.facebook.com/intern/diff/D73891426/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D73891426 |
…le swap" This allows us to make some of the attention modules to use sliding window kv cache. Will help enable models like gemma3. Differential Revision: [D73891426](https://our.internmc.facebook.com/intern/diff/D73891426/) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D73891426 |
bc03bd1
into
gh/kimishpatel/188/base
Pull Request resolved: #10611 This allows us to make some of the attention modules to use sliding window kv cache. Will help enable models like gemma3. ghstack-source-id: 283404677 Differential Revision: [D73891426](https://our.internmc.facebook.com/intern/diff/D73891426/)
…0835) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #10611 by @kimishpatel ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/188/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/188/head Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/kimishpatel/187/orig Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/kimishpatel/188/orig @diff-train-skip-merge Co-authored-by: Kimish Patel <kimishpatel@fb.com>
Stack from ghstack (oldest at bottom):
This allows us to make some of the attention modules to use sliding window kv cache. Will help enable models like gemma3.
Differential Revision: D73891426