[llm] Support different shape of input_pos#11869

Merged

facebook-github-bot merged 9 commits into

gh/larryliu0820/67/basefrom

gh/larryliu0820/67/head

Jun 25, 2025

larryliu0820 commented Jun 24, 2025 •

edited

Loading

Contributor

Stack from ghstack (oldest at bottom):

For huggingface models, forward() is taking tokens as well as cache_positions, which is a list of cache indices. This is different than the .pte files export_llama gives, which are taking tokens and input_pos where input_pos is a scalar tensor.

This PR adds support inside text_decoder_runner.cpp to handle both shapes of input_pos/cache_positions.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in input_pos or cache_position.

Differential Revision: D77203700


          [llm] Support different shape of input_pos

58e2792

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

[ghstack-poisoned]

larryliu0820 requested review from JacobSzwejbka, jackzhxng, kirklandsign, lucylq, mergennachin and swolchok as code owners

June 24, 2025 05:07

larryliu0820 mentioned this pull request

[llm] Add arange() tensor maker API #11861

Merged

pytorch-bot Bot commented Jun 24, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11869

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9481d79 with merge base 222d9e3 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos

84ef77a

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

ghstack-source-id: 292265667
Pull Request resolved: #11869

facebook-github-bot added the CLA Signed label

facebook-github-bot commented Jun 24, 2025

Contributor

This pull request was exported from Phabricator. Differential Revision: D77203700

facebook-github-bot added the fb-exported label

larryliu0820 added the release notes: llm label


          Update on "[llm] Support different shape of input_pos"

1ac0a14

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

[ghstack-poisoned]

larryliu0820 requested a review from manuelcandales as a code owner

June 24, 2025 22:35

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos

db8a437

Pull Request resolved: #11869

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.
ghstack-source-id: 292466463
@exported-using-ghexport

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

facebook-github-bot commented Jun 24, 2025

Contributor

This pull request was exported from Phabricator. Differential Revision: D77203700


          Update on "[llm] Support different shape of input_pos"

e1744fe

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

[ghstack-poisoned]

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos

2189e3d

Pull Request resolved: #11869

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.
ghstack-source-id: 292469061
@exported-using-ghexport

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

facebook-github-bot commented Jun 24, 2025

Contributor

This pull request was exported from Phabricator. Differential Revision: D77203700

kimishpatel approved these changes

View reviewed changes

guangy10 approved these changes

View reviewed changes


          Update on "[llm] Support different shape of input_pos"

2272fc5

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

[ghstack-poisoned]

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos

bb6bddb

Pull Request resolved: #11869

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.
ghstack-source-id: 292517675
@exported-using-ghexport

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

facebook-github-bot commented Jun 25, 2025

Contributor

This pull request was exported from Phabricator. Differential Revision: D77203700


          Update on "[llm] Support different shape of input_pos"

9a698d7

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

[ghstack-poisoned]

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos

cfba98c

Pull Request resolved: #11869

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.
ghstack-source-id: 292523573
@exported-using-ghexport

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

facebook-github-bot commented Jun 25, 2025

Contributor

This pull request was exported from Phabricator. Differential Revision: D77203700


          Update on "[llm] Support different shape of input_pos"

6f07be3

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

[ghstack-poisoned]

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos

0a46fd5

Pull Request resolved: #11869

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.
ghstack-source-id: 292529628
@exported-using-ghexport

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

facebook-github-bot commented Jun 25, 2025

Contributor

This pull request was exported from Phabricator. Differential Revision: D77203700


          Update on "[llm] Support different shape of input_pos"

3b95ef7

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

[ghstack-poisoned]

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos

b792fc9

Pull Request resolved: #11869

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.
ghstack-source-id: 292529864
@exported-using-ghexport

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

facebook-github-bot commented Jun 25, 2025

Contributor

This pull request was exported from Phabricator. Differential Revision: D77203700


          Update on "[llm] Support different shape of input_pos"

64aab38

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

[ghstack-poisoned]

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos

174bf10

Pull Request resolved: #11869

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.
ghstack-source-id: 292546578
@exported-using-ghexport

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

facebook-github-bot commented Jun 25, 2025

Contributor

This pull request was exported from Phabricator. Differential Revision: D77203700


          Update on "[llm] Support different shape of input_pos"

9481d79

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

[ghstack-poisoned]

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos

6ae4290

Pull Request resolved: #11869

For huggingface models, `forward()` is taking `tokens` as well as `cache_positions`, which is a list of cache indices. This is different than the .pte files `export_llama` gives, which are taking `tokens` and `input_pos` where `input_pos` is a scalar tensor.

This PR adds support inside `text_decoder_runner.cpp` to handle both shapes of `input_pos`/`cache_positions`.

To make the logic more generic without relying on extra metadata, here I'm adding the logic of inspecting method meta and input tensor info, to make a decision if we want to feed in `input_pos` or `cache_position`.
ghstack-source-id: 292560636
@exported-using-ghexport

Differential Revision: [D77203700](https://our.internmc.facebook.com/intern/diff/D77203700/)

facebook-github-bot commented Jun 25, 2025

Contributor

This pull request was exported from Phabricator. Differential Revision: D77203700

facebook-github-bot merged commit 84c41a8 into gh/larryliu0820/67/base

102 of 104 checks passed

facebook-github-bot deleted the gh/larryliu0820/67/head branch

June 25, 2025 12:11

facebook-github-bot temporarily deployed to cherry-pick-bot

June 25, 2025 12:11

— with

GitHub Actions Inactive

pytorchbot mentioned this pull request

[llm] Support different shape of input_pos #11966

Merged

larryliu0820 added a commit that referenced this pull request


          [llm] Support different shape of input_pos (#11966)

c5ecea6

This PR was created by the merge bot to help merge the original PR into
the main branch.
ghstack PR number: #11869 by
@larryliu0820
^ Please use this as the source of truth for the PR details, comments,
and reviews
ghstack PR base:
https://github.com/pytorch/executorch/tree/gh/larryliu0820/67/base
ghstack PR head:
https://github.com/pytorch/executorch/tree/gh/larryliu0820/67/head
Merge bot PR base:
https://github.com/pytorch/executorch/tree/gh/larryliu0820/66/orig
Merge bot PR head:
https://github.com/pytorch/executorch/tree/gh/larryliu0820/67/orig
@diff-train-skip-merge

---------

Co-authored-by: Mengwei Liu <larryliu@meta.com>
Co-authored-by: Mengwei Liu <larryliu0820@users.noreply.github.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

kimishpatel kimishpatel approved these changes

guangy10 guangy10 approved these changes

kirklandsign Awaiting requested review from kirklandsign kirklandsign is a code owner

JacobSzwejbka Awaiting requested review from JacobSzwejbka

lucylq Awaiting requested review from lucylq

swolchok Awaiting requested review from swolchok

jackzhxng Awaiting requested review from jackzhxng

mergennachin Awaiting requested review from mergennachin mergennachin is a code owner

manuelcandales Awaiting requested review from manuelcandales manuelcandales is a code owner

Labels

CLA Signed fb-exported release notes: llm