[llm] Add a generic text only LLM runner by larryliu0820 · Pull Request #11343 · pytorch/executorch

larryliu0820 · 2025-06-03T23:08:23Z

Introducing text_llm_runner. This can be used to run all text only decoder only LLM models supported by ExecuTorch.

Metadata is being read out from the .pte file and being used to construct the runner object.
examples/models/llama/runner.h[.cpp] only contains a simple wrapper around text_llm_runner.h[.cpp].

In next PRs I will move examples/models/phi-3-mini/runner to use the generic runner.

Will look into QNN and MediaTek runners as well.

Differential Revision: D75910889

ghstack-source-id: 288004703
Pull Request resolved: #11342

Summary

[PLEASE REMOVE] See CONTRIBUTING.md's Pull Requests for ExecuTorch PR guidelines.

[PLEASE REMOVE] If this PR closes an issue, please add a Fixes #<issue-id> line.

[PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out CONTRIBUTING.md's Pull Requests.

Test plan

[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.

pytorch-bot · 2025-06-03T23:08:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11343

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Merge Blocking SEVs

There is 1 active merge blocking SEVs. Please view them below:

(merge blocking) Long CI Queue Times: VolumeLimitExceeded

If you must merge, use @pytorchbot merge -f.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mergennachin

See inline

mergennachin

See inline @larryliu0820 and thank you!

mergennachin · 2025-06-05T15:40:16Z

+ */
+ET_EXPERIMENTAL std::unique_ptr<tokenizers::Tokenizer> load_tokenizer(
+    const std::string& tokenizer_path,
+    std::unique_ptr<std::vector<std::string>> special_tokens = nullptr,


instead of using smart pointer, how about this?

const std::optional<std::vector<std::string>>& tokens

That way, you're passing by const reference and not using any pointers

Wouldn't that trigger a copy? I think the tokenizer will have to hold the memory of these tokens.

mergennachin · 2025-06-05T15:42:42Z

+      std::unique_ptr<TextPrefiller> text_prefiller,
+      std::unique_ptr<TextTokenGenerator> text_token_generator,
+      std::unique_ptr<Stats> stats,
+      float temperature = -1.0f);


if it's deprecated, why include this as part of the API?

If there's a user code, they can always just use create_text_llm_runner, or can we migrate the user code to use GenerationConfig directly?

This is for JNI where it doesn't expose a temperature argument in generate() API. I think @kirklandsign is going to refactor JNI API (and demo app) to fully deprecate temperature to the constructor.

mergennachin · 2025-06-05T15:55:55Z

-          text_token_generator,
-      std::unique_ptr<::executorch::extension::llm::Stats> stats,
-      float temperature = -1.0f);
+std::unique_ptr<llm::TextLLMRunner> create_llama_runner(


do you think we should keep example::create_llama_runner method at all?

should we just replace with llm::load_tokenizer and llm::create_llm_runner for all the current callsites of example::create_llama_runner

It seems unnecessary indirection, and we would like to showcase these low-level APIs in our examples, as opposed to having yet another indirection (i.e., example::create_llama_runner)

i'm fine if you want to this in a follow-up PR, but I think we should do it very soon.

Yeah don't have a strong opinion on this. I can do a follow up.

Pull Request resolved: #11342 Introducing `text_llm_runner`. This can be used to run all text only decoder only LLM models supported by ExecuTorch. * Metadata is being read out from the .pte file and being used to construct the runner object. * examples/models/llama/runner.h[.cpp] only contains a simple wrapper around `text_llm_runner.h[.cpp]`. In next PRs I will move examples/models/phi-3-mini/runner to use the generic runner. Will look into QNN and MediaTek runners as well. ghstack-source-id: 288571542 @exported-using-ghexport Differential Revision: [D75910889](https://our.internmc.facebook.com/intern/diff/D75910889/)

larryliu0820 · 2025-06-05T23:59:47Z

Close, duplicate of #11342

larryliu0820 requested review from iseeyuan, jackzhxng, kirklandsign, lucylq, shoumikhin and swolchok as code owners June 3, 2025 23:08

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 3, 2025

larryliu0820 added release notes: llm Changes to llm utilities and removed CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. labels Jun 3, 2025

larryliu0820 force-pushed the gh/larryliu0820/65/orig branch from 4393dd1 to dc4de71 Compare June 3, 2025 23:21

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 3, 2025

swolchok mentioned this pull request Jun 3, 2025

[llm] Add a generic text only LLM runner #11342

Merged

mergennachin self-requested a review June 4, 2025 00:14

mergennachin reviewed Jun 4, 2025

View reviewed changes

larryliu0820 force-pushed the gh/larryliu0820/65/orig branch 2 times, most recently from 4070bba to 85fb703 Compare June 5, 2025 06:00

larryliu0820 requested a review from jathu as a code owner June 5, 2025 06:00

mergennachin approved these changes Jun 5, 2025

View reviewed changes

larryliu0820 force-pushed the gh/larryliu0820/65/orig branch from 85fb703 to f721f3f Compare June 5, 2025 23:44

larryliu0820 force-pushed the gh/larryliu0820/65/orig branch from f721f3f to 3fda3a4 Compare June 5, 2025 23:56

larryliu0820 closed this Jun 5, 2025

Conversation

larryliu0820 commented Jun 3, 2025

Summary

Test plan

Uh oh!

pytorch-bot Bot commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11343

❗ 1 Merge Blocking SEVs

Uh oh!

mergennachin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergennachin left a comment

Choose a reason for hiding this comment

Uh oh!

mergennachin Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

mergennachin Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

larryliu0820 Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

mergennachin Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

larryliu0820 Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

mergennachin Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

larryliu0820 Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

larryliu0820 commented Jun 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot Bot commented Jun 3, 2025 •

edited

Loading

mergennachin Jun 5, 2025 •

edited

Loading