[llm] Add a generic text only LLM runner#11343
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11343
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Merge Blocking SEVsThere is 1 active merge blocking SEVs. Please view them below:
If you must merge, use This comment was automatically generated by Dr. CI and updates every 15 minutes. |
4393dd1 to
dc4de71
Compare
4070bba to
85fb703
Compare
mergennachin
left a comment
There was a problem hiding this comment.
See inline @larryliu0820 and thank you!
| */ | ||
| ET_EXPERIMENTAL std::unique_ptr<tokenizers::Tokenizer> load_tokenizer( | ||
| const std::string& tokenizer_path, | ||
| std::unique_ptr<std::vector<std::string>> special_tokens = nullptr, |
There was a problem hiding this comment.
instead of using smart pointer, how about this?
const std::optional<std::vector<std::string>>& tokens
There was a problem hiding this comment.
That way, you're passing by const reference and not using any pointers
There was a problem hiding this comment.
Wouldn't that trigger a copy? I think the tokenizer will have to hold the memory of these tokens.
| std::unique_ptr<TextPrefiller> text_prefiller, | ||
| std::unique_ptr<TextTokenGenerator> text_token_generator, | ||
| std::unique_ptr<Stats> stats, | ||
| float temperature = -1.0f); |
There was a problem hiding this comment.
if it's deprecated, why include this as part of the API?
If there's a user code, they can always just use create_text_llm_runner, or can we migrate the user code to use GenerationConfig directly?
There was a problem hiding this comment.
This is for JNI where it doesn't expose a temperature argument in generate() API. I think @kirklandsign is going to refactor JNI API (and demo app) to fully deprecate temperature to the constructor.
| text_token_generator, | ||
| std::unique_ptr<::executorch::extension::llm::Stats> stats, | ||
| float temperature = -1.0f); | ||
| std::unique_ptr<llm::TextLLMRunner> create_llama_runner( |
There was a problem hiding this comment.
do you think we should keep example::create_llama_runner method at all?
should we just replace with llm::load_tokenizer and llm::create_llm_runner for all the current callsites of example::create_llama_runner
It seems unnecessary indirection, and we would like to showcase these low-level APIs in our examples, as opposed to having yet another indirection (i.e., example::create_llama_runner)
i'm fine if you want to this in a follow-up PR, but I think we should do it very soon.
There was a problem hiding this comment.
Yeah don't have a strong opinion on this. I can do a follow up.
85fb703 to
f721f3f
Compare
Pull Request resolved: #11342 Introducing `text_llm_runner`. This can be used to run all text only decoder only LLM models supported by ExecuTorch. * Metadata is being read out from the .pte file and being used to construct the runner object. * examples/models/llama/runner.h[.cpp] only contains a simple wrapper around `text_llm_runner.h[.cpp]`. In next PRs I will move examples/models/phi-3-mini/runner to use the generic runner. Will look into QNN and MediaTek runners as well. ghstack-source-id: 288571542 @exported-using-ghexport Differential Revision: [D75910889](https://our.internmc.facebook.com/intern/diff/D75910889/)
f721f3f to
3fda3a4
Compare
|
Close, duplicate of #11342 |
Introducing
text_llm_runner. This can be used to run all text only decoder only LLM models supported by ExecuTorch.text_llm_runner.h[.cpp].In next PRs I will move examples/models/phi-3-mini/runner to use the generic runner.
Will look into QNN and MediaTek runners as well.
Differential Revision: D75910889
ghstack-source-id: 288004703
Pull Request resolved: #11342
Summary
[PLEASE REMOVE] See CONTRIBUTING.md's Pull Requests for ExecuTorch PR guidelines.
[PLEASE REMOVE] If this PR closes an issue, please add a
Fixes #<issue-id>line.[PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out CONTRIBUTING.md's Pull Requests.
Test plan
[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.