Enable compiler optimizations to improve inference speed#4
Open
imatrisciano wants to merge 3 commits into
Open
Enable compiler optimizations to improve inference speed#4imatrisciano wants to merge 3 commits into
imatrisciano wants to merge 3 commits into
Conversation
|
Just because I was curious about the compiler flag changes, I asked GPT-4o for more details, which you can find at https://chatgpt.com/share/67343ef3-8ae4-8003-9c41-82ffa7cf7f5a. Thanks for working on LM Playground! ❤️ |
andriydruk
added a commit
that referenced
this pull request
Mar 23, 2026
When resetMessages() clears the message list while a generation callback is still delivering tokens, updateLastMessage() and markThinkingStarted() call _messages.last() on an empty list, throwing NoSuchElementException. This was the #4 crash on Google Play for v1.4.0 with 2 reports across 2 users. Added early return guards when _messages is empty in both updateLastMessage() and markThinkingStarted().
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces a couple of simple compiler flags that can greatly improve inference speed
As described in section 5.1.2 of the paper Jie Xiao, Qianyi Huang, Xu Chen, Chen Tian. Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation, arXiv:2410.03613, the flag
i8mmhas been added to the architecture description for arm64-v8a processors. This flag supposedly enables the generation of machine instructions optimised for int8 math.The flag
-Ofasthas been specified in CMakeLists.txt to enable compiler optimisations for any architecture. This change requires the flag-fno-finite-math-onlyto be specified so that we disable all the optimisations based on the assumption that floating point math cannot result in infinite.With those changes, I was able to observe great performance improvements on my device (Motorola Edge 20) when using Llama3.2-1B-Q4K_M: