Script to track performance regression#36
Draft
Mikolaj-A-Kowalski wants to merge 4 commits intomainfrom
Draft
Conversation
Adapted from pyrealm. Allows to compare two profiled runs of the code and highlights which functions performance dropped below a threshold. Note that at the memonet DEMENTpy exhibits high variance in runtime of individual functions (observed up to 15%) so the check with default settings (5% tolerance) will likley fail for the same code versions.
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Builds on top of #34
Related to #22
Adapts the performance regression inspection script of
pyrealm. Some modifications were necessary to make it run withDEMENTpy, but following them the script runs and is capable to generate the report.No documentation is provided yet. To run it one just needs to follow instructions from the
pyrealmdoc.At the moment its usefulness seems to be limited I'm afraid. In the tests on my machine the profile entries seem to have a really high variance in the range of 15% so two profiling runes of the identical code do produce the disagreement with the default threshold value of 5%. Below is the example result of the comparison of the two consecutive runs without any code modification:
At the moment I am not sure where does the variance comes from. Need to double check if the runs are seeded correctly and are reproducible. (EDIT: They are)