tests: filter out outliers in performance tests#1788
Open
peaBerberian wants to merge 1 commit intodevfrom
Open
tests: filter out outliers in performance tests#1788peaBerberian wants to merge 1 commit intodevfrom
peaBerberian wants to merge 1 commit intodevfrom
Conversation
bc991db to
27a936c
Compare
|
✅ Automated performance checks have passed on commit DetailsPerformance tests 1st run outputNo significative change in performance for tests:
|
0142e34 to
1fd9df3
Compare
For multiple years now, we run performance tests - to detect performance regressions on some key scenarios (load, seek, track switching). It should be able to catch true large regressions but it bothers me that sometimes it seems to detect with a high confidence a very minor regression in the "cold loading multithread" scenario. This one could particularly be sensitive to ordering / optimizations made by the browsers' cache. So I'm here trying to experiment with some strategies to limit the possibility of having some kind of bias in our performance tests: - I do more test iterations. We previously hit what seems to be a limitation in the CI when running the browser 128 times. I want to check if it's still the case as it's limiting. - I remove the 10% outliers of all samples, both for the previous state and the current state. It may be enough to remove the difference for our cold-loading test. - I added a function trying to detect ordering bias
27a936c to
563ebe7
Compare
|
✅ Automated performance checks have passed on commit DetailsPerformance tests 1st run outputNo significative change in performance for tests:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
For multiple years now, we run performance tests on each PR - to detect performance regressions on some key scenarios (load, seek, track switching).
It should be able to catch true large regressions but it bothers me that sometimes it seems to detect with a high confidence a very minor regression in the "cold loading multithread" scenario.
This one could particularly be sensitive to ordering / optimizations made by the browsers' cache.
So I'm here trying to experiment with some strategies to limit the possibility of having some kind of bias in our performance tests:
I do more test iterations. We previously hit what seems to be a limitation in the CI when running the browser 128 times. I want to check if it's still the case as it's limiting.
I remove the 10% outliers of all samples, both for the previous state and the current state. It may be enough to remove the difference for our cold-loading test.
I added a function trying to detect ordering bias