Skip to content

tests: filter out outliers in performance tests#1788

Open
peaBerberian wants to merge 1 commit intodevfrom
perf-tests-improv
Open

tests: filter out outliers in performance tests#1788
peaBerberian wants to merge 1 commit intodevfrom
perf-tests-improv

Conversation

@peaBerberian
Copy link
Collaborator

For multiple years now, we run performance tests on each PR - to detect performance regressions on some key scenarios (load, seek, track switching).

It should be able to catch true large regressions but it bothers me that sometimes it seems to detect with a high confidence a very minor regression in the "cold loading multithread" scenario.

This one could particularly be sensitive to ordering / optimizations made by the browsers' cache.

So I'm here trying to experiment with some strategies to limit the possibility of having some kind of bias in our performance tests:

  • I do more test iterations. We previously hit what seems to be a limitation in the CI when running the browser 128 times. I want to check if it's still the case as it's limiting.

  • I remove the 10% outliers of all samples, both for the previous state and the current state. It may be enough to remove the difference for our cold-loading test.

  • I added a function trying to detect ordering bias

@github-actions
Copy link

✅ Automated performance checks have passed on commit a47e24954b0abaae4d47db41236a51a422add009 with the base branch dev.

Details

Performance tests 1st run output

No significative change in performance for tests:

Name Mean Median
loading 19.56ms -> 19.56ms (-0.009ms, z: 1.12718) 29.25ms -> 29.25ms
seeking 8.25ms -> 8.29ms (-0.042ms, z: 1.70640) 12.15ms -> 12.15ms
audio-track-reload 27.62ms -> 27.65ms (-0.029ms, z: 1.45846) 41.25ms -> 41.25ms
cold loading multithread 45.80ms -> 45.07ms (0.725ms, z: 29.15405) 68.40ms -> 67.35ms
seeking multithread 79.62ms -> 69.56ms (10.055ms, z: 1.33196) 10.35ms -> 10.35ms
audio-track-reload multithread 26.89ms -> 26.77ms (0.121ms, z: 3.92740) 40.05ms -> 39.95ms
hot loading multithread 15.03ms -> 14.92ms (0.113ms, z: 8.96326) 22.35ms -> 22.20ms

@peaBerberian peaBerberian force-pushed the dev branch 9 times, most recently from 0142e34 to 1fd9df3 Compare January 27, 2026 11:59
For multiple years now, we run performance tests - to detect performance
regressions on some key scenarios (load, seek, track switching).

It should be able to catch true large regressions but it bothers me that
sometimes it seems to detect with a high confidence a very minor
regression in the "cold loading multithread" scenario.

This one could particularly be sensitive to ordering / optimizations
made by the browsers' cache.

So I'm here trying to experiment with some strategies to limit the
possibility of having some kind of bias in our performance tests:

- I do more test iterations. We previously hit what seems to be a
  limitation in the CI when running the browser 128 times. I want to
  check if it's still the case as it's limiting.

- I remove the 10% outliers of all samples, both for the previous state
  and the current state. It may be enough to remove the difference for
  our cold-loading test.

- I added a function trying to detect ordering bias
@github-actions
Copy link

✅ Automated performance checks have passed on commit c975e4c726bfa3fa52b85e7b999a87784be17eb2 with the base branch dev.

Details

Performance tests 1st run output

No significative change in performance for tests:

Name Mean Median
loading 23.54ms -> 23.52ms (0.012ms, z: 0.22502) 35.10ms -> 35.10ms
seeking 408.47ms -> 398.72ms (9.749ms, z: 0.83294) 1513.50ms -> 1513.35ms
audio-track-reload 30.95ms -> 30.95ms (-0.004ms, z: 0.13141) 46.35ms -> 46.35ms
cold loading multithread 49.83ms -> 49.07ms (0.760ms, z: 24.19302) 74.55ms -> 73.35ms
seeking multithread 12.87ms -> 12.82ms (0.056ms, z: 1.83277) 19.20ms -> 19.05ms
audio-track-reload multithread 29.08ms -> 28.93ms (0.157ms, z: 6.06846) 43.35ms -> 43.10ms
hot loading multithread 19.26ms -> 19.07ms (0.191ms, z: 9.85295) 28.80ms -> 28.35ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant