tests: filter out outliers in performance tests by peaBerberian · Pull Request #1788 · canalplus/rx-player

peaBerberian · 2026-01-21T15:44:20Z

For multiple years now, we run performance tests on each PR - to detect performance regressions on some key scenarios (load, seek, track switching).

It should be able to catch true large regressions but it bothers me that sometimes it seems to detect with a high confidence a very minor regression in the "cold loading multithread" scenario.

This one could particularly be sensitive to ordering / optimizations made by the browsers' cache.

So I'm here trying to experiment with some strategies to limit the possibility of having some kind of bias in our performance tests:

I do more test iterations. We previously hit what seems to be a limitation in the CI when running the browser 128 times. I want to check if it's still the case as it's limiting.
I remove the 10% outliers of all samples, both for the previous state and the current state. It may be enough to remove the difference for our cold-loading test.
I added a function trying to detect ordering bias

github-actions · 2026-01-21T17:34:50Z

✅ Automated performance checks have passed on commit a47e24954b0abaae4d47db41236a51a422add009 with the base branch dev.

Details

Performance tests 1st run output

No significative change in performance for tests:

Name	Mean	Median
loading	19.56ms -> 19.56ms (-0.009ms, z: 1.12718)	29.25ms -> 29.25ms
seeking	8.25ms -> 8.29ms (-0.042ms, z: 1.70640)	12.15ms -> 12.15ms
audio-track-reload	27.62ms -> 27.65ms (-0.029ms, z: 1.45846)	41.25ms -> 41.25ms
cold loading multithread	45.80ms -> 45.07ms (0.725ms, z: 29.15405)	68.40ms -> 67.35ms
seeking multithread	79.62ms -> 69.56ms (10.055ms, z: 1.33196)	10.35ms -> 10.35ms
audio-track-reload multithread	26.89ms -> 26.77ms (0.121ms, z: 3.92740)	40.05ms -> 39.95ms
hot loading multithread	15.03ms -> 14.92ms (0.113ms, z: 8.96326)	22.35ms -> 22.20ms

For multiple years now, we run performance tests - to detect performance regressions on some key scenarios (load, seek, track switching). It should be able to catch true large regressions but it bothers me that sometimes it seems to detect with a high confidence a very minor regression in the "cold loading multithread" scenario. This one could particularly be sensitive to ordering / optimizations made by the browsers' cache. So I'm here trying to experiment with some strategies to limit the possibility of having some kind of bias in our performance tests: - I do more test iterations. We previously hit what seems to be a limitation in the CI when running the browser 128 times. I want to check if it's still the case as it's limiting. - I remove the 10% outliers of all samples, both for the previous state and the current state. It may be enough to remove the difference for our cold-loading test. - I added a function trying to detect ordering bias

github-actions · 2026-01-27T14:40:20Z

✅ Automated performance checks have passed on commit c975e4c726bfa3fa52b85e7b999a87784be17eb2 with the base branch dev.

Details

Performance tests 1st run output

No significative change in performance for tests:

Name	Mean	Median
loading	23.54ms -> 23.52ms (0.012ms, z: 0.22502)	35.10ms -> 35.10ms
seeking	408.47ms -> 398.72ms (9.749ms, z: 0.83294)	1513.50ms -> 1513.35ms
audio-track-reload	30.95ms -> 30.95ms (-0.004ms, z: 0.13141)	46.35ms -> 46.35ms
cold loading multithread	49.83ms -> 49.07ms (0.760ms, z: 24.19302)	74.55ms -> 73.35ms
seeking multithread	12.87ms -> 12.82ms (0.056ms, z: 1.83277)	19.20ms -> 19.05ms
audio-track-reload multithread	29.08ms -> 28.93ms (0.157ms, z: 6.06846)	43.35ms -> 43.10ms
hot loading multithread	19.26ms -> 19.07ms (0.191ms, z: 9.85295)	28.80ms -> 28.35ms

peaBerberian force-pushed the perf-tests-improv branch from bc991db to 27a936c Compare January 21, 2026 15:45

peaBerberian force-pushed the dev branch 9 times, most recently from 0142e34 to 1fd9df3 Compare January 27, 2026 11:59

peaBerberian force-pushed the perf-tests-improv branch from 27a936c to 563ebe7 Compare January 27, 2026 12:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: filter out outliers in performance tests#1788

tests: filter out outliers in performance tests#1788
peaBerberian wants to merge 1 commit intodevfrom
perf-tests-improv

peaBerberian commented Jan 21, 2026

Uh oh!

github-actions bot commented Jan 21, 2026

Performance tests 1st run output

Uh oh!

github-actions bot commented Jan 27, 2026

Performance tests 1st run output

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

peaBerberian commented Jan 21, 2026

Uh oh!

github-actions bot commented Jan 21, 2026

Performance tests 1st run output

Uh oh!

github-actions bot commented Jan 27, 2026

Performance tests 1st run output

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant