feat: add benchmarks for awkward#3548
Conversation
* debug test 1 * fix output piping for benchmark comparisons * try handling multiline outputs * hopefully fix paths * another try to pass multiline output * try fix json file names * fix directory creation for benchmark results * please precommit * prettify table headers * style: pre-commit fixes * go back to original commit for comparison * prettify branch name and SHA in table header * style: pre-commit fixes * try self-hosted runner * another try with self-hosted runner * another try --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
Hi @ianna, The question is which things do we want to benchmark? Depending on how many functions we want to benchmark the CI job may become rather expensive? Also: there may be false positives if the runner is under additional load that may affect the actual runtime. We can adjust the threshold of when the bot is supposed to post a comment of course in the future. |
ianna
left a comment
There was a problem hiding this comment.
@pfackeldey - looks great! as discussed, it merits to be a separate repo. It would give us more flexibility to what we can profile. Thanks!
|
Yes, I'll close this for now and use it as a starting point for a separate repo! |
This PR adds benchmarks to awkward.
For every PR it runs a suite of benchmarks and measures some performance metrics between main and the feature branch (merged with main). It reports any improvements & regressions of more than 10% in
cpu_time.It would be nice to extend this to run over all tags so we can track how performance evolves throughout all awkward releases.
An example notification for a performance regression of the
ak.alloperation (axis=None) running on a Jagged array with 65536 elements and dtype="float64" would look like this:"""
🔹 ak.all(Jagged<65536,f64>, axis=None)
Relative CPU Time Difference:
692.1%— 🔴 RegressionShow full comparison
cpu_time(ms)real_time(ms)elements/s(Hz)This is not yet including all "number-crunching" operations. What we want to benchmark (e.g. which highlevel ops) is up for discussion.
Opening this PR already now to debug the CI.