Conversation
3484164 to
815a462
Compare
|
Please also consider to document/explain the results in the markdown table. Is it possible to mark the workflow check as failed if the performance diff > a threshold, i.e. 5%? |
|
Hi @ahrtr, sorry I forgot to say that this was in a PoC state, I was just looking to get the desired output. I'll clean this up when we decide what direction we want to follow. |
7f35340 to
ad4a6ae
Compare
|
I've cleaned up the code and squashed commits. It's very similar to the original implementation from #691 but adds some configuration and timeout so the job won't fail. However, what doesn't convince me is that for the same conditions, the op/sec seems to be highly variable, for example, the latest run (https://github.com/etcd-io/bbolt/actions/runs/9122610992). Without changes to the code, the difference is 4.51%.
I could scan the results, check for a threshold, and make the check fail. However, see my previous paragraph. I don't know if this benchmark (or at least op/sec) is accurate (or representative). I noticed locally that the more runs, the better and the more accurate the results. However, Go benchmarks in this repository are slow. Running a count of 5 (what Benchstat suggests for 95% confidence) takes about 45 minutes per run (meaning twice for checking the base and head for a PR). Would you say this is good enough for a first approach at benchmarking, @ahrtr? Document the markdown table, add a threshold, and get it going? |
Thanks for the work. Overall looks good to me. We can continue to enhance or revert the change if we see any big problem in future. |
|
@ahrtr, do you have any suggestion where to document the table? I was about to add it as part of the output, but I'm not sure if you think it will be better just in the code. |
It works for me. |
fe29042 to
1250765
Compare
|
@ahrtr, could you PTAL at this. I set it as ready for review. I worked on what we agreed, and the results for this PR are here: https://github.com/etcd-io/bbolt/actions/runs/9229526551?pr=750 |
|
that's neat, can we get it to post the resulting MD table to a comment as well? |
@tjungblu, I was researching a GitHub action that can comment (and update the original comment for new runs). I think it would be even better if we add a prow command (or a GitHub label) and just run these benchmarks if the label exists... However, I want to get first this PR merged, then add the next round of improvements :) |
|
Please hold reviewing before I address the build failure |
|
also quickly cobbled together the YCSB in pingcap/go-ycsb#300 - also really interesting to see the perf differences between boltdb and bbolt :) Might be another alternative to bench cmd, because the workloads are fairly well defined. |
|
I like that YCSB has a defined set of benchmarks to run. However, the reasons why I would prefer to use our
But if you guys feel that's a better direction, we can pursue that path :) |
|
I don't mind tbh, in the end we should choose something representative for etcd and k8s as a benchmark profile. |
I think you have also just exposed that we don't benchmark deletes 😅. |
|
I tend to use our own
|
|
I did a PoC with our Running 10 iterations, with no changes in the source code, the results are: Running 10 iterations against the commit before merging #741 |
|
Looks good. Regarding the parameters, please refer to #739 (comment). For example,
|
@ahrtr, I updated the pull request with these changes and to use our |
This adds benchmarking using cmd/bbolt's bench, inspired on what it's used in kube-state-matrics. Co-authored-by: Manuel Rüger <manuel@rueg.eu> Signed-off-by: Ivan Valdes <ivan@vald.es> wip Signed-off-by: Ivan Valdes <ivan@vald.es>
|
/lgtm |
|
LGTM @ivanvc Thank you. It looks good to me. |
The code comes from @ivanvc in etcd-io/bbolt#750
The code comes from ivanvc in etcd-io/bbolt#750
The code comes from ivanvc in etcd-io/bbolt#750
The code comes from ivanvc in etcd-io/bbolt#750
The code comes from ivanvc in etcd-io/bbolt#750
This PR introduces benchmarking in a PR, comparing the results vs. the base of that PR. It currently displays the results in the Job summary. If the performance difference is greater than 5%, it will make the job fail.
Supersedes #691.