Expand cross-similarity benchmark to larger sizes and trim variants#137
Expand cross-similarity benchmark to larger sizes and trim variants#137scal444 merged 4 commits intoNVIDIA-Digital-Bio:mainfrom
Conversation
Replace the 100/1000/2000 molecule scan with a 2k-32k sweep driven by a shared fingerprint set generated once up front. Drop the multiprocess rdkit and nvmolkit CPU-collect benchmarks, keeping only the serial rdkit baseline and the nvmolkit GPU-only path. Cosine similarity is now opt-in via a --cosine flag so default runs only time Tanimoto.
|
| Filename | Overview |
|---|---|
| benchmarks/cross_similarity_bench.py | Benchmark refactored to generate one shared fingerprint set up front, sweep 2k–32k sizes, drop multiprocess/cpu-collect variants, and add --cosine opt-in; previous infinite-loop bug is fixed with a not mols guard. |
Reviews (3): Last reviewed commit: "Make slow CPU path singlular" | Re-trigger Greptile
evasnow1992
left a comment
There was a problem hiding this comment.
Changes look good to me. Thank you for cleaning up this benchmark script!
Replace the 100/1000/2000 molecule scan with a 2k-32k sweep driven by a shared fingerprint set generated once up front. Drop the multiprocess rdkit and nvmolkit CPU-collect benchmarks, keeping only the serial rdkit baseline and the nvmolkit GPU-only path (the others were for reference/experiment). Cosine similarity is now opt-in via a --cosine flag so default runs only time Tanimoto.