Skip to content

Expand cross-similarity benchmark to larger sizes and trim variants#137

Merged
scal444 merged 4 commits intoNVIDIA-Digital-Bio:mainfrom
scal444:benchmark_cleanup
Apr 27, 2026
Merged

Expand cross-similarity benchmark to larger sizes and trim variants#137
scal444 merged 4 commits intoNVIDIA-Digital-Bio:mainfrom
scal444:benchmark_cleanup

Conversation

@scal444
Copy link
Copy Markdown
Collaborator

@scal444 scal444 commented Apr 24, 2026

Replace the 100/1000/2000 molecule scan with a 2k-32k sweep driven by a shared fingerprint set generated once up front. Drop the multiprocess rdkit and nvmolkit CPU-collect benchmarks, keeping only the serial rdkit baseline and the nvmolkit GPU-only path (the others were for reference/experiment). Cosine similarity is now opt-in via a --cosine flag so default runs only time Tanimoto.

scal444 added 2 commits April 24, 2026 14:20
Replace the 100/1000/2000 molecule scan with a 2k-32k sweep driven by a
shared fingerprint set generated once up front. Drop the multiprocess
rdkit and nvmolkit CPU-collect benchmarks, keeping only the serial rdkit
baseline and the nvmolkit GPU-only path. Cosine similarity is now
opt-in via a --cosine flag so default runs only time Tanimoto.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 24, 2026

Greptile Summary

This PR refactors the cross-similarity benchmark to generate one shared fingerprint set upfront for up to 32k molecules, then slice it per size — avoiding repeated fingerprint computation. The multiprocess rdkit and nvmolkit CPU-collect paths are removed, cosine similarity is now opt-in via --cosine, and the previously flagged infinite-loop risk is addressed with an explicit empty-mol guard and ValueError.

Confidence Score: 5/5

Safe to merge — clean benchmark-only refactor with no runtime logic or data-path changes.

No P0 or P1 findings. The infinite-loop concern from the previous review is resolved. The runner.args.values mutation pattern is synchronous with respect to bench_func calls, so per-benchmark value counts behave as intended. All SIZES values are ≤ max_size, so tensor slices are always in bounds.

No files require special attention.

Important Files Changed

Filename Overview
benchmarks/cross_similarity_bench.py Benchmark refactored to generate one shared fingerprint set up front, sweep 2k–32k sizes, drop multiprocess/cpu-collect variants, and add --cosine opt-in; previous infinite-loop bug is fixed with a not mols guard.

Reviews (3): Last reviewed commit: "Make slow CPU path singlular" | Re-trigger Greptile

Comment thread benchmarks/cross_similarity_bench.py
@scal444 scal444 requested a review from evasnow1992 April 24, 2026 19:47
Copy link
Copy Markdown
Collaborator

@evasnow1992 evasnow1992 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. Thank you for cleaning up this benchmark script!

@scal444 scal444 merged commit 78e0042 into NVIDIA-Digital-Bio:main Apr 27, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants