Expand cross-similarity benchmark to larger sizes and trim variants by scal444 · Pull Request #137 · NVIDIA-Digital-Bio/nvMolKit

scal444 · 2026-04-24T18:23:47Z

Replace the 100/1000/2000 molecule scan with a 2k-32k sweep driven by a shared fingerprint set generated once up front. Drop the multiprocess rdkit and nvmolkit CPU-collect benchmarks, keeping only the serial rdkit baseline and the nvmolkit GPU-only path (the others were for reference/experiment). Cosine similarity is now opt-in via a --cosine flag so default runs only time Tanimoto.

Replace the 100/1000/2000 molecule scan with a 2k-32k sweep driven by a shared fingerprint set generated once up front. Drop the multiprocess rdkit and nvmolkit CPU-collect benchmarks, keeping only the serial rdkit baseline and the nvmolkit GPU-only path. Cosine similarity is now opt-in via a --cosine flag so default runs only time Tanimoto.

greptile-apps · 2026-04-24T18:25:24Z

Greptile Summary

This PR refactors the cross-similarity benchmark to generate one shared fingerprint set upfront for up to 32k molecules, then slice it per size — avoiding repeated fingerprint computation. The multiprocess rdkit and nvmolkit CPU-collect paths are removed, cosine similarity is now opt-in via --cosine, and the previously flagged infinite-loop risk is addressed with an explicit empty-mol guard and ValueError.

Confidence Score: 5/5

Safe to merge — clean benchmark-only refactor with no runtime logic or data-path changes.

No P0 or P1 findings. The infinite-loop concern from the previous review is resolved. The runner.args.values mutation pattern is synchronous with respect to bench_func calls, so per-benchmark value counts behave as intended. All SIZES values are ≤ max_size, so tensor slices are always in bounds.

No files require special attention.

Important Files Changed

Filename	Overview
benchmarks/cross_similarity_bench.py	Benchmark refactored to generate one shared fingerprint set up front, sweep 2k–32k sizes, drop multiprocess/cpu-collect variants, and add --cosine opt-in; previous infinite-loop bug is fixed with a `not mols` guard.

_{Reviews (3): Last reviewed commit: "Make slow CPU path singlular" | Re-trigger Greptile}

evasnow1992

Changes look good to me. Thank you for cleaning up this benchmark script!

scal444 added 2 commits April 24, 2026 14:20

Formatting

7ab3cec

greptile-apps Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread benchmarks/cross_similarity_bench.py

scal444 added 2 commits April 24, 2026 14:30

Address greptile comment

70023da

Make slow CPU path singlular

1d59817

scal444 requested a review from evasnow1992 April 24, 2026 19:47

evasnow1992 approved these changes Apr 24, 2026

View reviewed changes

scal444 merged commit 78e0042 into NVIDIA-Digital-Bio:main Apr 27, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand cross-similarity benchmark to larger sizes and trim variants#137

Expand cross-similarity benchmark to larger sizes and trim variants#137
scal444 merged 4 commits intoNVIDIA-Digital-Bio:mainfrom
scal444:benchmark_cleanup

scal444 commented Apr 24, 2026

Uh oh!

greptile-apps Bot commented Apr 24, 2026 •

edited

Loading

Greptile Summary

Uh oh!

Uh oh!

evasnow1992 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

scal444 commented Apr 24, 2026

Uh oh!

greptile-apps Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

evasnow1992 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Apr 24, 2026 •

edited

Loading