Further support for Blackwell and L-class GPUs by scal444 · Pull Request #136 · NVIDIA-Digital-Bio/nvMolKit

scal444 · 2026-04-24T18:22:49Z

cc 89 (L40s) were left out of the fat build. Originally they had been supported by 75 PTX, but we removed it at some point so these were not working.

BMMA enabled for CC 10 and 12, I checked that it works.

greptile-apps · 2026-04-24T18:26:54Z

Greptile Summary

This PR adds 89-real to the fat-build arch list to restore L-class GPU (L40S/CC 89) support, enables BMMA tensor ops at runtime and compile time for Blackwell (sm_100, sm_120), removes the __CUDA_ARCH__ < 1000 upper bound in the similarity kernel guards, and fixes the getMaxThreadsPerSM value for consumer Blackwell (sm_120: 1536, not 2048). Open concerns flagged in the previous review round — the minor == 0 guard being too narrow for future Blackwell steppings and the PTX forward-compat gap for CUDA 12.8-only builds — remain unaddressed.

Confidence Score: 4/5

Safe to merge for current hardware; two concerns from the prior review round (minor-version guard and CUDA 12.8 PTX gap) remain open but do not affect production sm_100/sm_120 hardware today.

No new P0/P1 issues found in this pass. The logic for all three changed files is correct for the currently shipping Blackwell SKUs. The two previously flagged concerns (minor==0 narrowness and PTX forward-compat gap for CUDA 12.8 builds) are unresolved but are speculative/forward-looking rather than current breakage, keeping the score at the P1-ceiling of 4.

cmake/cuda_targets.cmake and src/similarity_kernels.cu carry the open forward-compat concerns from the prior review round.

Important Files Changed

Filename	Overview
cmake/cuda_targets.cmake	Adds `89-real` to fix L-class GPU support and conditionally appends `100-real` (CUDA ≥ 12.8) and `120` with PTX (CUDA ≥ 12.9); also extends the cc loop for `100`/`120` preprocessor defines. PTX forward-compat gap for CUDA 12.8-only builds and the minor-version guard remain open from the previous review round.
src/similarity_kernels.cu	Extends `supportsTensorOps` to accept `major == 10` and `major == 12`, adds compile-time guards for CC_100/CC_120, and removes the `__CUDA_ARCH__ < 1000` upper bound in the Tanimoto/Cosine kernel preprocessor guards to enable BMMA on Blackwell.
src/substruct/substruct_kernels.cu	Correctly adds explicit sm_100 (2048 threads/SM) and sm_120 (1536 threads/SM) cases in `getMaxThreadsPerSM` before the `sm >= 90` catch-all, fixing the previously incorrect 2048 value that would have been returned for consumer Blackwell.

_{Reviews (4): Last reviewed commit: "Remove the extra brace, I blame greptile" | Re-trigger Greptile}

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

evasnow1992

Changes look good to me. Thanks!

scal444 added 2 commits April 24, 2026 12:27

More blackwell support

ec5dca8

Cmake format

d38ef7b

greptile-apps Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread src/similarity_kernels.cu Outdated

Comment thread cmake/cuda_targets.cmake

Update src/similarity_kernels.cu

a356f6a

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread src/similarity_kernels.cu Outdated

scal444 added 2 commits April 24, 2026 14:39

Clang-format

026a639

Remove the extra brace, I blame greptile

0f9d7ee

scal444 requested a review from evasnow1992 April 24, 2026 21:49

evasnow1992 approved these changes Apr 24, 2026

View reviewed changes

scal444 merged commit 3f1e221 into NVIDIA-Digital-Bio:main Apr 27, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further support for Blackwell and L-class GPUs#136

Further support for Blackwell and L-class GPUs#136
scal444 merged 5 commits intoNVIDIA-Digital-Bio:mainfrom
scal444:blackwell_offical

scal444 commented Apr 24, 2026

Uh oh!

greptile-apps Bot commented Apr 24, 2026 •

edited

Loading

Greptile Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

evasnow1992 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

scal444 commented Apr 24, 2026

Uh oh!

greptile-apps Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

evasnow1992 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Apr 24, 2026 •

edited

Loading