Further support for Blackwell and L-class GPUs#136
Further support for Blackwell and L-class GPUs#136scal444 merged 5 commits intoNVIDIA-Digital-Bio:mainfrom
Conversation
|
| Filename | Overview |
|---|---|
| cmake/cuda_targets.cmake | Adds 89-real to fix L-class GPU support and conditionally appends 100-real (CUDA ≥ 12.8) and 120 with PTX (CUDA ≥ 12.9); also extends the cc loop for 100/120 preprocessor defines. PTX forward-compat gap for CUDA 12.8-only builds and the minor-version guard remain open from the previous review round. |
| src/similarity_kernels.cu | Extends supportsTensorOps to accept major == 10 and major == 12, adds compile-time guards for CC_100/CC_120, and removes the __CUDA_ARCH__ < 1000 upper bound in the Tanimoto/Cosine kernel preprocessor guards to enable BMMA on Blackwell. |
| src/substruct/substruct_kernels.cu | Correctly adds explicit sm_100 (2048 threads/SM) and sm_120 (1536 threads/SM) cases in getMaxThreadsPerSM before the sm >= 90 catch-all, fixing the previously incorrect 2048 value that would have been returned for consumer Blackwell. |
Reviews (4): Last reviewed commit: "Remove the extra brace, I blame greptile" | Re-trigger Greptile
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
evasnow1992
left a comment
There was a problem hiding this comment.
Changes look good to me. Thanks!
cc 89 (L40s) were left out of the fat build. Originally they had been supported by 75 PTX, but we removed it at some point so these were not working.
BMMA enabled for CC 10 and 12, I checked that it works.