fix(gpu): support CUDA 13.x toolkits (CUmemLocation anon-union)#28
Merged
Conversation
Building ferrotorch-gpu with CUDARC_CUDA_VERSION=13020 fails with `error[E0609]: no field 'id' on type CUmemLocation_st` at ferrotorch-gpu/src/graph.rs. In CUDA 13.x, cudarc's CUmemLocation moved `id` into an anonymous union (`__bindgen_anon_1.id`); the value semantics are identical. This is the only blocker -- the crate otherwise compiles cleanly at 13020. The assignment is cfg-gated on a new `ferrotorch_cuda13` flag emitted by build.rs when the resolved CUDARC_CUDA_VERSION is >= 13000 (env var, else an `nvcc --version` probe). The default 12080 build is byte-for-byte unchanged. Verified: compiles at both 12080 and 13020; runs natively on a CUDA 13.2 / RTX 2070 Super host with no CUDA-12 runtime shim -- init_cuda_backend() + cuBLAS GEMM, plus LayerNorm / attention / softmax / GELU, all match a PyTorch reference. cuSOLVER / cuFFT / cuSPARSE under a 13.x pin still require their .so.13 libs (orthogonal to this change). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up to PR forecast-bio#28. cuda_cusolver_compat::ensure() forces the CUDA-12.x libcusolver.so.11 to be resolved at runtime to supply the legacy cusolverDn* symbols the 12080-pinned cudarc dlopens. On a deliberate CUDA-13 build (ferrotorch_cuda13) that soname mismatch does not apply, the shim can never find a .so.11, and it only emits a misleading warning predicting a cusolverDnGeqrf panic. Gate the call on !cuda_version_at_least_13() — reusing the predicate PR forecast-bio#28 already added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Building
ferrotorch-gpuagainst a CUDA 13.x toolkit (CUDARC_CUDA_VERSION=13020) fails to compile:In CUDA 13.x, cudarc's
CUmemLocationmovedidinto an anonymous union (__bindgen_anon_1.id). The value semantics are identical. This is the only thing blocking a CUDA-13 build — the rest of the crate compiles cleanly at13020.Fix
graph.rs: the single assignment is cfg-gated —props.location.idon CUDA 12.x,props.location.__bindgen_anon_1.idon CUDA 13.x.build.rs: emits a newferrotorch_cuda13cfg when the resolved CUDA version is >= 13000 (readsCUDARC_CUDA_VERSION, falling back to annvcc --versionprobe), plus the matchingrustc-check-cfg.The default
12080build (pinned in.cargo/config.toml) never sets the cfg, so the 12.x path is byte-for-byte unchanged.Verification
On a CUDA 13.2 / RTX 2070 Super host (Arch, driver 595.71, nvidia-open):
cargo build -p ferrotorch-gpu --features cuda— clean at bothCUDARC_CUDA_VERSION=12080and13020.13020, runs natively with no CUDA-12 runtime shim (system/opt/cudacuBLAS/cudart.so.13):init_cuda_backend()succeeds, and a small MLP plus a GPT-2-small-shaped transformer (cuBLAS GEMM, LayerNorm, attention/softmax, GELU) produce outputs matching a PyTorch fp32 reference tomax_abs ~= 8e-6.Note / scope
This covers the cuBLAS + elementwise/attention path. cuSOLVER / cuFFT / cuSPARSE under a 13.x pin still expect their
.so.13libraries to be present — orthogonal to this change, just calling it out so the scope is clear.