Skip to content

fix(gpu): support CUDA 13.x toolkits (CUmemLocation anon-union)#28

Merged
dollspace-gay merged 2 commits into
forecast-bio:mainfrom
cynd22:fix/cuda13-support
May 31, 2026
Merged

fix(gpu): support CUDA 13.x toolkits (CUmemLocation anon-union)#28
dollspace-gay merged 2 commits into
forecast-bio:mainfrom
cynd22:fix/cuda13-support

Conversation

@cynd22
Copy link
Copy Markdown
Contributor

@cynd22 cynd22 commented May 30, 2026

What

Building ferrotorch-gpu against a CUDA 13.x toolkit (CUDARC_CUDA_VERSION=13020) fails to compile:

error[E0609]: no field `id` on type `CUmemLocation_st`
   --> ferrotorch-gpu/src/graph.rs:149
        props.location.id = device;

In CUDA 13.x, cudarc's CUmemLocation moved id into an anonymous union (__bindgen_anon_1.id). The value semantics are identical. This is the only thing blocking a CUDA-13 build — the rest of the crate compiles cleanly at 13020.

Fix

  • graph.rs: the single assignment is cfg-gated — props.location.id on CUDA 12.x, props.location.__bindgen_anon_1.id on CUDA 13.x.
  • build.rs: emits a new ferrotorch_cuda13 cfg when the resolved CUDA version is >= 13000 (reads CUDARC_CUDA_VERSION, falling back to an nvcc --version probe), plus the matching rustc-check-cfg.

The default 12080 build (pinned in .cargo/config.toml) never sets the cfg, so the 12.x path is byte-for-byte unchanged.

Verification

On a CUDA 13.2 / RTX 2070 Super host (Arch, driver 595.71, nvidia-open):

  • cargo build -p ferrotorch-gpu --features cuda — clean at both CUDARC_CUDA_VERSION=12080 and 13020.
  • At 13020, runs natively with no CUDA-12 runtime shim (system /opt/cuda cuBLAS/cudart .so.13): init_cuda_backend() succeeds, and a small MLP plus a GPT-2-small-shaped transformer (cuBLAS GEMM, LayerNorm, attention/softmax, GELU) produce outputs matching a PyTorch fp32 reference to max_abs ~= 8e-6.

Note / scope

This covers the cuBLAS + elementwise/attention path. cuSOLVER / cuFFT / cuSPARSE under a 13.x pin still expect their .so.13 libraries to be present — orthogonal to this change, just calling it out so the scope is clear.

Building ferrotorch-gpu with CUDARC_CUDA_VERSION=13020 fails with
`error[E0609]: no field 'id' on type CUmemLocation_st` at
ferrotorch-gpu/src/graph.rs. In CUDA 13.x, cudarc's CUmemLocation moved
`id` into an anonymous union (`__bindgen_anon_1.id`); the value semantics
are identical. This is the only blocker -- the crate otherwise compiles
cleanly at 13020.

The assignment is cfg-gated on a new `ferrotorch_cuda13` flag emitted by
build.rs when the resolved CUDARC_CUDA_VERSION is >= 13000 (env var, else
an `nvcc --version` probe). The default 12080 build is byte-for-byte
unchanged.

Verified: compiles at both 12080 and 13020; runs natively on a CUDA 13.2
/ RTX 2070 Super host with no CUDA-12 runtime shim -- init_cuda_backend()
+ cuBLAS GEMM, plus LayerNorm / attention / softmax / GELU, all match a
PyTorch reference. cuSOLVER / cuFFT / cuSPARSE under a 13.x pin still
require their .so.13 libs (orthogonal to this change).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dollspace-gay dollspace-gay self-assigned this May 31, 2026
Follow-up to PR forecast-bio#28. cuda_cusolver_compat::ensure() forces the CUDA-12.x
libcusolver.so.11 to be resolved at runtime to supply the legacy
cusolverDn* symbols the 12080-pinned cudarc dlopens. On a deliberate
CUDA-13 build (ferrotorch_cuda13) that soname mismatch does not apply,
the shim can never find a .so.11, and it only emits a misleading warning
predicting a cusolverDnGeqrf panic. Gate the call on
!cuda_version_at_least_13() — reusing the predicate PR forecast-bio#28 already added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dollspace-gay dollspace-gay merged commit b3a4320 into forecast-bio:main May 31, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants