Skip to content

Add cuda-13 build feature; make --force-gpu truly force the GPU#133

Merged
raphaelsty merged 2 commits into
mainfrom
feat/cuda-13-build-option
Jun 16, 2026
Merged

Add cuda-13 build feature; make --force-gpu truly force the GPU#133
raphaelsty merged 2 commits into
mainfrom
feat/cuda-13-build-option

Conversation

@raphaelsty

Copy link
Copy Markdown
Collaborator

Depends on lightonai/fastkmeans-rs#9 — merge & publish fastkmeans-rs 1.0.7 to crates.io before this (CI here can't resolve the dep until then).

1. cuda-13 build option

cudarc binds exactly one CUDA major version at compile time (cuBLAS/NVRTC SONAMEs .so.12 vs .so.13 + bindings differ), so one binary can't serve both CUDA 12 and 13. We expose the choice as a feature instead of hard-pinning cuda-11080:

feature CUDA notes
cuda 11.8 / 12.x unchanged — existing users/CI unaffected
cuda-13 13.x new; hosts where only .so.13 exists (driver 580 / toolkit 13.x)
_cuda internal gate, compiles the CUDA code; don't enable directly
  • cudarc's dependency line no longer pins a version; cuda / cuda-13 select it plus the matching fastkmeans-rs feature.
  • All next-plaid + colgrep #[cfg(feature = "cuda")] gates now key off _cuda, so they build under either major (next-plaid-onnx is unchanged — the ONNX EP is version-agnostic at build time).
  • Wired through next-plaid, colgrep and next-plaid-api. Enabling both cuda and cuda-13 fails to compile by design.

Build matrix validated: cargo build for default / cuda / cuda-13 all pass; cuda,cuda-13 fails as expected.

2. --force-gpu must not silently run on CPU

ORT registers execution providers best-effort — a CUDA EP that fails to initialize is silently dropped and the session quietly runs on CPU, so --force-gpu could still encode on the CPU. Now, when the GPU is forced:

  • the CUDA EP is registered with error_on_failure() (ORT errors instead of falling back), and
  • the CUDA-init panic branch hard-errors instead of returning a CPU builder.

Validation (CUDA-13 H100, driver 580.126.09)

  • colgrep init … --force-gpu (built --features cuda-13) → 🤖 Model: … (CUDA), indexes on GPU, no fallback.
  • CUDA_VISIBLE_DEVICES="" … --force-gpuhard error (FORCE_GPU is set, but the CUDA execution provider was not initialized), no silent CPU run.

Follow-ups (not in this PR)

  • next-plaid-api/Dockerfile CUDA image still targets CUDA 12; a cuda-13 image would use the new feature.
  • Release CI could ship a second colgrep artifact built with cuda-13.

🤖 Generated with Claude Code

Raphael Sourty and others added 2 commits June 16, 2026 19:43
## cuda-13 build option

cudarc binds exactly one CUDA major version at compile time, so a single binary
cannot serve both CUDA 12 and CUDA 13 hosts. Expose the choice as a feature
instead of hard-pinning it:

  * `cuda`    -> CUDA 11.8 / 12.x  (default, unchanged behaviour)
  * `cuda-13` -> CUDA 13.x         (hosts where only .so.13 libraries exist)
  * `_cuda`   -> internal gate that compiles the CUDA code; reached via either
                 public feature (do not enable directly)

cudarc no longer pins a version in its dependency line; `cuda` / `cuda-13`
select it (and the matching `fastkmeans-rs` feature). All `next-plaid` and
`colgrep` `#[cfg(feature = "cuda")]` gates now key off `_cuda` so they compile
under either major. Enabling both `cuda` and `cuda-13` fails by design. Applied
across next-plaid, colgrep and next-plaid-api; requires fastkmeans-rs >= 1.0.7
(companion PR lightonai/fastkmeans-rs#9).

## --force-gpu must not silently fall back to CPU

ORT registers execution providers best-effort: a CUDA EP that fails to
initialize is silently dropped and the session quietly runs on the CPU, so
`--force-gpu` could still encode on the CPU. Mark the CUDA EP
`error_on_failure()` when the GPU is forced, and turn the CUDA-init panic branch
into a hard error under force-gpu instead of returning a CPU builder.

Verified on a CUDA-13 H100 host: `--force-gpu` indexes on CUDA, and hard-errors
(no CPU fallback) when the GPU is hidden.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@raphaelsty raphaelsty merged commit b043c63 into main Jun 16, 2026
20 checks passed
raphaelsty pushed a commit that referenced this pull request Jun 16, 2026
Patch release. Includes the `cuda-13` build feature and the `--force-gpu`
hardening (#133). The default/curl build remains CUDA 11.8/12.x compatible;
CUDA 13 is opt-in via `--features cuda-13`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant