Add cuda-13 build feature; make --force-gpu truly force the GPU#133
Merged
Conversation
## cuda-13 build option
cudarc binds exactly one CUDA major version at compile time, so a single binary
cannot serve both CUDA 12 and CUDA 13 hosts. Expose the choice as a feature
instead of hard-pinning it:
* `cuda` -> CUDA 11.8 / 12.x (default, unchanged behaviour)
* `cuda-13` -> CUDA 13.x (hosts where only .so.13 libraries exist)
* `_cuda` -> internal gate that compiles the CUDA code; reached via either
public feature (do not enable directly)
cudarc no longer pins a version in its dependency line; `cuda` / `cuda-13`
select it (and the matching `fastkmeans-rs` feature). All `next-plaid` and
`colgrep` `#[cfg(feature = "cuda")]` gates now key off `_cuda` so they compile
under either major. Enabling both `cuda` and `cuda-13` fails by design. Applied
across next-plaid, colgrep and next-plaid-api; requires fastkmeans-rs >= 1.0.7
(companion PR lightonai/fastkmeans-rs#9).
## --force-gpu must not silently fall back to CPU
ORT registers execution providers best-effort: a CUDA EP that fails to
initialize is silently dropped and the session quietly runs on the CPU, so
`--force-gpu` could still encode on the CPU. Mark the CUDA EP
`error_on_failure()` when the GPU is forced, and turn the CUDA-init panic branch
into a hard error under force-gpu instead of returning a CPU builder.
Verified on a CUDA-13 H100 host: `--force-gpu` indexes on CUDA, and hard-errors
(no CPU fallback) when the GPU is hidden.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
raphaelsty
pushed a commit
that referenced
this pull request
Jun 16, 2026
Patch release. Includes the `cuda-13` build feature and the `--force-gpu` hardening (#133). The default/curl build remains CUDA 11.8/12.x compatible; CUDA 13 is opt-in via `--features cuda-13`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Depends on lightonai/fastkmeans-rs#9 — merge & publish
fastkmeans-rs 1.0.7to crates.io before this (CI here can't resolve the dep until then).1.
cuda-13build optioncudarcbinds exactly one CUDA major version at compile time (cuBLAS/NVRTC SONAMEs.so.12vs.so.13+ bindings differ), so one binary can't serve both CUDA 12 and 13. We expose the choice as a feature instead of hard-pinningcuda-11080:cudacuda-13.so.13exists (driver 580 / toolkit 13.x)_cudacuda/cuda-13select it plus the matchingfastkmeans-rsfeature.next-plaid+colgrep#[cfg(feature = "cuda")]gates now key off_cuda, so they build under either major (next-plaid-onnxis unchanged — the ONNX EP is version-agnostic at build time).next-plaid,colgrepandnext-plaid-api. Enabling bothcudaandcuda-13fails to compile by design.Build matrix validated:
cargo buildfor default /cuda/cuda-13all pass;cuda,cuda-13fails as expected.2.
--force-gpumust not silently run on CPUORT registers execution providers best-effort — a CUDA EP that fails to initialize is silently dropped and the session quietly runs on CPU, so
--force-gpucould still encode on the CPU. Now, when the GPU is forced:error_on_failure()(ORT errors instead of falling back), andValidation (CUDA-13 H100, driver 580.126.09)
colgrep init … --force-gpu(built--features cuda-13) →🤖 Model: … (CUDA), indexes on GPU, no fallback.CUDA_VISIBLE_DEVICES="" … --force-gpu→ hard error (FORCE_GPU is set, but the CUDA execution provider was not initialized), no silent CPU run.Follow-ups (not in this PR)
next-plaid-api/DockerfileCUDA image still targets CUDA 12; acuda-13image would use the new feature.colgrepartifact built withcuda-13.🤖 Generated with Claude Code