Add `cuda-13` build feature; make `--force-gpu` truly force the GPU by raphaelsty · Pull Request #133 · lightonai/next-plaid

raphaelsty · 2026-06-16T19:44:17Z

Depends on lightonai/fastkmeans-rs#9 — merge & publish fastkmeans-rs 1.0.7 to crates.io before this (CI here can't resolve the dep until then).

1. `cuda-13` build option

cudarc binds exactly one CUDA major version at compile time (cuBLAS/NVRTC SONAMEs .so.12 vs .so.13 + bindings differ), so one binary can't serve both CUDA 12 and 13. We expose the choice as a feature instead of hard-pinning cuda-11080:

feature	CUDA	notes
`cuda`	11.8 / 12.x	unchanged — existing users/CI unaffected
`cuda-13`	13.x	new; hosts where only `.so.13` exists (driver 580 / toolkit 13.x)
`_cuda`	—	internal gate, compiles the CUDA code; don't enable directly

cudarc's dependency line no longer pins a version; cuda / cuda-13 select it plus the matching fastkmeans-rs feature.
All next-plaid + colgrep #[cfg(feature = "cuda")] gates now key off _cuda, so they build under either major (next-plaid-onnx is unchanged — the ONNX EP is version-agnostic at build time).
Wired through next-plaid, colgrep and next-plaid-api. Enabling both cuda and cuda-13 fails to compile by design.

Build matrix validated: cargo build for default / cuda / cuda-13 all pass; cuda,cuda-13 fails as expected.

2. `--force-gpu` must not silently run on CPU

ORT registers execution providers best-effort — a CUDA EP that fails to initialize is silently dropped and the session quietly runs on CPU, so --force-gpu could still encode on the CPU. Now, when the GPU is forced:

the CUDA EP is registered with error_on_failure() (ORT errors instead of falling back), and
the CUDA-init panic branch hard-errors instead of returning a CPU builder.

Validation (CUDA-13 H100, driver 580.126.09)

colgrep init … --force-gpu (built --features cuda-13) → 🤖 Model: … (CUDA), indexes on GPU, no fallback.
CUDA_VISIBLE_DEVICES="" … --force-gpu → hard error (FORCE_GPU is set, but the CUDA execution provider was not initialized), no silent CPU run.

Follow-ups (not in this PR)

next-plaid-api/Dockerfile CUDA image still targets CUDA 12; a cuda-13 image would use the new feature.
Release CI could ship a second colgrep artifact built with cuda-13.

🤖 Generated with Claude Code

## cuda-13 build option cudarc binds exactly one CUDA major version at compile time, so a single binary cannot serve both CUDA 12 and CUDA 13 hosts. Expose the choice as a feature instead of hard-pinning it: * `cuda` -> CUDA 11.8 / 12.x (default, unchanged behaviour) * `cuda-13` -> CUDA 13.x (hosts where only .so.13 libraries exist) * `_cuda` -> internal gate that compiles the CUDA code; reached via either public feature (do not enable directly) cudarc no longer pins a version in its dependency line; `cuda` / `cuda-13` select it (and the matching `fastkmeans-rs` feature). All `next-plaid` and `colgrep` `#[cfg(feature = "cuda")]` gates now key off `_cuda` so they compile under either major. Enabling both `cuda` and `cuda-13` fails by design. Applied across next-plaid, colgrep and next-plaid-api; requires fastkmeans-rs >= 1.0.7 (companion PR lightonai/fastkmeans-rs#9). ## --force-gpu must not silently fall back to CPU ORT registers execution providers best-effort: a CUDA EP that fails to initialize is silently dropped and the session quietly runs on the CPU, so `--force-gpu` could still encode on the CPU. Mark the CUDA EP `error_on_failure()` when the GPU is forced, and turn the CUDA-init panic branch into a hard error under force-gpu instead of returning a CPU builder. Verified on a CUDA-13 H100 host: `--force-gpu` indexes on CUDA, and hard-errors (no CPU fallback) when the GPU is hidden. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Patch release. Includes the `cuda-13` build feature and the `--force-gpu` hardening (#133). The default/curl build remains CUDA 11.8/12.x compatible; CUDA 13 is opt-in via `--features cuda-13`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Raphael Sourty and others added 2 commits June 16, 2026 19:43

Lockfile: fastkmeans-rs 1.0.7

021ebd3

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

raphaelsty merged commit b043c63 into main Jun 16, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `cuda-13` build feature; make `--force-gpu` truly force the GPU#133

Add `cuda-13` build feature; make `--force-gpu` truly force the GPU#133
raphaelsty merged 2 commits into
mainfrom
feat/cuda-13-build-option

raphaelsty commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raphaelsty commented Jun 16, 2026

1. cuda-13 build option

2. --force-gpu must not silently run on CPU

Validation (CUDA-13 H100, driver 580.126.09)

Follow-ups (not in this PR)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `cuda-13` build option

2. `--force-gpu` must not silently run on CPU