Skip to content

Migrate pynvml to cuda.core.system#640

Open
mdboom wants to merge 5 commits into
rapidsai:mainfrom
mdboom:pynvml-to-cuda.core.system
Open

Migrate pynvml to cuda.core.system#640
mdboom wants to merge 5 commits into
rapidsai:mainfrom
mdboom:pynvml-to-cuda.core.system

Conversation

@mdboom
Copy link
Copy Markdown

@mdboom mdboom commented Apr 28, 2026

This migrates from pynvml.py to the new Cython/cybind-based cuda.core.system.

I was unable to get my local development set up for testing this one, so it's untested.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 28, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@pentschev
Copy link
Copy Markdown
Member

/ok to test 3b9b740

@mdboom mdboom marked this pull request as ready for review May 19, 2026 13:43
@mdboom mdboom requested review from a team as code owners May 19, 2026 13:43
@mdboom mdboom requested a review from msarahan May 19, 2026 13:43
@pentschev pentschev added improvement Improves an existing functionality non-breaking Introduces a non-breaking change ucxx labels May 19, 2026
@pentschev
Copy link
Copy Markdown
Member

/ok to test 408fe66

@pentschev
Copy link
Copy Markdown
Member

@mdboom there seems to be some style issues, could you try to automatically fix them with pre-commit install / pre-commit run -a?

@mdboom
Copy link
Copy Markdown
Author

mdboom commented May 19, 2026

/ok to test 4e248fc

1 similar comment
@pentschev
Copy link
Copy Markdown
Member

/ok to test 4e248fc

@pentschev
Copy link
Copy Markdown
Member

Unfortunately you're not part of the rapidsai org, so I think you have no power for triggering testing.

@pentschev
Copy link
Copy Markdown
Member

/ok to test 9dfabb5

@pentschev
Copy link
Copy Markdown
Member

@mdboom there are errors occurring that I think are legit issues, see here for example:

 │ │ Traceback (most recent call last):
 │ │   File "$SRC_DIR/conda_build_script.py", line 1, in <module>
 │ │     import ucxx
 │ │   File "$PREFIX/lib/python3.14/site-packages/ucxx/__init__.py", line 63, in <module>
 │ │     device_count = system.Device.get_device_count()
 │ │   File "cuda/core/system/_device.pyx", line 410, in cuda.core.system._device.Device.get_device_count
 │ │   File "cuda/core/system/_nvml_context.pxd", line 43, in cuda.core.system._nvml_context.initialize
 │ │   File "cuda/core/system/_nvml_context.pyx", line 32, in cuda.core.system._nvml_context._initialize
 │ │   File "cuda/core/system/_nvml_context.pyx", line 42, in cuda.core.system._nvml_context._initialize
 │ │   File "cuda/bindings/nvml.pyx", line 20860, in cuda.bindings.nvml.init_v2
 │ │   File "cuda/bindings/nvml.pyx", line 20866, in cuda.bindings.nvml.init_v2
 │ │     __status__ = nvmlInit_v2()
 │ │   File "cuda/bindings/cynvml.pyx", line 15, in cuda.bindings.cynvml.nvmlInit_v2
 │ │   File "cuda/bindings/_internal/nvml.pyx", line 3892, in cuda.bindings._internal.nvml._nvmlInit_v2
 │ │   File "cuda/bindings/_internal/nvml.pyx", line 2832, in cuda.bindings._internal.nvml._check_or_init_nvml
 │ │     return _init_nvml()
 │ │   File "cuda/bindings/_internal/nvml.pyx", line 417, in cuda.bindings._internal.nvml._init_nvml
 │ │     with gil, __symbol_lock:
 │ │   File "cuda/bindings/_internal/nvml.pyx", line 427, in cuda.bindings._internal.nvml._init_nvml
 │ │     handle = load_library()
 │ │   File "cuda/bindings/_internal/nvml.pyx", line 409, in cuda.bindings._internal.nvml.load_library
 │ │     cdef uintptr_t handle = load_nvidia_dynamic_lib("nvml")._handle_uint
 │ │   File "$PREFIX/lib/python3.14/site-packages/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py", line 303, in load_nvidia_dynamic_lib
 │ │     return _load_lib_no_cache(libname)
 │ │   File "$PREFIX/lib/python3.14/site-packages/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py", line 158, in _load_lib_no_cache
 │ │     return _load_driver_lib_no_cache(desc)
 │ │   File "$PREFIX/lib/python3.14/site-packages/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py", line 71, in _load_driver_lib_no_cache
 │ │     raise DynamicLibNotFoundError(
 │ │     ...<2 lines>...
 │ │     )
 │ │ cuda.pathfinder._dynamic_libs.load_dl_common.DynamicLibNotFoundError: "nvml" is an NVIDIA driver library and can only be found via system search. Ensure the NVIDIA display driver is installed.

Are we supposed to bring NVML from conda to satisfy this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change ucxx

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants