Skip to content

Installation issue  #24

@jayjk13

Description

@jayjk13

Hello,
I'm encountering an issue when trying to install the grouped-gemm package during the Docker image build process. The installation fails with an error indicating that no NVIDIA driver is found. This happens despite using a CUDA-enabled base image.

Environment

  • Base Image: nvidia/cuda:12.6.2-cudnn-devel-ubuntu22.04
  • Python Version: 3.10
  • Pip Version: 23.3.1
  • Operating System Inside Docker: Ubuntu 22.04
  • grouped-gemm Version: Attempting to install grouped-gemm==0.1.6

Dockerfile Snippet

FROM nvidia/cuda:12.6.2-cudnn-devel-ubuntu22.04

ENV CUDA_VISIBLE_DEVICES="0"

# Install system dependencies
RUN apt-get update && \
    apt-get install -y \
    git \
    python3 \
    python3-pip \
    && rm -rf /var/lib/apt/lists/*

# Upgrade pip
RUN pip3 install --upgrade pip

# Install necessary Python packages
RUN pip3 install transformers==4.45.0 \
    accelerate==0.34.1 \
    sentencepiece==0.2.0 \
    torchvision \
    requests \
    torch \
    Pillow \
    grouped-gemm

Error Message

During the pip install step, I receive the following error:

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Full error log:

#16 83.25 /tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
#16 83.25 cpu = _conversion_method_template(device=torch.device("cpu"))
#16 83.25 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
#16 83.25 Traceback (most recent call last):
#16 83.25   File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
#16 83.25     main()
#16 83.25   File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
#16 83.25     json_out['return_val'] = hook(**hook_input['kwargs'])
#16 83.25   File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
#16 83.25     return hook(config_settings)
#16 83.25   File "/tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
#16 83.25     return self._get_build_requires(config_settings, requirements=['wheel'])
#16 83.25   File "/tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 295, in _get_build_requires
#16 83.25     self.run_setup()
#16 83.25   File "/tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 311, in run_setup
#16 83.25     exec(code, locals())
#16 83.25   File "<string>", line 16, in <module>
#16 83.25   File "/tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 509, in get_device_capability
#16 83.25     prop = get_device_properties(device)
#16 83.25   File "/tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 523, in get_device_properties
#16 83.25     _lazy_init() # will define _get_device_properties
#16 83.25   File "/tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 319, in _lazy_init
#16 83.25     torch._C._cuda_init()
#16 83.25 RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Steps to Reproduce

  1. Create a Dockerfile with the contents provided above.
  2. Run docker build -t test-image . to build the Docker image.
  3. Observe that the build fails during the pip install grouped-gemm step with the error about missing NVIDIA drivers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions