Skip to content

Installation failure: flash-attn cannot build in HPC environments without CUDA #2

@seansica

Description

@seansica

Environment

  • OS: Linux x86_64 (HPC login node)
  • Python: 3.10
  • PyTorch: 2.7.0+cu126
  • uv version: 0.7.8
  • HPC Environment: Login nodes without direct GPU/CUDA access (jobs dispatched to GPU nodes via Slurm)

Problem Description

When running uv sync or uv sync --no-build-isolation, the installation fails because flash-attn attempts to build from source but cannot find required CUDA dependencies on HPC login nodes.

$ uv sync --no-build-isolation
Using CPython 3.10.12 interpreter at: /usr/bin/python3.10
Creating virtual environment at: .venv
Resolved 106 packages in 3ms
  × Failed to build `flash-attn==2.7.4.post1`
  ├─▶ The build backend returned an error
  ╰─▶ Call to `setuptools.build_meta:__legacy__.build_wheel` failed (exit status: 1)

      [stderr]
      Traceback (most recent call last):
        File "<string>", line 8, in <module>
      ModuleNotFoundError: No module named 'setuptools'

      hint: This usually indicates a problem with the package or the build environment.
  help: `flash-attn` (v2.7.4.post1) was included because `activault` (v0.1.0) depends on `flash-attn`

The output hinted that I should try uv pip install torch first before running uv sync. That worked for installing torch, but failed when trying to build/install flash-attn:

Resolved 106 packages in 3ms
  × Failed to build `flash-attn==2.7.4.post1`
  ├─▶ The build backend returned an error
  ╰─▶ Call to `setuptools.build_meta:__legacy__.build_wheel` failed (exit status: 1)

      [stdout]


      torch.__version__  = 2.7.0+cu126



      [stderr]
      /home/ssica/activault/.venv/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:276: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
        cpu = _conversion_method_template(device=torch.device("cpu"))
      <string>:106: UserWarning: flash_attn was requested, but nvcc was not found.  Are you sure your environment has nvcc available?  If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
      Traceback (most recent call last):
        File "<string>", line 11, in <module>
        File "/home/ssica/activault/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 432, in build_wheel
          return _build(['bdist_wheel'])
        File "/home/ssica/activault/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 423, in _build
          return self._build_with_temp_dir(
        File "/home/ssica/activault/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 404, in _build_with_temp_dir
          self.run_setup()
        File "/home/ssica/activault/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 512, in run_setup
          super().run_setup(setup_script=setup_script)
        File "/home/ssica/activault/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 317, in run_setup
          exec(code, locals())
        File "<string>", line 198, in <module>
        File "/home/ssica/activault/.venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1279, in CUDAExtension
          library_dirs += library_paths(device_type="cuda")
        File "/home/ssica/activault/.venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1490, in library_paths
          if (not os.path.exists(_join_cuda_home(lib_dir)) and
        File "/home/ssica/activault/.venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2889, in _join_cuda_home
          raise OSError('CUDA_HOME environment variable is not set. '
      OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

      hint: This usually indicates a problem with the package or the build environment.
  help: `flash-attn` (v2.7.4.post1) was included because `activault` (v0.1.0) depends on `flash-attn`

Root Cause

HPC environments typically:

  1. Have login nodes without GPU access or CUDA toolkit
  2. Dispatch GPU jobs to separate compute nodes via job schedulers (Slurm, PBS, etc.)
  3. Don't have CUDA development tools available by default on login nodes

This prevents flash-attn from building during dependency resolution, even though the final code will run on GPU nodes.

Workaround

(This is what worked for me).

Install pre-compiled flash-attn wheel manually before running uv sync:

  1. Identify which version of flash-attn to install for the environment:

    # In your virtual environment
    python -c "import sys, platform; print(f'Python: {sys.version_info.major}.{sys.version_info.minor}'); print(f'Platform: {platform.machine()}')"
    python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CXX11 ABI: {torch._C._GLIBCXX_USE_CXX11_ABI}')"
    
    # EXAMPLE OUTPUT:
    # PyTorch: 2.7.0+cu126
    # CUDA available: False
    # CUDA version: N/A
    # CXX11 ABI: True

    Prebuilt flash-attn packages for the latest release can be found here: https://github.com/Dao-AILab/flash-attention/releases/tag/v2.7.4.post1

  2. Install compatible pre-built wheel:

    # Example for Python 3.10, PyTorch 2.6/2.7, CUDA 12, CXX11 ABI True
    uv pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiTRUE-cp310-cp310-linux_x86_64.whl
  3. uv sync --no-build-isolation should work now.

Another Approach

Make flash-attn optional for GPU environments

Another option could be to make flash-attn optional for GPU environments—this would require user's run the command in their Slurm jobs before activault runs. Once the job gets dispatched to a compute node, the script should have CUDA access, which should make it possible to install flash-attn from source.

Notably, I didn't actually test this solution.

[project.optional-dependencies]
gpu = [
    "flash-attn>=2.7.0.post2",
]
dev = [
    "black",
    "isort", 
    "mypy",
    "pytest",
    "pytest-cov",
]

Usage:

  • Development (login nodes): uv sync --extra dev
  • GPU jobs: uv sync --extra gpu (run within Slurm job on compute nodes with CUDA)

Impact

This affects users in:

  • HPC clusters in which login nodes lack CUDA access (CUDA only available on dispatch)
  • CI/CD pipelines without GPU access

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions