Installation failure: flash-attn cannot build in HPC environments without CUDA

## Environment
- **OS**: Linux x86_64 (HPC login node)
- **Python**: 3.10
- **PyTorch**: 2.7.0+cu126
- **uv version**: 0.7.8
- **HPC Environment**: Login nodes without direct GPU/CUDA access (jobs dispatched to GPU nodes via Slurm)

## Problem Description

When running `uv sync` or `uv sync --no-build-isolation`, the installation fails because `flash-attn` attempts to build from source but cannot find required CUDA dependencies on HPC login nodes.

```
$ uv sync --no-build-isolation
Using CPython 3.10.12 interpreter at: /usr/bin/python3.10
Creating virtual environment at: .venv
Resolved 106 packages in 3ms
  × Failed to build `flash-attn==2.7.4.post1`
  ├─▶ The build backend returned an error
  ╰─▶ Call to `setuptools.build_meta:__legacy__.build_wheel` failed (exit status: 1)

      [stderr]
      Traceback (most recent call last):
        File "<string>", line 8, in <module>
      ModuleNotFoundError: No module named 'setuptools'

      hint: This usually indicates a problem with the package or the build environment.
  help: `flash-attn` (v2.7.4.post1) was included because `activault` (v0.1.0) depends on `flash-attn`
```

The output hinted that I should try `uv pip install torch` first before running `uv sync`. That worked for installing `torch`, but failed when trying to build/install `flash-attn`:

```
Resolved 106 packages in 3ms
  × Failed to build `flash-attn==2.7.4.post1`
  ├─▶ The build backend returned an error
  ╰─▶ Call to `setuptools.build_meta:__legacy__.build_wheel` failed (exit status: 1)

      [stdout]


      torch.__version__  = 2.7.0+cu126



      [stderr]
      /home/ssica/activault/.venv/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:276: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
        cpu = _conversion_method_template(device=torch.device("cpu"))
      <string>:106: UserWarning: flash_attn was requested, but nvcc was not found.  Are you sure your environment has nvcc available?  If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
      Traceback (most recent call last):
        File "<string>", line 11, in <module>
        File "/home/ssica/activault/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 432, in build_wheel
          return _build(['bdist_wheel'])
        File "/home/ssica/activault/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 423, in _build
          return self._build_with_temp_dir(
        File "/home/ssica/activault/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 404, in _build_with_temp_dir
          self.run_setup()
        File "/home/ssica/activault/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 512, in run_setup
          super().run_setup(setup_script=setup_script)
        File "/home/ssica/activault/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 317, in run_setup
          exec(code, locals())
        File "<string>", line 198, in <module>
        File "/home/ssica/activault/.venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1279, in CUDAExtension
          library_dirs += library_paths(device_type="cuda")
        File "/home/ssica/activault/.venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1490, in library_paths
          if (not os.path.exists(_join_cuda_home(lib_dir)) and
        File "/home/ssica/activault/.venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2889, in _join_cuda_home
          raise OSError('CUDA_HOME environment variable is not set. '
      OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

      hint: This usually indicates a problem with the package or the build environment.
  help: `flash-attn` (v2.7.4.post1) was included because `activault` (v0.1.0) depends on `flash-attn`
```

## Root Cause

HPC environments typically:
1. Have login nodes without GPU access or CUDA toolkit
2. Dispatch GPU jobs to separate compute nodes via job schedulers (Slurm, PBS, etc.)
3. Don't have CUDA development tools available by default on login nodes

This prevents `flash-attn` from building during dependency resolution, even though the final code will run on GPU nodes.

## Workaround

(This is what worked for me).

Install pre-compiled `flash-attn` wheel manually before running `uv sync`:

1. **Identify which version of `flash-attn` to install for the environment**:
   ```bash
   # In your virtual environment
   python -c "import sys, platform; print(f'Python: {sys.version_info.major}.{sys.version_info.minor}'); print(f'Platform: {platform.machine()}')"
   python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CXX11 ABI: {torch._C._GLIBCXX_USE_CXX11_ABI}')"
   
   # EXAMPLE OUTPUT:
   # PyTorch: 2.7.0+cu126
   # CUDA available: False
   # CUDA version: N/A
   # CXX11 ABI: True
   ```
   Prebuilt `flash-attn` packages for the latest release can be found here: https://github.com/Dao-AILab/flash-attention/releases/tag/v2.7.4.post1

2. **Install compatible pre-built wheel**:
   ```bash
   # Example for Python 3.10, PyTorch 2.6/2.7, CUDA 12, CXX11 ABI True
   uv pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiTRUE-cp310-cp310-linux_x86_64.whl
   ```

3. **`uv sync --no-build-isolation` should work now**.

## Another Approach

### Make flash-attn optional for GPU environments

Another option could be to make `flash-attn` optional for GPU environments—this would require user's run the command in their Slurm jobs before `activault` runs. Once the job gets dispatched to a compute node, the script should have CUDA access, which should make it possible to install `flash-attn` from source.

Notably, I didn't actually test this solution.

```toml
[project.optional-dependencies]
gpu = [
    "flash-attn>=2.7.0.post2",
]
dev = [
    "black",
    "isort", 
    "mypy",
    "pytest",
    "pytest-cov",
]
```

**Usage:**
- Development (login nodes): `uv sync --extra dev`
- GPU jobs: `uv sync --extra gpu` (run within Slurm job on compute nodes with CUDA)

## Impact

This affects users in:
- HPC clusters in which login nodes lack CUDA access (CUDA only available on dispatch)
- CI/CD pipelines without GPU access

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installation failure: flash-attn cannot build in HPC environments without CUDA #2

Environment

Problem Description

Root Cause

Workaround

Another Approach

Make flash-attn optional for GPU environments

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Installation failure: flash-attn cannot build in HPC environments without CUDA #2

Description

Environment

Problem Description

Root Cause

Workaround

Another Approach

Make flash-attn optional for GPU environments

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions