Environment
- OS: Linux x86_64 (HPC login node)
- Python: 3.10
- PyTorch: 2.7.0+cu126
- uv version: 0.7.8
- HPC Environment: Login nodes without direct GPU/CUDA access (jobs dispatched to GPU nodes via Slurm)
Problem Description
When running uv sync or uv sync --no-build-isolation, the installation fails because flash-attn attempts to build from source but cannot find required CUDA dependencies on HPC login nodes.
$ uv sync --no-build-isolation
Using CPython 3.10.12 interpreter at: /usr/bin/python3.10
Creating virtual environment at: .venv
Resolved 106 packages in 3ms
× Failed to build `flash-attn==2.7.4.post1`
├─▶ The build backend returned an error
╰─▶ Call to `setuptools.build_meta:__legacy__.build_wheel` failed (exit status: 1)
[stderr]
Traceback (most recent call last):
File "<string>", line 8, in <module>
ModuleNotFoundError: No module named 'setuptools'
hint: This usually indicates a problem with the package or the build environment.
help: `flash-attn` (v2.7.4.post1) was included because `activault` (v0.1.0) depends on `flash-attn`
The output hinted that I should try uv pip install torch first before running uv sync. That worked for installing torch, but failed when trying to build/install flash-attn:
Resolved 106 packages in 3ms
× Failed to build `flash-attn==2.7.4.post1`
├─▶ The build backend returned an error
╰─▶ Call to `setuptools.build_meta:__legacy__.build_wheel` failed (exit status: 1)
[stdout]
torch.__version__ = 2.7.0+cu126
[stderr]
/home/ssica/activault/.venv/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:276: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
cpu = _conversion_method_template(device=torch.device("cpu"))
<string>:106: UserWarning: flash_attn was requested, but nvcc was not found. Are you sure your environment has nvcc available? If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
Traceback (most recent call last):
File "<string>", line 11, in <module>
File "/home/ssica/activault/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 432, in build_wheel
return _build(['bdist_wheel'])
File "/home/ssica/activault/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 423, in _build
return self._build_with_temp_dir(
File "/home/ssica/activault/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 404, in _build_with_temp_dir
self.run_setup()
File "/home/ssica/activault/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 512, in run_setup
super().run_setup(setup_script=setup_script)
File "/home/ssica/activault/.venv/lib/python3.10/site-packages/setuptools/build_meta.py", line 317, in run_setup
exec(code, locals())
File "<string>", line 198, in <module>
File "/home/ssica/activault/.venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1279, in CUDAExtension
library_dirs += library_paths(device_type="cuda")
File "/home/ssica/activault/.venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1490, in library_paths
if (not os.path.exists(_join_cuda_home(lib_dir)) and
File "/home/ssica/activault/.venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2889, in _join_cuda_home
raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
hint: This usually indicates a problem with the package or the build environment.
help: `flash-attn` (v2.7.4.post1) was included because `activault` (v0.1.0) depends on `flash-attn`
Root Cause
HPC environments typically:
- Have login nodes without GPU access or CUDA toolkit
- Dispatch GPU jobs to separate compute nodes via job schedulers (Slurm, PBS, etc.)
- Don't have CUDA development tools available by default on login nodes
This prevents flash-attn from building during dependency resolution, even though the final code will run on GPU nodes.
Workaround
(This is what worked for me).
Install pre-compiled flash-attn wheel manually before running uv sync:
-
Identify which version of flash-attn to install for the environment:
# In your virtual environment
python -c "import sys, platform; print(f'Python: {sys.version_info.major}.{sys.version_info.minor}'); print(f'Platform: {platform.machine()}')"
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CXX11 ABI: {torch._C._GLIBCXX_USE_CXX11_ABI}')"
# EXAMPLE OUTPUT:
# PyTorch: 2.7.0+cu126
# CUDA available: False
# CUDA version: N/A
# CXX11 ABI: True
Prebuilt flash-attn packages for the latest release can be found here: https://github.com/Dao-AILab/flash-attention/releases/tag/v2.7.4.post1
-
Install compatible pre-built wheel:
# Example for Python 3.10, PyTorch 2.6/2.7, CUDA 12, CXX11 ABI True
uv pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiTRUE-cp310-cp310-linux_x86_64.whl
-
uv sync --no-build-isolation should work now.
Another Approach
Make flash-attn optional for GPU environments
Another option could be to make flash-attn optional for GPU environments—this would require user's run the command in their Slurm jobs before activault runs. Once the job gets dispatched to a compute node, the script should have CUDA access, which should make it possible to install flash-attn from source.
Notably, I didn't actually test this solution.
[project.optional-dependencies]
gpu = [
"flash-attn>=2.7.0.post2",
]
dev = [
"black",
"isort",
"mypy",
"pytest",
"pytest-cov",
]
Usage:
- Development (login nodes):
uv sync --extra dev
- GPU jobs:
uv sync --extra gpu (run within Slurm job on compute nodes with CUDA)
Impact
This affects users in:
- HPC clusters in which login nodes lack CUDA access (CUDA only available on dispatch)
- CI/CD pipelines without GPU access
Environment
Problem Description
When running
uv syncoruv sync --no-build-isolation, the installation fails becauseflash-attnattempts to build from source but cannot find required CUDA dependencies on HPC login nodes.The output hinted that I should try
uv pip install torchfirst before runninguv sync. That worked for installingtorch, but failed when trying to build/installflash-attn:Root Cause
HPC environments typically:
This prevents
flash-attnfrom building during dependency resolution, even though the final code will run on GPU nodes.Workaround
(This is what worked for me).
Install pre-compiled
flash-attnwheel manually before runninguv sync:Identify which version of
flash-attnto install for the environment:Prebuilt
flash-attnpackages for the latest release can be found here: https://github.com/Dao-AILab/flash-attention/releases/tag/v2.7.4.post1Install compatible pre-built wheel:
# Example for Python 3.10, PyTorch 2.6/2.7, CUDA 12, CXX11 ABI True uv pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiTRUE-cp310-cp310-linux_x86_64.whluv sync --no-build-isolationshould work now.Another Approach
Make flash-attn optional for GPU environments
Another option could be to make
flash-attnoptional for GPU environments—this would require user's run the command in their Slurm jobs beforeactivaultruns. Once the job gets dispatched to a compute node, the script should have CUDA access, which should make it possible to installflash-attnfrom source.Notably, I didn't actually test this solution.
Usage:
uv sync --extra devuv sync --extra gpu(run within Slurm job on compute nodes with CUDA)Impact
This affects users in: