Modal deployment challenges with PyTorch 2.4+ and flash-attention compatibility

## Problem Description

Attempting to deploy H-Net on Modal Labs encounters C++ ABI compatibility issues between PyTorch 2.4+ (required for `set_wrap_triton_enabled`) and flash-attention builds.

## Root Cause

H-Net requires PyTorch 2.4+ for the `set_wrap_triton_enabled` function, but pre-built flash-attention wheels are incompatible with newer PyTorch versions due to C++ ABI mismatches.

## Error Details

```
/usr/local/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationESs
```

This indicates a mismatch between C++ standard library versions used to compile PyTorch vs flash-attention.

## Attempted Solutions

1. **Building flash-attention from source**: Fails due to compiler environment issues (clang++ vs g++ mismatches)
2. **CUDA version alignment**: Tried matching CUDA 12.1 with PyTorch 2.4.1+cu121 but pip installs different versions
3. **C++ ABI flags**: Setting `_GLIBCXX_USE_CXX11_ABI=0` during build still results in symbol errors
4. **Various Python versions**: Tested 3.10 and 3.11 with similar results

## Working Configuration (Limited)

- PyTorch 2.3.1 + pip install flash-attn works
- But lacks `set_wrap_triton_enabled` required by H-Net

## Environment

- Modal Labs container deployment
- CUDA 12.1.1-devel base image
- Python 3.10/3.11

## Request

Could the H-Net team provide guidance on:
1. Recommended PyTorch/flash-attention version combinations
2. Alternative approaches for Triton operations if flash-attention compatibility is problematic
3. Docker/container-specific build instructions for Modal/similar platforms

This affects cloud deployment scenarios where pre-built wheels are preferred over source compilation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modal deployment challenges with PyTorch 2.4+ and flash-attention compatibility #10

Problem Description

Root Cause

Error Details

Attempted Solutions

Working Configuration (Limited)

Environment

Request

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Modal deployment challenges with PyTorch 2.4+ and flash-attention compatibility #10

Description

Problem Description

Root Cause

Error Details

Attempted Solutions

Working Configuration (Limited)

Environment

Request

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions