FlashAttention 3

FlashAttention 3 exists and surprisingly not many people know it. It's mainly advertised to be fast on sm90 (H100), and in my benchmarks it's indeed a bit faster than FA2 on sm86 (RTX 3080), so it helps on consumer GPUs.

Some FA3 wheels are available at https://download.pytorch.org/whl/flash-attn-3/ . See the discussion in https://github.com/Dao-AILab/flash-attention/pull/2223 . The only thing missing is that they could not build wheel with Windows + cu130, and it has been fixed with the patch in https://github.com/windreamer/flash-attention3-wheels . This is worth tracking for us.

After we have the wheels, we can replace FA2 with FA3 on supported GPUs in ComfyUI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FlashAttention 3 #10

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

FlashAttention 3 #10

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions