Skip to content

FlashAttention 3 #10

@woct0rdho

Description

@woct0rdho

FlashAttention 3 exists and surprisingly not many people know it. It's mainly advertised to be fast on sm90 (H100), and in my benchmarks it's indeed a bit faster than FA2 on sm86 (RTX 3080), so it helps on consumer GPUs.

Some FA3 wheels are available at https://download.pytorch.org/whl/flash-attn-3/ . See the discussion in Dao-AILab/flash-attention#2223 . The only thing missing is that they could not build wheel with Windows + cu130, and it has been fixed with the patch in https://github.com/windreamer/flash-attention3-wheels . This is worth tracking for us.

After we have the wheels, we can replace FA2 with FA3 on supported GPUs in ComfyUI.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions