Skip to content

Refactor attention functions and add autotuning support in Triton#288

Merged
LoserCheems merged 5 commits into
mainfrom
quant
May 12, 2026
Merged

Refactor attention functions and add autotuning support in Triton#288
LoserCheems merged 5 commits into
mainfrom
quant

Conversation

@LoserCheems
Copy link
Copy Markdown
Collaborator

No description provided.

- Introduced `is_autotune` parameter across various functions in `flash_sparse_attn` to enable kernel autotuning.
- Updated backward, forward, and decode functions to conditionally use autotuned kernels or default kernels based on the `is_autotune` flag.
- Adjusted launch configuration parameters for autotuning, setting default tile sizes and warp configurations when autotuning is enabled.
- Enhanced function signatures and documentation to reflect the new `is_autotune` and `skip_checks` parameters for improved performance tuning.
@LoserCheems LoserCheems merged commit 52a6f4e into main May 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant