Skip to content

Enable split-kv for blocksparse tensors#2536

Merged
drisspg merged 1 commit into
mainfrom
drisspg/stack/38
May 19, 2026
Merged

Enable split-kv for blocksparse tensors#2536
drisspg merged 1 commit into
mainfrom
drisspg/stack/38

Conversation

@drisspg
Copy link
Copy Markdown
Collaborator

@drisspg drisspg commented May 4, 2026

Stacked PRs:


Enable split-kv for blocksparse tensors

Perf:

Final results:
|   Sq |     Sk |   splits |   median us | speedup   |   mask blocks/qtile |   full blocks/qtile |
|------|--------|----------|-------------|-----------|---------------------|---------------------|
|  128 |  32768 |        1 |       342   | 1.00x     |                 256 |                   0 |
|  128 |  32768 |        2 |       221   | 1.55x     |                 256 |                   0 |
|  128 |  32768 |        4 |       115.1 | 2.97x     |                 256 |                   0 |
|  128 |  32768 |        8 |        63.2 | 5.41x     |                 256 |                   0 |
|  128 |  32768 |       16 |        37.6 | 9.10x     |                 256 |                   0 |
|  128 |  65536 |        1 |       678.1 | 1.00x     |                 512 |                   0 |
|  128 |  65536 |        2 |       433.8 | 1.56x     |                 512 |                   0 |
|  128 |  65536 |        4 |       221.8 | 3.06x     |                 512 |                   0 |
|  128 |  65536 |        8 |       117.3 | 5.78x     |                 512 |                   0 |
|  128 |  65536 |       16 |        64.8 | 10.46x    |                 512 |                   0 |
|  128 | 131072 |        1 |      1351.5 | 1.00x     |                1024 |                   0 |
|  128 | 131072 |        2 |       859.9 | 1.57x     |                1024 |                   0 |
|  128 | 131072 |        4 |       435.2 | 3.11x     |                1024 |                   0 |
|  128 | 131072 |        8 |       225   | 6.01x     |                1024 |                   0 |
|  128 | 131072 |       16 |       118.6 | 11.39x    |                1024 |                   0 |
|  256 |  32768 |        1 |       347.5 | 1.00x     |                   2 |                 254 |
|  256 |  32768 |        2 |       177.7 | 1.96x     |                   2 |                 254 |
|  256 |  32768 |        4 |        95.7 | 3.63x     |                   2 |                 254 |
|  256 |  32768 |        8 |        54.9 | 6.33x     |                   2 |                 254 |
|  256 |  32768 |       16 |        35.2 | 9.87x     |                   2 |                 254 |
|  256 |  65536 |        1 |       686   | 1.00x     |                   2 |                 510 |
|  256 |  65536 |        2 |       345.6 | 1.99x     |                   2 |                 510 |
|  256 |  65536 |        4 |       179.9 | 3.81x     |                   2 |                 510 |
|  256 |  65536 |        8 |        96.9 | 7.08x     |                   2 |                 510 |
|  256 |  65536 |       16 |        57.1 | 12.01x    |                   2 |                 510 |
|  256 | 131072 |        1 |      1363.3 | 1.00x     |                   2 |                1022 |
|  256 | 131072 |        2 |       680.8 | 2.00x     |                   2 |                1022 |
|  256 | 131072 |        4 |       347.6 | 3.92x     |                   2 |                1022 |
|  256 | 131072 |        8 |       181.5 | 7.51x     |                   2 |                1022 |
|  256 | 131072 |       16 |        99.1 | 13.76x    |                   2 |                1022 |
|  512 |  32768 |        1 |       347.5 | 1.00x     |                   2 |                 253 |
|  512 |  32768 |        2 |       178.7 | 1.94x     |                   2 |                 253 |
|  512 |  32768 |        4 |        96.7 | 3.59x     |                   2 |                 253 |
|  512 |  32768 |        8 |        55.5 | 6.26x     |                   2 |                 253 |
|  512 |  32768 |       16 |        36.1 | 9.62x     |                   2 |                 253 |
|  512 |  65536 |        1 |       686.3 | 1.00x     |                   2 |                 509 |
|  512 |  65536 |        2 |       346.5 | 1.98x     |                   2 |                 509 |
|  512 |  65536 |        4 |       180.7 | 3.80x     |                   2 |                 509 |
|  512 |  65536 |        8 |        98.4 | 6.98x     |                   2 |                 509 |
|  512 |  65536 |       16 |        58.5 | 11.74x    |                   2 |                 509 |
|  512 | 131072 |        1 |      1363.6 | 1.00x     |                   2 |                1021 |
|  512 | 131072 |        2 |       683   | 2.00x     |                   2 |                1021 |
|  512 | 131072 |        4 |       350.8 | 3.89x     |                   2 |                1021 |
|  512 | 131072 |        8 |       182.9 | 7.45x     |                   2 |                1021 |
|  512 | 131072 |       16 |       101.5 | 13.44x    |                   2 |                1021 |

drisspg added a commit that referenced this pull request May 4, 2026
stack-info: PR: #2536, branch: drisspg/stack/38
@drisspg drisspg force-pushed the drisspg/stack/38 branch from d61a157 to 525657d Compare May 4, 2026 22:54
@drisspg drisspg marked this pull request as draft May 4, 2026 23:26
drisspg added a commit that referenced this pull request May 4, 2026
stack-info: PR: #2536, branch: drisspg/stack/38
@drisspg drisspg force-pushed the drisspg/stack/38 branch from 525657d to d507197 Compare May 4, 2026 23:26
@drisspg drisspg marked this pull request as ready for review May 4, 2026 23:26
@drisspg drisspg marked this pull request as draft May 5, 2026 00:00
@drisspg drisspg marked this pull request as ready for review May 5, 2026 00:00
@drisspg drisspg mentioned this pull request May 5, 2026
@drisspg drisspg requested review from Johnsonms, tridao and v0i0 May 5, 2026 00:01
stack-info: PR: #2536, branch: drisspg/stack/38
@drisspg drisspg marked this pull request as draft May 7, 2026 18:17
@drisspg drisspg force-pushed the drisspg/stack/38 branch from d507197 to 5540d0d Compare May 7, 2026 18:17
@drisspg drisspg marked this pull request as ready for review May 7, 2026 18:17
@drisspg drisspg merged commit 4178915 into main May 19, 2026
ussoewwin added a commit to ussoewwin/flash-attention that referenced this pull request May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants