[cuda backend] skip fully-masked KV blocks calculation in SDPA #14046
| Job | Run time |
|---|---|
| 30s | |
| 5s | |
| 53m 21s | |
| 47m 34s | |
| 18m 40s | |
| 17m 10s | |
| 24m 33s | |
| 25m 34s | |
| 52m 26s | |
| 20m 39s | |
| 16m 1s | |
| 17m 55s | |
| 17m 20s | |
| 21m 18s | |
| 31m 27s | |
| 23m 35s | |
| 18m 0s | |
| 26m 5s | |
| 25m 16s | |
| 44m 32s | |
| 24m 20s | |
| 28m 27s | |
| 29m 20s | |
| 3s | |
| 20m 25s | |
| 20m 34s | |
| 22m 34s | |
| 8m 13s | |
| 8m 59s | |
| 26m 42s | |
| 15m 35s | |
| 16m 40s | |
| 24m 50s | |
| 10m 54s | |
| 9m 2s | |
| 10m 18s | |
| 24m 3s | |
| 8m 52s | |
| 10m 59s | |
| 24m 40s | |
| 9m 12s | |
| 11m 15s | |
| 14h 27m 58s |