[cuda backend] skip fully-masked KV blocks calculation in SDPA #14048
| Job | Run time |
|---|---|
| 2s | |
| 31s | |
| 54m 13s | |
| 48m 19s | |
| 20m 7s | |
| 18m 6s | |
| 18m 4s | |
| 24m 37s | |
| 23m 57s | |
| 26m 44s | |
| 20m 45s | |
| 18m 53s | |
| 29m 6s | |
| 26m 20s | |
| 25m 11s | |
| 50m 33s | |
| 43m 4s | |
| 13m 45s | |
| 31m 30s | |
| 17m 23s | |
| 27m 14s | |
| 17m 2s | |
| 29m 3s | |
| 3s | |
| 22m 38s | |
| 20m 57s | |
| 20m 31s | |
| 24m 41s | |
| 26m 33s | |
| 10m 9s | |
| 10m 37s | |
| 9m 36s | |
| 11m 0s | |
| 25m 6s | |
| 15m 8s | |
| 15m 34s | |
| 10m 7s | |
| 25m 11s | |
| 8m 11s | |
| 12m 3s | |
| 8m 52s | |
| 10m 9s | |
| 14h 31m 35s |