[cuda backend] optimized L_kv threshold for sdpa implementation selection. #6933
| Job | Run time |
|---|---|
| 4s | |
| 32s | |
| 25m 40s | |
| 29m 59s | |
| 16m 4s | |
| 29m 45s | |
| 26m 6s | |
| 29m 21s | |
| 34m 5s | |
| 36m 1s | |
| 34m 38s | |
| 35m 21s | |
| 35m 58s | |
| 33m 39s | |
| 6h 7m 13s |
| Job | Run time |
|---|---|
| 4s | |
| 32s | |
| 25m 40s | |
| 29m 59s | |
| 16m 4s | |
| 29m 45s | |
| 26m 6s | |
| 29m 21s | |
| 34m 5s | |
| 36m 1s | |
| 34m 38s | |
| 35m 21s | |
| 35m 58s | |
| 33m 39s | |
| 6h 7m 13s |