[cuda backend] optimized L_kv threshold for sdpa implementation selection. #35822
| Job | Run time |
|---|---|
| 20m 0s | |
| 7m 32s | |
| 8m 40s | |
| 7m 29s | |
| 8m 10s | |
| 8m 38s | |
| 10m 42s | |
| 16m 39s | |
| 8m 33s | |
| 6m 8s | |
| 6m 40s | |
| 8m 21s | |
| 7m 36s | |
| 7m 50s | |
| 8m 22s | |
| 2h 21m 20s |
| Job | Run time |
|---|---|
| 20m 0s | |
| 7m 32s | |
| 8m 40s | |
| 7m 29s | |
| 8m 10s | |
| 8m 38s | |
| 10m 42s | |
| 16m 39s | |
| 8m 33s | |
| 6m 8s | |
| 6m 40s | |
| 8m 21s | |
| 7m 36s | |
| 7m 50s | |
| 8m 22s | |
| 2h 21m 20s |