[cuda backend] reduce memory consumption on gemma4_31b by running embedding in int8 #1670
| Job | Run time |
|---|---|
| 1m 37s | |
| 15m 0s | |
| 42m 15s | |
| 10m 3s | |
| 35m 30s | |
| 38m 54s | |
| 48m 1s | |
| 48m 1s | |
| 10m 45s | |
| 13m 48s | |
| 10m 50s | |
| 11m 12s | |
| 11m 25s | |
| 17m 6s | |
| 15m 35s | |
| 11m 31s | |
| 11m 35s | |
| 11m 55s | |
| 12m 11s | |
| 11m 52s | |
| 11m 23s | |
| 10m 55s | |
| 11m 46s | |
| 11m 39s | |
| 7h 14m 49s |