[cuda backend] reduce memory consumption on gemma4_31b by running embedding in int8 #1723
| Job | Run time |
|---|---|
| 30s | |
| 40m 17s | |
| 34m 6s | |
| 9m 47s | |
| 9m 51s | |
| 36m 46s | |
| 13m 49s | |
| 41m 20s | |
| 11m 29s | |
| 12m 20s | |
| 10m 59s | |
| 11m 5s | |
| 10m 58s | |
| 10m 22s | |
| 10m 50s | |
| 18m 0s | |
| 10m 45s | |
| 10m 29s | |
| 11m 18s | |
| 18m 0s | |
| 10m 35s | |
| 11m 1s | |
| 10m 19s | |
| 11m 42s | |
| 6h 16m 38s |