Skip to content

[cuda backend] reduce memory consumption on gemma4_31b by running embedding in int8 #1723

[cuda backend] reduce memory consumption on gemma4_31b by running embedding in int8

[cuda backend] reduce memory consumption on gemma4_31b by running embedding in int8 #1723

Job Run time
30s
40m 17s
34m 6s
9m 47s
9m 51s
36m 46s
13m 49s
41m 20s
11m 29s
12m 20s
10m 59s
11m 5s
10m 58s
10m 22s
10m 50s
18m 0s
10m 45s
10m 29s
11m 18s
18m 0s
10m 35s
11m 1s
10m 19s
11m 42s
6h 16m 38s