Skip to content

[cuda backend] reduce memory consumption on gemma4_31b by running embedding in int8 #1670

[cuda backend] reduce memory consumption on gemma4_31b by running embedding in int8

[cuda backend] reduce memory consumption on gemma4_31b by running embedding in int8 #1670

Job Run time
1m 37s
15m 0s
42m 15s
10m 3s
35m 30s
38m 54s
48m 1s
48m 1s
10m 45s
13m 48s
10m 50s
11m 12s
11m 25s
17m 6s
15m 35s
11m 31s
11m 35s
11m 55s
12m 11s
11m 52s
11m 23s
10m 55s
11m 46s
11m 39s
7h 14m 49s