MLX

[cuda backend] reduce memory consumption on gemma4_31b by running embedding in int8 #1723

Run time

Learn about OS pricing on GitHub Actions

Job	Run time
CI run decision / decide	30s
test-mlx-voxtral-realtime / test-mlx-voxtral-realtime	40m 17s
test-mlx-parakeet / test-mlx-parakeet	34m 6s
test-mlx-whisper / test-mlx-whisper	9m 47s
test-mlx-qwen35-moe / test-mlx-qwen35-moe	9m 51s
test-mlx-stories110m / test-mlx-stories110m	36m 46s
test-mlx / test-mlx	13m 49s
test-mlx-voxtral / test-mlx-voxtral	41m 20s
backend-tester (operators) / test-mlx-backend-operators	11m 29s
backend-tester (models) / test-mlx-backend-models	12m 20s
test-mlx-llm (unsloth/Llama-3.2-1B-Instruct, llama-1b, false, 4w, macos-14-xlarge) / test-mlx-llm-llama-1b-4w	10m 59s
test-mlx-llm (unsloth/Llama-3.2-1B-Instruct, llama-1b, true, 4w, macos-14-xlarge) / test-mlx-llm-llama-1b-custom-4w	11m 5s
test-mlx-llm (unsloth/gemma-3-1b-it, gemma3-1b, true, nvfp4, macos-14-xlarge) / test-mlx-llm-gemma3-1b-custom-nvfp4	10m 58s
test-mlx-llm (unsloth/Qwen3-0.6B, qwen3-0.6b, true, 4w, macos-14-xlarge) / test-mlx-llm-qwen3-0.6b-custom-4w	10m 22s
test-mlx-llm (unsloth/gemma-3-1b-it, gemma3-1b, true, 4w, macos-14-xlarge) / test-mlx-llm-gemma3-1b-custom-4w	10m 50s
test-mlx-llm (google/gemma-4-E2B-it, gemma4-e2b, true, 4w, macos-15-xlarge) / test-mlx-llm-gemma4-e2b-custom-4w	18m 0s
test-mlx-llm (unsloth/Qwen3-0.6B, qwen3-0.6b, true, nvfp4, macos-14-xlarge) / test-mlx-llm-qwen3-0.6b-custom-nvfp4	10m 45s
test-mlx-llm (unsloth/Llama-3.2-1B-Instruct, llama-1b, false, nvfp4, macos-14-xlarge) / test-mlx-llm-llama-1b-nvfp4	10m 29s
test-mlx-llm (unsloth/Llama-3.2-1B-Instruct, llama-1b, true, nvfp4, macos-14-xlarge) / test-mlx-llm-llama-1b-custom-nvfp4	11m 18s
test-mlx-llm (google/gemma-4-E2B-it, gemma4-e2b, false, 4w, macos-15-xlarge) / test-mlx-llm-gemma4-e2b-4w	18m 0s
test-mlx-llm (unsloth/Qwen3-0.6B, qwen3-0.6b, false, 4w, macos-14-xlarge) / test-mlx-llm-qwen3-0.6b-4w	10m 35s
test-mlx-llm (unsloth/gemma-3-1b-it, gemma3-1b, false, nvfp4, macos-14-xlarge) / test-mlx-llm-gemma3-1b-nvfp4	11m 1s
test-mlx-llm (unsloth/Qwen3-0.6B, qwen3-0.6b, false, nvfp4, macos-14-xlarge) / test-mlx-llm-qwen3-0.6b-nvfp4	10m 19s
test-mlx-llm (unsloth/gemma-3-1b-it, gemma3-1b, false, 4w, macos-14-xlarge) / test-mlx-llm-gemma3-1b-4w	11m 42s
	6h 16m 38s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cuda backend] reduce memory consumption on gemma4_31b by running embedding in int8 #1723

Usage

[cuda backend] reduce memory consumption on gemma4_31b by running embedding in int8 #1723

Uh oh!

Run time