MLX

[cuda backend] reduce memory consumption on gemma4_31b by running embedding in int8 #1670

Run time

Learn about OS pricing on GitHub Actions

Job	Run time
CI run decision / decide	1m 37s
test-mlx / test-mlx	15m 0s
test-mlx-voxtral-realtime / test-mlx-voxtral-realtime	42m 15s
test-mlx-qwen35-moe / test-mlx-qwen35-moe	10m 3s
test-mlx-parakeet / test-mlx-parakeet	35m 30s
test-mlx-voxtral / test-mlx-voxtral	38m 54s
test-mlx-stories110m / test-mlx-stories110m	48m 1s
test-mlx-whisper / test-mlx-whisper	48m 1s
backend-tester (operators) / test-mlx-backend-operators	10m 45s
backend-tester (models) / test-mlx-backend-models	13m 48s
test-mlx-llm (unsloth/Qwen3-0.6B, qwen3-0.6b, true, nvfp4, macos-14-xlarge) / test-mlx-llm-qwen3-0.6b-custom-nvfp4	10m 50s
test-mlx-llm (unsloth/Llama-3.2-1B-Instruct, llama-1b, true, nvfp4, macos-14-xlarge) / test-mlx-llm-llama-1b-custom-nvfp4	11m 12s
test-mlx-llm (unsloth/Qwen3-0.6B, qwen3-0.6b, false, nvfp4, macos-14-xlarge) / test-mlx-llm-qwen3-0.6b-nvfp4	11m 25s
test-mlx-llm (google/gemma-4-E2B-it, gemma4-e2b, false, 4w, macos-15-xlarge) / test-mlx-llm-gemma4-e2b-4w	17m 6s
test-mlx-llm (google/gemma-4-E2B-it, gemma4-e2b, true, 4w, macos-15-xlarge) / test-mlx-llm-gemma4-e2b-custom-4w	15m 35s
test-mlx-llm (unsloth/gemma-3-1b-it, gemma3-1b, false, nvfp4, macos-14-xlarge) / test-mlx-llm-gemma3-1b-nvfp4	11m 31s
test-mlx-llm (unsloth/gemma-3-1b-it, gemma3-1b, true, nvfp4, macos-14-xlarge) / test-mlx-llm-gemma3-1b-custom-nvfp4	11m 35s
test-mlx-llm (unsloth/Qwen3-0.6B, qwen3-0.6b, true, 4w, macos-14-xlarge) / test-mlx-llm-qwen3-0.6b-custom-4w	11m 55s
test-mlx-llm (unsloth/gemma-3-1b-it, gemma3-1b, true, 4w, macos-14-xlarge) / test-mlx-llm-gemma3-1b-custom-4w	12m 11s
test-mlx-llm (unsloth/Llama-3.2-1B-Instruct, llama-1b, false, 4w, macos-14-xlarge) / test-mlx-llm-llama-1b-4w	11m 52s
test-mlx-llm (unsloth/Llama-3.2-1B-Instruct, llama-1b, false, nvfp4, macos-14-xlarge) / test-mlx-llm-llama-1b-nvfp4	11m 23s
test-mlx-llm (unsloth/Llama-3.2-1B-Instruct, llama-1b, true, 4w, macos-14-xlarge) / test-mlx-llm-llama-1b-custom-4w	10m 55s
test-mlx-llm (unsloth/gemma-3-1b-it, gemma3-1b, false, 4w, macos-14-xlarge) / test-mlx-llm-gemma3-1b-4w	11m 46s
test-mlx-llm (unsloth/Qwen3-0.6B, qwen3-0.6b, false, 4w, macos-14-xlarge) / test-mlx-llm-qwen3-0.6b-4w	11m 39s
	7h 14m 49s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cuda backend] reduce memory consumption on gemma4_31b by running embedding in int8 #1670

Usage

[cuda backend] reduce memory consumption on gemma4_31b by running embedding in int8 #1670

Uh oh!

Run time