feat: Add Qwen3.6 MoE (qwen3_5_moe) GGUF export support by DeadByDawn101 · Pull Request #1397 · ml-explore/mlx-lm

DeadByDawn101 · 2026-06-12T06:06:29Z

Summary

Adds GGUF export support for Qwen3.6 35B-A3B and other Qwen3.5/3.6 Mixture-of-Experts models via mlx_lm.fuse --export-gguf.

Currently, --export-gguf only supports llama, mixtral, and mistral model types. This PR adds qwen3_5_moe support.

Problem

python -m mlx_lm fuse --model qwen3.6-model --export-gguf
ValueError: Model type qwen3_5_moe not supported for GGUF conversion.

Qwen3.6 MoE uses a unique hybrid architecture (Gated DeltaNet + softmax attention + MoE with shared experts) with tensor naming conventions that differ from Mixtral-style MoE:

switch_mlp instead of block_sparse_moe.experts.{n}
Merged 3D expert tensors instead of per-expert 2D tensors
Separate gate_proj and up_proj that need pre-fusion for GGUF
language_model. prefix from the ConditionalGeneration wrapper
Linear attention (Mamba SSM) tensors alongside standard attention

Changes

`fuse.py`

Add qwen3_5_moe to the supported model types whitelist

`gguf.py`

Tensor name mappings: Added mappings for all Qwen3.6 MoE tensor patterns (switch_mlp, shared experts, MoE router, linear attention/SSM)
language_model. prefix stripping: Handles the ConditionalGeneration wrapper prefix
gate_proj + up_proj fusion: Pre-processing step that concatenates separate gate and up projections into gate_up_proj before name translation (GGUF expects these fused)

Testing

Tested with fine-tuned Qwen3.6-35B-A3B models:

mlx_lm.fuse --export-gguf produces valid F16 GGUF
Quantized to Q4_K_M via llama-quantize
Verified loading in llama.cpp and Ollama

Applies To

Qwen3.6-35B-A3B (all variants and fine-tunes)
Any model using Qwen3_5MoeForConditionalGeneration architecture
Any model with model_type: qwen3_5_moe

Discovered and implemented by Gabriel Garcia / RavenX LLC

Adds GGUF export support for Qwen3.6 35B-A3B and other Qwen3.5/3.6 Mixture-of-Experts models. Changes: - fuse.py: Add 'qwen3_5_moe' to supported model types for --export-gguf - gguf.py: Add tensor name mappings for Qwen3.6 MoE architecture: - Strip 'language_model.' prefix (ConditionalGeneration wrapper) - Map switch_mlp.{gate_up,down}_proj → ffn_{gate_up,down}_exps - Map shared_expert.{gate,down,up}_proj → ffn_{gate,down,up}_shexp - Map shared_expert_gate → ffn_gate_inp_shexp - Map mlp.gate → ffn_gate_inp (MoE router) - Map linear_attn (Mamba-style SSM) tensor names - gguf.py: Pre-process gate_proj + up_proj fusion into gate_up_proj before name translation (Qwen3.6 stores these separately but GGUF expects them concatenated along the intermediate_size dimension) Background: Qwen3.6 MoE uses a hybrid architecture (Gated DeltaNet + softmax attention + MoE with shared experts) that has different tensor naming conventions than Mixtral-style MoE models. The key differences are: 1. 'switch_mlp' instead of 'block_sparse_moe.experts.{n}' 2. Merged 3D expert tensors instead of per-expert 2D tensors 3. Separate gate_proj and up_proj that need pre-fusion 4. 'language_model.' prefix from the ConditionalGeneration wrapper 5. Linear attention (Mamba SSM) tensors alongside standard attention Tested with: Qwen3.6-35B-A3B fine-tuned models fused via mlx_lm.fuse Co-authored-by: Claude (Anthropic)

Tests added: - translate_weight_names strips language_model. prefix - translate_weight_names maps switch_mlp → ffn_*_exps - translate_weight_names maps shared_expert → ffn_*_shexp - translate_weight_names maps MoE router (mlp.gate) - translate_weight_names maps linear_attn (SSM) tensors - gate_proj + up_proj fusion produces correct gate_up_proj shape

DeadByDawn101 added 2 commits June 12, 2026 06:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Qwen3.6 MoE (qwen3_5_moe) GGUF export support#1397

feat: Add Qwen3.6 MoE (qwen3_5_moe) GGUF export support#1397
DeadByDawn101 wants to merge 2 commits into
ml-explore:mainfrom
DeadByDawn101:feat/qwen3_5_moe-gguf-export

DeadByDawn101 commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DeadByDawn101 commented Jun 12, 2026

Summary

Problem

Changes

fuse.py

gguf.py

Testing

Applies To

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`fuse.py`

`gguf.py`