Skip to content

feat: Add Qwen3.6 MoE (qwen3_5_moe) GGUF export support#1397

Open
DeadByDawn101 wants to merge 2 commits into
ml-explore:mainfrom
DeadByDawn101:feat/qwen3_5_moe-gguf-export
Open

feat: Add Qwen3.6 MoE (qwen3_5_moe) GGUF export support#1397
DeadByDawn101 wants to merge 2 commits into
ml-explore:mainfrom
DeadByDawn101:feat/qwen3_5_moe-gguf-export

Conversation

@DeadByDawn101

Copy link
Copy Markdown

Summary

Adds GGUF export support for Qwen3.6 35B-A3B and other Qwen3.5/3.6 Mixture-of-Experts models via mlx_lm.fuse --export-gguf.

Currently, --export-gguf only supports llama, mixtral, and mistral model types. This PR adds qwen3_5_moe support.

Problem

python -m mlx_lm fuse --model qwen3.6-model --export-gguf
ValueError: Model type qwen3_5_moe not supported for GGUF conversion.

Qwen3.6 MoE uses a unique hybrid architecture (Gated DeltaNet + softmax attention + MoE with shared experts) with tensor naming conventions that differ from Mixtral-style MoE:

  1. switch_mlp instead of block_sparse_moe.experts.{n}
  2. Merged 3D expert tensors instead of per-expert 2D tensors
  3. Separate gate_proj and up_proj that need pre-fusion for GGUF
  4. language_model. prefix from the ConditionalGeneration wrapper
  5. Linear attention (Mamba SSM) tensors alongside standard attention

Changes

fuse.py

  • Add qwen3_5_moe to the supported model types whitelist

gguf.py

  • Tensor name mappings: Added mappings for all Qwen3.6 MoE tensor patterns (switch_mlp, shared experts, MoE router, linear attention/SSM)
  • language_model. prefix stripping: Handles the ConditionalGeneration wrapper prefix
  • gate_proj + up_proj fusion: Pre-processing step that concatenates separate gate and up projections into gate_up_proj before name translation (GGUF expects these fused)

Testing

Tested with fine-tuned Qwen3.6-35B-A3B models:

  • mlx_lm.fuse --export-gguf produces valid F16 GGUF
  • Quantized to Q4_K_M via llama-quantize
  • Verified loading in llama.cpp and Ollama

Applies To

  • Qwen3.6-35B-A3B (all variants and fine-tunes)
  • Any model using Qwen3_5MoeForConditionalGeneration architecture
  • Any model with model_type: qwen3_5_moe

Discovered and implemented by Gabriel Garcia / RavenX LLC

Adds GGUF export support for Qwen3.6 35B-A3B and other Qwen3.5/3.6
Mixture-of-Experts models.

Changes:
- fuse.py: Add 'qwen3_5_moe' to supported model types for --export-gguf
- gguf.py: Add tensor name mappings for Qwen3.6 MoE architecture:
  - Strip 'language_model.' prefix (ConditionalGeneration wrapper)
  - Map switch_mlp.{gate_up,down}_proj → ffn_{gate_up,down}_exps
  - Map shared_expert.{gate,down,up}_proj → ffn_{gate,down,up}_shexp
  - Map shared_expert_gate → ffn_gate_inp_shexp
  - Map mlp.gate → ffn_gate_inp (MoE router)
  - Map linear_attn (Mamba-style SSM) tensor names
- gguf.py: Pre-process gate_proj + up_proj fusion into gate_up_proj
  before name translation (Qwen3.6 stores these separately but GGUF
  expects them concatenated along the intermediate_size dimension)

Background:
Qwen3.6 MoE uses a hybrid architecture (Gated DeltaNet + softmax
attention + MoE with shared experts) that has different tensor naming
conventions than Mixtral-style MoE models. The key differences are:
1. 'switch_mlp' instead of 'block_sparse_moe.experts.{n}'
2. Merged 3D expert tensors instead of per-expert 2D tensors
3. Separate gate_proj and up_proj that need pre-fusion
4. 'language_model.' prefix from the ConditionalGeneration wrapper
5. Linear attention (Mamba SSM) tensors alongside standard attention

Tested with: Qwen3.6-35B-A3B fine-tuned models fused via mlx_lm.fuse

Co-authored-by: Claude (Anthropic)
Tests added:
- translate_weight_names strips language_model. prefix
- translate_weight_names maps switch_mlp → ffn_*_exps
- translate_weight_names maps shared_expert → ffn_*_shexp
- translate_weight_names maps MoE router (mlp.gate)
- translate_weight_names maps linear_attn (SSM) tensors
- gate_proj + up_proj fusion produces correct gate_up_proj shape
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant