Skip to content

fix: qwen3.6 moe mtp fused expert layout#145

Merged
ISEEKYAN merged 1 commit into
ISEEKYAN:mainfrom
LiuXTao:itao/fix_qwen3.6_mtp_fused_expert
May 20, 2026
Merged

fix: qwen3.6 moe mtp fused expert layout#145
ISEEKYAN merged 1 commit into
ISEEKYAN:mainfrom
LiuXTao:itao/fix_qwen3.6_mtp_fused_expert

Conversation

@LiuXTao
Copy link
Copy Markdown
Contributor

@LiuXTao LiuXTao commented May 20, 2026

Problem

When loading Qwen3.6 MoE checkpoints with MTP (Multi-Token Prediction) layers, the bridge raises a KeyError because Qwen3.6 stores MTP expert weights in a fused/stacked layout (mtp.layers.0.mlp.experts.gate_up_proj with shape [num_experts, ...]), while the existing code only handles the Qwen3.5 per-expert layout (mtp.layers.0.mlp.experts.{i}.gate_proj.weight).

qwen3.6-35B-A3B
image

qwen3.5-35B-A3B
image

Solution

Auto-detect the MTP expert layout at checkpoint load time by checking for the fused key mtp.layers.0.mlp.experts.gate_up_proj in the safetensors index. When the fused layout is detected, map MTP expert parameters to the single stacked HF key and reuse the existing decoder MoE slice/stack logic for both import (slice by expert index from the fused tensor) and export (collect per-expert weights then torch.stack). The Qwen3.5 per-expert MTP path remains unchanged, ensuring backward compatibility.

@ISEEKYAN ISEEKYAN merged commit 08975e6 into ISEEKYAN:main May 20, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants