fix: qwen3.6 moe mtp fused expert layout by LiuXTao · Pull Request #145 · ISEEKYAN/mbridge

LiuXTao · 2026-05-20T08:33:53Z

Problem

When loading Qwen3.6 MoE checkpoints with MTP (Multi-Token Prediction) layers, the bridge raises a KeyError because Qwen3.6 stores MTP expert weights in a fused/stacked layout (mtp.layers.0.mlp.experts.gate_up_proj with shape [num_experts, ...]), while the existing code only handles the Qwen3.5 per-expert layout (mtp.layers.0.mlp.experts.{i}.gate_proj.weight).

qwen3.6-35B-A3B

qwen3.5-35B-A3B

Solution

Auto-detect the MTP expert layout at checkpoint load time by checking for the fused key mtp.layers.0.mlp.experts.gate_up_proj in the safetensors index. When the fused layout is detected, map MTP expert parameters to the single stacked HF key and reuse the existing decoder MoE slice/stack logic for both import (slice by expert index from the fused tensor) and export (collect per-expert weights then torch.stack). The Qwen3.5 per-expert MTP path remains unchanged, ensuring backward compatibility.

fix: qwen3.6 moe mtp fused expert layout

1070152

ISEEKYAN approved these changes May 20, 2026

View reviewed changes

ISEEKYAN merged commit 08975e6 into ISEEKYAN:main May 20, 2026
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: qwen3.6 moe mtp fused expert layout#145

fix: qwen3.6 moe mtp fused expert layout#145
ISEEKYAN merged 1 commit into
ISEEKYAN:mainfrom
LiuXTao:itao/fix_qwen3.6_mtp_fused_expert

LiuXTao commented May 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LiuXTao commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LiuXTao commented May 20, 2026 •

edited

Loading