Bug: MiniMax-M2.7 JANG MoE model fails to load
Model: MiniMax-M2.7-JANG_2L (JANG format v2.0, MoE, 256 local experts)
oMLX Version: latest (via App / Homebrew)
vMLX: loads the same model successfully
Error
Expected shape (256, 1536, 768) but received shape (256, 1536, 192)
for parameter model.layers.0.block_sparse_moe.switch_mlp.gate_proj.weight
Root Cause Analysis
The config.json declares switch_mlp layer quantization as:
"model.layers.0.block_sparse_moe.switch_mlp.gate_proj": {"bits": 8, "group_size": 32, "mode": "affine"}
But the actual stored tensor in safetensors is:
weight: shape=(256, 1536, 192), dtype=uint32
scales: shape=(256, 1536, 24), dtype=float16
biases: shape=(256, 1536, 24), dtype=float16
The config claims 8-bit quantization with group_size=32, but the actual storage is 4-bit:
| Parameter |
Claimed (config) |
Actual (stored) |
| bits |
8 |
4 |
| group_size |
32 |
32 |
| expected shape |
(256, 1536, 768) |
(256, 1536, 192) ✅ |
The dequantize logic in oMLX reads bits: 8 from config and computes a shape that doesn't match the actual tensor.
Meanwhile the actual tensor was stored with 4-bit quantization (4 values per uint32), giving:
- 3072 elements / 32 groups = 96 groups
- Each group: 32 × 4 bits / 8 = 16 bytes → but packed as uint32: 96 × 2 = 192 uint32
The config.json quantization metadata is wrong for switch_mlp layers — it reports bits: 8 but the safetensors are actually 4-bit packed. oMLX should either:
- Auto-detect the actual bits from the tensor shape vs. config shape, or
- Allow the JANG format to store the correct per-tensor bits in a separate field
Reproduction Steps
- Download MiniMax-M2.7-JANG_2L model
- Load in oMLX → fails with shape mismatch error
- Load in vMLX → works fine
Model Config Snippet
{
"model_type": "minimax_m2",
"hidden_size": 3072,
"intermediate_size": 1536,
"num_hidden_layers": 62,
"num_local_experts": 256,
"num_experts_per_tok": 8,
"head_dim": 128,
"quantization": {
"bits": 8,
"group_size": 128,
"mode": "affine",
"model.layers.0.block_sparse_moe.switch_mlp.gate_proj": {"bits": 8, "group_size": 32, "mode": "affine"},
"model.layers.0.block_sparse_moe.switch_mlp.up_proj": {"bits": 8, "group_size": 32, "mode": "affine"},
"model.layers.0.block_sparse_moe.switch_mlp.down_proj": {"bits": 8, "group_size": 32, "mode": "affine"},
...
}
}
Suggested Fix
In the MoE expert dequantization path, before applying the config's bits value, verify the actual tensor shape against what the config claims. If there's a mismatch, auto-detect the correct bit width from the tensor dimensions.
Alternatively, store the correct per-tensor bits value in the JANG format metadata so oMLX reads the right quantization parameters.
Bug: MiniMax-M2.7 JANG MoE model fails to load
Model: MiniMax-M2.7-JANG_2L (JANG format v2.0, MoE, 256 local experts)
oMLX Version: latest (via App / Homebrew)
vMLX: loads the same model successfully
Error
Root Cause Analysis
The
config.jsondeclaresswitch_mlplayer quantization as:But the actual stored tensor in safetensors is:
The config claims 8-bit quantization with group_size=32, but the actual storage is 4-bit:
The dequantize logic in oMLX reads
bits: 8from config and computes a shape that doesn't match the actual tensor.Meanwhile the actual tensor was stored with 4-bit quantization (4 values per uint32), giving:
The config.json
quantizationmetadata is wrong forswitch_mlplayers — it reportsbits: 8but the safetensors are actually 4-bit packed. oMLX should either:Reproduction Steps
Model Config Snippet
{ "model_type": "minimax_m2", "hidden_size": 3072, "intermediate_size": 1536, "num_hidden_layers": 62, "num_local_experts": 256, "num_experts_per_tok": 8, "head_dim": 128, "quantization": { "bits": 8, "group_size": 128, "mode": "affine", "model.layers.0.block_sparse_moe.switch_mlp.gate_proj": {"bits": 8, "group_size": 32, "mode": "affine"}, "model.layers.0.block_sparse_moe.switch_mlp.up_proj": {"bits": 8, "group_size": 32, "mode": "affine"}, "model.layers.0.block_sparse_moe.switch_mlp.down_proj": {"bits": 8, "group_size": 32, "mode": "affine"}, ... } }Suggested Fix
In the MoE expert dequantization path, before applying the config's
bitsvalue, verify the actual tensor shape against what the config claims. If there's a mismatch, auto-detect the correct bit width from the tensor dimensions.Alternatively, store the correct per-tensor
bitsvalue in the JANG format metadata so oMLX reads the right quantization parameters.