Bug: MiniMax-M2.7 JANG MoE model fails to load

## Bug: MiniMax-M2.7 JANG MoE model fails to load

**Model**: MiniMax-M2.7-JANG_2L (JANG format v2.0, MoE, 256 local experts)
**oMLX Version**: latest (via App / Homebrew)
**vMLX**: loads the same model successfully

## Error

```
Expected shape (256, 1536, 768) but received shape (256, 1536, 192)
for parameter model.layers.0.block_sparse_moe.switch_mlp.gate_proj.weight
```

## Root Cause Analysis

The `config.json` declares `switch_mlp` layer quantization as:
```json
"model.layers.0.block_sparse_moe.switch_mlp.gate_proj": {"bits": 8, "group_size": 32, "mode": "affine"}
```

But the actual stored tensor in safetensors is:
```
weight: shape=(256, 1536, 192), dtype=uint32
scales: shape=(256, 1536, 24), dtype=float16
biases: shape=(256, 1536, 24), dtype=float16
```

**The config claims 8-bit quantization with group_size=32, but the actual storage is 4-bit:**

| Parameter | Claimed (config) | Actual (stored) |
|---|---|---|
| bits | 8 | 4 |
| group_size | 32 | 32 |
| expected shape | (256, 1536, 768) | (256, 1536, 192) ✅ |

The dequantize logic in oMLX reads `bits: 8` from config and computes a shape that doesn't match the actual tensor.

Meanwhile the actual tensor was stored with **4-bit quantization** (4 values per uint32), giving:
- 3072 elements / 32 groups = 96 groups
- Each group: 32 × 4 bits / 8 = 16 bytes → but packed as uint32: 96 × 2 = 192 uint32

**The config.json `quantization` metadata is wrong** for `switch_mlp` layers — it reports `bits: 8` but the safetensors are actually 4-bit packed. oMLX should either:
1. Auto-detect the actual bits from the tensor shape vs. config shape, or
2. Allow the JANG format to store the correct per-tensor bits in a separate field

## Reproduction Steps

1. Download MiniMax-M2.7-JANG_2L model
2. Load in oMLX → fails with shape mismatch error
3. Load in vMLX → works fine

## Model Config Snippet

```json
{
  "model_type": "minimax_m2",
  "hidden_size": 3072,
  "intermediate_size": 1536,
  "num_hidden_layers": 62,
  "num_local_experts": 256,
  "num_experts_per_tok": 8,
  "head_dim": 128,
  "quantization": {
    "bits": 8,
    "group_size": 128,
    "mode": "affine",
    "model.layers.0.block_sparse_moe.switch_mlp.gate_proj": {"bits": 8, "group_size": 32, "mode": "affine"},
    "model.layers.0.block_sparse_moe.switch_mlp.up_proj": {"bits": 8, "group_size": 32, "mode": "affine"},
    "model.layers.0.block_sparse_moe.switch_mlp.down_proj": {"bits": 8, "group_size": 32, "mode": "affine"},
    ...
  }
}
```

## Suggested Fix

In the MoE expert dequantization path, before applying the config's `bits` value, verify the actual tensor shape against what the config claims. If there's a mismatch, auto-detect the correct bit width from the tensor dimensions.

Alternatively, store the correct per-tensor `bits` value in the JANG format metadata so oMLX reads the right quantization parameters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: MiniMax-M2.7 JANG MoE model fails to load — switch_mlp.gate_proj shape mismatch (config says 8-bit, actual is 4-bit) #967

Error

Root Cause Analysis

Reproduction Steps

Model Config Snippet

Suggested Fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Parameter	Claimed (config)	Actual (stored)
bits	8	4
group_size	32	32
expected shape	(256, 1536, 768)	(256, 1536, 192) ✅

Bug: MiniMax-M2.7 JANG MoE model fails to load — switch_mlp.gate_proj shape mismatch (config says 8-bit, actual is 4-bit) #967

Description