Text-only PARO MoE checkpoints ship a vision_config → servers mis-route to VLM engine (load failure / SIGKILL)

## Text-only PARO MoE checkpoints ship a `vision_config`, causing inference servers to mis-route them to a VLM/multimodal engine

**Summary:** The published text-only PARO checkpoints for the Qwen3.6 / Gemma-4 *MoE* families include a `vision_config` block (and a `…ForConditionalGeneration` architecture) in `config.json`. Inference servers that pick the engine by presence of `vision_config` therefore route these to a vision-language engine, which then fails to load the MoE expert tensors. This is purely vestigial — the checkpoints are text-only.

**Confirmed example:** `z-lab/Qwen3.6-35B-A3B-PARO`
```
architectures: ["Qwen3_5MoeForConditionalGeneration"]
model_type:    "qwen3_5_moe"
vision_config: present
quantization_config.quant_method: "paroquant"
```

**Observed behavior (oMLX 0.3.9.dev2, Apple Silicon / M1 Max):**
- oMLX classifies the model as VLM (because `vision_config` is present) and loads it on the VLM engine.
- The VLM path cannot load the MoE expert tensors and throws on `model.language_model.layers.0.mlp.experts.gate_up_proj`.
- For `Qwen3.6-35B-A3B-PARO` it then falls back to the LLM (batched) engine — but only after a failed VLM attempt, contributing to a very slow cold load (~8 min).
- For `gemma-4-26B-A4B-it-PARO` there is no MoE→LLM fallback on that arch, and the server is **hard-killed (SIGKILL)** on load.

**Root cause:** these are text-only quantizations with no usable vision weights, but the `vision_config` is retained, so any server that routes on `vision_config` (a common heuristic) sends them down the multimodal path.

**Suggested fix:** strip `vision_config` (and vision-related top-level keys like `image_token_id`) from the published text-only PARO MoE checkpoints. This is the established pattern for text-only quants of multimodal-arch models (e.g. Unsloth's text-only Gemma quants drop `vision_config`), and it makes these load correctly as LLMs across servers without per-user workarounds.

**Per-user workaround (for reference):** on oMLX, either set `model_type_override: "llm"` in `model_settings.json`, or remove `vision_config` from the local `config.json` — both force the LLM engine and avoid the failed VLM attempt.

Happy to provide full logs if useful.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text-only PARO MoE checkpoints ship a vision_config → servers mis-route to VLM engine (load failure / SIGKILL) #49

Text-only PARO MoE checkpoints ship a `vision_config`, causing inference servers to mis-route them to a VLM/multimodal engine

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Text-only PARO MoE checkpoints ship a vision_config → servers mis-route to VLM engine (load failure / SIGKILL) #49

Description

Text-only PARO MoE checkpoints ship a vision_config, causing inference servers to mis-route them to a VLM/multimodal engine

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Text-only PARO MoE checkpoints ship a `vision_config`, causing inference servers to mis-route them to a VLM/multimodal engine