Skip to content

Text-only PARO MoE checkpoints ship a vision_config → servers mis-route to VLM engine (load failure / SIGKILL) #49

@sangemaru

Description

@sangemaru

Text-only PARO MoE checkpoints ship a vision_config, causing inference servers to mis-route them to a VLM/multimodal engine

Summary: The published text-only PARO checkpoints for the Qwen3.6 / Gemma-4 MoE families include a vision_config block (and a …ForConditionalGeneration architecture) in config.json. Inference servers that pick the engine by presence of vision_config therefore route these to a vision-language engine, which then fails to load the MoE expert tensors. This is purely vestigial — the checkpoints are text-only.

Confirmed example: z-lab/Qwen3.6-35B-A3B-PARO

architectures: ["Qwen3_5MoeForConditionalGeneration"]
model_type:    "qwen3_5_moe"
vision_config: present
quantization_config.quant_method: "paroquant"

Observed behavior (oMLX 0.3.9.dev2, Apple Silicon / M1 Max):

  • oMLX classifies the model as VLM (because vision_config is present) and loads it on the VLM engine.
  • The VLM path cannot load the MoE expert tensors and throws on model.language_model.layers.0.mlp.experts.gate_up_proj.
  • For Qwen3.6-35B-A3B-PARO it then falls back to the LLM (batched) engine — but only after a failed VLM attempt, contributing to a very slow cold load (~8 min).
  • For gemma-4-26B-A4B-it-PARO there is no MoE→LLM fallback on that arch, and the server is hard-killed (SIGKILL) on load.

Root cause: these are text-only quantizations with no usable vision weights, but the vision_config is retained, so any server that routes on vision_config (a common heuristic) sends them down the multimodal path.

Suggested fix: strip vision_config (and vision-related top-level keys like image_token_id) from the published text-only PARO MoE checkpoints. This is the established pattern for text-only quants of multimodal-arch models (e.g. Unsloth's text-only Gemma quants drop vision_config), and it makes these load correctly as LLMs across servers without per-user workarounds.

Per-user workaround (for reference): on oMLX, either set model_type_override: "llm" in model_settings.json, or remove vision_config from the local config.json — both force the LLM engine and avoid the failed VLM attempt.

Happy to provide full logs if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions