Skip to content

fix(discovery): fall back to directory name for audio model type detection#849

Open
ryancee wants to merge 1 commit intojundot:mainfrom
ryancee:fix/audio-model-discovery-dirname-fallback
Open

fix(discovery): fall back to directory name for audio model type detection#849
ryancee wants to merge 1 commit intojundot:mainfrom
ryancee:fix/audio-model-discovery-dirname-fallback

Conversation

@ryancee
Copy link
Copy Markdown

@ryancee ryancee commented Apr 18, 2026

Summary

Models whose config.json omits a top-level model_type field were being classified as llm by detect_model_type(), causing a KeyError('model_type') crash at inference time.

Root Cause

detect_model_type() in model_discovery.py identifies audio models exclusively by matching config["model_type"] and config["architectures"] against the mlx-audio model-type sets. Models using NeMo-format config files — notably parakeet-tdt models — do not include a top-level model_type field. This causes:

  1. detect_model_type() falls through all audio checks and returns "llm"
  2. A BatchedEngine (LLM engine) is loaded for the model
  3. The LLM loading path calls config["model_type"] without a .get() fallback
  4. KeyError: 'model_type' is raised — reported to the client as HTTP 500
POST /v1/audio/transcriptions → 500: 'model_type'

The model itself works fine when loaded directly via mlx_audio.stt.utils.load_model, which already has its own directory-name fallback in base_load_model.

Fix

After all config-based checks, extract the first hyphen-separated segment of the model directory name and match it against AUDIO_STT/TTS/STS_MODEL_TYPES — the same sets already used for config-based detection. The segment is guarded against _LLM_TYPE_COLLISIONS to prevent false positives on directories named llama-..., qwen3-..., etc.

# e.g. "parakeet-tdt-0.6b-v2" → stem = "parakeet" → matches AUDIO_STT_MODEL_TYPES → "audio_stt"
dir_stem = model_path.name.lower().split("-")[0]
if dir_stem and dir_stem not in _LLM_TYPE_COLLISIONS:
    if dir_stem in AUDIO_STT_MODEL_TYPES:
        return "audio_stt"
    ...

This mirrors the existing logic in mlx_audio.utils.base_load_model, which already resolves model type from the directory name when config.json is missing the field.

Affected Models

Any audio model with a NeMo-format config, including:

  • parakeet-tdt-0.6b-v2 (confirmed: was returning HTTP 500 before fix, now correctly detected as audio_stt)
  • Any future parakeet or NeMo-converted model added to ~/.omlx/models/

Note

A companion PR to Blaizzy/mlx-audio#657 adds model_type: "parakeet" injection in ModelConfig.from_dict(), which addresses the root cause at the mlx-audio level. This omlx fix provides defense-in-depth for any other audio models that may have the same omission.

…ction

Models whose config.json omits a top-level model_type field (e.g.
parakeet-tdt models, which use NeMo-format config files) were being
classified as 'llm' because detect_model_type() only inspects the
architectures[] and model_type fields from config.json.

This caused a KeyError('model_type') at inference time: the wrong
engine (BatchedEngine/LLM) was loaded for the model, and that engine's
LLM loading path called config['model_type'] without a .get() fallback.

Fix: after all config-based checks, extract the first hyphen-separated
segment of the model directory name and match it against the same
mlx-audio AUDIO_STT/TTS/STS_MODEL_TYPES sets used for config-based
detection.  The stem is excluded from _LLM_TYPE_COLLISIONS to avoid
false positives (e.g. a directory named 'llama-...' should not be
detected as audio).

This matches how mlx_audio.utils.base_load_model already resolves
model type from path when config is missing the field.
@ryancee ryancee force-pushed the fix/audio-model-discovery-dirname-fallback branch from 90ca033 to 32d6ec7 Compare April 18, 2026 16:53
@jundot jundot force-pushed the main branch 2 times, most recently from 7844f15 to b078330 Compare April 28, 2026 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant