fix(discovery): fall back to directory name for audio model type detection#849
Open
ryancee wants to merge 1 commit intojundot:mainfrom
Open
fix(discovery): fall back to directory name for audio model type detection#849ryancee wants to merge 1 commit intojundot:mainfrom
ryancee wants to merge 1 commit intojundot:mainfrom
Conversation
…ction
Models whose config.json omits a top-level model_type field (e.g.
parakeet-tdt models, which use NeMo-format config files) were being
classified as 'llm' because detect_model_type() only inspects the
architectures[] and model_type fields from config.json.
This caused a KeyError('model_type') at inference time: the wrong
engine (BatchedEngine/LLM) was loaded for the model, and that engine's
LLM loading path called config['model_type'] without a .get() fallback.
Fix: after all config-based checks, extract the first hyphen-separated
segment of the model directory name and match it against the same
mlx-audio AUDIO_STT/TTS/STS_MODEL_TYPES sets used for config-based
detection. The stem is excluded from _LLM_TYPE_COLLISIONS to avoid
false positives (e.g. a directory named 'llama-...' should not be
detected as audio).
This matches how mlx_audio.utils.base_load_model already resolves
model type from path when config is missing the field.
90ca033 to
32d6ec7
Compare
7844f15 to
b078330
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Models whose
config.jsonomits a top-levelmodel_typefield were being classified asllmbydetect_model_type(), causing aKeyError('model_type')crash at inference time.Root Cause
detect_model_type()inmodel_discovery.pyidentifies audio models exclusively by matchingconfig["model_type"]andconfig["architectures"]against the mlx-audio model-type sets. Models using NeMo-format config files — notablyparakeet-tdtmodels — do not include a top-levelmodel_typefield. This causes:detect_model_type()falls through all audio checks and returns"llm"BatchedEngine(LLM engine) is loaded for the modelconfig["model_type"]without a.get()fallbackKeyError: 'model_type'is raised — reported to the client as HTTP 500The model itself works fine when loaded directly via
mlx_audio.stt.utils.load_model, which already has its own directory-name fallback inbase_load_model.Fix
After all config-based checks, extract the first hyphen-separated segment of the model directory name and match it against
AUDIO_STT/TTS/STS_MODEL_TYPES— the same sets already used for config-based detection. The segment is guarded against_LLM_TYPE_COLLISIONSto prevent false positives on directories namedllama-...,qwen3-..., etc.This mirrors the existing logic in
mlx_audio.utils.base_load_model, which already resolves model type from the directory name whenconfig.jsonis missing the field.Affected Models
Any audio model with a NeMo-format config, including:
parakeet-tdt-0.6b-v2(confirmed: was returning HTTP 500 before fix, now correctly detected asaudio_stt)~/.omlx/models/Note
A companion PR to Blaizzy/mlx-audio#657 adds
model_type: "parakeet"injection inModelConfig.from_dict(), which addresses the root cause at the mlx-audio level. This omlx fix provides defense-in-depth for any other audio models that may have the same omission.