fix: fall back to openai-harmony rendering for gpt-oss models without chat_template#852
Open
FaisalFehad wants to merge 1 commit intojundot:mainfrom
Open
fix: fall back to openai-harmony rendering for gpt-oss models without chat_template#852FaisalFehad wants to merge 1 commit intojundot:mainfrom
FaisalFehad wants to merge 1 commit intojundot:mainfrom
Conversation
… chat_template
gpt-oss models whose packaged tokenizer does not ship with a
``chat_template`` (several community MLX quants of the context-1
family, for example) fail every chat-style endpoint
(``/v1/chat/completions``, ``/v1/messages``, ``/v1/responses``) with:
Chat template error: Cannot use chat template functions because
tokenizer.chat_template is not set ...
even though oMLX has full Harmony infrastructure (adapter, streaming
parser, anthropic→harmony converter, structured tool-call extraction).
The only reachable endpoint is ``/v1/completions``, which drops the
parser's ``tool_calls`` on the floor — so tool use on those models is
effectively unusable.
Render the prompt directly via ``openai-harmony`` (already a
dependency) when ``model_type == "gpt_oss"`` and the tokenizer has no
``chat_template``. Models that ship a template continue to use it,
so nothing changes on the happy path.
- ``omlx/adapter/harmony.py``: add ``render_harmony_prompt()``.
Translates OpenAI messages + tools into a Harmony prompt string;
handles content blocks, assistant ``tool_calls``, ``tool`` responses,
``reasoning_effort`` and ``conversation_start_date`` via
``chat_template_kwargs``. Resolves tool-response function names
via ``tool_call_id`` when the ``tool`` message omits ``name``.
- ``omlx/engine/batched.py::_apply_chat_template``: branch to the new
renderer only when the fallback is needed.
- ``tests/test_harmony_render.py``: 19 unit tests covering basic
rendering, content-as-list, tool rendering, tool-call roundtrip
(incl. name resolution via tool_call_id), template kwargs, and
malformed input.
Verified end-to-end on a gpt-oss model without ``chat_template``:
``/v1/chat/completions`` and ``/v1/messages`` now return structured
``tool_calls``; a model that does ship a template is unaffected.
7844f15 to
b078330
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gpt-oss models whose packaged tokenizer does not ship with a
chat_template— several community MLX quants of thecontext-1family, for example — fail every chat-style endpoint (/v1/chat/completions,/v1/messages,/v1/responses) with:even though oMLX already has full Harmony infrastructure (adapter, streaming parser, anthropic→harmony converter, structured tool-call extraction). The only endpoint that responds at all is
/v1/completions— and that path drops the parser's structuredtool_callson the floor, which makes tool use on these models effectively unusable.Fix
When
model_type == "gpt_oss"and the tokenizer has nochat_template, render the prompt directly viaopenai-harmony(already a dependency) instead of falling intotokenizer.apply_chat_template. Models that ship a template continue to use it — nothing changes on the happy path.Files
omlx/adapter/harmony.py— newrender_harmony_prompt(). Translates OpenAI-style messages + tools into a Harmony prompt string. Handles:[{"type":"text","text":"..."}])tool_calls→ commentary channel withto=functions.NAMErole=toolresponses, resolving the function name viatool_call_idlookup when the message omitsname(orphan tool messages are skipped with a warning rather than given a fabricated name)reasoning_effortandconversation_start_dateviachat_template_kwargsomlx/engine/batched.py::_apply_chat_template— branches to the new renderer only when the fallback is needed. Single guardedifat the top of the method; existing path untouched.tests/test_harmony_render.py— 19 unit tests covering basic rendering, content-as-list, tool rendering, tool-call roundtrip (incl. name resolution viatool_call_idand orphan-skip), template kwargs, and malformed input.Verification
chat_template:/v1/chat/completions— returns structuredtool_calls./v1/messages— returns structuredtool_useblocks.gemma-4-e4b-it-8bit(ships achat_template) — unchanged behaviour, new branch not taken.Test plan
pytest tests/test_harmony_render.py— 19 passedpytest tests/test_harmony.py tests/test_harmony_parser.py tests/test_harmony_render.py— 67 passed/v1/chat/completionscall on gpt-oss-no-template model → structuredtool_calls/v1/messagescall on gpt-oss-no-template model → structuredtool_use/v1/chat/completionscall on model with template → unchanged