fix: fall back to openai-harmony rendering for gpt-oss models without chat_template by FaisalFehad · Pull Request #852 · jundot/omlx

FaisalFehad · 2026-04-18T20:25:46Z

Summary

gpt-oss models whose packaged tokenizer does not ship with a chat_template — several community MLX quants of the context-1 family, for example — fail every chat-style endpoint (/v1/chat/completions, /v1/messages, /v1/responses) with:

Chat template error: Cannot use chat template functions because
tokenizer.chat_template is not set and no template argument was passed! ...

even though oMLX already has full Harmony infrastructure (adapter, streaming parser, anthropic→harmony converter, structured tool-call extraction). The only endpoint that responds at all is /v1/completions — and that path drops the parser's structured tool_calls on the floor, which makes tool use on these models effectively unusable.

Fix

When model_type == "gpt_oss" and the tokenizer has no chat_template, render the prompt directly via openai-harmony (already a dependency) instead of falling into tokenizer.apply_chat_template. Models that ship a template continue to use it — nothing changes on the happy path.

Files

omlx/adapter/harmony.py — new render_harmony_prompt(). Translates OpenAI-style messages + tools into a Harmony prompt string. Handles:
- content blocks ([{"type":"text","text":"..."}])
- assistant tool_calls → commentary channel with to=functions.NAME
- role=tool responses, resolving the function name via tool_call_id lookup when the message omits name (orphan tool messages are skipped with a warning rather than given a fabricated name)
- reasoning_effort and conversation_start_date via chat_template_kwargs
- Encoding cached at module scope (one load per process, not per call)
omlx/engine/batched.py::_apply_chat_template — branches to the new renderer only when the fallback is needed. Single guarded if at the top of the method; existing path untouched.
tests/test_harmony_render.py — 19 unit tests covering basic rendering, content-as-list, tool rendering, tool-call roundtrip (incl. name resolution via tool_call_id and orphan-skip), template kwargs, and malformed input.

Verification

19/19 new tests pass.
Full harmony test suite (67/67) passes unchanged.
End-to-end against a local gpt-oss model without chat_template:
- /v1/chat/completions — returns structured tool_calls.
- /v1/messages — returns structured tool_use blocks.
Regression-checked against gemma-4-e4b-it-8bit (ships a chat_template) — unchanged behaviour, new branch not taken.

Test plan

pytest tests/test_harmony_render.py — 19 passed
pytest tests/test_harmony.py tests/test_harmony_parser.py tests/test_harmony_render.py — 67 passed
Live /v1/chat/completions call on gpt-oss-no-template model → structured tool_calls
Live /v1/messages call on gpt-oss-no-template model → structured tool_use
Live /v1/chat/completions call on model with template → unchanged

… chat_template gpt-oss models whose packaged tokenizer does not ship with a ``chat_template`` (several community MLX quants of the context-1 family, for example) fail every chat-style endpoint (``/v1/chat/completions``, ``/v1/messages``, ``/v1/responses``) with: Chat template error: Cannot use chat template functions because tokenizer.chat_template is not set ... even though oMLX has full Harmony infrastructure (adapter, streaming parser, anthropic→harmony converter, structured tool-call extraction). The only reachable endpoint is ``/v1/completions``, which drops the parser's ``tool_calls`` on the floor — so tool use on those models is effectively unusable. Render the prompt directly via ``openai-harmony`` (already a dependency) when ``model_type == "gpt_oss"`` and the tokenizer has no ``chat_template``. Models that ship a template continue to use it, so nothing changes on the happy path. - ``omlx/adapter/harmony.py``: add ``render_harmony_prompt()``. Translates OpenAI messages + tools into a Harmony prompt string; handles content blocks, assistant ``tool_calls``, ``tool`` responses, ``reasoning_effort`` and ``conversation_start_date`` via ``chat_template_kwargs``. Resolves tool-response function names via ``tool_call_id`` when the ``tool`` message omits ``name``. - ``omlx/engine/batched.py::_apply_chat_template``: branch to the new renderer only when the fallback is needed. - ``tests/test_harmony_render.py``: 19 unit tests covering basic rendering, content-as-list, tool rendering, tool-call roundtrip (incl. name resolution via tool_call_id), template kwargs, and malformed input. Verified end-to-end on a gpt-oss model without ``chat_template``: ``/v1/chat/completions`` and ``/v1/messages`` now return structured ``tool_calls``; a model that does ship a template is unaffected.

jundot force-pushed the main branch 2 times, most recently from 7844f15 to b078330 Compare April 28, 2026 02:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fall back to openai-harmony rendering for gpt-oss models without chat_template#852

fix: fall back to openai-harmony rendering for gpt-oss models without chat_template#852
FaisalFehad wants to merge 1 commit intojundot:mainfrom
FaisalFehad:fix/gpt-oss-chat-template-fallback

FaisalFehad commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FaisalFehad commented Apr 18, 2026

Summary

Fix

Files

Verification

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant