Skip to content

fix: fall back to openai-harmony rendering for gpt-oss models without chat_template#852

Open
FaisalFehad wants to merge 1 commit intojundot:mainfrom
FaisalFehad:fix/gpt-oss-chat-template-fallback
Open

fix: fall back to openai-harmony rendering for gpt-oss models without chat_template#852
FaisalFehad wants to merge 1 commit intojundot:mainfrom
FaisalFehad:fix/gpt-oss-chat-template-fallback

Conversation

@FaisalFehad
Copy link
Copy Markdown
Contributor

Summary

gpt-oss models whose packaged tokenizer does not ship with a chat_template — several community MLX quants of the context-1 family, for example — fail every chat-style endpoint (/v1/chat/completions, /v1/messages, /v1/responses) with:

Chat template error: Cannot use chat template functions because
tokenizer.chat_template is not set and no template argument was passed! ...

even though oMLX already has full Harmony infrastructure (adapter, streaming parser, anthropic→harmony converter, structured tool-call extraction). The only endpoint that responds at all is /v1/completions — and that path drops the parser's structured tool_calls on the floor, which makes tool use on these models effectively unusable.

Fix

When model_type == "gpt_oss" and the tokenizer has no chat_template, render the prompt directly via openai-harmony (already a dependency) instead of falling into tokenizer.apply_chat_template. Models that ship a template continue to use it — nothing changes on the happy path.

Files

  • omlx/adapter/harmony.py — new render_harmony_prompt(). Translates OpenAI-style messages + tools into a Harmony prompt string. Handles:
    • content blocks ([{"type":"text","text":"..."}])
    • assistant tool_calls → commentary channel with to=functions.NAME
    • role=tool responses, resolving the function name via tool_call_id lookup when the message omits name (orphan tool messages are skipped with a warning rather than given a fabricated name)
    • reasoning_effort and conversation_start_date via chat_template_kwargs
    • Encoding cached at module scope (one load per process, not per call)
  • omlx/engine/batched.py::_apply_chat_template — branches to the new renderer only when the fallback is needed. Single guarded if at the top of the method; existing path untouched.
  • tests/test_harmony_render.py — 19 unit tests covering basic rendering, content-as-list, tool rendering, tool-call roundtrip (incl. name resolution via tool_call_id and orphan-skip), template kwargs, and malformed input.

Verification

  • 19/19 new tests pass.
  • Full harmony test suite (67/67) passes unchanged.
  • End-to-end against a local gpt-oss model without chat_template:
    • /v1/chat/completions — returns structured tool_calls.
    • /v1/messages — returns structured tool_use blocks.
  • Regression-checked against gemma-4-e4b-it-8bit (ships a chat_template) — unchanged behaviour, new branch not taken.

Test plan

  • pytest tests/test_harmony_render.py — 19 passed
  • pytest tests/test_harmony.py tests/test_harmony_parser.py tests/test_harmony_render.py — 67 passed
  • Live /v1/chat/completions call on gpt-oss-no-template model → structured tool_calls
  • Live /v1/messages call on gpt-oss-no-template model → structured tool_use
  • Live /v1/chat/completions call on model with template → unchanged

… chat_template

gpt-oss models whose packaged tokenizer does not ship with a
``chat_template`` (several community MLX quants of the context-1
family, for example) fail every chat-style endpoint
(``/v1/chat/completions``, ``/v1/messages``, ``/v1/responses``) with:

    Chat template error: Cannot use chat template functions because
    tokenizer.chat_template is not set ...

even though oMLX has full Harmony infrastructure (adapter, streaming
parser, anthropic→harmony converter, structured tool-call extraction).
The only reachable endpoint is ``/v1/completions``, which drops the
parser's ``tool_calls`` on the floor — so tool use on those models is
effectively unusable.

Render the prompt directly via ``openai-harmony`` (already a
dependency) when ``model_type == "gpt_oss"`` and the tokenizer has no
``chat_template``.  Models that ship a template continue to use it,
so nothing changes on the happy path.

- ``omlx/adapter/harmony.py``: add ``render_harmony_prompt()``.
  Translates OpenAI messages + tools into a Harmony prompt string;
  handles content blocks, assistant ``tool_calls``, ``tool`` responses,
  ``reasoning_effort`` and ``conversation_start_date`` via
  ``chat_template_kwargs``.  Resolves tool-response function names
  via ``tool_call_id`` when the ``tool`` message omits ``name``.
- ``omlx/engine/batched.py::_apply_chat_template``: branch to the new
  renderer only when the fallback is needed.
- ``tests/test_harmony_render.py``: 19 unit tests covering basic
  rendering, content-as-list, tool rendering, tool-call roundtrip
  (incl. name resolution via tool_call_id), template kwargs, and
  malformed input.

Verified end-to-end on a gpt-oss model without ``chat_template``:
``/v1/chat/completions`` and ``/v1/messages`` now return structured
``tool_calls``; a model that does ship a template is unaffected.
@jundot jundot force-pushed the main branch 2 times, most recently from 7844f15 to b078330 Compare April 28, 2026 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant