Skip to content

chat : add MiniMax M2 specialized tool-call handler#8

Open
doctorjei wants to merge 1 commit into
domvox:mainfrom
doctorjei:tq-hip-minimax-pr
Open

chat : add MiniMax M2 specialized tool-call handler#8
doctorjei wants to merge 1 commit into
domvox:mainfrom
doctorjei:tq-hip-minimax-pr

Conversation

@doctorjei
Copy link
Copy Markdown
Contributor

You can also find this as a PR for the mainline here: ggml-org/llama.cpp#22106

Overview

Adds specialized tool-call handler for MiniMax M2.7 template (and probably later versions, I'm guessing). Right now, without it, M2.7 output with tools crashes llama-server (GGML_ABORT) at src/llama-grammar.cpp:1435 (EOG with non-empty stack) when <invoke> is emitted.

Why? (Reproducing the Issue)

Reproducible in-tree via tests/test-chat.cpp on current master (b8840). Parallel <invoke> elements inside a <minimax:tool_call> wrapper confuse the peg_tester.

The autoparser (peg-native) infers grammar structure from the template via differential rendering. MiniMax's template uses XML with repeatable invoke elements for parallel calls. The parser correctly infers per-invoke structure but mis-specifies the repetition rule, so any second invoke is lost.

This is a regression; an earlier working version was in the mainline (#16932, 1920345) (via a generalized XML tool-call parser), but the autoparser refactoring (#18675) replaced it. This PR restores specialized handling for MiniMax M2.7 (and likely other M2 versions) without reverting the broader refactor.

Implementation

This implementation follows the Kimi K2 / DeepSeek V3.2 pattern for templates the autoparser cannot handle.

  • common_chat_params_init_minimax prepares PEG for wrapper/invoke/param grammar (parallel calls).
  • Reasoning is extracted (<think>…</think> blocks) ahead of tool calls.
  • String-typed parameters are captured verbatim (tool_arg_string_value) to preserve embedded XML-style content; non-strings are reconstructed through JSON.
  • Dispatch in common_chat_try_specialized_template requires three MiniMax-specific literals in template source (<minimax:tool_call>, <invoke name=, <parameter name=)

Testing

Extends the existing MiniMax block in tests/test-chat.cpp with five test cases.

Test Case Purpose Master (no fix)
Parallel <invoke> elements; two different tools Reproduces crash pattern Crashes (Invalid diff: now finding less tool calls!)
Parallel <invoke> elements; same tool twice Additional variant of crash Crashes (same)
String parameter with embedded <div><script>…</script></div> Verifies tool_arg_string_value is verbatim Passes
Multi-line string parameter (Python code with \n) Verifies until("</parameter>") boundary on multi-line content Passes
Two integer parameters in one <invoke> Verifies zero_or_more over parameter list + non-string JSON reconstruction Passes

The passing test cases are also focused on repetition (vs content shape) to provide additional regression coverage.

Additional information

  • Precedent: #21785 (DeepSeek V3.2 dedicated parser)
  • Earlier working implementation: #16932 (superseded by #18675)
  • Crash assertion: src/llama-grammar.cpp:1435 (GGML_ABORT("fatal error") when EOG token is accepted with non-empty grammar stacks).

Requirements

AI was used to identify the appropriate strategy, draft a harness, and draft initial code snippets. Every line was reviewed, edited as appropriate, and included manually in commits.

The autoparser (peg-native) cannot parse MiniMax's XML-based tool-call
format, causing GGML_ABORT crashes when tools are present. Add a
specialized handler following the Kimi K2 pattern with XML parameter
parsing via tool_arg_name/tool_arg_value tags.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant