chat : add MiniMax M2 specialized tool-call handler#8
Open
doctorjei wants to merge 1 commit into
Open
Conversation
The autoparser (peg-native) cannot parse MiniMax's XML-based tool-call format, causing GGML_ABORT crashes when tools are present. Add a specialized handler following the Kimi K2 pattern with XML parameter parsing via tool_arg_name/tool_arg_value tags.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
You can also find this as a PR for the mainline here: ggml-org/llama.cpp#22106
Overview
Adds specialized tool-call handler for MiniMax M2.7 template (and probably later versions, I'm guessing). Right now, without it, M2.7 output with tools crashes llama-server (
GGML_ABORT) atsrc/llama-grammar.cpp:1435(EOG with non-empty stack) when<invoke>is emitted.Why? (Reproducing the Issue)
Reproducible in-tree via
tests/test-chat.cppon current master (b8840). Parallel<invoke>elements inside a<minimax:tool_call>wrapper confuse thepeg_tester.The autoparser (
peg-native) infers grammar structure from the template via differential rendering. MiniMax's template uses XML with repeatableinvokeelements for parallel calls. The parser correctly infers per-invoke structure but mis-specifies the repetition rule, so any secondinvokeis lost.This is a regression; an earlier working version was in the mainline (#16932,
1920345) (via a generalized XML tool-call parser), but the autoparser refactoring (#18675) replaced it. This PR restores specialized handling for MiniMax M2.7 (and likely other M2 versions) without reverting the broader refactor.Implementation
This implementation follows the Kimi K2 / DeepSeek V3.2 pattern for templates the autoparser cannot handle.
common_chat_params_init_minimaxprepares PEG for wrapper/invoke/param grammar (parallel calls).<think>…</think>blocks) ahead of tool calls.tool_arg_string_value) to preserve embedded XML-style content; non-strings are reconstructed through JSON.common_chat_try_specialized_templaterequires three MiniMax-specific literals in template source (<minimax:tool_call>,<invoke name=,<parameter name=)Testing
Extends the existing MiniMax block in
tests/test-chat.cppwith five test cases.<invoke>elements; two different toolsInvalid diff: now finding less tool calls!)<invoke>elements; same tool twice<div><script>…</script></div>tool_arg_string_valueis verbatim\n)until("</parameter>")boundary on multi-line content<invoke>zero_or_moreover parameter list + non-string JSON reconstructionThe passing test cases are also focused on repetition (vs content shape) to provide additional regression coverage.
Additional information
src/llama-grammar.cpp:1435(GGML_ABORT("fatal error")when EOG token is accepted with non-empty grammar stacks).Requirements
AI was used to identify the appropriate strategy, draft a harness, and draft initial code snippets. Every line was reviewed, edited as appropriate, and included manually in commits.