Skip to content

[Feature]: Add nemotron_json as built-in tool parser (NVIDIA Nemotron-Nano-9B-v2 plugin breaks against v0.20.x module reorg) #42065

@adonig

Description

@adonig

🚀 The feature, motivation and pitch

Context

nvidia/NVIDIA-Nemotron-Nano-9B-v2 ships an out-of-tree tool-call parser plugin (nemotron_toolcall_parser_no_streaming.py) that NVIDIA's own vLLM cookbook tells users to load via:

--enable-auto-tool-choice
--tool-parser-plugin "<repo>/nemotron_toolcall_parser_no_streaming.py"
--tool-call-parser nemotron_json

The cookbook pins vLLM to commit 75531a6c… (2025-08-15). The plugin file in NVIDIA's HF model repo has not been updated since.

What breaks on v0.20.x

Three import paths in the plugin no longer resolve, plus the ToolParser.__init__ calling convention changed:

Symbol / surface Old (Aug-2025 vLLM) v0.20.1
ChatCompletionRequest vllm.entrypoints.openai.protocol vllm.entrypoints.openai.chat_completion.protocol
FunctionCall, ToolCall, DeltaFunctionCall, DeltaToolCall, DeltaMessage, ExtractedToolCallInformation vllm.entrypoints.openai.protocol vllm.entrypoints.openai.engine.protocol
ToolParser, ToolParserManager vllm.entrypoints.openai.tool_parsers.abstract_tool_parser vllm.tool_parsers.abstract_tool_parser
AnyTokenizer vllm.transformers_utils.tokenizer renamed to TokenizerLike in vllm.tokenizers.protocol
ToolParser.__init__(tokenizer) one positional arg now called as tool_parser(tokenizer, request.tools) (see vllm/entrypoints/serve/render/serving.py) — subclasses must accept the second arg

Result against current vLLM: server fails to start with KeyError: 'invalid tool call parser: nemotron_json' (plugin can't be imported), and even after fixing imports the parser raises TypeError: __init__() takes 2 positional arguments but 3 were given on the first request that carries tools=[…].

Patched plugin (works against v0.20.1)

Only imports + AnyTokenizer -> TokenizerLike rename + __init__ accepts tools; parsing logic is identical to NVIDIA's upstream.

nemotron_parser.py
# SPDX-License-Identifier: Apache-2.0

import json
import re
from typing import Union

from vllm.entrypoints.openai.chat_completion.protocol import ChatCompletionRequest
from vllm.entrypoints.openai.engine.protocol import (
    DeltaMessage,
    ExtractedToolCallInformation,
    FunctionCall,
    ToolCall,
)
from vllm.tool_parsers.abstract_tool_parser import ToolParser, ToolParserManager
from vllm.logger import init_logger
from vllm.tokenizers.protocol import TokenizerLike

logger = init_logger(__name__)


@ToolParserManager.register_module("nemotron_json")
class NemotronJSONToolParser(ToolParser):
    def __init__(self, tokenizer: TokenizerLike, tools=None):
        super().__init__(tokenizer, tools)
        self.tool_call_start_token = "<TOOLCALL>"
        self.tool_call_end_token = "</TOOLCALL>"
        self.tool_call_regex = re.compile(r"<TOOLCALL>(.*?)</TOOLCALL>", re.DOTALL)

    def extract_tool_calls(
        self, model_output: str, request: ChatCompletionRequest
    ) -> ExtractedToolCallInformation:
        if self.tool_call_start_token not in model_output:
            return ExtractedToolCallInformation(
                tools_called=False, tool_calls=[], content=model_output
            )
        try:
            str_calls = self.tool_call_regex.findall(model_output)[0].strip()
            if not str_calls.startswith("["):
                str_calls = "[" + str_calls
            if not str_calls.endswith("]"):
                str_calls = str_calls + "]"
            tool_calls = []
            for tc in json.loads(str_calls):
                try:
                    args = tc["arguments"]
                    tool_calls.append(ToolCall(
                        type="function",
                        function=FunctionCall(
                            name=tc["name"],
                            arguments=json.dumps(args, ensure_ascii=False)
                                if isinstance(args, dict) else args,
                        ),
                    ))
                except Exception:
                    continue
            content = model_output[:model_output.rfind(self.tool_call_start_token)]
            return ExtractedToolCallInformation(
                tools_called=True, tool_calls=tool_calls,
                content=content if content else None,
            )
        except Exception:
            logger.exception("Error extracting tool call from: %s", model_output)
            return ExtractedToolCallInformation(
                tools_called=False, tool_calls=[], content=model_output
            )

    def extract_tool_calls_streaming(self, *_args, **_kwargs) -> Union[DeltaMessage, None]:
        raise NotImplementedError("Streaming not supported")

Proposal

Either

  • accept this as a built-in nemotron_json parser under vllm/tool_parsers/ (the format <TOOLCALL>[{"name": ..., "arguments": ...}, ...]</TOOLCALL> is baked into the model's chat template, so it's a stable target), or
  • coordinate with NVIDIA to refresh the plugin in their HF model repo.

Happy with whichever. Flagging because the current state is silently broken for anyone following NVIDIA's official cookbook against current vLLM.

Reproduction

vLLM 0.20.1 + vllm serve nvidia/NVIDIA-Nemotron-Nano-9B-v2-NVFP4 --enable-auto-tool-choice --tool-parser-plugin <upstream-plugin> --tool-call-parser nemotron_json with the upstream plugin file → ImportError chain ending in KeyError: 'invalid tool call parser: nemotron_json'. After patching imports, first request with tools=[…] raises TypeError: NemotronJSONToolParser.__init__() takes 2 positional arguments but 3 were given.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions