[Tool Parser]Add tool parser for NVIDIA-Nemotron-Nano-9B-v2 model by sniper35 · Pull Request #42255 · vllm-project/vllm

sniper35 · 2026-05-11T00:14:33Z

Purpose

Closes #42065
Nemotron cookbook updated to align with vllm codebase: NVIDIA-NeMo/Nemotron#196
Related: HF repo update: https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2/discussions/37

Test Plan

start the server with the new registered tool parser:

  export VLLM_VENV_BIN=${VLLM_VENV_BIN:-/dev-env/repos/vllm/.venv/bin}
  export MODEL=${MODEL:-/dev-env/repos/NVIDIA-Nemotron-Nano-9B-v2}
  export PORT=${PORT:-8017}

  "$VLLM_VENV_BIN/python" -m vllm.entrypoints.openai.api_server \
    --model "$MODEL" \
    --served-model-name nemotron-smoke \
    --trust-remote-code \
    --dtype bfloat16 \
    --mamba-ssm-cache-dtype float32 \
    --enable-auto-tool-choice \
    --tool-call-parser nemotron_nano_v2 \
    --chat-template examples/tool_chat_template_nemotron_nano_v2.jinja \
    --max-model-len 2048 \
    --gpu-memory-utilization 0.75 \
    --host 127.0.0.1 \
    --port "$PORT" \
    --no-enable-log-requests

Non-streming:

  from openai import OpenAI

  client = OpenAI(base_url=f"http://127.0.0.1:{__import__('os').environ.get('PORT', '8017')}/v1", api_key="dummy")

  resp = client.chat.completions.create(
      model="nemotron-smoke",
      messages=[{"role": "user", "content": "What is an 18% tip on a $100 bill? Use the tool."}],
      tools=[{
          "type": "function",
          "function": {
              "name": "calculate_tip",
              "description": "Calculate a tip amount.",
              "parameters": {
                  "type": "object",
                  "properties": {
                      "bill_total": {"type": "number"},
                      "tip_percentage": {"type": "number"},
                  },
                  "required": ["bill_total", "tip_percentage"],
              },
          },
      }],
      tool_choice="auto",
      temperature=0,
  )

  msg = resp.choices[0].message
  print("content:", msg.content)
  print("tool_calls:", msg.tool_calls)
  assert msg.tool_calls
  assert msg.tool_calls[0].function.name == "calculate_tip"

Streaming:

  from openai import OpenAI
  import os

  client = OpenAI(base_url=f"http://127.0.0.1:{os.environ.get('PORT', '8017')}/v1", api_key="dummy")

  chunks = client.chat.completions.create(
      model="nemotron-smoke",
      messages=[{"role": "user", "content": "What is an 18% tip on a $100 bill? Use the tool."}],
      tools=[{
          "type": "function",
          "function": {
              "name": "calculate_tip",
              "description": "Calculate a tip amount.",
              "parameters": {
                  "type": "object",
                  "properties": {
                      "bill_total": {"type": "number"},
                      "tip_percentage": {"type": "number"},
                  },
                  "required": ["bill_total", "tip_percentage"],
              },
          },
      }],
      tool_choice="auto",
      temperature=0,
      stream=True,
  )

  content = []
  tool_calls = {}

  for chunk in chunks:
      delta = chunk.choices[0].delta
      if delta.content:
          content.append(delta.content)
      for tc in delta.tool_calls or []:
          state = tool_calls.setdefault(tc.index, {"name": "", "arguments": ""})
          if tc.function and tc.function.name:
              state["name"] += tc.function.name
          if tc.function and tc.function.arguments:
              state["arguments"] += tc.function.arguments

  print("content:", "".join(content))
  print("tool_calls:", tool_calls)
  assert tool_calls
  assert any(tc["name"] == "calculate_tip" for tc in tool_calls.values())

Test Result

Non-streming:

content: Okay, the user is asking for an 18% tip on a $100 bill and wants me to use the tool. Let me check the available functions. There's a calculate_tip function that takes bill_total and tip_percentage. The parameters are both required. The bill here is $100, and the tip percentage is 18. I need to call the function with these values. Let me make sure the parameters are numbers. Yes, 100 and 18 are numbers. So the tool call should be calculate_tip with bill_total 100 and tip_percentage 18. That should give the tip amount. I don't see any missing info here, so I can proceed to call the tool.
</think>


tool_calls: [ChatCompletionMessageFunctionToolCall(id='chatcmpl-tool-bc97c3bce8f6591e', function=Function(arguments='{"bill_total": 100, "tip_percentage": 18}', name='calculate_tip'), type='function')]

Streaming:

content: Okay, the user is asking for an 18% tip on a $100 bill and wants me to use the tool. Let me check the available functions. There's a calculate_tip function that takes bill_total and tip_percentage. The parameters are both required. The bill here is $100, and the tip percentage is 18. I need to call the function with these values. Let me make sure the parameters are numbers. Yes, 100 and 18 are numbers. So the tool call should be calculate_tip with bill_total 100 and tip_percentage 18. That should give the tip amount. I don't see any missing info here, so I can proceed to call the tool.
</think>


tool_calls: {0: {'name': 'calculate_tip', 'arguments': '{"bill_total": 100, "tip_percentage": 18}'}}

test_nanotron_streaming.py
test_nanotron_non_streaming.py

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Register a built-in parser for Nemotron <TOOLCALL> JSON payloads, add a matching chat template example, and cover streaming extraction for content-plus-tool and parallel-call chunk boundaries. Signed-off-by: Dong Wang <dongw2019@gmail.com>

…unk boundaries Signed-off-by: Dong Wang <dongw2019@gmail.com>

mergify · 2026-05-11T00:15:23Z

Documentation preview: https://vllm--42255.org.readthedocs.build/en/42255/

gemini-code-assist

Code Review

This PR adds the NemotronJSONToolParser and a corresponding Jinja2 chat template to support tool calling for Nemotron models. It includes documentation and tests for streaming and non-streaming tool extraction. Feedback points out a potential JSON escaping issue in the Jinja2 template, recommending the tojson filter for function names instead of manual quoting.

Signed-off-by: Dong Wang <dongw2019@gmail.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

sniper35 · 2026-05-11T01:24:33Z

    class ExampleToolParser(ToolParser):
-        def __init__(self, tokenizer: TokenizerLike):
-            super().__init__(tokenizer)
+        def __init__(self, tokenizer: TokenizerLike, tools=None):


This is to align with with the latest code base

ameyasm1154 · 2026-05-20T15:29:56Z

I am from the Nemotron team and had worked on this model. The tool-parser implementation looks correct and the tests show it functions correctly.

sniper35 added 2 commits May 11, 2026 00:05

tool_parsers: add Nemotron JSON tool parser

4b6abc6

Register a built-in parser for Nemotron <TOOLCALL> JSON payloads, add a matching chat template example, and cover streaming extraction for content-plus-tool and parallel-call chunk boundaries. Signed-off-by: Dong Wang <dongw2019@gmail.com>

cover streaming extraction for content-plus-tool and parallel-call ch…

32367d4

…unk boundaries Signed-off-by: Dong Wang <dongw2019@gmail.com>

mergify Bot added documentation Improvements or additions to documentation nvidia tool-calling labels May 11, 2026

github-project-automation Bot added this to NVIDIA and Tool Calling May 11, 2026

gemini-code-assist Bot reviewed May 11, 2026

View reviewed changes

Comment thread examples/tool_chat_template_nemotron_json.jinja Outdated

sniper35 changed the title ~~Add tool parser for Nvidia Nemotron family models~~ Add tool parser for Nvidia NVIDIA-Nemotron-Nano-9B-v2 model May 11, 2026

sniper35 added 2 commits May 11, 2026 00:51

cover the case the too_name needs to be escaped

ce7222b

Signed-off-by: Dong Wang <dongw2019@gmail.com>

refactored parser name to be more specific toward Nano-V2

644451f

Signed-off-by: Dong Wang <dongw2019@gmail.com>

sniper35 marked this pull request as ready for review May 11, 2026 00:58

sniper35 requested review from aarnphm, bbrowning, chaunceyjiang, sfeng33 and tomeras91 as code owners May 11, 2026 00:58

claude Bot reviewed May 11, 2026

View reviewed changes

sniper35 changed the title ~~Add tool parser for Nvidia NVIDIA-Nemotron-Nano-9B-v2 model~~ Add tool parser for NVIDIA-Nemotron-Nano-9B-v2 model May 11, 2026

sniper35 changed the title ~~Add tool parser for NVIDIA-Nemotron-Nano-9B-v2 model~~ [Tool Parser]Add tool parser for NVIDIA-Nemotron-Nano-9B-v2 model May 11, 2026

sniper35 commented May 11, 2026

View reviewed changes

This was referenced May 11, 2026

[Tool parser Cookbook]Update the tool parser cookbook to align with the latest vllm code base NVIDIA-NeMo/Nemotron#196

Open

[Feature]: Add nemotron_json as built-in tool parser (NVIDIA Nemotron-Nano-9B-v2 plugin breaks against v0.20.x module reorg) #42065

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Tool Parser]Add tool parser for NVIDIA-Nemotron-Nano-9B-v2 model#42255

[Tool Parser]Add tool parser for NVIDIA-Nemotron-Nano-9B-v2 model#42255
sniper35 wants to merge 4 commits into
vllm-project:mainfrom
sniper35:add-nemotron-tool-parser

sniper35 commented May 11, 2026 •

edited

Loading

Uh oh!

mergify Bot commented May 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

claude Bot left a comment

Uh oh!

sniper35 May 11, 2026

Uh oh!

ameyasm1154 commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sniper35 commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

test_nanotron_streaming.py test_nanotron_non_streaming.py

Uh oh!

mergify Bot commented May 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

sniper35 May 11, 2026

Choose a reason for hiding this comment

Uh oh!

ameyasm1154 commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sniper35 commented May 11, 2026 •

edited

Loading

test_nanotron_streaming.py
test_nanotron_non_streaming.py