Skip to content

[Tool Parser]Add tool parser for NVIDIA-Nemotron-Nano-9B-v2 model#42255

Open
sniper35 wants to merge 4 commits into
vllm-project:mainfrom
sniper35:add-nemotron-tool-parser
Open

[Tool Parser]Add tool parser for NVIDIA-Nemotron-Nano-9B-v2 model#42255
sniper35 wants to merge 4 commits into
vllm-project:mainfrom
sniper35:add-nemotron-tool-parser

Conversation

@sniper35
Copy link
Copy Markdown
Contributor

@sniper35 sniper35 commented May 11, 2026

Purpose

Closes #42065
Nemotron cookbook updated to align with vllm codebase: NVIDIA-NeMo/Nemotron#196
Related: HF repo update: https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2/discussions/37

Test Plan

start the server with the new registered tool parser:

  export VLLM_VENV_BIN=${VLLM_VENV_BIN:-/dev-env/repos/vllm/.venv/bin}
  export MODEL=${MODEL:-/dev-env/repos/NVIDIA-Nemotron-Nano-9B-v2}
  export PORT=${PORT:-8017}

  "$VLLM_VENV_BIN/python" -m vllm.entrypoints.openai.api_server \
    --model "$MODEL" \
    --served-model-name nemotron-smoke \
    --trust-remote-code \
    --dtype bfloat16 \
    --mamba-ssm-cache-dtype float32 \
    --enable-auto-tool-choice \
    --tool-call-parser nemotron_nano_v2 \
    --chat-template examples/tool_chat_template_nemotron_nano_v2.jinja \
    --max-model-len 2048 \
    --gpu-memory-utilization 0.75 \
    --host 127.0.0.1 \
    --port "$PORT" \
    --no-enable-log-requests

Non-streming:

  from openai import OpenAI

  client = OpenAI(base_url=f"http://127.0.0.1:{__import__('os').environ.get('PORT', '8017')}/v1", api_key="dummy")

  resp = client.chat.completions.create(
      model="nemotron-smoke",
      messages=[{"role": "user", "content": "What is an 18% tip on a $100 bill? Use the tool."}],
      tools=[{
          "type": "function",
          "function": {
              "name": "calculate_tip",
              "description": "Calculate a tip amount.",
              "parameters": {
                  "type": "object",
                  "properties": {
                      "bill_total": {"type": "number"},
                      "tip_percentage": {"type": "number"},
                  },
                  "required": ["bill_total", "tip_percentage"],
              },
          },
      }],
      tool_choice="auto",
      temperature=0,
  )

  msg = resp.choices[0].message
  print("content:", msg.content)
  print("tool_calls:", msg.tool_calls)
  assert msg.tool_calls
  assert msg.tool_calls[0].function.name == "calculate_tip"

Streaming:

  from openai import OpenAI
  import os

  client = OpenAI(base_url=f"http://127.0.0.1:{os.environ.get('PORT', '8017')}/v1", api_key="dummy")

  chunks = client.chat.completions.create(
      model="nemotron-smoke",
      messages=[{"role": "user", "content": "What is an 18% tip on a $100 bill? Use the tool."}],
      tools=[{
          "type": "function",
          "function": {
              "name": "calculate_tip",
              "description": "Calculate a tip amount.",
              "parameters": {
                  "type": "object",
                  "properties": {
                      "bill_total": {"type": "number"},
                      "tip_percentage": {"type": "number"},
                  },
                  "required": ["bill_total", "tip_percentage"],
              },
          },
      }],
      tool_choice="auto",
      temperature=0,
      stream=True,
  )

  content = []
  tool_calls = {}

  for chunk in chunks:
      delta = chunk.choices[0].delta
      if delta.content:
          content.append(delta.content)
      for tc in delta.tool_calls or []:
          state = tool_calls.setdefault(tc.index, {"name": "", "arguments": ""})
          if tc.function and tc.function.name:
              state["name"] += tc.function.name
          if tc.function and tc.function.arguments:
              state["arguments"] += tc.function.arguments

  print("content:", "".join(content))
  print("tool_calls:", tool_calls)
  assert tool_calls
  assert any(tc["name"] == "calculate_tip" for tc in tool_calls.values())

Test Result

Non-streming:

content: Okay, the user is asking for an 18% tip on a $100 bill and wants me to use the tool. Let me check the available functions. There's a calculate_tip function that takes bill_total and tip_percentage. The parameters are both required. The bill here is $100, and the tip percentage is 18. I need to call the function with these values. Let me make sure the parameters are numbers. Yes, 100 and 18 are numbers. So the tool call should be calculate_tip with bill_total 100 and tip_percentage 18. That should give the tip amount. I don't see any missing info here, so I can proceed to call the tool.
</think>


tool_calls: [ChatCompletionMessageFunctionToolCall(id='chatcmpl-tool-bc97c3bce8f6591e', function=Function(arguments='{"bill_total": 100, "tip_percentage": 18}', name='calculate_tip'), type='function')]

Streaming:

content: Okay, the user is asking for an 18% tip on a $100 bill and wants me to use the tool. Let me check the available functions. There's a calculate_tip function that takes bill_total and tip_percentage. The parameters are both required. The bill here is $100, and the tip percentage is 18. I need to call the function with these values. Let me make sure the parameters are numbers. Yes, 100 and 18 are numbers. So the tool call should be calculate_tip with bill_total 100 and tip_percentage 18. That should give the tip amount. I don't see any missing info here, so I can proceed to call the tool.
</think>


tool_calls: {0: {'name': 'calculate_tip', 'arguments': '{"bill_total": 100, "tip_percentage": 18}'}}

test_nanotron_streaming.py
test_nanotron_non_streaming.py

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

sniper35 added 2 commits May 11, 2026 00:05
  Register a built-in parser for Nemotron <TOOLCALL> JSON payloads,
  add a matching chat template example, and cover streaming extraction
  for content-plus-tool and parallel-call chunk boundaries.

Signed-off-by: Dong Wang <dongw2019@gmail.com>
…unk boundaries

Signed-off-by: Dong Wang <dongw2019@gmail.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 11, 2026

Documentation preview: https://vllm--42255.org.readthedocs.build/en/42255/

@mergify mergify Bot added documentation Improvements or additions to documentation nvidia tool-calling labels May 11, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR adds the NemotronJSONToolParser and a corresponding Jinja2 chat template to support tool calling for Nemotron models. It includes documentation and tests for streaming and non-streaming tool extraction. Feedback points out a potential JSON escaping issue in the Jinja2 template, recommending the tojson filter for function names instead of manual quoting.

Comment thread examples/tool_chat_template_nemotron_json.jinja Outdated
@sniper35 sniper35 changed the title Add tool parser for Nvidia Nemotron family models Add tool parser for Nvidia NVIDIA-Nemotron-Nano-9B-v2 model May 11, 2026
sniper35 added 2 commits May 11, 2026 00:51
Signed-off-by: Dong Wang <dongw2019@gmail.com>
Signed-off-by: Dong Wang <dongw2019@gmail.com>
@sniper35 sniper35 marked this pull request as ready for review May 11, 2026 00:58
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@sniper35 sniper35 changed the title Add tool parser for Nvidia NVIDIA-Nemotron-Nano-9B-v2 model Add tool parser for NVIDIA-Nemotron-Nano-9B-v2 model May 11, 2026
@sniper35 sniper35 changed the title Add tool parser for NVIDIA-Nemotron-Nano-9B-v2 model [Tool Parser]Add tool parser for NVIDIA-Nemotron-Nano-9B-v2 model May 11, 2026
class ExampleToolParser(ToolParser):
def __init__(self, tokenizer: TokenizerLike):
super().__init__(tokenizer)
def __init__(self, tokenizer: TokenizerLike, tools=None):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to align with with the latest code base

@ameyasm1154
Copy link
Copy Markdown

I am from the Nemotron team and had worked on this model. The tool-parser implementation looks correct and the tests show it functions correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation nvidia tool-calling

Projects

Status: No status
Status: No status

Development

Successfully merging this pull request may close these issues.

[Feature]: Add nemotron_json as built-in tool parser (NVIDIA Nemotron-Nano-9B-v2 plugin breaks against v0.20.x module reorg)

2 participants