Skip to content

fix(nemotron3): faithful Nano/Super/Ultra rendering + per-variant config split#84

Open
hallerite wants to merge 2 commits into
mainfrom
fix/nemotron3-template-faithfulness
Open

fix(nemotron3): faithful Nano/Super/Ultra rendering + per-variant config split#84
hallerite wants to merge 2 commits into
mainfrom
fix/nemotron3-template-faithfulness

Conversation

@hallerite

@hallerite hallerite commented Jun 10, 2026

Copy link
Copy Markdown
Member

Summary

Makes the Nemotron-3 renderer byte-for-byte faithful to the Nano, Super, and Ultra chat templates, and splits the typed config per template variant so each exposes only the kwargs it actually supports.

The existing 18-case shared barrage passed, but several untested branches diverged from apply_chat_template. Verified against the real cached templates for all three checkpoints.

Faithfulness fixes

Root cause was hand-assembled newline glue + premature .strip(). The assistant body is now built as a string that mirrors the template's own algebra (assemble <think>…</think>{content}, trim, append exactly one separator), then tokenized in one pass — so it matches apply_chat_template by construction.

  1. reason → tool-call / empty content emitted a stray blank line (</think>\n\n<tool_call> instead of </think>\n<tool_call>); same for the no-tool empty-content case. The most common agentic shape. ✅
  2. History-truncation boundary was last_plain_assistant_idx; the templates (all three) use loop.index0 < last_user_idx. The renderer was dropping in-flight tool-cycle reasoning the template keeps. ✅
  3. Inline <think>…</think> in content was parsed and reformatted; the template emits it verbatim (same contract enforced for Kimi). ✅
  4. Whitespace: user/system/tool/reasoning_content were .strip()-ed; the template emits them verbatim (assistant is trimmed only where the template trims). ✅

Variant split (config honesty)

low_effort and medium_effort are real per-variant Jinja kwargs (Super and Ultra respectively; Nano has neither). Rather than expose both on one config with silent no-ops, the config is split:

  • nemotron-3 — Nano / Super. Fields: enable_thinking, truncate_history_thinking, low_effort.
  • nemotron-3-ultra (new discriminator) — Ultra. Fields: enable_thinking, truncate_history_thinking, medium_effort.

Both route to one shared Nemotron3Renderer, which now selects its variant from config.name. This deletes the ultra flag, _default_ultra, and _ULTRA_DEFAULTS; _is_super (name-based) is kept solely to no-op low_effort on Nano (Nano/Super share a config). Bad combinations now fail at config-load — matching the discriminated-union philosophy in configs.py.

⚠️ API change

Nemotron3RendererConfig(ultra=True)Nemotron3UltraRendererConfig(). "auto" resolution is unaffected (resolves by model name). Downstream configs (prime-rl / verifiers) that pin ultra= need updating.

Tests

  • New tests/test_nemotron3_parity.py: exhaustive token-for-token parity across Nano/Super/Ultra (reasoning±tool-calls, truncation boundary, inline-think, verbatim whitespace, effort kwargs, gen-prompt/thinking toggles).
  • Effort kwargs wired into the auto-derived config-parity matrix (low_effort exercised on both Nano (no-op) and Super (active); medium_effort on Ultra).
  • test_nemotron3_ultra.py rewritten for the two-config wiring.
  • Full suite green.

🤖 Generated with Claude Code


Note

Medium Risk
Token streams and typed config shape change for Nemotron-3 (especially assistant reasoning, tool calls, and ultra= callers); auto model resolution is unchanged but explicit configs must migrate.

Overview
Nemotron-3 rendering is brought in line with apply_chat_template and Ultra is split into its own typed config/renderer entry.

Assistant turns are built as a single body string (think/content/tool XML) and tokenized in one pass, fixing newline glue around reason → tool calls, historical thinking truncation (now last_user_idx on all variants, not “last plain assistant”), verbatim inline `` in content, and unstripped user/system/tool/reasoning_content bytes. `nemotron-3-ultra` is a new discriminator with `Nemotron3UltraRenderer` / `Nemotron3UltraRendererConfig` (`medium_effort`); Nano/Super stay on `nemotron-3` with `low_effort` (Super-only via `_is_super`). The internal `ultra` selector and name-based Ultra defaults are removed—Ultra checkpoints map in `MODEL_RENDERER_MAP` instead. `bridge_to_next_turn` returns `None` when an effort hint is active so the last-user hint cannot go stale.

Breaking: Nemotron3RendererConfig(ultra=True)Nemotron3UltraRendererConfig(); token ids change for affected Nemotron-3 conversations.

New tests/test_nemotron3_parity.py plus wiring/parity matrix updates for effort kwargs.

Reviewed by Cursor Bugbot for commit f70ddb7. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Split Nemotron-3 Ultra into a dedicated renderer and config separate from Nano/Super

  • Introduces Nemotron3UltraRenderer and Nemotron3UltraRendererConfig as distinct classes; Ultra checkpoints now route to 'nemotron-3-ultra' in MODEL_RENDERER_MAP and the registry, while Nano/Super remain on 'nemotron-3'.
  • Removes the ultra selector field from Nemotron3RendererConfig and replaces per-model defaults with explicit subclass hooks; Ultra uses medium_effort and Nano/Super use low_effort as template kwargs.
  • Fixes whitespace and </think> glue rendering to be faithful per variant, and extracts tool-call XML emission into a shared _format_tool_call static method.
  • Adds a parity test suite in tests/test_nemotron3_parity.py covering reasoning boundaries, thinking truncation, whitespace fidelity, and effort kwargs for all three variants.
  • Behavioral Change: passing ultra=True or cross-variant kwargs (e.g. medium_effort to the Nano/Super config) now raises a validation error.

Macroscope summarized f70ddb7.

…fig split

Make the renderer byte-for-byte match apply_chat_template on branches the
shared barrage didn't cover, verified against the real cached templates.

Faithfulness (assistant body now mirrors the template's string algebra —
assemble <think>…</think>{content}, trim, append one separator — and is
tokenized in one pass):
- reason → tool-call / empty content no longer emits a stray blank line
  (</think>\n<tool_call>, not </think>\n\n<tool_call>); same for the
  no-tool empty-content case.
- history-truncation boundary is last_user_idx (was last_plain_assistant_idx)
  for every variant, so in-flight tool-cycle reasoning is kept.
- inline <think>…</think> in content renders verbatim (no reformat).
- user / system / tool / reasoning_content emitted unstripped.

Variant split (low_effort / medium_effort are real per-variant Jinja kwargs):
- nemotron-3 (Nano/Super): enable_thinking, truncate_history_thinking, low_effort.
- nemotron-3-ultra (new discriminator): + medium_effort.
- one shared Nemotron3Renderer selects the variant from config.name; drops the
  ultra flag, _default_ultra, and _ULTRA_DEFAULTS. _is_super kept to no-op
  low_effort on Nano. Bad combos now fail at config-load.

BREAKING: Nemotron3RendererConfig(ultra=True) → Nemotron3UltraRendererConfig().
"auto" resolution is unaffected.

Tests: new tests/test_nemotron3_parity.py (exhaustive Nano/Super/Ultra parity);
effort kwargs wired into the config-parity matrix; test_nemotron3_ultra.py
rewritten for the two-config wiring.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@macroscopeapp

macroscopeapp Bot commented Jun 10, 2026

Copy link
Copy Markdown

Approvability

Verdict: Needs human review

This PR refactors the Nemotron-3 renderer by splitting configs into two variants and introduces new runtime behavior (reasoning-effort hints via low_effort/medium_effort kwargs). The changes to rendering logic and model routing warrant human review to verify the behavioral changes are correct.

You can customize Macroscope's approvability policy. Learn more.

…lass under two names

Match the house style (GLM5Renderer/GLM51Renderer, Qwen35Renderer/Qwen36Renderer):
each registered renderer name gets its own class. nemotron-3-ultra now maps to a
Nemotron3UltraRenderer(Nemotron3Renderer) sibling that flips the _ultra / _config_cls
class hooks, rather than registering one class under two names and branching on
config.name. No behavior change; full suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant