fix(nemotron3): faithful Nano/Super/Ultra rendering + per-variant config split#84
Open
hallerite wants to merge 2 commits into
Open
fix(nemotron3): faithful Nano/Super/Ultra rendering + per-variant config split#84hallerite wants to merge 2 commits into
hallerite wants to merge 2 commits into
Conversation
…fig split
Make the renderer byte-for-byte match apply_chat_template on branches the
shared barrage didn't cover, verified against the real cached templates.
Faithfulness (assistant body now mirrors the template's string algebra —
assemble <think>…</think>{content}, trim, append one separator — and is
tokenized in one pass):
- reason → tool-call / empty content no longer emits a stray blank line
(</think>\n<tool_call>, not </think>\n\n<tool_call>); same for the
no-tool empty-content case.
- history-truncation boundary is last_user_idx (was last_plain_assistant_idx)
for every variant, so in-flight tool-cycle reasoning is kept.
- inline <think>…</think> in content renders verbatim (no reformat).
- user / system / tool / reasoning_content emitted unstripped.
Variant split (low_effort / medium_effort are real per-variant Jinja kwargs):
- nemotron-3 (Nano/Super): enable_thinking, truncate_history_thinking, low_effort.
- nemotron-3-ultra (new discriminator): + medium_effort.
- one shared Nemotron3Renderer selects the variant from config.name; drops the
ultra flag, _default_ultra, and _ULTRA_DEFAULTS. _is_super kept to no-op
low_effort on Nano. Bad combos now fail at config-load.
BREAKING: Nemotron3RendererConfig(ultra=True) → Nemotron3UltraRendererConfig().
"auto" resolution is unaffected.
Tests: new tests/test_nemotron3_parity.py (exhaustive Nano/Super/Ultra parity);
effort kwargs wired into the config-parity matrix; test_nemotron3_ultra.py
rewritten for the two-config wiring.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ApprovabilityVerdict: Needs human review This PR refactors the Nemotron-3 renderer by splitting configs into two variants and introduces new runtime behavior (reasoning-effort hints via You can customize Macroscope's approvability policy. Learn more. |
…lass under two names Match the house style (GLM5Renderer/GLM51Renderer, Qwen35Renderer/Qwen36Renderer): each registered renderer name gets its own class. nemotron-3-ultra now maps to a Nemotron3UltraRenderer(Nemotron3Renderer) sibling that flips the _ultra / _config_cls class hooks, rather than registering one class under two names and branching on config.name. No behavior change; full suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes the Nemotron-3 renderer byte-for-byte faithful to the Nano, Super, and Ultra chat templates, and splits the typed config per template variant so each exposes only the kwargs it actually supports.
The existing 18-case shared barrage passed, but several untested branches diverged from
apply_chat_template. Verified against the real cached templates for all three checkpoints.Faithfulness fixes
Root cause was hand-assembled newline glue + premature
.strip(). The assistant body is now built as a string that mirrors the template's own algebra (assemble<think>…</think>{content}, trim, append exactly one separator), then tokenized in one pass — so it matchesapply_chat_templateby construction.</think>\n\n<tool_call>instead of</think>\n<tool_call>); same for the no-tool empty-content case. The most common agentic shape. ✅last_plain_assistant_idx; the templates (all three) useloop.index0 < last_user_idx. The renderer was dropping in-flight tool-cycle reasoning the template keeps. ✅<think>…</think>incontentwas parsed and reformatted; the template emits it verbatim (same contract enforced for Kimi). ✅reasoning_contentwere.strip()-ed; the template emits them verbatim (assistant is trimmed only where the template trims). ✅Variant split (config honesty)
low_effortandmedium_effortare real per-variant Jinja kwargs (Super and Ultra respectively; Nano has neither). Rather than expose both on one config with silent no-ops, the config is split:nemotron-3— Nano / Super. Fields:enable_thinking,truncate_history_thinking,low_effort.nemotron-3-ultra(new discriminator) — Ultra. Fields:enable_thinking,truncate_history_thinking,medium_effort.Both route to one shared
Nemotron3Renderer, which now selects its variant fromconfig.name. This deletes theultraflag,_default_ultra, and_ULTRA_DEFAULTS;_is_super(name-based) is kept solely to no-oplow_efforton Nano (Nano/Super share a config). Bad combinations now fail at config-load — matching the discriminated-union philosophy inconfigs.py.Nemotron3RendererConfig(ultra=True)→Nemotron3UltraRendererConfig()."auto"resolution is unaffected (resolves by model name). Downstream configs (prime-rl / verifiers) that pinultra=need updating.Tests
tests/test_nemotron3_parity.py: exhaustive token-for-token parity across Nano/Super/Ultra (reasoning±tool-calls, truncation boundary, inline-think, verbatim whitespace, effort kwargs, gen-prompt/thinking toggles).low_effortexercised on both Nano (no-op) and Super (active);medium_efforton Ultra).test_nemotron3_ultra.pyrewritten for the two-config wiring.🤖 Generated with Claude Code
Note
Medium Risk
Token streams and typed config shape change for Nemotron-3 (especially assistant reasoning, tool calls, and
ultra=callers); auto model resolution is unchanged but explicit configs must migrate.Overview
Nemotron-3 rendering is brought in line with
apply_chat_templateand Ultra is split into its own typed config/renderer entry.Assistant turns are built as a single body string (think/content/tool XML) and tokenized in one pass, fixing newline glue around reason → tool calls, historical thinking truncation (now
last_user_idxon all variants, not “last plain assistant”), verbatim inline `` in content, and unstripped user/system/tool/reasoning_contentbytes. `nemotron-3-ultra` is a new discriminator with `Nemotron3UltraRenderer` / `Nemotron3UltraRendererConfig` (`medium_effort`); Nano/Super stay on `nemotron-3` with `low_effort` (Super-only via `_is_super`). The internal `ultra` selector and name-based Ultra defaults are removed—Ultra checkpoints map in `MODEL_RENDERER_MAP` instead. `bridge_to_next_turn` returns `None` when an effort hint is active so the last-user hint cannot go stale.Breaking:
Nemotron3RendererConfig(ultra=True)→Nemotron3UltraRendererConfig(); token ids change for affected Nemotron-3 conversations.New
tests/test_nemotron3_parity.pyplus wiring/parity matrix updates for effort kwargs.Reviewed by Cursor Bugbot for commit f70ddb7. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Split Nemotron-3 Ultra into a dedicated renderer and config separate from Nano/Super
Nemotron3UltraRendererandNemotron3UltraRendererConfigas distinct classes; Ultra checkpoints now route to'nemotron-3-ultra'inMODEL_RENDERER_MAPand the registry, while Nano/Super remain on'nemotron-3'.ultraselector field fromNemotron3RendererConfigand replaces per-model defaults with explicit subclass hooks; Ultra usesmedium_effortand Nano/Super uselow_effortas template kwargs.</think>glue rendering to be faithful per variant, and extracts tool-call XML emission into a shared_format_tool_callstatic method.tests/test_nemotron3_parity.pycovering reasoning boundaries, thinking truncation, whitespace fidelity, and effort kwargs for all three variants.ultra=Trueor cross-variant kwargs (e.g.medium_effortto the Nano/Super config) now raises a validation error.Macroscope summarized f70ddb7.