fix(nemotron3): faithful Nano/Super/Ultra rendering + per-variant config split by hallerite · Pull Request #84 · PrimeIntellect-ai/renderers

hallerite · 2026-06-10T21:46:07Z

Summary

Makes the Nemotron-3 renderer byte-for-byte faithful to the Nano, Super, and Ultra chat templates, and splits the typed config per template variant so each exposes only the kwargs it actually supports.

The existing 18-case shared barrage passed, but several untested branches diverged from apply_chat_template. Verified against the real cached templates for all three checkpoints.

Faithfulness fixes

Root cause was hand-assembled newline glue + premature .strip(). The assistant body is now built as a string that mirrors the template's own algebra (assemble <think>…</think>{content}, trim, append exactly one separator), then tokenized in one pass — so it matches apply_chat_template by construction.

reason → tool-call / empty content emitted a stray blank line (</think>\n\n<tool_call> instead of </think>\n<tool_call>); same for the no-tool empty-content case. The most common agentic shape. ✅
History-truncation boundary was last_plain_assistant_idx; the templates (all three) use loop.index0 < last_user_idx. The renderer was dropping in-flight tool-cycle reasoning the template keeps. ✅
Inline <think>…</think> in content was parsed and reformatted; the template emits it verbatim (same contract enforced for Kimi). ✅
Whitespace: user/system/tool/reasoning_content were .strip()-ed; the template emits them verbatim (assistant is trimmed only where the template trims). ✅

Variant split (config honesty)

low_effort and medium_effort are real per-variant Jinja kwargs (Super and Ultra respectively; Nano has neither). Rather than expose both on one config with silent no-ops, the config is split:

nemotron-3 — Nano / Super. Fields: enable_thinking, truncate_history_thinking, low_effort.
nemotron-3-ultra (new discriminator) — Ultra. Fields: enable_thinking, truncate_history_thinking, medium_effort.

Both route to one shared Nemotron3Renderer, which now selects its variant from config.name. This deletes the ultra flag, _default_ultra, and _ULTRA_DEFAULTS; _is_super (name-based) is kept solely to no-op low_effort on Nano (Nano/Super share a config). Bad combinations now fail at config-load — matching the discriminated-union philosophy in configs.py.

⚠️ API change

Nemotron3RendererConfig(ultra=True) → Nemotron3UltraRendererConfig(). "auto" resolution is unaffected (resolves by model name). Downstream configs (prime-rl / verifiers) that pin ultra= need updating.

Tests

New tests/test_nemotron3_parity.py: exhaustive token-for-token parity across Nano/Super/Ultra (reasoning±tool-calls, truncation boundary, inline-think, verbatim whitespace, effort kwargs, gen-prompt/thinking toggles).
Effort kwargs wired into the auto-derived config-parity matrix (low_effort exercised on both Nano (no-op) and Super (active); medium_effort on Ultra).
test_nemotron3_ultra.py rewritten for the two-config wiring.
Full suite green.

🤖 Generated with Claude Code

Note

Medium Risk
Token streams and typed config shape change for Nemotron-3 (especially assistant reasoning, tool calls, and ultra= callers); auto model resolution is unchanged but explicit configs must migrate.

Overview
Nemotron-3 rendering is brought in line with apply_chat_template and Ultra is split into its own typed config/renderer entry.

Assistant turns are built as a single body string (think/content/tool XML) and tokenized in one pass, fixing newline glue around reason → tool calls, historical thinking truncation (now last_user_idx on all variants, not “last plain assistant”), verbatim inline `` in content, and unstripped user/system/tool/reasoning_content bytes. `nemotron-3-ultra` is a new discriminator with `Nemotron3UltraRenderer` / `Nemotron3UltraRendererConfig` (`medium_effort`); Nano/Super stay on `nemotron-3` with `low_effort` (Super-only via `_is_super`). The internal `ultra` selector and name-based Ultra defaults are removed—Ultra checkpoints map in `MODEL_RENDERER_MAP` instead. `bridge_to_next_turn` returns `None` when an effort hint is active so the last-user hint cannot go stale.

Breaking: Nemotron3RendererConfig(ultra=True) → Nemotron3UltraRendererConfig(); token ids change for affected Nemotron-3 conversations.

New tests/test_nemotron3_parity.py plus wiring/parity matrix updates for effort kwargs.

^{Reviewed by Cursor Bugbot for commit f70ddb7. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Split Nemotron-3 Ultra into a dedicated renderer and config separate from Nano/Super

Introduces Nemotron3UltraRenderer and Nemotron3UltraRendererConfig as distinct classes; Ultra checkpoints now route to 'nemotron-3-ultra' in MODEL_RENDERER_MAP and the registry, while Nano/Super remain on 'nemotron-3'.
Removes the ultra selector field from Nemotron3RendererConfig and replaces per-model defaults with explicit subclass hooks; Ultra uses medium_effort and Nano/Super use low_effort as template kwargs.
Fixes whitespace and </think> glue rendering to be faithful per variant, and extracts tool-call XML emission into a shared _format_tool_call static method.
Adds a parity test suite in tests/test_nemotron3_parity.py covering reasoning boundaries, thinking truncation, whitespace fidelity, and effort kwargs for all three variants.
Behavioral Change: passing ultra=True or cross-variant kwargs (e.g. medium_effort to the Nano/Super config) now raises a validation error.

^{Macroscope summarized f70ddb7.}

…fig split Make the renderer byte-for-byte match apply_chat_template on branches the shared barrage didn't cover, verified against the real cached templates. Faithfulness (assistant body now mirrors the template's string algebra — assemble <think>…</think>{content}, trim, append one separator — and is tokenized in one pass): - reason → tool-call / empty content no longer emits a stray blank line (</think>\n<tool_call>, not </think>\n\n<tool_call>); same for the no-tool empty-content case. - history-truncation boundary is last_user_idx (was last_plain_assistant_idx) for every variant, so in-flight tool-cycle reasoning is kept. - inline <think>…</think> in content renders verbatim (no reformat). - user / system / tool / reasoning_content emitted unstripped. Variant split (low_effort / medium_effort are real per-variant Jinja kwargs): - nemotron-3 (Nano/Super): enable_thinking, truncate_history_thinking, low_effort. - nemotron-3-ultra (new discriminator): + medium_effort. - one shared Nemotron3Renderer selects the variant from config.name; drops the ultra flag, _default_ultra, and _ULTRA_DEFAULTS. _is_super kept to no-op low_effort on Nano. Bad combos now fail at config-load. BREAKING: Nemotron3RendererConfig(ultra=True) → Nemotron3UltraRendererConfig(). "auto" resolution is unaffected. Tests: new tests/test_nemotron3_parity.py (exhaustive Nano/Super/Ultra parity); effort kwargs wired into the config-parity matrix; test_nemotron3_ultra.py rewritten for the two-config wiring. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

macroscopeapp · 2026-06-10T21:48:00Z

Approvability

Verdict: Needs human review

This PR refactors the Nemotron-3 renderer by splitting configs into two variants and introduces new runtime behavior (reasoning-effort hints via low_effort/medium_effort kwargs). The changes to rendering logic and model routing warrant human review to verify the behavioral changes are correct.

^{You can customize Macroscope's approvability policy. Learn more.}

…lass under two names Match the house style (GLM5Renderer/GLM51Renderer, Qwen35Renderer/Qwen36Renderer): each registered renderer name gets its own class. nemotron-3-ultra now maps to a Nemotron3UltraRenderer(Nemotron3Renderer) sibling that flips the _ultra / _config_cls class hooks, rather than registering one class under two names and branching on config.name. No behavior change; full suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(nemotron3): faithful Nano/Super/Ultra rendering + per-variant config split#84

fix(nemotron3): faithful Nano/Super/Ultra rendering + per-variant config split#84
hallerite wants to merge 2 commits into
mainfrom
fix/nemotron3-template-faithfulness

hallerite commented Jun 10, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

macroscopeapp Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hallerite commented Jun 10, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Faithfulness fixes

Variant split (config honesty)

⚠️ API change

Tests

Split Nemotron-3 Ultra into a dedicated renderer and config separate from Nano/Super

Uh oh!

macroscopeapp Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hallerite commented Jun 10, 2026 •

edited by macroscopeapp Bot

Loading

macroscopeapp Bot commented Jun 10, 2026 •

edited

Loading