feat(clients): add dynamo_chat renderer transport (TITO over Dynamo) by biswapanda · Pull Request #1574 · PrimeIntellect-ai/verifiers

biswapanda · 2026-06-09T00:16:37Z

Description

Adds a dynamo_chat renderer transport so the verifiers TITO (tokens-in/tokens-out) client can run multi-turn against NVIDIA Dynamo, alongside the existing vLLM TITO path. Previously the TITO client only spoke vLLM's surface (POST /v1/chat/completions/tokens + /tokenize); Dynamo serves neither route, so multi-turn TITO against Dynamo silently degraded to MITO from turn 2 onward.

Changes

types: add RendererTransport = Literal["vllm_generate", "dynamo_chat"] and ClientConfig.renderer_transport (default vllm_generate — the new path is opt-in).
renderer_client / token client: thread renderer_transport through to renderers.generate() and route by transport.
- vllm_generate (default): unchanged — POST /v1/chat/completions/tokens, bridge tokens via /tokenize.
- dynamo_chat: POST /v1/chat/completions with placeholder messages + nvext.token_data=prompt_ids; bridge tokens computed locally via the model's HF fast tokenizer (no /tokenize round-trip). Engine token IDs + logprobs come back under nvext.engine_data.
chat completions client: graft nvext.engine_data (engine token IDs + per-token logprobs) onto the OpenAI-shaped response when present and the vLLM-native fields are absent, keeping the rest of the pipeline transport-agnostic.
routed_experts contract: RoutedExpertsPayload gains dtype: NotRequired[Literal["uint8", "uint16", "int32"]] so the routed-experts buffer is self-describing (≤256 experts → uint8, larger → uint16/int32) instead of consumers assuming a fixed width; the JSON-gate sidecar stripper is bounded to the routed_experts object and made key-order robust.
Fix a normalize_for_comparison asymmetry so get_prompt_ids matches vf.Message-shaped input (drops None-valued keys).

Type of Change

New feature (non-breaking change which adds functionality)

Review

Codex adversarial review: SIGN-OFF (head ea53210). All review threads resolved.

Notes

Default behavior is unchanged (renderer_transport defaults to vllm_generate). Companion to PrimeIntellect-ai/renderers#79 and PrimeIntellect-ai/prime-rl#2737.

Note

Medium Risk
Changes multi-turn inference wiring and token/logprob parsing on a new opt-in path; default vllm_generate is unchanged, but dynamo_chat depends on local tokenizer parity with the engine and correct engine_data grafting for training tokens.

Overview
Adds opt-in dynamo_chat transport so multi-turn TITO works against NVIDIA Dynamo (which lacks vLLM’s /chat/completions/tokens and /tokenize), via ClientConfig.renderer_transport shared with RendererClient.

OpenAIChatCompletionsTokenClient: when dynamo_chat is set, stitched prompts go to POST /v1/chat/completions with nvext.token_data, requests opt into nvext.extra_fields=["engine_data"], vLLM-only sampling keys are scrubbed, and bridge tokenization uses a cached local HF AutoTokenizer (transformers required) instead of /tokenize.

OpenAIChatCompletionsClient: new _graft_engine_data maps nvext.engine_data (and related shapes) onto choices[0].token_ids, prompt_token_ids, and synthesized logprobs when content is missing; parse_tokens drops tokens if logprob length mismatches completion IDs.

TITO stitching fix: normalize_for_comparison drops None keys so vf.Message-shaped prompts match trajectory prefixes (avoids MITO fallback from turn 2+).

Routed experts: optional dtype on RoutedExpertsPayload; strip_routed_experts_data finds data inside the routed_experts object regardless of key order and ignores unrelated data fields elsewhere.

^{Reviewed by Cursor Bugbot for commit b31ff2d. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Add `dynamo_chat` renderer transport (TITO over Dynamo) to `OpenAIChatCompletionsTokenClient`

Adds a renderer_transport field to ClientConfig (default "vllm_generate") and a RendererTransport type alias in types.py to select between "vllm_generate" and "dynamo_chat" wire shapes.
Adds _post_dynamo_chat to openai_chat_completions_token_client.py, which sends pre-tokenized prompts via nvext.token_data, requests engine_data via nvext.extra_fields, and scrubs vLLM-only keys (return_token_ids, spaces_between_special_tokens, priority).
Adds local HF tokenizer fallback (_get_local_tokenizer) so get_prompt_ids avoids the /tokenize HTTP call when using Dynamo transport.
Adds _graft_engine_data to openai_chat_completions_client.py to populate choice.token_ids, response.prompt_token_ids, and synthesize choice.logprobs from nvext.engine_data on responses.
Rewrites strip_routed_experts_data in response_utils.py to locate the routed_experts data key in a key-order-agnostic way, fixing false positives from sibling objects.

^{Macroscope summarized b31ff2d.}

…ansport

…tokens Dynamo's vLLM and SGLang backends emit engine-emitted token IDs and per-token logprobs under `response.nvext.engine_data` when the client opts in via `nvext.extra_fields=["engine_data"]` (PR #8119). The vLLM-native path uses non-standard top-level fields (`choices[0].token_ids`, `response.prompt_token_ids`). Add a small graft inside `from_native_response.parse_tokens` that copies the engine_data fields onto the OpenAI-shaped response when present and the top-level fields are absent. The rest of parse_tokens then reads via the standard SDK attribute path regardless of backend.

The verifiers TITO client previously only spoke vLLM's TITO surface (POST /v1/chat/completions/tokens with tokens=prompt_ids; bridge tokens via /tokenize). Dynamo serves neither route, so multi-turn TITO against Dynamo silently degraded to MITO every turn-2+. This teaches OpenAIChatCompletionsTokenClient to read ClientConfig.renderer_transport and route accordingly: * prime_vllm_generate (default): unchanged. POST /v1/chat/completions/tokens with tokens=prompt_ids; bridge tokens via /tokenize HTTP. Requires vLLM >= 0.20. * dynamo_chat_nvext: POST /v1/chat/completions with placeholder messages + nvext.token_data=prompt_ids. Bridge tokens are computed locally via the model's HF fast tokenizer (no /tokenize HTTP round-trip). Server returns engine-side token IDs and logprobs under nvext.engine_data (PR #8119 channel), parsed by the OpenAIChatCompletionsClient.from_native_response graft so the rest of the pipeline is transport-agnostic. Also fix the normalize_for_comparison asymmetry that caused get_prompt_ids to never match for vf.Message-shaped input (the form MultiTurnEnv produces after maybe_normalize_messages). Drop None-valued keys so model_dump's exhaustive view is equivalent to to_native_prompt's slimmer view.

…ken_ids (plan B3)

…rs.generate()

…ChatCompletion, scrub return_token_ids, forward sampling args, graft engine_data logprobs) + rename to dynamo_chat

… content-less; trim test comments

…p fixed allowlist) for vLLM-path parity

…prob length, tokenizer override, drop dead renderer field

…all paths)

…route dynamo TITO through routed-experts sidecar helper

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 4aa48a4. Configure here.}

…_pretrained must not block the event loop)

…er key-order robust

…ect; document dtype field

…chat comments

cursor Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread verifiers/clients/openai_chat_completions_token_client.py Outdated

Comment thread verifiers/clients/openai_chat_completions_token_client.py Outdated

Comment thread verifiers/types.py Outdated

This was referenced Jun 9, 2026

feat(client): add dynamo_chat transport + routed_experts to renderer generate PrimeIntellect-ai/renderers#79

Open

feat: dynamo inference backend integration PrimeIntellect-ai/prime-rl#2737

Open

biswapanda added 5 commits June 8, 2026 19:11

feat(types): add RendererTransport literal + ClientConfig.renderer_tr…

230384a

…ansport

feat(clients): graft top-level nvext.completion_token_ids + prompt_to…

f12bf63

…ken_ids (plan B3)

feat(clients): thread renderer_transport from ClientConfig to rendere…

ee3482a

…rs.generate()

biswapanda force-pushed the rl-sdk-4 branch from 68d8f48 to ee3482a Compare June 9, 2026 02:13

fix(clients): address PR review R1-R5 (guard transport kwarg, import …

3b58bf9

…ChatCompletion, scrub return_token_ids, forward sampling args, graft engine_data logprobs) + rename to dynamo_chat

cursor Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread verifiers/clients/openai_chat_completions_token_client.py Outdated

Comment thread verifiers/clients/openai_chat_completions_client.py

biswapanda added 2 commits June 9, 2026 00:41

fix(clients): graft engine_data logprobs even when choice logprobs is…

7a85b84

… content-less; trim test comments

fix(clients): dynamo_chat forwards full normalized sampling_args (dro…

7cbb603

…p fixed allowlist) for vLLM-path parity

cursor Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread verifiers/clients/openai_chat_completions_token_client.py

fix(clients): centralize Dynamo denylist scrub (MITO+TITO), guard log…

6b2dfbb

…prob length, tokenizer override, drop dead renderer field

cursor Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread verifiers/clients/openai_chat_completions_token_client.py

biswapanda added 2 commits June 9, 2026 01:31

fix(clients): enforce logprobs/ids length invariant in parse_tokens (…

9d260d3

…all paths)

fix(clients): centralize tokenizer override in _get_local_tokenizer; …

4aa48a4

…route dynamo TITO through routed-experts sidecar helper

cursor Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread verifiers/clients/openai_chat_completions_token_client.py

biswapanda added 4 commits June 9, 2026 03:11

fix(clients): load HF tokenizer inside worker thread (cache-miss from…

d713edc

…_pretrained must not block the event loop)

feat(types): add dtype to RoutedExpertsPayload contract

193c549

fix(routed_experts): tighten dtype to Literal and make sidecar stripp…

c30dad2

…er key-order robust

fix(routed_experts): bound sidecar stripper to the routed_experts obj…

ea53210

…ect; document dtype field

biswapanda changed the title ~~feat(clients): add dynamo_chat_nvext renderer transport for multi-turn TITO~~ feat(clients): add dynamo_chat renderer transport (TITO over Dynamo) Jun 10, 2026

docs(clients): drop PR-number and branch/plan references from dynamo_…

b31ff2d

…chat comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(clients): add dynamo_chat renderer transport (TITO over Dynamo)#1574

feat(clients): add dynamo_chat renderer transport (TITO over Dynamo)#1574
biswapanda wants to merge 16 commits into
PrimeIntellect-ai:mainfrom
biswapanda:rl-sdk-4

biswapanda commented Jun 9, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

biswapanda commented Jun 9, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Type of Change

Review

Notes

Add dynamo_chat renderer transport (TITO over Dynamo) to OpenAIChatCompletionsTokenClient

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

biswapanda commented Jun 9, 2026 •

edited by macroscopeapp Bot

Loading

Add `dynamo_chat` renderer transport (TITO over Dynamo) to `OpenAIChatCompletionsTokenClient`