Skip to content

feat(clients): add dynamo_chat renderer transport (TITO over Dynamo)#1574

Open
biswapanda wants to merge 16 commits into
PrimeIntellect-ai:mainfrom
biswapanda:rl-sdk-4
Open

feat(clients): add dynamo_chat renderer transport (TITO over Dynamo)#1574
biswapanda wants to merge 16 commits into
PrimeIntellect-ai:mainfrom
biswapanda:rl-sdk-4

Conversation

@biswapanda

@biswapanda biswapanda commented Jun 9, 2026

Copy link
Copy Markdown

Description

Adds a dynamo_chat renderer transport so the verifiers TITO (tokens-in/tokens-out) client can run multi-turn against NVIDIA Dynamo, alongside the existing vLLM TITO path. Previously the TITO client only spoke vLLM's surface (POST /v1/chat/completions/tokens + /tokenize); Dynamo serves neither route, so multi-turn TITO against Dynamo silently degraded to MITO from turn 2 onward.

Changes

  • types: add RendererTransport = Literal["vllm_generate", "dynamo_chat"] and ClientConfig.renderer_transport (default vllm_generate — the new path is opt-in).
  • renderer_client / token client: thread renderer_transport through to renderers.generate() and route by transport.
    • vllm_generate (default): unchanged — POST /v1/chat/completions/tokens, bridge tokens via /tokenize.
    • dynamo_chat: POST /v1/chat/completions with placeholder messages + nvext.token_data=prompt_ids; bridge tokens computed locally via the model's HF fast tokenizer (no /tokenize round-trip). Engine token IDs + logprobs come back under nvext.engine_data.
  • chat completions client: graft nvext.engine_data (engine token IDs + per-token logprobs) onto the OpenAI-shaped response when present and the vLLM-native fields are absent, keeping the rest of the pipeline transport-agnostic.
  • routed_experts contract: RoutedExpertsPayload gains dtype: NotRequired[Literal["uint8", "uint16", "int32"]] so the routed-experts buffer is self-describing (≤256 experts → uint8, larger → uint16/int32) instead of consumers assuming a fixed width; the JSON-gate sidecar stripper is bounded to the routed_experts object and made key-order robust.
  • Fix a normalize_for_comparison asymmetry so get_prompt_ids matches vf.Message-shaped input (drops None-valued keys).

Type of Change

  • New feature (non-breaking change which adds functionality)

Review

Codex adversarial review: SIGN-OFF (head ea53210). All review threads resolved.

Notes

Default behavior is unchanged (renderer_transport defaults to vllm_generate). Companion to PrimeIntellect-ai/renderers#79 and PrimeIntellect-ai/prime-rl#2737.


Note

Medium Risk
Changes multi-turn inference wiring and token/logprob parsing on a new opt-in path; default vllm_generate is unchanged, but dynamo_chat depends on local tokenizer parity with the engine and correct engine_data grafting for training tokens.

Overview
Adds opt-in dynamo_chat transport so multi-turn TITO works against NVIDIA Dynamo (which lacks vLLM’s /chat/completions/tokens and /tokenize), via ClientConfig.renderer_transport shared with RendererClient.

OpenAIChatCompletionsTokenClient: when dynamo_chat is set, stitched prompts go to POST /v1/chat/completions with nvext.token_data, requests opt into nvext.extra_fields=["engine_data"], vLLM-only sampling keys are scrubbed, and bridge tokenization uses a cached local HF AutoTokenizer (transformers required) instead of /tokenize.

OpenAIChatCompletionsClient: new _graft_engine_data maps nvext.engine_data (and related shapes) onto choices[0].token_ids, prompt_token_ids, and synthesized logprobs when content is missing; parse_tokens drops tokens if logprob length mismatches completion IDs.

TITO stitching fix: normalize_for_comparison drops None keys so vf.Message-shaped prompts match trajectory prefixes (avoids MITO fallback from turn 2+).

Routed experts: optional dtype on RoutedExpertsPayload; strip_routed_experts_data finds data inside the routed_experts object regardless of key order and ignores unrelated data fields elsewhere.

Reviewed by Cursor Bugbot for commit b31ff2d. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add dynamo_chat renderer transport (TITO over Dynamo) to OpenAIChatCompletionsTokenClient

  • Adds a renderer_transport field to ClientConfig (default "vllm_generate") and a RendererTransport type alias in types.py to select between "vllm_generate" and "dynamo_chat" wire shapes.
  • Adds _post_dynamo_chat to openai_chat_completions_token_client.py, which sends pre-tokenized prompts via nvext.token_data, requests engine_data via nvext.extra_fields, and scrubs vLLM-only keys (return_token_ids, spaces_between_special_tokens, priority).
  • Adds local HF tokenizer fallback (_get_local_tokenizer) so get_prompt_ids avoids the /tokenize HTTP call when using Dynamo transport.
  • Adds _graft_engine_data to openai_chat_completions_client.py to populate choice.token_ids, response.prompt_token_ids, and synthesize choice.logprobs from nvext.engine_data on responses.
  • Rewrites strip_routed_experts_data in response_utils.py to locate the routed_experts data key in a key-order-agnostic way, fixing false positives from sibling objects.

Macroscope summarized b31ff2d.

Comment thread verifiers/clients/openai_chat_completions_token_client.py Outdated
Comment thread verifiers/clients/openai_chat_completions_token_client.py Outdated
Comment thread verifiers/types.py Outdated
…tokens

Dynamo's vLLM and SGLang backends emit engine-emitted token IDs and per-token
logprobs under `response.nvext.engine_data` when the client opts in via
`nvext.extra_fields=["engine_data"]` (PR #8119). The vLLM-native path uses
non-standard top-level fields (`choices[0].token_ids`, `response.prompt_token_ids`).

Add a small graft inside `from_native_response.parse_tokens` that copies the
engine_data fields onto the OpenAI-shaped response when present and the
top-level fields are absent. The rest of parse_tokens then reads via the
standard SDK attribute path regardless of backend.
The verifiers TITO client previously only spoke vLLM's TITO surface
(POST /v1/chat/completions/tokens with tokens=prompt_ids; bridge tokens
via /tokenize). Dynamo serves neither route, so multi-turn TITO against
Dynamo silently degraded to MITO every turn-2+.

This teaches OpenAIChatCompletionsTokenClient to read
ClientConfig.renderer_transport and route accordingly:

  * prime_vllm_generate (default): unchanged. POST /v1/chat/completions/tokens
    with tokens=prompt_ids; bridge tokens via /tokenize HTTP. Requires vLLM
    >= 0.20.

  * dynamo_chat_nvext: POST /v1/chat/completions with placeholder messages +
    nvext.token_data=prompt_ids. Bridge tokens are computed locally via the
    model's HF fast tokenizer (no /tokenize HTTP round-trip). Server returns
    engine-side token IDs and logprobs under nvext.engine_data (PR #8119
    channel), parsed by the OpenAIChatCompletionsClient.from_native_response
    graft so the rest of the pipeline is transport-agnostic.

Also fix the normalize_for_comparison asymmetry that caused get_prompt_ids
to never match for vf.Message-shaped input (the form MultiTurnEnv produces
after maybe_normalize_messages). Drop None-valued keys so model_dump's
exhaustive view is equivalent to to_native_prompt's slimmer view.
…ChatCompletion, scrub return_token_ids, forward sampling args, graft engine_data logprobs) + rename to dynamo_chat
Comment thread verifiers/clients/openai_chat_completions_token_client.py Outdated
Comment thread verifiers/clients/openai_chat_completions_client.py
Comment thread verifiers/clients/openai_chat_completions_token_client.py
…prob length, tokenizer override, drop dead renderer field
Comment thread verifiers/clients/openai_chat_completions_token_client.py

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 4aa48a4. Configure here.

Comment thread verifiers/clients/openai_chat_completions_token_client.py
@biswapanda biswapanda changed the title feat(clients): add dynamo_chat_nvext renderer transport for multi-turn TITO feat(clients): add dynamo_chat renderer transport (TITO over Dynamo) Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant