feat(clients): add dynamo_chat renderer transport (TITO over Dynamo)#1574
Open
biswapanda wants to merge 16 commits into
Open
feat(clients): add dynamo_chat renderer transport (TITO over Dynamo)#1574biswapanda wants to merge 16 commits into
biswapanda wants to merge 16 commits into
Conversation
…tokens Dynamo's vLLM and SGLang backends emit engine-emitted token IDs and per-token logprobs under `response.nvext.engine_data` when the client opts in via `nvext.extra_fields=["engine_data"]` (PR #8119). The vLLM-native path uses non-standard top-level fields (`choices[0].token_ids`, `response.prompt_token_ids`). Add a small graft inside `from_native_response.parse_tokens` that copies the engine_data fields onto the OpenAI-shaped response when present and the top-level fields are absent. The rest of parse_tokens then reads via the standard SDK attribute path regardless of backend.
The verifiers TITO client previously only spoke vLLM's TITO surface
(POST /v1/chat/completions/tokens with tokens=prompt_ids; bridge tokens
via /tokenize). Dynamo serves neither route, so multi-turn TITO against
Dynamo silently degraded to MITO every turn-2+.
This teaches OpenAIChatCompletionsTokenClient to read
ClientConfig.renderer_transport and route accordingly:
* prime_vllm_generate (default): unchanged. POST /v1/chat/completions/tokens
with tokens=prompt_ids; bridge tokens via /tokenize HTTP. Requires vLLM
>= 0.20.
* dynamo_chat_nvext: POST /v1/chat/completions with placeholder messages +
nvext.token_data=prompt_ids. Bridge tokens are computed locally via the
model's HF fast tokenizer (no /tokenize HTTP round-trip). Server returns
engine-side token IDs and logprobs under nvext.engine_data (PR #8119
channel), parsed by the OpenAIChatCompletionsClient.from_native_response
graft so the rest of the pipeline is transport-agnostic.
Also fix the normalize_for_comparison asymmetry that caused get_prompt_ids
to never match for vf.Message-shaped input (the form MultiTurnEnv produces
after maybe_normalize_messages). Drop None-valued keys so model_dump's
exhaustive view is equivalent to to_native_prompt's slimmer view.
…ken_ids (plan B3)
…ChatCompletion, scrub return_token_ids, forward sampling args, graft engine_data logprobs) + rename to dynamo_chat
… content-less; trim test comments
…p fixed allowlist) for vLLM-path parity
…prob length, tokenizer override, drop dead renderer field
…route dynamo TITO through routed-experts sidecar helper
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 4aa48a4. Configure here.
…_pretrained must not block the event loop)
…er key-order robust
…ect; document dtype field
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Description
Adds a
dynamo_chatrenderer transport so the verifiers TITO (tokens-in/tokens-out) client can run multi-turn against NVIDIA Dynamo, alongside the existing vLLM TITO path. Previously the TITO client only spoke vLLM's surface (POST /v1/chat/completions/tokens+/tokenize); Dynamo serves neither route, so multi-turn TITO against Dynamo silently degraded to MITO from turn 2 onward.Changes
RendererTransport = Literal["vllm_generate", "dynamo_chat"]andClientConfig.renderer_transport(defaultvllm_generate— the new path is opt-in).renderer_transportthrough torenderers.generate()and route by transport.vllm_generate(default): unchanged —POST /v1/chat/completions/tokens, bridge tokens via/tokenize.dynamo_chat:POST /v1/chat/completionswith placeholder messages +nvext.token_data=prompt_ids; bridge tokens computed locally via the model's HF fast tokenizer (no/tokenizeround-trip). Engine token IDs + logprobs come back undernvext.engine_data.nvext.engine_data(engine token IDs + per-token logprobs) onto the OpenAI-shaped response when present and the vLLM-native fields are absent, keeping the rest of the pipeline transport-agnostic.RoutedExpertsPayloadgainsdtype: NotRequired[Literal["uint8", "uint16", "int32"]]so the routed-experts buffer is self-describing (≤256 experts → uint8, larger → uint16/int32) instead of consumers assuming a fixed width; the JSON-gate sidecar stripper is bounded to the routed_experts object and made key-order robust.normalize_for_comparisonasymmetry soget_prompt_idsmatchesvf.Message-shaped input (dropsNone-valued keys).Type of Change
Review
Codex adversarial review: SIGN-OFF (head
ea53210). All review threads resolved.Notes
Default behavior is unchanged (
renderer_transportdefaults tovllm_generate). Companion to PrimeIntellect-ai/renderers#79 and PrimeIntellect-ai/prime-rl#2737.Note
Medium Risk
Changes multi-turn inference wiring and token/logprob parsing on a new opt-in path; default
vllm_generateis unchanged, but dynamo_chat depends on local tokenizer parity with the engine and correctengine_datagrafting for training tokens.Overview
Adds opt-in
dynamo_chattransport so multi-turn TITO works against NVIDIA Dynamo (which lacks vLLM’s/chat/completions/tokensand/tokenize), viaClientConfig.renderer_transportshared withRendererClient.OpenAIChatCompletionsTokenClient: whendynamo_chatis set, stitched prompts go toPOST /v1/chat/completionswithnvext.token_data, requests opt intonvext.extra_fields=["engine_data"], vLLM-only sampling keys are scrubbed, and bridge tokenization uses a cached local HFAutoTokenizer(transformersrequired) instead of/tokenize.OpenAIChatCompletionsClient: new_graft_engine_datamapsnvext.engine_data(and related shapes) ontochoices[0].token_ids,prompt_token_ids, and synthesizedlogprobswhen content is missing;parse_tokensdrops tokens if logprob length mismatches completion IDs.TITO stitching fix:
normalize_for_comparisondropsNonekeys sovf.Message-shaped prompts match trajectory prefixes (avoids MITO fallback from turn 2+).Routed experts: optional
dtypeonRoutedExpertsPayload;strip_routed_experts_datafindsdatainside therouted_expertsobject regardless of key order and ignores unrelateddatafields elsewhere.Reviewed by Cursor Bugbot for commit b31ff2d. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Add
dynamo_chatrenderer transport (TITO over Dynamo) toOpenAIChatCompletionsTokenClientrenderer_transportfield toClientConfig(default"vllm_generate") and aRendererTransporttype alias in types.py to select between"vllm_generate"and"dynamo_chat"wire shapes._post_dynamo_chatto openai_chat_completions_token_client.py, which sends pre-tokenized prompts vianvext.token_data, requestsengine_datavianvext.extra_fields, and scrubs vLLM-only keys (return_token_ids,spaces_between_special_tokens,priority)._get_local_tokenizer) soget_prompt_idsavoids the/tokenizeHTTP call when using Dynamo transport._graft_engine_datato openai_chat_completions_client.py to populatechoice.token_ids,response.prompt_token_ids, and synthesizechoice.logprobsfromnvext.engine_dataon responses.strip_routed_experts_datain response_utils.py to locate therouted_expertsdata key in a key-order-agnostic way, fixing false positives from sibling objects.Macroscope summarized b31ff2d.