feat(client): add dynamo_chat transport + routed_experts to renderer generate by biswapanda · Pull Request #79 · PrimeIntellect-ai/renderers

biswapanda · 2026-06-09T00:19:26Z

Description

Adds a dynamo_chat transport to the renderer-based generate() client so it can run against NVIDIA Dynamo, which serves no /inference/v1/generate route. Selected per-call via transport=; defaults to the existing vLLM path, so behavior is unchanged unless opted in.

Two transports:

vllm_generate (default): unchanged — messages → render_ids() → POST /inference/v1/generate → parse_response() (vLLM TITO surface).
dynamo_chat: messages → render_ids() → POST /v1/chat/completions with nvext.token_data (pre-tokenized prompt) + nvext.extra_fields=["engine_data"]. Completion token IDs and logprobs are read back from nvext.engine_data.

Dynamo wire shape (`_post_dynamo_chat`)

Mirrors the verifiers token client so the payload is identical whether a rollout goes through the token client or the renderer client. nvext.token_data (Dynamo skips tokenization when present); cache_salt → nvext.cache_salt, priority → nvext.agent_hints.priority; a single placeholder user message; sampling remap (max_tokens → max_completion_tokens, logprobs=N → logprobs=true + top_logprobs=N); passthrough fields ride the Dynamo allowlist. Tools are baked into token_data by the renderer (not sent on the wire).

routed_experts (MoE expert replay) — now surfaced on dynamo_chat

(Supersedes the earlier "routed_experts intentionally NOT surfaced" note — it now is.) parse reads routed_experts from nvext.routed_experts (or nvext.engine_data.routed_experts) and maps it to the downstream RoutedExpertsPayload {data, shape, start, dtype}. The Dynamo worker returns full-sequence routing with start=0; the renderer row-trims the leading prompt rows only when the caller explicitly sets routed_experts_prompt_start — a first-turn request with no caller start stays full-sequence with start=0 (no phantom prefix). Completion logprobs prefer nvext.engine_data.completion_logprobs (the same authoritative source as the engine token IDs) over the chat echo; a present-but-empty engine list is authoritative and does not fall back to chat.

Other

Public RendererTransport = Literal["vllm_generate", "dynamo_chat"] alias. A present-but-empty completion_token_ids is a valid zero-token completion; only a fully absent field raises. Multimodal renderers raise NotImplementedError on dynamo_chat (vLLM path / token-client TITO remain available for VLMs).

Type of Change

New feature (non-breaking change which adds functionality)

Review

Codex adversarial review: SIGN-OFF (F1/F2/F3 + the N1 logprob-presence finding resolved; head 5f2a914). All review threads resolved.

Testing

tests/test_client.py covers the Dynamo request body shape (priority/detokenize/sampling remap), routed_experts parse + row-trim (explicit prompt_start vs first-turn full-sequence), engine-logprob preference incl. present-but-empty, and missing/empty completion IDs.

Note

Medium Risk
New inference path affects rollout token IDs, logprobs, and MoE expert replay for Dynamo users; default transport is unchanged but parsing/validation errors are now explicit on misconfigured Dynamo responses.

Overview
Adds a dynamo_chat backend to renderer generate(), selected per call via transport= (default vllm_generate leaves existing vLLM /inference/v1/generate behavior intact).

generate() now delegates HTTP to a _Transport strategy: vLLM posts token IDs to the cached absolute generate URL; Dynamo posts to /chat/completions with nvext.token_data, merged caller nvext, sampling remaps (max_completion_tokens, logprobs/top_logprobs), and denylisted vLLM-only keys routed into nvext (cache_salt, priority, routed_experts_prompt_start). Responses are normalized through _WireResult; Dynamo prefers nvext.engine_data for completion IDs and logprobs, validates routed_experts to {data, shape, start, dtype}, and can client-trim MoE routing when the worker returns full-sequence data with start=0. Multimodal on dynamo_chat raises NotImplementedError.

Tests cover Dynamo wire shape, nvext merging, missing/misaligned completions and logprobs, routed-experts trimming, and shared POST error propagation for both transports.

^{Reviewed by Cursor Bugbot for commit f5c480d. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Add `dynamo_chat` transport and `routed_experts` support to `generate()`

Introduces a transport parameter to generate() in renderers/client.py (default "vllm_generate"), routing requests to either /inference/v1/generate (vLLM) or /chat/completions (Dynamo); unknown values raise ValueError.
Adds _DynamoChatTransport which builds an OpenAI-style chat body with nvext fields (token IDs, engine data, cache salt, priority, routed_experts_prompt_start), forwards sampling params while dropping vLLM-internal keys, and parses nvext.engine_data for completion token IDs and logprobs.
Adds client-side trimming of routed_experts prompt rows via _trim_dynamo_routed_experts when the worker has not already applied the offset.
Extracts the existing vLLM logic into _VllmGenerateTransport with no behavioral change to that path.
Risk: Dynamo transport raises NotImplementedError for multimodal inputs and RuntimeError when completion_token_ids is absent or logprob length mismatches — these are new hard failures with no fallback.

^{Macroscope summarized f5c480d.}

…ols from dynamo body, raise on missing ids; rename transport to dynamo_chat

…d, drop routed_experts on dynamo (codex round 2)

…ake); docstring fix

…ached endpoints

…erge nvext, canonical completion-ids, logprobs alignment)

… on dynamo path

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 30c01b6. Configure here.}

…payload to contract

…only

…fset

…ing only)

…first-turn stays full)

…ith engine ids

… chat fallback)

…trim is now a back-compat fallback

…omments

cursor Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread renderers/client.py Outdated

Comment thread renderers/client.py Outdated

Comment thread renderers/client.py Outdated

biswapanda mentioned this pull request Jun 9, 2026

feat: dynamo inference backend integration PrimeIntellect-ai/prime-rl#2737

Open

1 task

biswapanda changed the title ~~feat(client): add dynamo_chat_nvext transport to renderer generate()~~ feat(client): add dynamo_chat_nvext transport to renderer Jun 9, 2026

biswapanda added 2 commits June 8, 2026 19:11

feat(client): add transport selector + dynamo_chat_nvext branch

334e496

feat: forward Dynamo nvext TITO fields

6a21574

biswapanda force-pushed the rl-sdk-4 branch from 268e16b to 6a21574 Compare June 9, 2026 02:13

fix(client): address codex review — revert default vLLM path, drop to…

a35e023

…ols from dynamo body, raise on missing ids; rename transport to dynamo_chat

biswapanda mentioned this pull request Jun 9, 2026

feat(clients): add dynamo_chat renderer transport (TITO over Dynamo) PrimeIntellect-ai/verifiers#1574

Open

1 task

biswapanda added 2 commits June 9, 2026 01:21

fix(client): gate nvext fallbacks to dynamo path, fix zero-token guar…

b6f50d0

…d, drop routed_experts on dynamo (codex round 2)

test(client): prove routed_experts dropped on dynamo (Dynamo-shaped f…

5dbf494

…ake); docstring fix

biswapanda changed the title ~~feat(client): add dynamo_chat_nvext transport to renderer~~ feat(client): add dynamo_chat transport to renderer generate() Jun 9, 2026

biswapanda added 4 commits June 9, 2026 15:31

style: apply ruff format to client + tests (fix CI)

287871c

docs(client): trim verbose comments in dynamo_chat path

6041134

refactor(client): replace transport if/else with strategy classes + c…

503846c

…ached endpoints

fix(client): address codex F1-F4 on dynamo_chat (denylist sampling, m…

ed03eaa

…erge nvext, canonical completion-ids, logprobs alignment)

cursor Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread renderers/client.py

biswapanda changed the title ~~feat(client): add dynamo_chat transport to renderer generate()~~ feat(client): add dynamo_chat transport to renderer generate Jun 10, 2026

biswapanda added 2 commits June 9, 2026 20:05

fix(client): route sampling_params cache_salt and priority into nvext…

eb0bdb2

… on dynamo path

feat(client): surface routed_experts on dynamo_chat transport

30c01b6

cursor Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread renderers/client.py Outdated

biswapanda added 8 commits June 10, 2026 11:30

fix(client): drop duplicate routed_experts request; normalize parsed …

28e3d02

…payload to contract

test(client): update dynamo extra_fields expectations to engine_data …

c31854f

…only

fix(client): stamp routed_experts.start on dynamo_chat from prompt of…

59553da

…fset

test(client): expect stamped routed_experts.start on dynamo_chat

b7927f8

fix(client): trim dynamo_chat routed_experts rows to start (was stamp…

b554520

…ing only)

fix(client): only trim routed_experts when caller sets prompt_start (…

010c894

…first-turn stays full)

fix(client): prefer engine_data.completion_logprobs to stay aligned w…

51d9154

…ith engine ids

fix(client): treat present-empty engine logprobs as authoritative (no…

5f2a914

… chat fallback)

biswapanda changed the title ~~feat(client): add dynamo_chat transport to renderer generate~~ feat(client): add dynamo_chat transport + routed_experts to renderer generate Jun 10, 2026

feat(client): send routed_experts_prompt_start in nvext; client-side …

7567377

…trim is now a back-compat fallback

biswapanda mentioned this pull request Jun 11, 2026

feat(rl): forward routed_experts_prompt_start via nvext (engine-side routing trim) ai-dynamo/dynamo#10562

Open

3 tasks

docs(client): drop PR-number references and stale vLLM version from c…

f5c480d

…omments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(client): add dynamo_chat transport + routed_experts to renderer generate#79

feat(client): add dynamo_chat transport + routed_experts to renderer generate#79
biswapanda wants to merge 21 commits into
PrimeIntellect-ai:mainfrom
biswapanda:rl-sdk-4

biswapanda commented Jun 9, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

biswapanda commented Jun 9, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dynamo wire shape (_post_dynamo_chat)

routed_experts (MoE expert replay) — now surfaced on dynamo_chat

Other

Type of Change

Review

Testing

Add dynamo_chat transport and routed_experts support to generate()

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

biswapanda commented Jun 9, 2026 •

edited by macroscopeapp Bot

Loading

Dynamo wire shape (`_post_dynamo_chat`)

Add `dynamo_chat` transport and `routed_experts` support to `generate()`