[Feature] Add agent gateway#25
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive agent framework and gateway system designed to facilitate agentic workflows within a training environment. Key components include a factory for constructing frameworks, an OpenAI-compatible framework implementation that manages sequence generation and trajectory logging, and a gateway system that provides an OpenAI-compatible API for agent interactions. The gateway handles session lifecycle, trajectory buffering, and multimodal data processing. Feedback identifies a critical issue with the incremental token encoding logic in the gateway, which may produce malformed sequences due to assumptions about tokenizer stability and turn separators. Further recommendations include parallelizing reward calculations to improve performance and replacing blocking ray.get calls with asynchronous operations to avoid event loop starvation.
| def _encode_incremental( | ||
| self, | ||
| messages: list[dict[str, Any]], | ||
| image_data: list[Any] | None = None, | ||
| video_data: list[Any] | None = None, | ||
| ) -> list[int]: | ||
| """Encode incremental messages (tool results, user follow-ups) for a continuation turn. | ||
|
|
||
| Uses the remove_system_prompt pattern from ToolAgentLoop: encode the new messages | ||
| alone (which prepends a system prompt), then strip the known system_prompt prefix. | ||
| No tools parameter — tool schema is already in the initial prompt_ids. | ||
| """ | ||
| if self._processor is not None: | ||
| raw_prompt = _apply_chat_template( | ||
| self._processor, | ||
| messages, | ||
| add_generation_prompt=True, | ||
| tokenize=False, | ||
| **self._apply_chat_template_kwargs, | ||
| ) | ||
| videos = video_data | ||
| video_metadata = None | ||
| if videos is not None: | ||
| videos, video_metadata = zip(*videos, strict=False) | ||
| videos, video_metadata = list(videos), list(video_metadata) | ||
| model_inputs = self._processor( | ||
| text=[raw_prompt], | ||
| images=image_data, | ||
| videos=videos, | ||
| video_metadata=video_metadata, | ||
| return_tensors="pt", | ||
| do_sample_frames=False, | ||
| ) | ||
| ids = normalize_token_ids(model_inputs["input_ids"]) | ||
| else: | ||
| ids = normalize_token_ids( | ||
| _apply_chat_template( | ||
| self._tokenizer, messages, add_generation_prompt=True, | ||
| **self._apply_chat_template_kwargs, | ||
| ) | ||
| ) | ||
| return ids[len(self._system_prompt):] |
There was a problem hiding this comment.
The incremental encoding logic is fragile and likely to produce malformed token sequences. Slicing tokens based on the length of a pre-encoded system prompt assumes that the tokenizer is prefix-stable and that the chat template doesn't insert turn separators or special tokens between the system prompt and the first message. Furthermore, concatenating these incremental IDs to the previous turn's response IDs (at line 542) will miss the necessary turn separators (e.g., <|im_end|> and <|im_start|>user) required by most chat templates. It is safer to re-encode the full message history and identify the delta, or simply rely on the backend's prefix caching by sending the full prompt.
| gateway_actor_kwargs["backend"] = self | ||
|
|
||
| self.owned_gateway_actors = [GatewayActor.remote(**gateway_actor_kwargs) for _ in range(gateway_count)] | ||
| ray.get([gateway.start.remote() for gateway in self.owned_gateway_actors]) |
There was a problem hiding this comment.
Using ray.get inside an async context (called via build_agent_framework) will block the event loop, preventing other concurrent tasks from making progress. Since a helper _await_ray_ref is already defined in this file, you should consider moving the gateway startup logic to an async initialization method that can be awaited, rather than performing blocking calls in the constructor.
|
为什么放在trainer目录下?我觉得这是黑盒调用训推通用的流程。我偏向往上提一级,直接放uni_agent/framework和uni_agent/gateway. |
bee0d08 to
ef46265
Compare
2da5be1 to
825b7f3
Compare
825b7f3 to
9c7c97a
Compare
a7e392b to
9677db1
Compare
|
The current entry point binds a single runner via We may introduce an AgentRunner abstract base with a minimal run() contract: Each sample carries the runner name; config mounts a name → runner map. The config could be in the following format: Then the framework resolves the runner per-session by sample["agent_runner_name"], like: This could be similar to verl's existing |
|
Hi, I would like to propose using Prefix Trie for multi-trajectory storage for Agentgateway. My RFC is here:#51
For detailed explanation, please also refer to this comment: verl-project/verl#6299 (comment) |
…nt) (#52) ### What does this PR do? Adds `examples/swe_agent/` — an end-to-end recipe for training a SWE-bench coding agent with fully-async RL (Megatron actors + vLLM rollout on separate nodes) and Modal swe-rex sandboxes. It stitches the existing building blocks into something runnable, mirroring `examples/search_agent/`: - data: `examples/data_preprocess/swe_rebench.py` + `swe_bench_verified.py` - reward: `uni_agent.reward.swe_rebench` / `swe_bench` - rollout: `uni_agent.agent_loop.UniAgentLoop` (Modal swe-rex) Reference config trains Qwen3-235B-A22B-Instruct-2507 with GRPO on a 12-node (8 train + 4 rollout) × 4-GPU topology; everything is env-overridable to scale down. ### Checklist Before Starting - [x] Search for similar PRs/issues: - `gh pr list --repo verl-project/uni-agent --state open` → no SWE-bench training example (PR #25 is an unrelated agent-framework/gateway) - no existing `examples/swe*` dir - [x] Format the PR title as `[examples] feat: ...` ### Test This is a recipe (scripts + configs + docs), not library code: - `bash -n train_qwen3_235b_swebench.sh` — OK - `python -c "import yaml; yaml.safe_load(...)"` on both YAMLs — OK - `pre-commit run --files examples/swe_agent/*` — pass (compile-all; ruff/mypy skip non-py) - `shellcheck` — clean except style-only SC2206 on the hydra arg-array append, consistent with the repo's other launch scripts Full end-to-end training was run internally on the reference topology; the committed files are the scrubbed/generalized form of that setup (no secrets or site-specific paths — `runtime_env.yaml` ships placeholders only). ### Files | File | Purpose | |---|---| | `train_qwen3_235b_swebench.sh` | `ray job submit` + full GRPO / Megatron / vLLM config; topology & paths are env vars | | `agent_config.yaml` | UniAgentLoop config: tools, Modal deployment, rollout concurrency, reward | | `runtime_env.yaml` | Ray runtime-env **template** (placeholders for Modal / W&B tokens + checkout paths) | | `README.md` | dataset → runtime_env → launch → monitor + tuning notes | ### Notes captured for reproducibility Non-obvious settings learned running this at scale (documented in the script header / README): - `max_response_length=128K` — SWE-bench trajectories are long (mean ~70K tokens, ~90 turns); 32K truncates ~half - `tool_parser: hermes` for Qwen3-235B (wrong parser silently breaks tool calls) - `moe_token_dispatcher_type=alltoall` — portable MoE dispatch - `VLLM_USE_DEEP_GEMM=0` — vLLM 0.21 EP/CUTLASS init workaround - do **not** set `expandable_segments:True` (incompatible with vLLM sleep-mode CuMemAllocator, pytorch#147851) ### Checklist Before Submitting - [x] Read the Contribute Guide - [x] `pre-commit run --files examples/swe_agent/*` passed - [x] No new library code → no unit tests; recipe validated via syntax/lint + internal end-to-end run - [x] AI assistance was used (Claude Code); the submitting human (@aoshen02) reviewed every line - [x] No secrets / site-specific paths committed --------- Signed-off-by: aoshen02 <aoshen@inferact.ai> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: yuyangding <yuyangding@bytedance.com>
|
Hi, I noticed that, in the original verl Suggestion: Refer the original verl AgentLoopManager pattern — introduce multiple AgentLoopWorker Ray actors, partition the batch tasks across them. # refer AgentLoopManager._init_agent_loop_workers()
for i in range(num_workers):
worker = AgentLoopWorker.options(
scheduling_strategy=NodeAffinitySchedulingStrategy(node_id=..., soft=True)
).remote(config, ...)
self.workers.append(worker)
async def generate_sequences(self, prompts):
chunks = prompts.chunk(len(self.workers))
await asyncio.gather(*[
w.generate_sequences.remote(chunk)
for w, chunk in zip(self.workers, chunks)
])This separates two independent concurrency axes: gateway_count for LLM serving throughput, num_workers for agent execution parallelism — consistent with the original verl design. |
|
| from verl.workers.rollout.utils import run_uvicorn | ||
|
|
||
|
|
||
| class _GatewayActor: |
There was a problem hiding this comment.
Why not just decorate it with @ray.remote?
@ray.remote
class GatewayActor:
...Split from PR verl-project#25 per maintainer request: gateway is the first independently-reviewable PR. Owns SessionHandle/Trajectory (moved from framework.types). No framework dependency. Spec: cxb_dev/docs/plans/2026-06-03-pr25-split-gateway-framework-deepeyes-design.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
P0 follow-up to PR verl-project#25 review: docstring every public class, method, field, and function in the gateway package. Pure documentation; zero behavior change. Full regression 50 passed unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
a300294 to
a17e1ed
Compare
Introduce uni_agent/trainer/gateway/protocol.py with OpenAI-compatible ChatCompletionRequest / ChatCompletionResponse TypedDicts. _handle_chat_completions now annotates its payload as ChatCompletionRequest and constructs the response via ChatCompletionResponse local instead of an anonymous dict. Response gains the OpenAI-standard `created` (unix ts) and `model` fields; `model` falls back to "unknown" when the request omits it to avoid breaking direct-call test payloads. MessageCodec runtime validation, GatewaySession envelope, GenerationOutcome contract, Trajectory token-truth all unchanged. No pydantic, no openai SDK runtime dependency. Spec: cxb_dev/docs/plans/2026-06-04-gateway-openai-sdk-typed-io-design.md Addresses PR verl-project#25 wuxibin89 review: typed request/response. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
a17e1ed to
d0ad4af
Compare
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
d0ad4af to
348f520
Compare
Thanks for the comment, this is a valid point. The current framework path fans out all Since I have split the current PR into three separate PRs (gateway, framework, and examples), I will treat this as a framework-layer follow-up and address it in the next PR instead of this one. |
Add docstrings to all 41 gateway tests describing their behavior contract. Delete five tests whose risk hypotheses do not hold up under the pr-ready-test-review skill rubric (real-risk vs fictional-risk): - test_gateway_actor_abort_session_does_not_wait_for_backend_generate (guards against someone adding await to a 5-line zero-await method — fictional risk caught by code review) - test_gateway_actor_finalizes_without_complete (guards against someone actively forbidding a legal code path — fictional risk) - test_backend_stop_reason_mapping_returns_openai_finish_reason (parametrized dict-copy of _FINISH_REASON_MAP — fragile mirroring) - test_gateway_manager_wait_for_completion_delegates_to_session_owner (tests a mock's delegation; real routing covered by sticky-routing test) Merge two normalization tests into a single parametrized test_message_normalization_tool_call_arguments. Remove orphan SlowBackend fake. Scope ray_runtime fixtures to session level in all three test files (310s → 172s wall-clock, ~27% improvement). 41 passed, 6 warnings, 172s. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
835b8f7 to
3599cf4
Compare
What does this PR do?
This PR adds
uni_agent.gateway— an OpenAI-compatible session gateway runtime for multi-turn agent-style rollout in uni-agent, as a downstream integration of verl RFC #5790 and the upstream agent framework PR verl#6299.Specifically:
_GatewayActor(~200 lines) — thin FastAPI actor:/v1/chat/completionsover sticky sessions; OpenAI-compatible error envelopes; capability gates (n>1/response_format/tool_choice=required|dict→ 400); typed I/O viaChatCompletionRequest/ChatCompletionResponseprotocol types; lifecycle endpoints (/complete,finalize,abort).GatewaySession— per-session state +run_generationenvelope (encode → generate → decode+commit). Commit-on-success isolation: failed backend/decode calls never pollute session state. Session fields are written in a single point underrequest_lockwith noawait/ IO / parsing inside the commit segment.MessageCodec— model-scoped encode/decode/tool-parse/multimodal/sampling-param, stateless, shared across all sessions.GatewayManager— session-id → gateway-actor routing with least-active load balancing.GatewayServingRuntime— owns gateway actor lifecycle, injects theLLMServerClientas a duck-typedbackendinto the actor.GatewayActorConfig— frozen dataclass carrying model-/length-scoped knobs from the rollout yaml. Backend is separate so the codec/session boundary has no view of the LLM client lifecycle.Response bodies follow the OpenAI Chat Completions spec (
id/object/created/model/choices/usage). Request/response shapes are defined as TypedDicts ingateway/protocol.py— no openai Python SDK runtime dependency.Token-truth (the RL correctness contract) is preserved inline: backend-sampled token IDs accumulate in
TrajectoryBufferwith aresponse_maskthat distinguishes generated tokens (mask=1, trainable) from chat-template interstitials (mask=0, masked). Response logprobs are copied directly from the backendTokenOutput— no decode→re-encode.PR scope
Per maintainer request this PR has been split into three stacked PRs:
uni_agent.gatewayuni_agent.framework(follow-up, stacked on this PR)examples/agent_train/deepeyes_gateway/(follow-up, stacked on framework)Only the gateway portion is reviewed here. Shared types (
SessionHandle,Trajectory) moved from the oldframework.typesintogateway/types.pyso the gateway is fully self-contained. There is zerouni_agent.frameworkimport in the gateway package.Checklist Before Starting
[gateway] feat: add OpenAI-compatible session gateway runtimeTest
41 passed, 6 warnings (gateway actor / manager / session-runtime / typed-I/O response shape, ~170s wall-clock).
Critical regression gates included:
test_gateway_actor_backend_failure_does_not_commit_partial_state(commit-on-success isolation)test_gateway_actor_context_change_splits_trajectory(branch-c materialized trajectory recovery)test_gateway_actor_continuation_budget_exhausted_materializes_length_stop(length-budget exhaust)test_gateway_serving_runtime_owns_gateway_lifecycle_and_session_runtime(runtime wiring)All pass.
API and Usage
Public API:
uni_agent.gateway—GatewayServingRuntime,GatewayManager,GatewayActor,GatewayActorConfig,SessionHandle,TrajectoryMinimum wiring (framework PR will provide a
build_agent_framework()helper):The runtime exposes a
SessionRuntime-shaped surface (create_session/wait_for_completion/finalize_session/abort_session) consumed by the framework. Sessions communicate with the gateway via standard/v1/chat/completionsrequests:Design & Code Changes
High-level structure:
_GatewayActor— FastAPI routes, OpenAI error envelopes, capability gates, chat-completion JSON serialization (~200 lines).MessageCodec(codec.py) — model-scoped encode/decode/normalize/multimodal/sampling-param, stateless across sessions.GatewaySession(session.py) — per-session state,run_generationenvelope, lifecycle methods. ReturnsGenerationOutcomebusiness objects; never constructs HTTP responses.GatewayManager— session-to-actor routing via least-active count.GatewayServingRuntime— Ray actor lifecycle, backend injection, session-lifecycle delegation.Request flow:
HTTP request → _GatewayActor (capability gate) → GatewaySession.run_generation (encode → backend.generate → decode+commit) → GenerationOutcome → actor serializes ChatCompletionResponse JSON.Key invariants:
generation_lockserializes in-flight generation (implementation detail of the single-active trajectory model, not part of the public contract).WIP / Follow-up
GatewayServingRuntime+GatewayManagermerge (runtime's thin session-method delegates can fold into the runtime)/v1/messages) as a sibling route + protocol type — not yet neededChecklist Before Submitting
*_on_cpu.pynaming convention.pre-commit install && pre-commit run --all-files