[gui] feat: add GUI agent loop (sandbox-as-environment VLM training) by aoshen02 · Pull Request #49 · verl-project/uni-agent

aoshen02 · 2026-05-28T14:57:23Z

Summary

Adds a new agent loop for VLM GUI agent RL training that treats a desktop sandbox as the environment: the model emits raw CoT + pyautogui-style actions, the sandbox executes them, and returns the next screenshot. The loop runs until DONE / FAIL / max_steps with retry on recoverable sandbox errors, optional trajectory splitting, and a FlexAttention heterogeneous-context mode that keeps all text tokens while sliding-windowing images.

Everything new lives under uni_agent/gui/:

gui_agent_loop.py — @register("gui_agent") GUIAgentLoop, subclass of verl.experimental.agent_loop.AgentLoopBase.
os_sandbox_tool.py — OSSandboxTool (verl.tools.BaseTool interface) + DummySandboxTool + SandboxClient (aiohttp) + TaskUnrecoverableError / SystemUnavailableError.
gui_utils.py — apply_sliding_window_to_images, PyautoguiActionConvertor (OAGI action → pyautogui command), CapsLockManager, key validation tables.

Following the existing uni-agent convention, the verl base classes (AgentLoopBase / AgentLoopMetrics / AgentLoopOutput / register / simple_timer / rollout_trace_op / TokenOutput) are still imported from the verl submodule; only the GUI-specific helpers and the sandbox tool live under uni_agent.gui.

Test plan

ruff check uni_agent/gui/ → clean
ruff format --check uni_agent/gui/ → clean
pre-commit run --files uni_agent/gui/* (ruff + ruff-format + mypy + compileall) → all pass
Smoke-train against a small GUI sandbox dataset (run on a follow-up branch — needs oagi, aiohttp, and a reachable sandbox URL)

Notes

Imports oagi.utils.output_parser.parse_raw_output and oagi.types.ActionType; these are expected to be installed in the runtime environment alongside verl (same as the upstream working version).
Requires verl submodule to expose verl.experimental.agent_loop.{agent_loop, utils}, verl.tools.{base_tool, schemas}, verl.utils.{profiler, rollout_trace}, verl.workers.rollout.replica.TokenOutput. All present at the currently-pinned submodule commit.
AI-assisted (Claude Code, Opus 4.7).

🤖 Generated with Claude Code

gemini-code-assist

Code Review

This pull request introduces the GUI Agent components for VLM GUI agent training, including the agent loop, sandbox tool integration, and PyAutoGUI action conversion utilities. The review feedback highlights several critical issues: potential hanging in the sandbox client due to misplaced timeouts on response parsing instead of connection establishment, inefficient VM recreations caused by raising unrecoverable errors on format validation failures, potential runtime crashes in action conversion when returning empty lists for capslock or user call actions, and a strict assertion in the sliding window utility that could crash the training process if an image block ends exactly at the end of the prompt.

gemini-code-assist · 2026-05-28T14:59:26Z

+            async with session.post(f"{self.sandbox_url}/api/v1/sandbox/create", json=request_body) as response:
+                if response.status != 200:
+                    text = await response.text()
+                    logger.error(f"[SandboxClient] Create failed: status={response.status}, body={text[:500]}")
+                    response.raise_for_status()
+
+                return await asyncio.wait_for(response.json(), timeout=self.create_timeout)


The asyncio.wait_for is only wrapping response.json(), which is executed after the connection has already been established and headers received. If the sandbox server hangs during the connection or before sending headers, session.post will block indefinitely because there is no timeout passed to it. This can cause the entire training rollout worker to hang. Pass the timeout directly to session.post using aiohttp.ClientTimeout.

Suggested change

async with session.post(f"{self.sandbox_url}/api/v1/sandbox/create", json=request_body) as response:

if response.status != 200:

text = await response.text()

logger.error(f"[SandboxClient] Create failed: status={response.status}, body={text[:500]}")

response.raise_for_status()

return await asyncio.wait_for(response.json(), timeout=self.create_timeout)

import aiohttp

timeout = aiohttp.ClientTimeout(total=self.create_timeout)

async with session.post(f"{self.sandbox_url}/api/v1/sandbox/create", json=request_body, timeout=timeout) as response:

if response.status != 200:

text = await response.text()

logger.error(f"[SandboxClient] Create failed: status={response.status}, body={text[:500]}")

response.raise_for_status()

return await response.json()

gemini-code-assist · 2026-05-28T14:59:27Z

+            session = await self._get_session()
+            async with session.post(
+                f"{self.sandbox_url}/api/v1/sandbox/execute", json={"task_id": task_id, "action": action}
+            ) as response:
+                response.raise_for_status()
+                return await asyncio.wait_for(response.json(), timeout=self.execute_timeout)
+        except asyncio.TimeoutError as e:


Similar to the create method, asyncio.wait_for only wraps response.json(). If the sandbox execution hangs, session.post will block indefinitely. Pass the timeout directly to session.post instead.

Suggested change

session = await self._get_session()

async with session.post(

f"{self.sandbox_url}/api/v1/sandbox/execute", json={"task_id": task_id, "action": action}

) as response:

response.raise_for_status()

return await asyncio.wait_for(response.json(), timeout=self.execute_timeout)

except asyncio.TimeoutError as e:

import aiohttp

timeout = aiohttp.ClientTimeout(total=self.execute_timeout)

async with session.post(

f"{self.sandbox_url}/api/v1/sandbox/execute",

json={"task_id": task_id, "action": action},

timeout=timeout

) as response:

response.raise_for_status()

return await response.json()

gemini-code-assist · 2026-05-28T14:59:27Z

+                    if gen_attempt == self.MAX_GENERATION_RETRIES - 1:
+                        raise TaskUnrecoverableError(
+                            task_id=agent_data.request_id,
+                            error=f"Validation failed after {self.MAX_GENERATION_RETRIES} attempts: {error_msg}",
+                            step_count=agent_data.step_count,
+                        ) from e
+                    continue


Raising TaskUnrecoverableError when model generation format validation fails is highly inefficient. TaskUnrecoverableError triggers a full sandbox VM teardown and recreation (which can take up to 5 minutes). Format validation is a model generation issue, not a system/VM corruption issue. It should be treated as a non-retryable trajectory failure (similar to TruncatedError) to avoid wasting massive VM/GPU resources on retries.

Suggested change

if gen_attempt == self.MAX_GENERATION_RETRIES - 1:

raise TaskUnrecoverableError(

task_id=agent_data.request_id,

error=f"Validation failed after {self.MAX_GENERATION_RETRIES} attempts: {error_msg}",

step_count=agent_data.step_count,

) from e

continue

if gen_attempt == self.MAX_GENERATION_RETRIES - 1:

agent_data.truncated = True

raise TruncatedError(

agent_data=agent_data,

message=f"Validation failed after {self.MAX_GENERATION_RETRIES} attempts: {error_msg}",

) from e

continue

gemini-code-assist · 2026-05-28T14:59:27Z

+            if len(valid_keys) == 1 and valid_keys[0] == "capslock":
+                if self.caps_manager.should_use_system_capslock():
+                    # System mode: use OS-level caps lock
+                    hotkey_interval = self.pyautogui_config.hotkey_interval
+                    return [f"pyautogui.hotkey('capslock', interval={hotkey_interval})"]
+                else:
+                    # Session mode: toggle internal state (no actual key press needed in conversion)
+                    self.caps_manager.toggle()
+                    return []  # No pyautogui command needed for session mode
+            else:


Returning an empty list [] for capslock toggle in session mode will cause __call__ to raise a RuntimeError ("All action conversions failed") if it is the only action in the sequence, because converted remains empty. To prevent this, return a no-op command like ["WAIT(0.0)"] which is safely parsed as a 0-second sleep.

Suggested change

if len(valid_keys) == 1 and valid_keys[0] == "capslock":

if self.caps_manager.should_use_system_capslock():

# System mode: use OS-level caps lock

hotkey_interval = self.pyautogui_config.hotkey_interval

return [f"pyautogui.hotkey('capslock', interval={hotkey_interval})"]

else:

# Session mode: toggle internal state (no actual key press needed in conversion)

self.caps_manager.toggle()

return [] # No pyautogui command needed for session mode

else:

if len(valid_keys) == 1 and valid_keys[0] == "capslock":

if self.caps_manager.should_use_system_capslock():

# System mode: use OS-level caps lock

hotkey_interval = self.pyautogui_config.hotkey_interval

return [f"pyautogui.hotkey('capslock', interval={hotkey_interval})"]

else:

# Session mode: toggle internal state (no actual key press needed in conversion)

self.caps_manager.toggle()

return ["WAIT(0.0)"]

else:

gemini-code-assist · 2026-05-28T14:59:27Z

+        if action_type == ActionType.CALL_USER.value:
+            # User intervention requested - not an error, just no-op
+            self.logger.info("User intervention requested")
+            return []


Returning [] for CALL_USER will cause __call__ to raise a RuntimeError if it is the only action in the sequence. Return ["WAIT(0.0)"] instead to safely represent a no-op.

Suggested change

if action_type == ActionType.CALL_USER.value:

# User intervention requested - not an error, just no-op

self.logger.info("User intervention requested")

return []

if action_type == ActionType.CALL_USER.value:

# User intervention requested - not an error, just no-op

self.logger.info("User intervention requested")

return ["WAIT(0.0)"]

gemini-code-assist · 2026-05-28T14:59:27Z

+        last_end = end
+
+    # Add remaining tokens after last block
+    assert last_end < len(prompt_ids)


If the last image block ends exactly at the end of prompt_ids, last_end will be equal to len(prompt_ids). In this case, assert last_end < len(prompt_ids) will fail and crash the training process. Change the assertion to assert last_end <= len(prompt_ids).

Suggested change

assert last_end < len(prompt_ids)

assert last_end <= len(prompt_ids)

Adds a new agent loop for VLM GUI agent RL training that treats a desktop sandbox as the environment: the model emits raw CoT + pyautogui-style actions, the sandbox executes them, and returns the next screenshot. The loop runs until DONE/FAIL/max_steps with retry on recoverable sandbox errors, optional trajectory splitting, and a FlexAttention heterogeneous- context mode that keeps all text tokens while sliding-windowing images. Layout (one file per agent loop variant, sibling to existing UniAgentLoop; the sandbox tool sits with the other tools): - uni_agent/gui_agent_loop.py — `@register("gui_agent")` GUIAgentLoop, on top of verl.experimental.agent_loop.AgentLoopBase. Parallels the existing uni_agent/agent_loop.py / UniAgentLoop. - uni_agent/gui_utils.py — apply_sliding_window_to_images, PyautoguiActionConvertor (OAGI action -> pyautogui command), CapsLockManager, key validation tables. - uni_agent/tools/os_sandbox_tool.py — OSSandboxTool (verl BaseTool) + DummySandboxTool + SandboxClient (aiohttp) + TaskUnrecoverableError / SystemUnavailableError. Imports follow uni-agent convention: AgentLoopBase / AgentLoopMetrics / AgentLoopOutput / register / simple_timer / rollout_trace_op / TokenOutput still come from the verl submodule; only the GUI-specific helpers and the sandbox tool live under uni_agent. Lints: ruff + ruff-format + mypy + compileall pre-commit hooks pass on the new files. AI-assisted (Claude Code). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist Bot reviewed May 28, 2026

View reviewed changes

aoshen02 force-pushed the aoshen/gui-agent-loop branch from 3601b8b to 5cd5a02 Compare May 28, 2026 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[gui] feat: add GUI agent loop (sandbox-as-environment VLM training)#49

[gui] feat: add GUI agent loop (sandbox-as-environment VLM training)#49
aoshen02 wants to merge 1 commit into
verl-project:mainfrom
aoshen02:aoshen/gui-agent-loop

aoshen02 commented May 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	assert last_end < len(prompt_ids)
	assert last_end <= len(prompt_ids)

Conversation

aoshen02 commented May 28, 2026

Summary

Test plan

Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant