feat(android): add UIAutomator hierarchy dump, parsing, and agent tool#251
feat(android): add UIAutomator hierarchy dump, parsing, and agent tool#251mlikasam-askui wants to merge 3 commits intomainfrom
Conversation
Add UIElement and UIElementCollection to parse UIAutomator window-dump XML from normalized shell output (bounds, text, resource-id, content-desc, clickable, etc.). Expose get_ui_elements() on Android AgentOs and implement it in the facade and PpAdb path so callers get a flattened hierarchy string. Register AndroidGetUIAutomatorHierarchyTool in the Android tool store for act flows that need structure instead of screenshots. Refresh pdm.lock for the otel dependency group and OpenTelemetry-related package updates.
| self, | ||
| x: int, | ||
| y: int, | ||
| from_agent: bool = True, |
There was a problem hiding this comment.
What does the from_agent mean?
There was a problem hiding this comment.
“from_agent” indicates whether the coordinates were provided by the agent.
| stopped and the UI has settled. | ||
| """ | ||
| self._check_if_device_is_selected() | ||
| assert self._device is not None |
There was a problem hiding this comment.
Should the assert not be integrated in _check_if_device_is_selected
There was a problem hiding this comment.
it could be 😄
| assert self._device is not None | ||
| dump_cmd = f"uiautomator dump {self._UIAUTOMATOR_DUMP_PATH}" | ||
| dump_response = self.shell(dump_cmd) | ||
| if "dumped" not in dump_response.lower(): |
There was a problem hiding this comment.
What does the "dumped" mean?
There was a problem hiding this comment.
“dumped” is included in the response of a successful UI dump.
| dump_response = self.shell(dump_cmd) | ||
| if "dumped" not in dump_response.lower(): | ||
| msg = f"Failed to dump UI hierarchy: {dump_response}" | ||
| raise AndroidAgentOsError(msg) |
There was a problem hiding this comment.
Do we have to terminate the Agent Loop or is this error recoverably from the Agent??
There was a problem hiding this comment.
I don’t think so, since I assume the agent can auto-recover and use different methods, such as taking a screenshot or using the shell.
Summary
Dump the current screen with
uiautomator dump, parse the XML into a flat list of views (text, ids, content-desc, bounds, tap centers), and expose it asAndroidGetUIAutomatorHierarchyToolfor agents when screenshots are weak or you want structured UI data.Notes
get_ui_elements()on AndroidAgentOs/PpAdbAgentOsand facade.pdm.lockupdates.