Skip to content

feat(android): add UIAutomator hierarchy dump, parsing, and agent tool#251

Open
mlikasam-askui wants to merge 3 commits intomainfrom
feat/android-uiautomator-hierarchy-tool
Open

feat(android): add UIAutomator hierarchy dump, parsing, and agent tool#251
mlikasam-askui wants to merge 3 commits intomainfrom
feat/android-uiautomator-hierarchy-tool

Conversation

@mlikasam-askui
Copy link
Copy Markdown
Contributor

Summary

Dump the current screen with uiautomator dump, parse the XML into a flat list of views (text, ids, content-desc, bounds, tap centers), and expose it as AndroidGetUIAutomatorHierarchyTool for agents when screenshots are weak or you want structured UI data.

Notes

  • Wired via get_ui_elements() on Android AgentOs / PpAdbAgentOs and facade.
  • Includes pdm.lock updates.

Add UIElement and UIElementCollection to parse UIAutomator window-dump XML
from normalized shell output (bounds, text, resource-id, content-desc,
clickable, etc.).

Expose get_ui_elements() on Android AgentOs and implement it in the facade
and PpAdb path so callers get a flattened hierarchy string.

Register AndroidGetUIAutomatorHierarchyTool in the Android tool store for
act flows that need structure instead of screenshots.

Refresh pdm.lock for the otel dependency group and OpenTelemetry-related
package updates.
self,
x: int,
y: int,
from_agent: bool = True,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the from_agent mean?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“from_agent” indicates whether the coordinates were provided by the agent.

stopped and the UI has settled.
"""
self._check_if_device_is_selected()
assert self._device is not None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the assert not be integrated in _check_if_device_is_selected

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be 😄

assert self._device is not None
dump_cmd = f"uiautomator dump {self._UIAUTOMATOR_DUMP_PATH}"
dump_response = self.shell(dump_cmd)
if "dumped" not in dump_response.lower():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the "dumped" mean?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“dumped” is included in the response of a successful UI dump.

dump_response = self.shell(dump_cmd)
if "dumped" not in dump_response.lower():
msg = f"Failed to dump UI hierarchy: {dump_response}"
raise AndroidAgentOsError(msg)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to terminate the Agent Loop or is this error recoverably from the Agent??

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think so, since I assume the agent can auto-recover and use different methods, such as taking a screenshot or using the shell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants