This document describes the currently implemented Python API surface for OmniUI Phase 1.
Main entry point:
from omniui import OmniUISession factory:
client = OmniUI.connect(...)Concrete session type:
OmniUIClient
Core model types:
SelectorActionResultActionTraceResolvedElementActionLogEntryUISnapshotUIDiffRecordedEventRecordedScript
OmniUI.connect(base_url="http://127.0.0.1:48100", app_name="LoginDemo", pid=None, ocr_engine=None, vision_engine=None, step_delay=0.0)
Create a client session against the local OmniUI agent.
Parameters:
base_url: strLocal OmniUI agent base URL.app_name: strLogical target application name sent to the Java agent session endpoint.pid: int | NoneOptional process identifier for future attach scenarios.ocr_engine: SimpleOcrEngine | NoneOptional OCR provider implementation.vision_engine: SimpleVisionEngine | NoneOptional vision provider implementation.step_delay: floatSeconds to sleep after each action. Default0.0(no delay). Useful for slowing down script playback in demos. Can be overridden per-call via thedelayargument on individual action methods.
Returns:
OmniUIClient
Behavior:
- Calls
GET /health - Calls
POST /sessions - Raises
RuntimeErrorif the agent does not report healthy status
Playback speed example:
# 0.5s pause after every action — useful for demos and presentations
client = OmniUI.connect(port=48102, step_delay=0.5)
# Per-call override: this specific click waits 1 second
client.click(id="tbNew", delay=1.0)
# No delay (default behaviour)
client = OmniUI.connect(port=48102)Fetch a JavaFX node snapshot from the agent.
Current node fields include:
handlefxIdnodeTypetexthierarchyPathvisibleenabled
Normalize selector input without executing any action.
Current selector fields:
idtexttypeindex— 0-based integer; picks the Nth node among all nodes matching the other fields (default: 0)
Normalization rules:
- empty strings are converted to
None - values are stripped before use
Execute a click action.
Parameters:
delay: float | None— per-call sleep (seconds) after the action, overrides the globalstep_delay.
Resolution order:
- JavaFX direct resolution through the agent
- refresh + retry if the initial failure reason is
selector_not_found - OCR fallback for text selectors
- vision fallback if OCR does not resolve and a
templateis provided
Notes:
- In the current repo, OCR and vision fallback resolve the target and trace it, but do not yet dispatch a real OS-level click.
- For JavaFX-resolved nodes, the Java agent uses direct node-level interaction.
Execute a text input action through the JavaFX path.
Parameters:
text: strdelay: float | None— per-call sleep (seconds) after the action.- selector fields such as
id,text,type
Resolve an element and fetch its text through the JavaFX path.
Read the tooltip text of a node.
result.value— the tooltip text string, or""if the node has no tooltip attachedresult.okisFalse(with reasonselector_not_found) if the node cannot be resolved
tip = client.get_tooltip(id="submitButton")
assert tip.value == "Click to submit the form"
# Node without a tooltip
tip2 = client.get_tooltip(id="statusLabel")
assert tip2.value == ""Read the inline CSS style string of a node (node.getStyle()).
result.value— inline style string (e.g."-fx-text-fill: red;") or""if not set- Only reflects styles set via
setStyle()— stylesheet-applied styles are not returned
style = client.get_style(id="validationLabel")
assert "-fx-text-fill: red" in style.value
# No inline style
style2 = client.get_style(id="status")
assert style2.value == ""Read the list of CSS class names applied to a node (node.getStyleClass()).
result.value—list[str]of CSS class names (e.g.["button", "error"])
classes = client.get_style_class(id="loginButton")
assert "button" in classes.value
# Validation error class
classes2 = client.get_style_class(id="idField")
assert "error" in classes2.valueResolve an element, fetch its text, and compare it to expected using the given match mode.
match value |
Behaviour |
|---|---|
"exact" (default) |
actual == expected |
"contains" |
expected in actual |
"starts_with" |
actual.startswith(expected) |
"regex" |
re.search(expected, actual) — matches anywhere in the string |
Raises ValueError for unknown match values.
Return semantics:
result.okisTrueonly when the comparison succeedsresult.value:
{
"actual": ...,
"expected": ...,
"match": "exact" | "contains" | "starts_with" | "regex",
"matches": True | False,
}Examples
client.verify_text("Login Flow", id="loginSectionTitle") # exact (default)
client.verify_text("Login", match="contains", id="loginSectionTitle") # substring
client.verify_text("Login", match="starts_with", id="loginSectionTitle") # prefix
client.verify_text(r"^Login \w+$", match="regex", id="loginSectionTitle") # regexFind the nearest ScrollPane ancestor of the resolved node and adjust its vvalue so the node is visible in the viewport.
The vvalue is computed from the node's position within the scroll content:
vvalue = nodeTop / (contentHeight - viewportHeight) # clamped 0–1
Example
client.scroll_to(id="scrollRow29") # bring row 29 into viewAdjust the hvalue / vvalue of the resolved ScrollPane by a relative offset. Both delta_x and delta_y are in the 0.0–1.0 normalised range.
- Positive
delta_y→ scroll down; negative → scroll up - If no selector is given the first
ScrollPanefound in the scene is used delay: float | None— per-call sleep (seconds) after the action.
Examples
client.scroll_by(0.0, 0.2, id="myScrollPane") # scroll down 20%
client.scroll_by(0.0, -1.0, id="myScrollPane") # scroll back to top
client.scroll_by(0.0, 0.5) # first ScrollPane in sceneRight-click a node to open its context menu. Waits for the context menu overlay to appear.
Fire a double-click (synthesized MouseEvent.MOUSE_CLICKED with clickCount=2) on the target node.
Use for interactions such as expanding TreeView items, opening detail views from ListView/TableView rows, or any custom setOnMouseClicked handler that inspects event.getClickCount() == 2.
delay: float | None— per-call sleep (seconds) after the action.
client.double_click(id="myTreeItem")
client.double_click(text="Record 1", type="ListCell")Start a drag gesture from the node matched by selector. Returns a fluent builder — chain with .to() or .to_coords().
Fires MOUSE_PRESSED → 5 × MOUSE_DRAGGED (interpolated) → MOUSE_RELEASED on the JavaFX scene root.
delay: float | None— per-call sleep (seconds) after the action.
# Drag one node to another
client.drag(id="sourceItem").to(id="targetItem")
# Drag to absolute scene coordinates
client.drag(id="handle").to_coords(x=300, y=200)Drag the node matched by selector to absolute scene-relative coordinates (to_x, to_y).
client.drag_to(id="handle", to_x=300, to_y=200)Fire KEY_PRESSED + KEY_RELEASED for the given key string.
Format: "[Modifier+]*Key" — case-insensitive. Aliases: Ctrl = Control, Win = Meta.
delay: float | None— per-call sleep (seconds) after the action.
| Example | Meaning |
|---|---|
"Escape" |
Press Escape key |
"Enter" |
Press Enter key |
"Tab" |
Press Tab key |
"Control+C" |
Ctrl+C |
"ctrl+z" |
Ctrl+Z (alias, lowercase OK) |
"Shift+Tab" |
Shift+Tab |
"Control+Shift+Z" |
Ctrl+Shift+Z |
If selector is provided, the event fires on that node. If omitted, it fires on the scene's current focus owner.
client.press_key("Escape") # global, no selector
client.press_key("Tab", id="username") # on specific node
client.press_key("Control+A", id="username") # select all
client.press_key("ctrl+z") # alias acceptedOpen a top-level menu in a MenuBar. Waits for the menu popup to appear.
Parameters:
menu: str— label of the top-level menu item (e.g."File")- selector fields to identify the MenuBar node
Open a MenuBar menu and navigate to a nested item.
Parameters:
path: str— slash-separated path (e.g."Edit/Advanced/Reformat")- selector fields to identify the MenuBar node
Click a visible menu item by label. Requires a menu or context menu to already be open.
Parameters:
item: str— exact label text of the menu item
Close the currently open menu or context menu popup.
Set a DatePicker value directly without opening the popup. Preferred over open_datepicker + pick_date.
Parameters:
date: str— ISO-8601 date string, e.g."2025-09-15"- selector fields to identify the DatePicker node
Click a DatePicker to open its calendar popup. Waits for the popup to appear.
Navigate the open DatePicker calendar by one month. Requires the calendar popup to be open.
Parameters:
direction: str—"forward"or"backward"
Click a date cell in the open DatePicker calendar.
Parameters:
date: str— ISO-8601 date string
Read the currently visible Alert dialog. Returns button labels and content text.
result.value shape:
{
"title": "...",
"header": "...",
"content": "...",
"buttons": ["OK", "Cancel", ...]
}Close the currently visible Alert dialog.
Parameters:
button: str | None— button label to click (e.g."OK"). IfNone, clicks the default button.
Select an item by value in a ComboBox, ChoiceBox, or ListView.
Parameters:
value: str— item label to select
Read the selected state of a CheckBox, RadioButton, or ToggleButton.
result.value: True or False.
Set the selected state of a CheckBox, RadioButton, or ToggleButton.
Parameters:
value: bool
Select multiple items by value in a ListView (or any control backed by MultipleSelectionModel).
Parameters:
values: list[str]— ordered list of item labels to select simultaneously
The control must have SelectionMode.MULTIPLE enabled; otherwise only the last value is kept.
result = client.select_multiple(["alpha-node", "gamma-node"], id="serverList")
assert result.ok
# Verify
items = client.get_selected_items(id="serverList")
assert set(items.value) == {"alpha-node", "gamma-node"}Return all currently selected items in a multi-select control.
result.value—list[str]of selected item labels (in selection order)
items = client.get_selected_items(id="serverList")
assert "alpha-node" in items.valueSet the value of a Slider.
Parameters:
value: float
Set the value of a Spinner directly.
Parameters:
value: int | float
Increment or decrement a Spinner by a number of steps.
Parameters:
steps: int— positive to increment, negative to decrement
Read the current progress of a ProgressBar or ProgressIndicator.
result.value: float in range [0.0, 1.0].
Read the current value of a generic value-bearing node (e.g. Slider, DatePicker).
List the tab labels of a TabPane.
result.value: list[str].
Select a tab by label in a TabPane.
Parameters:
tab: str— exact tab label
Expand a TitledPane inside an Accordion.
Collapse a TitledPane inside an Accordion.
Read whether a TitledPane is expanded.
result.value: True or False.
Read whether a Hyperlink has been visited.
result.value: True or False.
Select a row in a TreeTableView by matching cell content.
Parameters:
value: str— cell text to matchcolumn: str | None— optional column name to restrict the match
Read the text of a specific cell.
Parameters:
row: str— row identifier (item value)column: str— column name
Expand a tree item in a TreeTableView by matching its cell value.
Collapse a tree item in a TreeTableView by matching its cell value.
Read whether a tree item is expanded.
result.value: True or False.
Read the text value of a cell. row and col are 0-based integers.
result.value: cell text string.
Click a cell in the TableView.
Double-click a cell to enter edit mode and type value, then commit with Enter.
Sort a column. direction: "asc" | "desc" | None (toggle).
Return a list of item descriptors for all items in a ToolBar.
result.value: list of dicts, each with keys fxId, text, type, disabled.
result = client.get_toolbar_items(id="mainToolBar")
for item in result.value:
print(item["fxId"], item["text"], item["disabled"])Read the current value of a standalone ScrollBar node.
result.value: dict with keys value, min, max (all float).
Set the scroll position. Values outside [min, max] are silently clamped.
Read the current page index and total page count of a Pagination control.
result.value: dict with keys page (int, 0-based) and page_count (int).
Jump to a specific page (0-based). Out-of-range values are silently clamped.
Advance one page. No-op at the last page.
Go back one page. No-op at the first page.
Return a reusable Locator bound to this client and the given selector.
The locator stores the selector and re-uses it on every subsequent call,
removing the need to repeat id=... (or any other selector keyword) on
every action.
btn = client.locator(id="loginBtn")
btn.wait_for_visible()
btn.click()
btn.verify_text("Login")Notes:
- Locators created with
id=...cache the most recent successful JavaFX match. - If that
fx:idlater disappears, OmniUI retries with cachedtext, then cachedtype+index, and records the fallback inresult.trace.details["self_heal"].
Raises ValueError if called with no selector keywords.
All methods below are equivalent to calling the corresponding client.*
method with the stored selector merged in.
Interaction
loc.click()loc.double_click()loc.right_click()loc.press_key(key: str)loc.type(text: str)
Text / content
loc.get_text() -> ActionResultloc.verify_text(expected: str, *, match: str = "exact") -> ActionResultloc.get_tooltip() -> ActionResult
Style
loc.get_style() -> ActionResultloc.get_style_class() -> ActionResult
State queries
loc.is_visible() -> boolloc.is_enabled() -> boolloc.is_visited() -> bool
Value / selection
loc.get_value() -> ActionResultloc.get_progress() -> ActionResultloc.get_selected() -> ActionResultloc.get_selected_items() -> ActionResultloc.select(value: str) -> ActionResultloc.select_multiple(values: list[str]) -> ActionResultloc.set_selected(value: bool) -> ActionResultloc.set_slider(value: float) -> ActionResultloc.set_spinner(value: str) -> ActionResultloc.step_spinner(steps: int) -> ActionResult
Tabs
loc.get_tabs() -> ActionResultloc.select_tab(tab: str) -> ActionResult
Scroll
loc.scroll_to() -> ActionResultloc.scroll_by(delta_x: float, delta_y: float) -> ActionResult
Accordion / Pane
loc.expand_pane() -> ActionResultloc.collapse_pane() -> ActionResultloc.get_expanded() -> ActionResult
Wait conditions — require the locator to have been created with id=
loc.wait_for_visible(timeout: float = 5.0)loc.wait_for_enabled(timeout: float = 5.0)loc.wait_for_node(timeout: float = 5.0)loc.wait_for_text(expected: str, timeout: float = 5.0)loc.wait_for_value(expected: str, timeout: float = 5.0)
Raises ValueError if the locator was not created with id=.
Fetch the screenshot payload from the agent.
Current implementation note:
- In the demo/reference path, screenshot bytes may be OCR-friendly text fixture content rather than a real PNG bitmap.
Run the configured OCR engine against image bytes.
Current return fields:
textconfidencebounds
Run the configured vision engine against the latest screenshot.
Current return fields:
matchedconfidencebounds
Return the recorded action log for the current client session.
The list contains completed action results only.
Return True if the matched node is currently visible.
Returns False if no node matches the selector (does not raise).
if client.is_visible(id="submitButton"):
client.click(id="submitButton")Return True if the matched node is currently enabled (not disabled).
Returns False if no node matches the selector (does not raise).
assert client.is_enabled(id="loginButton"), "Login button should be enabled"
assert not client.is_enabled(id="__no_such_node__") # always FalsePoll-based helpers that block until a UI state condition is met or raise TimeoutError if the timeout expires. All methods accept timeout (seconds, default 5.0) and interval (poll period in seconds, default 0.2).
Block until the text of node id equals expected.
client.click(id="loadButton")
client.wait_for_text("statusLabel", "Done", timeout=10.0)Block until node id is visible (is_visible returns True).
client.wait_for_visible("resultPanel")Block until node id is enabled (is_enabled returns True).
client.wait_for_enabled("submitButton")Block until a node with fxId == id appears in node discovery.
client.wait_for_node("dynamicWidget", timeout=3.0)Alias for wait_for_text. Provided for readability in value-assertion contexts.
client.wait_for_value("totalField", "42")Records user interactions with the JavaFX app and generates a replayable Python script.
Attaches an EventFilter to the JavaFX Scene to intercept MOUSE_CLICKED and KEY_TYPED events. Sets client._recording = True.
client.start_recording()Removes the EventFilter, flushes buffered events from the agent, runs selector inference on each event, and generates a Python test script. Sets client._recording = False.
script = client.stop_recording()
print(f"{len(script.events)} events, {len(script.script)} chars")
script.save("recorded_test.py")Returns a RecordedScript instance.
Trigger graceful shutdown of the JavaFX application and the agent HTTP server.
Internally schedules System.exit(0) after a 200 ms delay (to allow the HTTP response to flush), then terminates the full JVM. The agent also installs an omniui-exit-monitor daemon thread at startup that automatically calls System.exit(0) when the JavaFX Application Thread exits — so closing the window via the OS [x] button also terminates the process cleanly.
This should be the last call in a session. Subsequent API calls will raise connection errors as the JVM exits.
# Clean up after test suite
client.close_app()The currently supported selector surface is:
{
"id": "...",
"text": "...",
"type": "...",
}Typical usage:
client.click(id="loginButton")
client.click(text="Login", type="Button")Additional current behavior:
click(..., template=b"...")is accepted by the internal fallback pipeline for vision matching
Fields:
id: str | Nonetext: str | Nonetype: str | None
Fields:
tier: strtarget_ref: strselector_used: Selectormatched_attributes: dict[str, Any]confidence: float | Nonedebug_context: dict[str, Any]
Tier values currently used:
javafxocrvision
Fields:
selector: Selectorattempted_tiers: list[str]resolved_tier: str | Noneconfidence: float | Nonedetails: dict[str, Any]
Typical attempted tier sequences:
["javafx"]["javafx", "refresh"]["javafx", "refresh", "ocr"]["javafx", "refresh", "ocr", "vision"]
Fields:
ok: booltrace: ActionTraceresolved: ResolvedElement | Nonevalue: Any
Fields:
action: strtimestamp: datetimeresult: ActionResult
A frozen snapshot of the scene graph, captured by client.snapshot().
Fields:
nodes: list[dict[str, Any]]— full node list at capture timetimestamp: float— Unix timestamp
Methods:
save(path: str | Path) -> None— write to JSON fileUISnapshot.load(path: str | Path) -> UISnapshot— restore from JSON file
Result of comparing two UISnapshot instances via client.diff(before, after).
Fields:
added: list[dict]removed: list[dict]changed: list[dict]
A single interaction event captured during a recording session.
Fields:
event_type: str—"click"or"type"fx_id: str—fx:idof the target node (empty string if unknown)text: str— typed text (for"type"events)node_type: str— JavaFX class short name (e.g.,"Button")node_index: int— zero-based index among same-type siblingstimestamp: float— Unix timestamp
Output of client.stop_recording().
Fields:
events: list[RecordedEvent]script: str— generated Python test script string
Methods:
save(path: str | Path) -> None— write script to file
Current default OCR provider:
SimpleOcrEngine
Method:
read(image: bytes) -> list[OcrMatch]
Current default vision provider:
SimpleVisionEngine
Method:
match(image: bytes, template: bytes) -> VisionMatch
These are deterministic placeholder implementations intended to keep Phase 1 executable before production OCR or vision libraries are integrated.
from omniui import OmniUI
client = OmniUI.connect(app_name="LoginDemo")
client.click(id="username")
client.type("admin", id="username")
client.click(id="password")
client.type("1234", id="password")
client.click(id="loginButton")
client.verify_text(id="status", expected="Success")client.click(text="Login")Run all included component demos end-to-end:
python demo/python/run_all.pyAll demos connect to the same running JavaFX app with the agent enabled.
# MenuBar: File > New
client.navigate_menu("File/New", id="demoMenuBar")
# ContextMenu: right-click a node then pick an item
client.right_click(id="someNode")
client.click_menu_item("Copy")# Direct set (preferred)
client.set_date("2025-09-15", id="demoPicker")client.click(id="infoAlertButton")
result = client.get_dialog()
print(result.value["content"])
client.dismiss_dialog("OK")client.expand_tree_table_item("Engineering", id="demoTreeTable")
client.select_tree_table_row("Alice", id="demoTreeTable")
cell = client.get_tree_table_cell("Alice", "Name", id="demoTreeTable")- No formal generated API reference yet; this document is manually maintained.
- Fallback click currently resolves and records bounds, but does not issue a real OS click.
type()currently depends on JavaFX direct interaction and does not have OCR/vision fallback.find()normalizes selectors only; it does not resolve them against the application.
Base class for the Page Object Model pattern. Subclass this and add methods that group related UI actions for a single screen or component.
`python from omniui import OmniUI, OmniPage
class LoginPage(OmniPage): def login(self, username: str, password: str) -> None: self.client.input_text(id="username", text=username) self.client.input_text(id="password", text=password) self.client.click(id="loginButton")
def get_status(self) -> str:
return self.client.get_text(id="statusLabel").value
client = OmniUI.connect(port=48100) page = LoginPage(client) page.login("admin", "secret") assert page.get_status() == "Welcome" `
Shorthand for self.client.locator(**selector).