OmniUI is a multi-modal UI automation framework with a JavaFX-first execution path.
Phase 1 focuses on local JavaFX automation with this priority order:
- JavaFX scene graph resolution and direct event-level interaction
- OCR text fallback
- Vision template-match fallback
The current repo includes:
- a Python client in omniui
- a local HTTP Java agent in java-agent
- reference JavaFX demo apps in demo/java — 13 apps covering the full JavaFX control set
- demo and benchmark scripts in demo/python and scripts
Implemented in this repo today:
Core infrastructure
- JavaFX node discovery through a live JavaFX runtime reached by the local Java agent
- selector fallback chain:
javafx -> ocr -> vision - action tracing, retry-after-refresh, and action history
- recorder-lite script generation for stable click selectors
Actions — basic interaction
click,right_click,double_click,type,get_text,verify_text,get_tooltip,get_style,get_style_class,scroll_to,scroll_bypress_key(key, **selector)— keyboard shortcuts; format:"Ctrl+C","Shift+Tab","Escape","Control+Shift+Z"select(ComboBox / ChoiceBox / ListView),get_selected,set_selected(CheckBox / RadioButton / ToggleButton)select_multiple(values, **selector),get_selected_items(**selector)— multi-select for ListView / TableViewis_visible,is_enabled— query node visibility / enabled statewait_for_text,wait_for_visible,wait_for_enabled,wait_for_node,wait_for_value— poll-based wait conditionsclose_app()— trigger graceful JavaFX application shutdown
Actions — menus
open_menu,navigate_menu,dismiss_menu,click_menu_item(MenuBar + ContextMenu)
Actions — DatePicker
open_datepicker,navigate_month,pick_date,set_date
Actions — dialogs
get_dialog,dismiss_dialog(Alert: information / warning / error / confirmation)
Actions — form controls
set_slider,set_spinner,step_spinner,get_progress,get_value
Actions — tabs
get_tabs,select_tab
Actions — Accordion
expand_pane,collapse_pane,get_expanded
Actions — Hyperlink
get_visited
Actions — TreeTableView
select_tree_table_row,get_tree_table_cellexpand_tree_table_item,collapse_tree_table_item,get_tree_table_expanded
Actions — ColorPicker
set_color,get_color,open_colorpicker,dismiss_colorpicker
Actions — SplitPane
get_divider_positions,set_divider_position
Selectors
- All actions accept
**selectorfields:id,text,type,index index=N(0-based) picks the Nth node among all nodes matching the other criteria — e.g.click(type="Button", index=1)clicks the second Button
Demo suite (all passing via python demo/python/run_all.py)
- Login, ComboBox, ListView, TableView, TreeView, ContextMenu, MenuBar, DatePicker, Alert
- RadioButton, Slider+Spinner, Progress, Tab, TextArea, PasswordField, Hyperlink
- CheckBox, ChoiceBox, Accordion, TreeTableView, ColorPicker, SplitPane, Node State, Wait Conditions
- Double-Click, Keyboard Shortcuts, Index Selector
Not implemented yet:
- dynamic JVM attach to arbitrary third-party JavaFX processes
- real OCR engine integration such as Tesseract or PaddleOCR
- real template-matching backend such as OpenCV
- OS-level click dispatch for fallback bounds
omniui/
core/
selector_engine/
ocr_module/
vision_module/
recorder_lite/
java-agent/
demo/
java/
core-app/ ← ComboBox, ListView, TreeView, TableView, GridPane (port 48100)
input-app/ ← TextArea, Checkboxes, Sliders, ColorPicker, DatePicker … (port 48101)
advanced-app/ ← ContextMenu, MenuBar, Dialogs, TabPane, Accordion … (port 48102)
drag-app/ ← Drag & Drop (label items → drop target) (port 48103)
progress-app/ ← ProgressBar, ProgressIndicator, async jobs (port 48104)
image-app/ ← ImageView, image switching (port 48105)
color-app/ ← ColorPicker, color result label (port 48106)
todo-app/ ← TableView (editable), dialog, task management (port 48107)
login-app/ ← Login form (username, password, loginButton, status) (port 48108)
user-search-app/ ← TableView with pagination, live search/filter (port 48109)
dynamic-fxml-app/ ← FXML view loading, Form/Dashboard/List views (port 48110)
explorer-app/ ← TreeView file explorer, TableView file listing (port 48111)
settings-app/ ← TabPane settings form with full validation (port 48112)
python/
core/ ← demo scripts for core-app
input/ ← demo scripts for input-app
advanced/ ← demo scripts for advanced-app
drag/ ← demo scripts for drag-app
progress/ ← demo scripts for progress-app
image/ ← demo scripts for image-app
color/ ← demo scripts for color-app
todo/ ← demo scripts for todo-app
settings/ ← demo scripts for settings-app
dynamicfxml/ ← demo scripts for dynamic-fxml-app
explorer/ ← demo scripts for explorer-app
usersearch/ ← demo scripts for user-search-app
scripts/
openspec/
- Python 3.11+
- Java 22+
- Maven 3.9+
- Windows is the currently validated desktop target for the demo app build
- Build all demo apps (Java agent + jlink runtime images):
scripts\build_demo_runtime.bat- Start the JavaFX demo app (core-app, port 48100):
demo\java\core-app\run-dev-with-agent.batThis launches the core demo window with the OmniUI Java agent enabled on http://127.0.0.1:48100.
- In another terminal, run the Python demo flow:
python scripts/demo_login_flow.pyExpected behavior:
- username and password fields are driven through JavaFX direct interaction
- the login button click is demonstrated through OCR fallback using
text="Login" - the script verifies that the status label becomes
Success
- Optional: run the benchmark script while the demo app is still running:
python scripts/benchmark_phase1.pyThis reports average timings for:
- JavaFX node query
- OCR fallback parsing
More runnable demos:
Single command demo entry:
python scripts/run_demo.pyPackaged demo runtime helpers:
scripts/build_demo_runtime.ps1scripts/build_demo_runtime.batscripts/build_demo_runtime.sh
After building, those helpers print the exact packaged with-agent and plain launchers to use next.
The .sh helper is currently intended for Git Bash on Windows. The demo app and packaged launcher workflow are still primarily documented and validated on Windows.
Documentation hub:
Full API reference:
Minimal example:
from omniui import OmniUI
client = OmniUI.connect(app_name="LoginDemo")
client.click(id="username")
client.type("admin", id="username")
client.click(id="password")
client.type("1234", id="password")
client.click(text="Login")
client.verify_text("Success", id="status") # exact (default)
client.verify_text("Suc", match="contains", id="status") # contains
client.verify_text(r"^Succ", match="regex", id="status") # regexUse the full API reference for parameters, result models, fallback semantics, and return fields.
Demo / presentation mode — slow down playback with step_delay:
# 0.5s pause after every action (global)
client = OmniUI.connect(port=48102, step_delay=0.5)
# Or override per call
client.click(id="tbNew", delay=1.0)Recorder-lite currently works from the client action history and only emits stable click expressions.
Example output:
click(id="username")
click(text="Login")
Unsupported interactions are intentionally skipped instead of falling back to raw coordinates.
OmniUI has two test tiers:
| Command | What runs | Launches UI? | Speed |
|---|---|---|---|
pytest or pytest tests/ |
Unit tests only (mocks, no real app) | ❌ | Fast (~0.5 s) |
pytest tests/integration/ |
Integration tests (launches real JavaFX apps) | ✅ | Slow |
Run the unit test suite:
python -m pytest tests/Run integration tests (requires built Java agent + demo apps):
python -m pytest tests/integration/Build the Java agent and demo apps:
# Install agent module to local Maven repo first
mvn install -f java-agent/pom.xml
# Then build the jlink runtime images for each demo app
mvn package javafx:jlink -f demo/java/core-app/pom.xml
mvn package javafx:jlink -f demo/java/input-app/pom.xml
mvn package javafx:jlink -f demo/java/advanced-app/pom.xmlNote: The jlink runtime image embeds
dev.omniui.agentas a module. Always runmvn install -f java-agent/pom.xmlbefore rebuilding the jlink image, or agent changes will not be picked up.
Check markdown i18n consistency:
python scripts/check_markdown_i18n.pyThe implementation was developed through the OpenSpec workflow. Change artifacts are under:
openspec/changes/add-omniui-javafx-automation-framework
Key artifacts:
- The agent protocol is documented in docs/protocol/agent-protocol.md.
- The Java agent is now the owner of HTTP startup and JavaFX runtime discovery for the demo support path.
- The fallback engines in this repo are deterministic placeholder implementations intended to keep the Phase 1 architecture executable before integrating production OCR and vision libraries.