Skip to content

caryhsu/omniui

Repository files navigation

OmniUI

繁體中文說明

OmniUI is a multi-modal UI automation framework with a JavaFX-first execution path.

Phase 1 focuses on local JavaFX automation with this priority order:

  • JavaFX scene graph resolution and direct event-level interaction
  • OCR text fallback
  • Vision template-match fallback

The current repo includes:

Status

Implemented in this repo today:

Core infrastructure

  • JavaFX node discovery through a live JavaFX runtime reached by the local Java agent
  • selector fallback chain: javafx -> ocr -> vision
  • action tracing, retry-after-refresh, and action history
  • recorder-lite script generation for stable click selectors

Actions — basic interaction

  • click, right_click, double_click, type, get_text, verify_text, get_tooltip, get_style, get_style_class, scroll_to, scroll_by
  • press_key(key, **selector) — keyboard shortcuts; format: "Ctrl+C", "Shift+Tab", "Escape", "Control+Shift+Z"
  • select (ComboBox / ChoiceBox / ListView), get_selected, set_selected (CheckBox / RadioButton / ToggleButton)
  • select_multiple(values, **selector), get_selected_items(**selector) — multi-select for ListView / TableView
  • is_visible, is_enabled — query node visibility / enabled state
  • wait_for_text, wait_for_visible, wait_for_enabled, wait_for_node, wait_for_value — poll-based wait conditions
  • close_app() — trigger graceful JavaFX application shutdown

Actions — menus

  • open_menu, navigate_menu, dismiss_menu, click_menu_item (MenuBar + ContextMenu)

Actions — DatePicker

  • open_datepicker, navigate_month, pick_date, set_date

Actions — dialogs

  • get_dialog, dismiss_dialog (Alert: information / warning / error / confirmation)

Actions — form controls

  • set_slider, set_spinner, step_spinner, get_progress, get_value

Actions — tabs

  • get_tabs, select_tab

Actions — Accordion

  • expand_pane, collapse_pane, get_expanded

Actions — Hyperlink

  • get_visited

Actions — TreeTableView

  • select_tree_table_row, get_tree_table_cell
  • expand_tree_table_item, collapse_tree_table_item, get_tree_table_expanded

Actions — ColorPicker

  • set_color, get_color, open_colorpicker, dismiss_colorpicker

Actions — SplitPane

  • get_divider_positions, set_divider_position

Selectors

  • All actions accept **selector fields: id, text, type, index
  • index=N (0-based) picks the Nth node among all nodes matching the other criteria — e.g. click(type="Button", index=1) clicks the second Button

Demo suite (all passing via python demo/python/run_all.py)

  • Login, ComboBox, ListView, TableView, TreeView, ContextMenu, MenuBar, DatePicker, Alert
  • RadioButton, Slider+Spinner, Progress, Tab, TextArea, PasswordField, Hyperlink
  • CheckBox, ChoiceBox, Accordion, TreeTableView, ColorPicker, SplitPane, Node State, Wait Conditions
  • Double-Click, Keyboard Shortcuts, Index Selector

Not implemented yet:

  • dynamic JVM attach to arbitrary third-party JavaFX processes
  • real OCR engine integration such as Tesseract or PaddleOCR
  • real template-matching backend such as OpenCV
  • OS-level click dispatch for fallback bounds

Layout

omniui/
  core/
  selector_engine/
  ocr_module/
  vision_module/
  recorder_lite/
java-agent/
demo/
  java/
    core-app/          ← ComboBox, ListView, TreeView, TableView, GridPane        (port 48100)
    input-app/         ← TextArea, Checkboxes, Sliders, ColorPicker, DatePicker … (port 48101)
    advanced-app/      ← ContextMenu, MenuBar, Dialogs, TabPane, Accordion …      (port 48102)
    drag-app/          ← Drag & Drop (label items → drop target)                  (port 48103)
    progress-app/      ← ProgressBar, ProgressIndicator, async jobs               (port 48104)
    image-app/         ← ImageView, image switching                               (port 48105)
    color-app/         ← ColorPicker, color result label                          (port 48106)
    todo-app/          ← TableView (editable), dialog, task management            (port 48107)
    login-app/         ← Login form (username, password, loginButton, status)     (port 48108)
    user-search-app/   ← TableView with pagination, live search/filter            (port 48109)
    dynamic-fxml-app/  ← FXML view loading, Form/Dashboard/List views             (port 48110)
    explorer-app/      ← TreeView file explorer, TableView file listing           (port 48111)
    settings-app/      ← TabPane settings form with full validation               (port 48112)
  python/
    core/              ← demo scripts for core-app
    input/             ← demo scripts for input-app
    advanced/          ← demo scripts for advanced-app
    drag/              ← demo scripts for drag-app
    progress/          ← demo scripts for progress-app
    image/             ← demo scripts for image-app
    color/             ← demo scripts for color-app
    todo/              ← demo scripts for todo-app
    settings/          ← demo scripts for settings-app
    dynamicfxml/       ← demo scripts for dynamic-fxml-app
    explorer/          ← demo scripts for explorer-app
    usersearch/        ← demo scripts for user-search-app
scripts/
openspec/

Prerequisites

  • Python 3.11+
  • Java 22+
  • Maven 3.9+
  • Windows is the currently validated desktop target for the demo app build

Quick Start

  1. Build all demo apps (Java agent + jlink runtime images):
scripts\build_demo_runtime.bat
  1. Start the JavaFX demo app (core-app, port 48100):
demo\java\core-app\run-dev-with-agent.bat

This launches the core demo window with the OmniUI Java agent enabled on http://127.0.0.1:48100.

  1. In another terminal, run the Python demo flow:
python scripts/demo_login_flow.py

Expected behavior:

  • username and password fields are driven through JavaFX direct interaction
  • the login button click is demonstrated through OCR fallback using text="Login"
  • the script verifies that the status label becomes Success
  1. Optional: run the benchmark script while the demo app is still running:
python scripts/benchmark_phase1.py

This reports average timings for:

  • JavaFX node query
  • OCR fallback parsing

More runnable demos:

Single command demo entry:

python scripts/run_demo.py

Packaged demo runtime helpers:

  • scripts/build_demo_runtime.ps1
  • scripts/build_demo_runtime.bat
  • scripts/build_demo_runtime.sh

After building, those helpers print the exact packaged with-agent and plain launchers to use next.

The .sh helper is currently intended for Git Bash on Windows. The demo app and packaged launcher workflow are still primarily documented and validated on Windows.

Python API

Documentation hub:

Full API reference:

Minimal example:

from omniui import OmniUI

client = OmniUI.connect(app_name="LoginDemo")

client.click(id="username")
client.type("admin", id="username")

client.click(id="password")
client.type("1234", id="password")

client.click(text="Login")
client.verify_text("Success", id="status")                                 # exact (default)
client.verify_text("Suc", match="contains", id="status")                  # contains
client.verify_text(r"^Succ", match="regex", id="status")                  # regex

Use the full API reference for parameters, result models, fallback semantics, and return fields.

Demo / presentation mode — slow down playback with step_delay:

# 0.5s pause after every action (global)
client = OmniUI.connect(port=48102, step_delay=0.5)

# Or override per call
client.click(id="tbNew", delay=1.0)

Recorder-lite

Recorder-lite currently works from the client action history and only emits stable click expressions.

Example output:

click(id="username")
click(text="Login")

Unsupported interactions are intentionally skipped instead of falling back to raw coordinates.

Testing

OmniUI has two test tiers:

Command What runs Launches UI? Speed
pytest or pytest tests/ Unit tests only (mocks, no real app) Fast (~0.5 s)
pytest tests/integration/ Integration tests (launches real JavaFX apps) Slow

Run the unit test suite:

python -m pytest tests/

Run integration tests (requires built Java agent + demo apps):

python -m pytest tests/integration/

Build the Java agent and demo apps:

# Install agent module to local Maven repo first
mvn install -f java-agent/pom.xml
# Then build the jlink runtime images for each demo app
mvn package javafx:jlink -f demo/java/core-app/pom.xml
mvn package javafx:jlink -f demo/java/input-app/pom.xml
mvn package javafx:jlink -f demo/java/advanced-app/pom.xml

Note: The jlink runtime image embeds dev.omniui.agent as a module. Always run mvn install -f java-agent/pom.xml before rebuilding the jlink image, or agent changes will not be picked up.

Check markdown i18n consistency:

python scripts/check_markdown_i18n.py

OpenSpec

The implementation was developed through the OpenSpec workflow. Change artifacts are under:

openspec/changes/add-omniui-javafx-automation-framework

Key artifacts:

Notes

  • The agent protocol is documented in docs/protocol/agent-protocol.md.
  • The Java agent is now the owner of HTTP startup and JavaFX runtime discovery for the demo support path.
  • The fallback engines in this repo are deterministic placeholder implementations intended to keep the Phase 1 architecture executable before integrating production OCR and vision libraries.

About

UI automation framework with JavaFX-first element discovery, direct runtime interaction, and OCR/vision fallback.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages