Skip to content

Proposal: richer element selector grammar for winapp ui #563

@LegendaryBlair

Description

@LegendaryBlair

Proposal: richer element selector grammar for winapp ui

The gap

Today winapp ui resolves a selector string in essentially two modes:

  1. AutomationId exact match (invoke '<id>', set-value '<id>', get-property '<id>', etc.) — works when the target has a stable AutomationProperties.AutomationId.
  2. Substring search across Name + AutomationId (search '<text>' --json) — returns up to ~4 matches with no scoping refinement.

Reality of any real WinUI 3 / WPF / Win32 app: many elements have NO AutomationId (layout thumbnails, generated ListItems, context-menu items, ComboBoxItem entries, …). For those the only options today are inspect → text or JSON output → client-side regex / Where-Object filter → re-issue a follow-up command with the extracted id.

That dance is verbose, fragile (stale-tree race between inspect and the follow-up; runtimeId isn't stable across UIA tree refreshes), and impossible when the inner element simply has no stable identity of its own. Every author re-invents the same client-side filtering boilerplate.

Direction (not a locked-in design)

Add a small CSS/jQuery-flavored selector vocabulary to winapp ui selector strings. The shape I'd propose, in priority order:

  1. Attribute predicates[Name="Save"], [AutomationId^="itm-"], [Name*="ettings"], regex variant [AutomationId~=/^itm-.+/], plus :not(...). Backed by UIA PropertyCondition + AndCondition. (P0 — highest payoff, additive.)
  2. Bare ControlType filterwinapp ui search 'Button' -w $h, 'ListItem', 'ComboBoxItem', …; * for any. One PropertyCondition, trivial to implement. (P0.)
  3. Hierarchy combinators — descendant ( ) and direct-child (>); #Id as a shortcut for [AutomationId="Id"]. E.g. '#ItemsList > ListItem[AutomationId^="itm-"]'. Backed by TreeWalker. Sibling combinators (+, ~) can land in a v2. (P1 — unlocks selection in regions without stable AutomationIds, which is most of WinUI 3 today.)
  4. Element-anchored scope — a -e <ElementHandle> flag so a found element can become the root for a subsequent command, instead of always re-rooting at the window. Handle could be a stateless RuntimeId-encoded token (same approach Selenium uses for WebElement). (P1.)
  5. Pseudo-classes for state:enabled, :disabled, :visible (!IsOffscreen), :focused, :checked. Pairs especially well with wait-for, which today only checks presence: winapp ui wait-for 'Button[Name="Save"]:enabled' --timeout 5000. (P2.)

Before / after

# Today — find an enabled "Copy" button inside MainPanel
$ins = winapp ui inspect 'MainPanel' -w $h --depth 3 --json | ConvertFrom-Json
$btn = $ins.children | Where-Object { $_.controlType -eq 'Button' -and $_.name -eq 'Copy' -and $_.properties.IsEnabled }
if (-not $btn) { throw "no enabled Copy button" }
winapp ui invoke $btn.automationId -w $h
# After
winapp ui invoke '#MainPanel Button[Name="Copy"]:enabled' -w $h

Half the code, no JSON poking, no client-side filter, no stale-tree race window, and it works even when the inner element has no AutomationId of its own.

Hard limits / honest scope

  • XPath axes (ancestor::, following::), :has() / :contains(), selector caching / compilation, and JS-style callback predicates are explicitly out of scope — additive later if demand emerges.
  • Type names should be UIA LocalizedControlType (case-insensitive), not Win32 ClassName, to stay stable across tech stacks.

Open questions worth deciding up front

  1. Does the existing CLI already do any selector-string parsing beyond bare AutomationId? Any existing grammar should be honored.
  2. When a selector matches N elements, does search --json keep its current shape, or evolve to a richer per-element record (with hierarchical-path info for debugging)?
  3. Should wait-for accept the new grammar from day 1, or stay scoped to bare-AutomationId presence and adopt selectors later?
  4. Element-handle lifecycle for proposal 4: stateless RuntimeId-encoded token vs server-side cache? Recommend stateless.

Happy to write up the full grammar, parser test corpus, per-proposal implementation notes, and priority/ROI matrix as a follow-up doc / PR once the direction is acknowledged.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions