Replies: 2 comments 1 reply
-
|
This is great. Let me see the best way to include this. |
Beta Was this translation helpful? Give feedback.
1 reply
-
|
A2UI v0.9 surface support #117 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
One of the core innovations of OpenClaw lies in its breakthrough beyond the limitations of pure text interaction in traditional AI assistants. Through the Canvas and A2UI (Agent-to-UI) protocol, it empowers AI agents with the ability to directly render and manipulate visual interfaces on your device screen. This marks the evolution of AI from a "conversationalist" to an "executor" and "presenter."
I. Canvas: AI's Dedicated Visual Workspace
Canvas is a visual panel embedded within the OpenClaw client (iOS, Android, macOS). You can think of it as a dedicated "screen" for AI agents.
Nature and Positioning: It is a lightweight visual workspace rendered based on WKWebView (on macOS), supporting full HTML, CSS, and JavaScript. Its design philosophy is to serve as an "Agent-driven visual workspace," allowing AI not only to speak but also to display.
Core Functions: Through Canvas, agents can perform various visual operations:
Display Web Pages: Navigate and open any webpage within Canvas's WebView (navigate).
Execute Scripts: Inject and execute JavaScript code (eval) within the context of a loaded page to achieve dynamic interactions, such as modifying styles or auto-filling forms.
Capture State: Capture the current content displayed on Canvas (snapshot) at any time for user reporting or as a visual reference for the next operation.
Control Visibility: Expand (present) or hide (hide) the Canvas panel itself.
Access and Storage: Canvas content is stored in a specific session directory on the local device (e.g., ~/Library/Application Support/OpenClaw/canvas// on macOS) and accessed through a custom URL scheme (openclaw-canvas://), ensuring security and localized performance.
II. A2UI: Making AI Your Interface Designer
A2UI is a structured protocol defined by OpenClaw, with the purpose of enabling AI agents to generate and push complete interactive user interfaces for direct rendering on Canvas. This is the most powerful function of the Canvas tool.
Working Principle: Unlike traditional web development models, in the A2UI mode, AI writes or assembles UI components in real-time based on the conversation context and pushes them to the user's Canvas for rendering via the protocol. The results of user interaction with the interface can then be fed back to the AI, forming a closed loop. The AI's output is not HTML but structured JSONL frames (one JSON object per line), with each frame describing a UI component (such as text, button, card, etc.).
Core Commands: The A2UI host engine primarily handles four core commands, forming the basic primitives for AI to control the front-end:
Push: Pushes the generated UI code to Canvas for rendering.
Reset: Clears all A2UI-rendered content on Canvas, restoring it to a blank state.
Eval: Allows the AI to send a piece of JavaScript script to be executed in a secure sandbox on Canvas, enabling dynamic logic (such as countdowns, animations).
Snapshot: Captures the current state of Canvas and sends it back to the AI, enabling the AI to "see" the results of the interface it pushed. This facilitates debugging and iteration based on visual feedback, forming a true "visual closed loop."
Format and Compatibility: Currently, OpenClaw's Canvas primarily supports the A2UI v0.8 JSONL format. The v0.9 version and related createSurface interface are not yet supported. In practical use, users typically only need to describe their needs in natural language, and the agent will automatically generate the correct A2UI frames and call the a2ui_push action.
III. Application Scenarios and Value of Canvas & A2UI
Combining Canvas and A2UI, OpenClaw can achieve rich application scenarios:
Dynamic Information Display: AI can instantly generate data dashboards, charts, to-do lists, or weather cards and push them to your phone or computer screen, making information presentation more intuitive.
Interactive Tasks: Users can interact through AI-generated interfaces, such as filling out forms or clicking buttons to confirm actions. These interaction events can trigger subsequent AI tasks.
Visual Monitoring of Automated Processes: When combined with tools like browser automation, Canvas can be used to display the progress of automated execution or screenshots of key steps, giving users a clear perception of the automation process.
Lowering the Barrier to Entry: Users do not need to write any code. They can obtain customized visual interfaces simply by using natural language instructions (e.g., "Help me generate a dashboard for this week's task progress and push it to my phone's Canvas"), greatly enhancing the practicality and ease of use of AI assistants.
Beta Was this translation helpful? Give feedback.
All reactions