|
This repository is a community fork of the original UI-TARS-desktop project by ByteDance. All original code, design, and intellectual property belong to ByteDance and its contributors.
|
English | 简体中文
TARS* is a Multimodal AI Agent stack, currently shipping two projects: Agent TARS and UI-TARS-desktop:
| 🤖 Agent TARS | 🖥️ UI-TARS-desktop |
|---|---|
agent-tars-book-hotel.mp4 |
computer-use-triple-speed.mp4 |
|
Agent TARS — A general multimodal AI Agent stack bringing GUI Agent and Vision into your terminal, computer, browser and product.
Ships with CLI and Web UI, integrating with MCP tools. |
UI-TARS Desktop — A desktop application providing native GUI Agent based on UI-TARS model.
Ships local and remote computer + browser operators. |
Click to expand / collapse
- [2025-11-05] 🎉 We're excited to announce the release of Agent TARS CLI v0.3.0! This version brings streaming support for multiple tools (shell commands, multi-file structured display), runtime settings with timing statistics for tool calls and deep thinking, Event Stream Viewer for data flow tracking and debugging. Additionally, it features exclusive support for AIO agent Sandbox as isolated all-in-one tools execution environment.
- [2025-06-25] We released a Agent TARS Beta and Agent TARS CLI - Introducing Agent TARS Beta, a multimodal AI agent that aims to explore a work form that is closer to human-like task completion through rich multimodal capabilities (such as GUI Agent, Vision) and seamless integration with various real-world tools.
- [2025-06-12] - 🎁 We are thrilled to announce the release of UI-TARS Desktop v0.2.0! This update introduces two powerful new features: Remote Computer Operator and Remote Browser Operator—both completely free. No configuration required: simply click to remotely control any computer or browser, and experience a new level of convenience and intelligence.
- [2025-04-17] - 🎉 We're thrilled to announce the release of new UI-TARS Desktop application v0.1.0, featuring a redesigned Agent UI. The application enhances the computer using experience, introduces new browser operation features, and supports the advanced UI-TARS-1.5 model for improved performance and precise control.
- [2025-02-20] - 📦 Introduced UI TARS SDK, is a powerful cross-platform toolkit for building GUI automation agents.
- [2025-01-23] - 🚀 We updated the Cloud Deployment section in the 中文版: GUI模型部署教程 with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.
Agent TARS is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product.
It primarily ships with a CLI and Web UI for usage.
It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world MCP tools.
Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline
agent-tars-new-flight.mp4
| Booking Hotel | Generate Chart with extra MCP Servers |
|---|---|
agent-tars-book-hotel.mp4 |
mcp-chart.mp4 |
| Instruction: I am in Los Angeles from September 1st to September 6th, with a budget of $5,000. Please help me book a Ritz-Carlton hotel closest to the airport on booking.com and compile a transportation guide for me | Instruction: Draw me a chart of Hangzhou's weather for one month |
For more use cases, please check out #842.
- 🖱️ One-Click Out-of-the-box CLI - Supports both headful Web UI and headless server) execution.
- 🌐 Hybrid Browser Agent - Control browsers using GUI Agent, DOM, or a hybrid strategy.
- 🔄 Event Stream - Protocol-driven Event Stream drives Context Engineering and Agent UI.
- 🧰 MCP Integration - The kernel is built on MCP and also supports mounting MCP Servers to connect to real-world tools.
# Launch with `npx`.
npx @agent-tars/cli@latest
# Install globally, required Node.js >= 22
npm install @agent-tars/cli@latest -g
# Run with your preferred model provider
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-keyVisit the comprehensive Quick Start guide for detailed setup instructions.
🌟 Explore Agent TARS Universe 🌟
UI-TARS Desktop is a native GUI agent for your local computer, driven by UI-TARS and Seed-1.5-VL/1.6 series models.
📑 Paper
| 🤗 Hugging Face Models
| 🫨 Discord
| 🤖 ModelScope
🖥️ Desktop Application
| 👓 Midscene (use in browser)
| Instruction | Local Operator | Remote Operator |
|---|---|---|
| Please help me open the autosave feature of VS Code and delay AutoSave operations for 500 milliseconds in the VS Code setting. | computer-use-triple-speed.mp4 |
remote-computer-operators.mp4 |
| Could you help me check the latest open issue of the UI-TARS-Desktop project on GitHub? | browser-use-triple-speed.mp4 |
remote-browser-operators.mp4 |
- 🤖 Natural language control powered by Vision-Language Model
- 🖥️ Screenshot and visual recognition support
- 🎯 Precise mouse and keyboard control
- 💻 Cross-platform support (Windows/MacOS/Browser)
- 🔄 Real-time feedback and status display
- 🔐 Private and secure - fully local processing
See Quick Start
This fork includes the following enhancements built on top of the original project. Every change is traceable to specific files.
A complete chat page with multi-turn conversation, streaming responses, multi-mode switching, and image/document upload.
New files created
| File | Description |
|---|---|
apps/ui-tars/src/renderer/src/pages/chat/index.tsx |
Main chat page — message list, input box, mode selector, image & document upload, streaming display |
apps/ui-tars/src/renderer/src/pages/chat/message-actions.tsx |
MessageActionBar component below each assistant message |
apps/ui-tars/src/renderer/src/pages/chat/mode-config.ts |
10+ chat mode definitions (Translation, Code, Creative Writing, etc.) with bilingual system prompts |
apps/ui-tars/src/renderer/src/hooks/useGlobalShortcuts.ts |
IPC listener for global shortcuts dispatched from main process |
Key implementation details:
- Session persistence — Messages stored to
localStorage(key:ui-tars-chat-<sessionId>) and auto-restored on navigation. Sessions created in IndexedDB viasessionManageron first user message so they appear in the sidebar history. - Streaming — Listens to
system:chatCompletion:chunkIPC events for real-time token streaming. - Mode system — Each mode injects a
systemprompt into the API call. Modes include sub-categories with quick-use templates. - Image upload —
FileReader→ base64, attached to the API request;imageBase64is stripped beforelocalStoragesave to prevent quota overflow (see Bug Fix #3). - Document analysis — Uses
api['system:analyzeDocument']with automatic VLM fallback if text extraction fails.
A toolbar rendered below every assistant message with 8 actions:
| Action | Detail |
|---|---|
| Copy | Dropdown: "Copy as Markdown" (raw) or "Copy as Plain Text" (regex strips #, **, backticks, links) |
| Like / Dislike | Toggle persistent reactions stored in localStorage (key: ui-tars-msg-reactions) |
| Regenerate | Drops trailing assistant messages, re-sends from last user message (see Bug Fix #1) |
| Edit as Document | Opens content in a new browser window with a minimal rich-text editor |
| Share | Uses navigator.share API (falls back to clipboard copy) |
| Delete | Removes a specific message from the conversation |
| Report | Flags the message with a "Reported, thank you!" toast |
Registered in the main process (apps/ui-tars/src/main/tray.ts) via Electron globalShortcut:
| Shortcut | Action | Implementation |
|---|---|---|
Ctrl+Shift+T |
Show / Hide main window | Toggle visibility in main process |
Ctrl+Shift+N |
New Chat | Sends shortcut:newChat IPC → renderer navigates to /chat with { newChat: true } state, preserving existing session |
Ctrl+Shift+, |
Open Settings | Sends shortcut:openSettings IPC → renderer calls openSettings() from Zustand store |
A settings panel (apps/ui-tars/src/renderer/src/components/Settings/category/optimization.tsx) managing 6 feature flags:
| Flag | Risk | Description |
|---|---|---|
enableRetry |
🟢 Low | Smart retry with intelligent backoff for failed critical actions |
enablePerformanceMonitor |
🟢 Low | Track operation timings, generate reports, detect slow operations |
enableVisualization |
🟢 Low | Expose IPC endpoints for performance dashboards |
enableOCR |
🟡 Medium | OCR text recognition via Tesseract.js |
enableMultiModel |
� Medium | Intelligent model selection, failover, performance-based routing |
enableWorkflow |
🟡 Medium | Record, replay, and manage automated task workflows |
- Low-risk flags enabled by default; medium-risk flags require an
AlertDialogconfirmation before enabling. - Persisted to
localStorage(key:ui-tars-optimization-flags). - Backend config class in
apps/ui-tars/src/main/services/optimizationConfig.tswithenableAll(),disableAll(),enableSafe()helpers.
Bug #1 — handleRegenerate race condition & duplicate messages
- File:
apps/ui-tars/src/renderer/src/pages/chat/index.tsx(lines 384–441) - Problem: The original implementation called
setMessages()(async) to remove the last assistant message, then immediately calleddoSend(). BecausedoSendcapturedmessagesfrom its closure (stale value), the API received the old conversation history including the deleted reply, and a duplicate user message was appended. - Fix: Rewrote
handleRegenerateas a self-contained async function that: ① synchronously computeskeptMessages = messages.slice(0, lastUserIdx + 1), ② sets state with the truncated array, ③ calls the API directly withkeptMessages— no closure staleness.
Bug #2 — Orphaned localStorage data on session deletion
- File:
apps/ui-tars/src/renderer/src/components/SideBar/app-sidebar.tsx(lines 151–164) - Problem: When a session was deleted from the sidebar,
sessionManager.deleteSession()cleaned IndexedDB, but the correspondinglocalStorageentry (ui-tars-chat-<sessionId>) and theui-tars-chat-current-sessionmarker were never removed, leading to unbounded data accumulation. - Fix: Added
localStorage.removeItem()calls for both keys insideonSessionDelete.
Bug #3 — localStorage quota exceeded by imageBase64
- File:
apps/ui-tars/src/renderer/src/pages/chat/index.tsx(lines 66–72) - Problem:
ChatMessageobjects containingimageBase64(hundreds of KB to several MB per image) were serialized directly intolocalStorage, easily exceeding the 5–10 MB browser quota and silently failing. - Fix:
saveSessionMessages()now strips theimageBase64field via destructuring beforeJSON.stringify:msgs.map(({ imageBase64, ...rest }) => rest).
| Category | Files | Description |
|---|---|---|
| Build scripts | scripts/build/build-quick.bat, scripts/build/build-windows.bat |
Quick build (skip type checks) and full Windows packaging with environment validation |
| Dev scripts | scripts/dev/start-secure.bat |
Secure startup with security audit, Node.js/pnpm checks, and .env.local validation |
| Test scripts | scripts/test/test-integration.bat |
Integration test runner with TypeScript compilation |
| Release scripts | scripts/release/release-*.sh |
Beta and release package publishing (moved from root) |
| Security | docs/build-and-security/SECURITY_SETUP.zh-CN.md, docs/build-and-security/README.SECURITY.md |
Security configuration guides |
| Documentation | docs/README.md, docs/development/, docs/optimization/, docs/testing/, docs/build-and-security/, docs/community/ |
All .md files reorganized into categorized subfolders with an index |
📂 For week-by-week implementation records, see the docs/development/ folder.
We would like to express our sincere gratitude to:
| Credit | |
|---|---|
| 🏢 ByteDance & Seed Team | Creating and open-sourcing UI-TARS-desktop under Apache 2.0 |
| 🧠 UI-TARS Model Team | The groundbreaking vision-language model powering GUI automation |
| 👥 Original Contributors | Building the Agent TARS ecosystem — CLI, Web UI, SDK, desktop app |
| 🌍 Open-Source Community | Feedback, bug reports, and feature suggestions |
This fork would not exist without the excellent foundation provided by ByteDance's engineering team. We are deeply grateful for their contributions to the AI agent community.
See CONTRIBUTING.md.
This project is licensed under the Apache License 2.0.
If you find the original paper and code useful in your research, please give a ⭐ to the original project and cite the original authors' work:
BibTeX
@article{qin2025ui,
title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
journal={arXiv preprint arXiv:2501.12326},
year={2025}
}Built with ❤️ on top of ByteDance's UI-TARS-desktop
