Skip to content

sjkncs/UI-TARS-desktop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Typing SVG

Upstream This Fork License

Stars Forks Watchers Profile Views

⚠️ Fork Notice / 项目声明

This repository is a community fork of the original UI-TARS-desktop project by ByteDance. All original code, design, and intellectual property belong to ByteDance and its contributors.

Link
🔗 Original Project github.com/bytedance/UI-TARS-desktop
🍴 This Fork github.com/sjkncs/UI-TARS-desktop

English | 简体中文


📖 Introduction

TARS* is a Multimodal AI Agent stack, currently shipping two projects: Agent TARS and UI-TARS-desktop:

🤖 Agent TARS 🖥️ UI-TARS-desktop
agent-tars-book-hotel.mp4
computer-use-triple-speed.mp4
Agent TARS — A general multimodal AI Agent stack bringing GUI Agent and Vision into your terminal, computer, browser and product.

Ships with CLI and Web UI, integrating with MCP tools.
UI-TARS Desktop — A desktop application providing native GUI Agent based on UI-TARS model.

Ships local and remote computer + browser operators.

📑 Table of Contents

Click to expand / collapse

📰 News

  • [2025-11-05] 🎉 We're excited to announce the release of Agent TARS CLI v0.3.0! This version brings streaming support for multiple tools (shell commands, multi-file structured display), runtime settings with timing statistics for tool calls and deep thinking, Event Stream Viewer for data flow tracking and debugging. Additionally, it features exclusive support for AIO agent Sandbox as isolated all-in-one tools execution environment.
  • [2025-06-25] We released a Agent TARS Beta and Agent TARS CLI - Introducing Agent TARS Beta, a multimodal AI agent that aims to explore a work form that is closer to human-like task completion through rich multimodal capabilities (such as GUI Agent, Vision) and seamless integration with various real-world tools.
  • [2025-06-12] - 🎁 We are thrilled to announce the release of UI-TARS Desktop v0.2.0! This update introduces two powerful new features: Remote Computer Operator and Remote Browser Operator—both completely free. No configuration required: simply click to remotely control any computer or browser, and experience a new level of convenience and intelligence.
  • [2025-04-17] - 🎉 We're thrilled to announce the release of new UI-TARS Desktop application v0.1.0, featuring a redesigned Agent UI. The application enhances the computer using experience, introduces new browser operation features, and supports the advanced UI-TARS-1.5 model for improved performance and precise control.
  • [2025-02-20] - 📦 Introduced UI TARS SDK, is a powerful cross-platform toolkit for building GUI automation agents.
  • [2025-01-23] - 🚀 We updated the Cloud Deployment section in the 中文版: GUI模型部署教程 with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.

🤖 Agent TARS

npm version downloads node version Discord Community Official Twitter 飞书交流群 Ask DeepWiki

Agent TARS is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product.

It primarily ships with a CLI and Web UI for usage. It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world MCP tools.

Showcase

Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline
agent-tars-new-flight.mp4

Booking Hotel Generate Chart with extra MCP Servers
agent-tars-book-hotel.mp4
mcp-chart.mp4
Instruction: I am in Los Angeles from September 1st to September 6th, with a budget of $5,000. Please help me book a Ritz-Carlton hotel closest to the airport on booking.com and compile a transportation guide for me Instruction: Draw me a chart of Hangzhou's weather for one month

For more use cases, please check out #842.

Core Features

  • 🖱️ One-Click Out-of-the-box CLI - Supports both headful Web UI and headless server) execution.
  • 🌐 Hybrid Browser Agent - Control browsers using GUI Agent, DOM, or a hybrid strategy.
  • 🔄 Event Stream - Protocol-driven Event Stream drives Context Engineering and Agent UI.
  • 🧰 MCP Integration - The kernel is built on MCP and also supports mounting MCP Servers to connect to real-world tools.

Quick Start

Agent TARS CLI

# Launch with `npx`.
npx @agent-tars/cli@latest

# Install globally, required Node.js >= 22
npm install @agent-tars/cli@latest -g

# Run with your preferred model provider
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key

Visit the comprehensive Quick Start guide for detailed setup instructions.

Documentation

🌟 Explore Agent TARS Universe 🌟

Category Resource Link Description
🏠 Central Hub Website Your gateway to Agent TARS ecosystem
📚 Quick Start Quick Start Zero to hero in 5 minutes
🚀 What's New Blog Discover cutting-edge features & vision
🛠️ Developer Zone Docs Master every command & features
🎯 Showcase Examples View use cases built by the official and community
🔧 Reference API Complete technical reference




🖥️ UI-TARS Desktop

UI-TARS

UI-TARS Desktop is a native GUI agent for your local computer, driven by UI-TARS and Seed-1.5-VL/1.6 series models.

   📑 Paper    | 🤗 Hugging Face Models   |   🫨 Discord   |   🤖 ModelScope  
🖥️ Desktop Application    |    👓 Midscene (use in browser)   

Showcase

Instruction Local Operator Remote Operator
Please help me open the autosave feature of VS Code and delay AutoSave operations for 500 milliseconds in the VS Code setting.
computer-use-triple-speed.mp4
remote-computer-operators.mp4
Could you help me check the latest open issue of the UI-TARS-Desktop project on GitHub?
browser-use-triple-speed.mp4
remote-browser-operators.mp4

Features

  • 🤖 Natural language control powered by Vision-Language Model
  • 🖥️ Screenshot and visual recognition support
  • 🎯 Precise mouse and keyboard control
  • 💻 Cross-platform support (Windows/MacOS/Browser)
  • 🔄 Real-time feedback and status display
  • 🔐 Private and secure - fully local processing

Quick Start

See Quick Start


🔧 Our Modifications

This fork includes the following enhancements built on top of the original project. Every change is traceable to specific files.

💬 1. Chat System (New Feature)

A complete chat page with multi-turn conversation, streaming responses, multi-mode switching, and image/document upload.

New files created
File Description
apps/ui-tars/src/renderer/src/pages/chat/index.tsx Main chat page — message list, input box, mode selector, image & document upload, streaming display
apps/ui-tars/src/renderer/src/pages/chat/message-actions.tsx MessageActionBar component below each assistant message
apps/ui-tars/src/renderer/src/pages/chat/mode-config.ts 10+ chat mode definitions (Translation, Code, Creative Writing, etc.) with bilingual system prompts
apps/ui-tars/src/renderer/src/hooks/useGlobalShortcuts.ts IPC listener for global shortcuts dispatched from main process

Key implementation details:

  • Session persistence — Messages stored to localStorage (key: ui-tars-chat-<sessionId>) and auto-restored on navigation. Sessions created in IndexedDB via sessionManager on first user message so they appear in the sidebar history.
  • Streaming — Listens to system:chatCompletion:chunk IPC events for real-time token streaming.
  • Mode system — Each mode injects a system prompt into the API call. Modes include sub-categories with quick-use templates.
  • Image uploadFileReader → base64, attached to the API request; imageBase64 is stripped before localStorage save to prevent quota overflow (see Bug Fix #3).
  • Document analysis — Uses api['system:analyzeDocument'] with automatic VLM fallback if text extraction fails.

🎛️ 2. Message Action Bar (New Feature)

A toolbar rendered below every assistant message with 8 actions:

Action Detail
Copy Dropdown: "Copy as Markdown" (raw) or "Copy as Plain Text" (regex strips #, **, backticks, links)
Like / Dislike Toggle persistent reactions stored in localStorage (key: ui-tars-msg-reactions)
Regenerate Drops trailing assistant messages, re-sends from last user message (see Bug Fix #1)
Edit as Document Opens content in a new browser window with a minimal rich-text editor
Share Uses navigator.share API (falls back to clipboard copy)
Delete Removes a specific message from the conversation
Report Flags the message with a "Reported, thank you!" toast

⌨️ 3. Global Shortcuts (New Feature)

Registered in the main process (apps/ui-tars/src/main/tray.ts) via Electron globalShortcut:

Shortcut Action Implementation
Ctrl+Shift+T Show / Hide main window Toggle visibility in main process
Ctrl+Shift+N New Chat Sends shortcut:newChat IPC → renderer navigates to /chat with { newChat: true } state, preserving existing session
Ctrl+Shift+, Open Settings Sends shortcut:openSettings IPC → renderer calls openSettings() from Zustand store

⚙️ 4. Optimization Settings (New Feature)

A settings panel (apps/ui-tars/src/renderer/src/components/Settings/category/optimization.tsx) managing 6 feature flags:

Flag Risk Description
enableRetry 🟢 Low Smart retry with intelligent backoff for failed critical actions
enablePerformanceMonitor 🟢 Low Track operation timings, generate reports, detect slow operations
enableVisualization 🟢 Low Expose IPC endpoints for performance dashboards
enableOCR 🟡 Medium OCR text recognition via Tesseract.js
enableMultiModel � Medium Intelligent model selection, failover, performance-based routing
enableWorkflow 🟡 Medium Record, replay, and manage automated task workflows
  • Low-risk flags enabled by default; medium-risk flags require an AlertDialog confirmation before enabling.
  • Persisted to localStorage (key: ui-tars-optimization-flags).
  • Backend config class in apps/ui-tars/src/main/services/optimizationConfig.ts with enableAll(), disableAll(), enableSafe() helpers.

🐛 5. Bug Fixes

Bug #1 — handleRegenerate race condition & duplicate messages
  • File: apps/ui-tars/src/renderer/src/pages/chat/index.tsx (lines 384–441)
  • Problem: The original implementation called setMessages() (async) to remove the last assistant message, then immediately called doSend(). Because doSend captured messages from its closure (stale value), the API received the old conversation history including the deleted reply, and a duplicate user message was appended.
  • Fix: Rewrote handleRegenerate as a self-contained async function that: ① synchronously computes keptMessages = messages.slice(0, lastUserIdx + 1), ② sets state with the truncated array, ③ calls the API directly with keptMessages — no closure staleness.
Bug #2 — Orphaned localStorage data on session deletion
  • File: apps/ui-tars/src/renderer/src/components/SideBar/app-sidebar.tsx (lines 151–164)
  • Problem: When a session was deleted from the sidebar, sessionManager.deleteSession() cleaned IndexedDB, but the corresponding localStorage entry (ui-tars-chat-<sessionId>) and the ui-tars-chat-current-session marker were never removed, leading to unbounded data accumulation.
  • Fix: Added localStorage.removeItem() calls for both keys inside onSessionDelete.
Bug #3 — localStorage quota exceeded by imageBase64
  • File: apps/ui-tars/src/renderer/src/pages/chat/index.tsx (lines 66–72)
  • Problem: ChatMessage objects containing imageBase64 (hundreds of KB to several MB per image) were serialized directly into localStorage, easily exceeding the 5–10 MB browser quota and silently failing.
  • Fix: saveSessionMessages() now strips the imageBase64 field via destructuring before JSON.stringify: msgs.map(({ imageBase64, ...rest }) => rest).

🏗️ 6. Infrastructure & Documentation

Category Files Description
Build scripts scripts/build/build-quick.bat, scripts/build/build-windows.bat Quick build (skip type checks) and full Windows packaging with environment validation
Dev scripts scripts/dev/start-secure.bat Secure startup with security audit, Node.js/pnpm checks, and .env.local validation
Test scripts scripts/test/test-integration.bat Integration test runner with TypeScript compilation
Release scripts scripts/release/release-*.sh Beta and release package publishing (moved from root)
Security docs/build-and-security/SECURITY_SETUP.zh-CN.md, docs/build-and-security/README.SECURITY.md Security configuration guides
Documentation docs/README.md, docs/development/, docs/optimization/, docs/testing/, docs/build-and-security/, docs/community/ All .md files reorganized into categorized subfolders with an index

📂 For week-by-week implementation records, see the docs/development/ folder.


🙏 Acknowledgments

We would like to express our sincere gratitude to:

Credit
🏢 ByteDance & Seed Team Creating and open-sourcing UI-TARS-desktop under Apache 2.0
🧠 UI-TARS Model Team The groundbreaking vision-language model powering GUI automation
👥 Original Contributors Building the Agent TARS ecosystem — CLI, Web UI, SDK, desktop app
🌍 Open-Source Community Feedback, bug reports, and feature suggestions

This fork would not exist without the excellent foundation provided by ByteDance's engineering team. We are deeply grateful for their contributions to the AI agent community.


🤝 Contributing

See CONTRIBUTING.md.

📄 License

This project is licensed under the Apache License 2.0.

📝 Citation

If you find the original paper and code useful in your research, please give a ⭐ to the original project and cite the original authors' work:

BibTeX
@article{qin2025ui,
  title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
  author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
  journal={arXiv preprint arXiv:2501.12326},
  year={2025}
}

Built with ❤️ on top of ByteDance's UI-TARS-desktop

About

🤖 Community fork of ByteDance UI-TARS-desktop | Multimodal AI Agent with Chat System, Message Actions, Global Shortcuts & Optimization Settings | 基于字节跳动 UI-TARS-desktop 的社区增强版,新增聊天系统、消息操作栏、全局快捷键与优化设置

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors