Skip to content

An automation framework for controlling Android devices through AI Agents and MCP.

License

Notifications You must be signed in to change notification settings

sheng1111/MobileAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MobileAgent - AI-Powered Mobile Automation Framework

Ask DeepWiki Python 3.8+ Node.js 18+ License: MIT MCP Compatible

δΈ­ζ–‡η‰ˆ README

An open-source automation framework for controlling Android devices through AI Agents and MCP (Model Context Protocol). Build intelligent mobile automation workflows with natural language commands.

🌟 Key Features

Core Capabilities

  • πŸ€– AI Agent Compatible - Works with Cursor, Claude Code, Gemini CLI, Codex, Windsurf, Roo Code
  • πŸ”Œ MCP Integration - Supports mobile-mcp, filesystem, fetch, context7 MCP servers
  • πŸ“± Multi-Device Support - Control multiple Android devices simultaneously
  • 🌐 Web UI - Web-based control panel for device and task management (EN/zh-TW)
  • 🎯 Skills System - Unified skills source with auto-deployment to detected AI Agents
  • πŸ”€ Unicode Input - Chinese, Japanese, emoji support via ADBKeyboard

Advanced Automation (New in v2.0)

  • ⚑ MCP Macro Server - High-level tools for faster, more reliable automation
  • 🎯 uiautomator2 Integration - Selector-based operations, no coordinate guessing
  • πŸ”„ Platform Adapters - Unified interface for Threads, Instagram, X, TikTok, YouTube, Facebook
  • πŸ” Element-First Strategy - Use accessibility tree before screenshots for speed & accuracy
  • βœ… Click-Verify Protocol - Every action is verified for reliability
  • πŸ› Debug Artifacts - Auto-save screenshot + element dump on failure

πŸ“‹ Requirements

  • Python 3.8+
  • Node.js 18+
  • Android SDK Platform Tools (ADB)
  • Android device (USB debugging enabled)

Optional (Recommended)

πŸš€ Quick Start

1. Run Setup Script

chmod +x set.sh && ./set.sh

This will automatically:

  • Check dependencies (Python 3.8+, Node.js 18+, ADB)
  • Create Python virtual environment and install dependencies
  • Install uiautomator2 (if device connected, also initializes ATX agent)
  • Configure MCP settings for AI CLI tools (Gemini, Claude, Codex)
  • Validate and deploy skills to detected AI Agents
  • Create required directories (outputs/, temp/logs/)

2. Connect Device & Start Using

adb devices                    # Verify device connection
source .venv/bin/activate      # Activate virtual environment

That's it! You're ready to use MobileAgent with your AI Agent.

πŸ“ Project Structure

MobileAgent/
β”œβ”€β”€ AGENTS.md              # AI Agent behavioral guidelines (MUST READ)
β”œβ”€β”€ GEMINI.md              # Gemini CLI quick reference
β”œβ”€β”€ CLAUDE.md              # Claude Code quick reference
β”œβ”€β”€ set.sh                 # Setup script (includes skills deployment)
β”‚
β”œβ”€β”€ src/                   # Python modules
β”‚   β”œβ”€β”€ adb_helper.py      # ADB command wrapper
β”‚   β”œβ”€β”€ executor.py        # Deterministic executor (Element-First enforcement)
β”‚   β”œβ”€β”€ tool_router.py     # Unified MCP/ADB/u2 interface
β”‚   β”œβ”€β”€ u2_driver.py       # uiautomator2 selector-based operations
β”‚   β”œβ”€β”€ mcp_macro_server.py # High-level MCP macro tools
β”‚   β”œβ”€β”€ platform_adapter.py # Multi-platform unified interface
β”‚   β”œβ”€β”€ state_tracker.py   # Navigation state machine
β”‚   β”œβ”€β”€ patrol.py          # Social media patrol automation
β”‚   └── logger.py          # Logging module
β”‚
β”œβ”€β”€ .skills/               # Skills source directory
β”‚   β”œβ”€β”€ app-explore/       # Main skill: app operations + research mindset
β”‚   β”œβ”€β”€ app-action/        # Quick single-step operations
β”‚   β”œβ”€β”€ patrol/            # Social media patrol (search & monitor keywords)
β”‚   β”œβ”€β”€ content-extract/   # Full content extraction + NLP analysis
β”‚   β”œβ”€β”€ device-check/      # Device connection verification
β”‚   β”œβ”€β”€ screen-analyze/    # Screen state analysis
β”‚   β”œβ”€β”€ troubleshoot/      # Diagnostics and fixes
β”‚   └── unicode-setup/     # Unicode input configuration
β”‚
β”œβ”€β”€ web/                   # Web UI
β”‚   β”œβ”€β”€ app.py             # Flask backend
β”‚   β”œβ”€β”€ static/            # CSS/JS
β”‚   └── templates/         # HTML templates
β”‚
β”œβ”€β”€ mcp/                   # MCP configuration
β”œβ”€β”€ apk_tools/             # APK utilities (DeviceKit, ADBKeyboard)
β”œβ”€β”€ tests/                 # Unit tests
β”œβ”€β”€ outputs/               # Screenshots, downloads, patrol reports
└── temp/logs/             # Log files

πŸ› οΈ MCP Macro Server

The new mobile-macro MCP server provides high-level automation tools that combine multiple steps into single operations, reducing LLM round-trips and improving reliability.

Available Tools

Tool Description
find_and_click Element search + click + verify in one call
type_and_submit Focus + type + submit in one call
smart_wait Wait for element with native u2 wait
scroll_and_find Auto-scroll until element found
navigate_back Back + verify navigation
dismiss_popup Dismiss common dialogs (OK, Cancel, Close, etc.)
launch_and_wait Launch app + wait for ready indicator
get_screen_summary Screen state overview with visible texts
run_patrol Complete social media browsing automation

Configuration

Add to your MCP settings:

{
  "mcpServers": {
    "mobile-macro": {
      "command": "python",
      "args": ["-m", "src.mcp_macro_server"],
      "cwd": "<PROJECT_PATH>"
    }
  }
}

🎯 uiautomator2 Integration

For the most reliable automation, install uiautomator2:

pip install uiautomator2
python -m uiautomator2 init

Benefits

Operation Coordinate-Based Selector-Based (u2)
Click button router.click(x=540, y=1200) router.click(text="Search")
Find element Screenshot + vision Direct selector lookup
Wait for element Polling with screenshots Native wait support
Stability Screen-size dependent Works across devices

Usage in Code

from src.tool_router import ToolRouter

router = ToolRouter()  # Auto-detects u2

# Selector-based click (most reliable)
router.click(text="Search")
router.click_by_selector(resourceId="com.app:id/btn", clickable=True)

# Smart waiting
router.wait_for_element_u2(text="Loading", gone=True, timeout=10)

# Scroll to find
found, el = router.scroll_to_element(text="Settings", max_scrolls=5)

πŸŽ“ Skills System

MobileAgent uses a unified skills source directory (.skills/). Running set.sh automatically detects installed AI Agents and deploys skills to their respective directories.

Supported AI Agents

AI Agent Detection Method Deploy Path
Cursor ~/.cursor/ exists .cursor/skills/
Claude Code claude command or ~/.claude/ .claude/skills/
Gemini CLI gemini command or ~/.gemini/ .gemini/skills/
Codex CLI codex command or ~/.codex/ .codex/skills/
Windsurf ~/.codeium/ exists .windsurf/skills/
Roo Code ~/.roo/ exists .roo/skills/

Adding a Skill

  1. Create a new directory under .skills/
  2. Create a SKILL.md file (with frontmatter)
  3. Run ./set.sh to validate and deploy

See .skills/README.md for details.

πŸ„ Patrol Skill (ζ΅·ε·‘)

Like a coast guard hunting for targets, the patrol skill enables AI Agents to:

  • Search for a keyword on social media
  • Monitor and browse related posts
  • Collect opinions and sentiment about the topic
  • Report findings back to user

Example:

User: "Search Threads for clawdbot and see what people think"

AI Agent will:
1. Launch Threads app
2. Search "clawdbot"
3. Browse 5+ posts mentioning it
4. Read comments and reactions
5. Report: "Here's what people are saying about clawdbot..."

πŸ“„ Content Extract Skill

Extract full content (not summaries) from articles and posts with structured NLP analysis:

  • Full text extraction: Complete article content without truncation
  • NLP Analysis: Who (people), What (events), When (time), Where (locations), Objects (things/products)
  • Keywords: Key terms and topics with confidence scores
  • JSON Output: Standardized schema for easy API integration
  • Save to file: JSON (primary) and Markdown (secondary) in outputs/ directory

Example JSON output structure:

{
  "extraction_meta": {
    "version": "2.0",
    "extracted_at": "2024-01-29T10:30:00+08:00",
    "platform": "WeChat",
    "extraction_status": "success"
  },
  "articles": [{
    "title": "Article Title",
    "content": { "full_text": "...", "word_count": 342 },
    "nlp_analysis": {
      "who": [{ "value": "Person Name", "confidence": 0.95 }],
      "what": [{ "value": "Event description", "confidence": 0.90 }]
    },
    "keywords": ["AI", "technology"],
    "sentiment": "positive"
  }]
}

πŸ“± App Explore Skill

Main skill for app operations with research mindset:

Platform Features
LINE, WeChat, Telegram, WhatsApp Send messages, search contacts
Facebook, Instagram, Threads, X Like, comment, share, follow
YouTube, TikTok Like, comment, subscribe
Gmail, LinkedIn, Discord, Snapchat Platform-specific operations

Features:

  • Element-First Strategy: Use accessibility tree before screenshots
  • Click-Verify Protocol: Verify every action succeeded
  • Separated UI reference files, load on-demand to save tokens
  • Multi-language UI keywords (EN/zh/JP/KR)

πŸ–₯️ Web UI

Start the web control panel:

source .venv/bin/activate
pip install flask
python web/app.py

Open http://localhost:6443 in your browser.

Features

  • View connected devices
  • Select CLI tool (Gemini/Claude/Codex) and model
  • Real-time task output streaming
  • Task history
  • English/Traditional Chinese interface

Screenshots

Dashboard New Task
Dashboard New Task
View connected devices and task history Select CLI tool, model, and describe your task
Task Running Task Completed
Running Completed
Real-time output with device screen View results and task summary

πŸ’» Usage Example

Python API

from src.adb_helper import ADBHelper

adb = ADBHelper()
adb.screenshot(prefix="step1")
adb.tap(540, 1200)
adb.type_text("search query")
adb.press_enter()

Deterministic Executor

from src.executor import DeterministicExecutor

executor = DeterministicExecutor()

# Observe β†’ Find β†’ Click β†’ Verify
state = executor.observe()
element = executor.find_element(text="Search")
if element:
    result = executor.click_and_verify(element)
    if result.result == ActionResult.SUCCESS:
        print("Click verified!")

Tool Router (Unified Interface)

from src.tool_router import ToolRouter

router = ToolRouter()

# Auto-selects best tool (u2 > MCP > ADB)
router.click(text="Search")           # Find by text, then click
router.type_text("Hello δ½ ε₯½")         # Unicode supported
router.swipe("up", verify=True)       # Scroll with verification
router.wait_for_element(text="Results")

Patrol Automation

from src.patrol import PatrolStateMachine, PatrolConfig

config = PatrolConfig(max_posts=10, max_scrolls=5)
patrol = PatrolStateMachine(platform="threads", config=config)
report = patrol.run(keyword="AI agents")

print(f"Visited {len(report.posts)} posts")
print(report.summary)

❓ FAQ

Q: Cannot connect to device?

adb kill-server && adb start-server
adb devices

Q: Text input fails?

from src.adb_helper import setup_adbkeyboard
setup_adbkeyboard()

Or install DeviceKit APK for MCP:

adb install apk_tools/mobilenext-devicekit.apk

Q: Where are the logs?

temp/logs/mobile_agent_YYYYMMDD.log

Q: How to enable uiautomator2?

pip install uiautomator2
python -m uiautomator2 init

ToolRouter will automatically detect and use it.

πŸ“œ License

This project is licensed under the MIT License.

Dependency Licenses

Tool/Package License Description
MCP (Model Context Protocol) Open Source (Linux Foundation) Donated by Anthropic to Agentic AI Foundation
mobile-mcp Apache-2.0 MCP server for mobile automation
context7 MIT Documentation query MCP server
uiautomator2 MIT Android automation library
ADB (Android Debug Bridge) Apache-2.0 Android SDK Platform Tools
ADBKeyboard GPL-2.0 Unicode input support
Flask BSD-3-Clause Web UI framework

πŸ“§ Contact


Built with ❀️ for the AI Agent community

About

An automation framework for controlling Android devices through AI Agents and MCP.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published