An open-source automation framework for controlling Android devices through AI Agents and MCP (Model Context Protocol). Build intelligent mobile automation workflows with natural language commands.
- π€ AI Agent Compatible - Works with Cursor, Claude Code, Gemini CLI, Codex, Windsurf, Roo Code
- π MCP Integration - Supports mobile-mcp, filesystem, fetch, context7 MCP servers
- π± Multi-Device Support - Control multiple Android devices simultaneously
- π Web UI - Web-based control panel for device and task management (EN/zh-TW)
- π― Skills System - Unified skills source with auto-deployment to detected AI Agents
- π€ Unicode Input - Chinese, Japanese, emoji support via ADBKeyboard
- β‘ MCP Macro Server - High-level tools for faster, more reliable automation
- π― uiautomator2 Integration - Selector-based operations, no coordinate guessing
- π Platform Adapters - Unified interface for Threads, Instagram, X, TikTok, YouTube, Facebook
- π Element-First Strategy - Use accessibility tree before screenshots for speed & accuracy
- β Click-Verify Protocol - Every action is verified for reliability
- π Debug Artifacts - Auto-save screenshot + element dump on failure
- Python 3.8+
- Node.js 18+
- Android SDK Platform Tools (ADB)
- Android device (USB debugging enabled)
- uiautomator2 - For selector-based automation
chmod +x set.sh && ./set.shThis will automatically:
- Check dependencies (Python 3.8+, Node.js 18+, ADB)
- Create Python virtual environment and install dependencies
- Install uiautomator2 (if device connected, also initializes ATX agent)
- Configure MCP settings for AI CLI tools (Gemini, Claude, Codex)
- Validate and deploy skills to detected AI Agents
- Create required directories (
outputs/,temp/logs/)
adb devices # Verify device connection
source .venv/bin/activate # Activate virtual environmentThat's it! You're ready to use MobileAgent with your AI Agent.
MobileAgent/
βββ AGENTS.md # AI Agent behavioral guidelines (MUST READ)
βββ GEMINI.md # Gemini CLI quick reference
βββ CLAUDE.md # Claude Code quick reference
βββ set.sh # Setup script (includes skills deployment)
β
βββ src/ # Python modules
β βββ adb_helper.py # ADB command wrapper
β βββ executor.py # Deterministic executor (Element-First enforcement)
β βββ tool_router.py # Unified MCP/ADB/u2 interface
β βββ u2_driver.py # uiautomator2 selector-based operations
β βββ mcp_macro_server.py # High-level MCP macro tools
β βββ platform_adapter.py # Multi-platform unified interface
β βββ state_tracker.py # Navigation state machine
β βββ patrol.py # Social media patrol automation
β βββ logger.py # Logging module
β
βββ .skills/ # Skills source directory
β βββ app-explore/ # Main skill: app operations + research mindset
β βββ app-action/ # Quick single-step operations
β βββ patrol/ # Social media patrol (search & monitor keywords)
β βββ content-extract/ # Full content extraction + NLP analysis
β βββ device-check/ # Device connection verification
β βββ screen-analyze/ # Screen state analysis
β βββ troubleshoot/ # Diagnostics and fixes
β βββ unicode-setup/ # Unicode input configuration
β
βββ web/ # Web UI
β βββ app.py # Flask backend
β βββ static/ # CSS/JS
β βββ templates/ # HTML templates
β
βββ mcp/ # MCP configuration
βββ apk_tools/ # APK utilities (DeviceKit, ADBKeyboard)
βββ tests/ # Unit tests
βββ outputs/ # Screenshots, downloads, patrol reports
βββ temp/logs/ # Log files
The new mobile-macro MCP server provides high-level automation tools that combine multiple steps into single operations, reducing LLM round-trips and improving reliability.
| Tool | Description |
|---|---|
find_and_click |
Element search + click + verify in one call |
type_and_submit |
Focus + type + submit in one call |
smart_wait |
Wait for element with native u2 wait |
scroll_and_find |
Auto-scroll until element found |
navigate_back |
Back + verify navigation |
dismiss_popup |
Dismiss common dialogs (OK, Cancel, Close, etc.) |
launch_and_wait |
Launch app + wait for ready indicator |
get_screen_summary |
Screen state overview with visible texts |
run_patrol |
Complete social media browsing automation |
Add to your MCP settings:
{
"mcpServers": {
"mobile-macro": {
"command": "python",
"args": ["-m", "src.mcp_macro_server"],
"cwd": "<PROJECT_PATH>"
}
}
}For the most reliable automation, install uiautomator2:
pip install uiautomator2
python -m uiautomator2 init| Operation | Coordinate-Based | Selector-Based (u2) |
|---|---|---|
| Click button | router.click(x=540, y=1200) |
router.click(text="Search") |
| Find element | Screenshot + vision | Direct selector lookup |
| Wait for element | Polling with screenshots | Native wait support |
| Stability | Screen-size dependent | Works across devices |
from src.tool_router import ToolRouter
router = ToolRouter() # Auto-detects u2
# Selector-based click (most reliable)
router.click(text="Search")
router.click_by_selector(resourceId="com.app:id/btn", clickable=True)
# Smart waiting
router.wait_for_element_u2(text="Loading", gone=True, timeout=10)
# Scroll to find
found, el = router.scroll_to_element(text="Settings", max_scrolls=5)MobileAgent uses a unified skills source directory (.skills/). Running set.sh automatically detects installed AI Agents and deploys skills to their respective directories.
| AI Agent | Detection Method | Deploy Path |
|---|---|---|
| Cursor | ~/.cursor/ exists |
.cursor/skills/ |
| Claude Code | claude command or ~/.claude/ |
.claude/skills/ |
| Gemini CLI | gemini command or ~/.gemini/ |
.gemini/skills/ |
| Codex CLI | codex command or ~/.codex/ |
.codex/skills/ |
| Windsurf | ~/.codeium/ exists |
.windsurf/skills/ |
| Roo Code | ~/.roo/ exists |
.roo/skills/ |
- Create a new directory under
.skills/ - Create a
SKILL.mdfile (with frontmatter) - Run
./set.shto validate and deploy
See .skills/README.md for details.
Like a coast guard hunting for targets, the patrol skill enables AI Agents to:
- Search for a keyword on social media
- Monitor and browse related posts
- Collect opinions and sentiment about the topic
- Report findings back to user
Example:
User: "Search Threads for clawdbot and see what people think"
AI Agent will:
1. Launch Threads app
2. Search "clawdbot"
3. Browse 5+ posts mentioning it
4. Read comments and reactions
5. Report: "Here's what people are saying about clawdbot..."
Extract full content (not summaries) from articles and posts with structured NLP analysis:
- Full text extraction: Complete article content without truncation
- NLP Analysis: Who (people), What (events), When (time), Where (locations), Objects (things/products)
- Keywords: Key terms and topics with confidence scores
- JSON Output: Standardized schema for easy API integration
- Save to file: JSON (primary) and Markdown (secondary) in
outputs/directory
Example JSON output structure:
{
"extraction_meta": {
"version": "2.0",
"extracted_at": "2024-01-29T10:30:00+08:00",
"platform": "WeChat",
"extraction_status": "success"
},
"articles": [{
"title": "Article Title",
"content": { "full_text": "...", "word_count": 342 },
"nlp_analysis": {
"who": [{ "value": "Person Name", "confidence": 0.95 }],
"what": [{ "value": "Event description", "confidence": 0.90 }]
},
"keywords": ["AI", "technology"],
"sentiment": "positive"
}]
}Main skill for app operations with research mindset:
| Platform | Features |
|---|---|
| LINE, WeChat, Telegram, WhatsApp | Send messages, search contacts |
| Facebook, Instagram, Threads, X | Like, comment, share, follow |
| YouTube, TikTok | Like, comment, subscribe |
| Gmail, LinkedIn, Discord, Snapchat | Platform-specific operations |
Features:
- Element-First Strategy: Use accessibility tree before screenshots
- Click-Verify Protocol: Verify every action succeeded
- Separated UI reference files, load on-demand to save tokens
- Multi-language UI keywords (EN/zh/JP/KR)
Start the web control panel:
source .venv/bin/activate
pip install flask
python web/app.pyOpen http://localhost:6443 in your browser.
- View connected devices
- Select CLI tool (Gemini/Claude/Codex) and model
- Real-time task output streaming
- Task history
- English/Traditional Chinese interface
| Dashboard | New Task |
|---|---|
![]() |
![]() |
| View connected devices and task history | Select CLI tool, model, and describe your task |
| Task Running | Task Completed |
|---|---|
![]() |
![]() |
| Real-time output with device screen | View results and task summary |
from src.adb_helper import ADBHelper
adb = ADBHelper()
adb.screenshot(prefix="step1")
adb.tap(540, 1200)
adb.type_text("search query")
adb.press_enter()from src.executor import DeterministicExecutor
executor = DeterministicExecutor()
# Observe β Find β Click β Verify
state = executor.observe()
element = executor.find_element(text="Search")
if element:
result = executor.click_and_verify(element)
if result.result == ActionResult.SUCCESS:
print("Click verified!")from src.tool_router import ToolRouter
router = ToolRouter()
# Auto-selects best tool (u2 > MCP > ADB)
router.click(text="Search") # Find by text, then click
router.type_text("Hello δ½ ε₯½") # Unicode supported
router.swipe("up", verify=True) # Scroll with verification
router.wait_for_element(text="Results")from src.patrol import PatrolStateMachine, PatrolConfig
config = PatrolConfig(max_posts=10, max_scrolls=5)
patrol = PatrolStateMachine(platform="threads", config=config)
report = patrol.run(keyword="AI agents")
print(f"Visited {len(report.posts)} posts")
print(report.summary)adb kill-server && adb start-server
adb devicesfrom src.adb_helper import setup_adbkeyboard
setup_adbkeyboard()Or install DeviceKit APK for MCP:
adb install apk_tools/mobilenext-devicekit.apktemp/logs/mobile_agent_YYYYMMDD.log
pip install uiautomator2
python -m uiautomator2 initToolRouter will automatically detect and use it.
This project is licensed under the MIT License.
| Tool/Package | License | Description |
|---|---|---|
| MCP (Model Context Protocol) | Open Source (Linux Foundation) | Donated by Anthropic to Agentic AI Foundation |
| mobile-mcp | Apache-2.0 | MCP server for mobile automation |
| context7 | MIT | Documentation query MCP server |
| uiautomator2 | MIT | Android automation library |
| ADB (Android Debug Bridge) | Apache-2.0 | Android SDK Platform Tools |
| ADBKeyboard | GPL-2.0 | Unicode input support |
| Flask | BSD-3-Clause | Web UI framework |
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Built with β€οΈ for the AI Agent community



