Linux desktop automation CLI with AI-curated task recording and semantic search
Modular architecture — Works natively with QuetzaCodetl, OpenCode, and Claude Code as a built-in Bash-invoked tool.
QuetzaCodetl Integration:
desktop-agentis available as a native skill in QuetzaCodetl. The model can invoke it directly via Bash commands for Linux desktop automation without any MCP connection.
cd /home/mal/AI/desktop-agent
./install.sh# View all tasks
desktop-agent tasks
# Search for a task
desktop-agent tasks search "check disk space"
# Run a task
desktop-agent replay --run "check-disk-space"
# Record new task
desktop-agent record
# ... do your steps ...
desktop-agent save-task my-task --description "What it does" --purpose "Why useful"START HERE: Read INSTALLATION.md for installation details
Full Documentation:
COMPLETE_HANDOFF.md- Complete technical handoffSESSION_SUMMARY.md- What we built todayIMPLEMENTATION_STATUS.md- Current statusTASK_REPOSITORY_ROADMAP.md- 38 tasks to addIMPLEMENTATION_GUIDE.md- Code examplesANALYSIS.md- System architecture
- ✅ Task Recording - Record GUI actions as reusable tasks
- ✅ Semantic Search - Find tasks by description using embeddings
- ✅ Parameters - Reusable tasks with
${variable}placeholders - ✅ Success Tracking - Auto-track what works (✓ ? ✗ indicators)
- ✅ Micro-Tasks - Common patterns extracted as building blocks
- ✅ OCR Support - Extract text from screenshots
- ✅ Multi-Drive - Navigate across multiple hard drives
- 35+ tasks in repository
- 11 micro-tasks extracted
- 100% success rate on executed tasks
- Parameters working (3+ parameterized tasks)
- Success tracking active
- QuetzaCodetl native skill — invoked via Bash, no MCP required
Architecture: Modular Python package
Entry Point: ~/.local/bin/desktop-agent (wrapper script)
Source: /home/mal/AI/desktop-agent/modular/
Database: ~/.cache/desktop-agent/tasks.db (SQLite)
Embeddings: nomic-embed-text (768-dim vectors)
Dependencies: pyatspi, pytesseract, pillow, requests, xdotool, scrot
# Check disk space across all drives
desktop-agent replay --run check-disk-space
# Navigate to mounted drives
desktop-agent replay --run view-mounted-drives
# Find large files
desktop-agent replay --run find-large-files# Open different apps with same task
desktop-agent replay --run --param app_name="firefox" open-app
desktop-agent replay --run --param app_name="terminal" open-app
# Run different terminal commands
desktop-agent replay --run --param command="ls -la" run-command
desktop-agent replay --run --param command="df -h" run-commanddesktop-agent record
# Do your steps in the GUI...
desktop-agent save-task my-workflow \
--description "Opens Firefox and checks email" \
--purpose "Morning routine" \
--context "Start of day"- Task Composition - Chain tasks together
- Conditional Logic - If/else execution
- Auto-Pattern Extraction - Suggest micro-tasks automatically
- More Foundation Tasks - Git, networking, process management
/home/mal/AI/desktop-agent/
├── README.md ← You are here
├── COMPLETE_HANDOFF.md ← START HERE (full details)
├── SESSION_SUMMARY.md ← Today's work
├── IMPLEMENTATION_STATUS.md ← Current status
├── TASK_REPOSITORY_ROADMAP.md ← 38 tasks to add
├── IMPLEMENTATION_GUIDE.md ← Code examples
├── ANALYSIS.md ← Architecture
├── extract-micro-tasks.py ← Pattern analysis
├── analyze-reddit-feed.py ← OCR for Reddit
└── browse-and-analyze-reddit.sh ← Reddit workflow
~/.cache/desktop-agent/
├── tasks.db ← All tasks + embeddings
└── recording.json ← Current recording
~/.local/bin/
└── desktop-agent.py ← Main implementation
Phase 1 Complete:
- ✅ Parameter support
- ✅ Success tracking
- ✅ Enhanced search
- ✅ Test case (Reddit)
Phase 2 Complete:
- ✅ 9 foundation tasks
- ✅ 11 micro-tasks extracted
- ✅ Multi-drive navigation
Phase 3 (Next):
- ⏳ Task composition
- ⏳ Conditional logic
- ⏳ Auto-pattern extraction
Version: 2.1 - Parameters + Success Tracking + Micro-Tasks
Last Updated: 2026-04-22
Status: Production Ready — Integrated with QuetzaCodetl native tool suite