Desktop Agent - Task-Caching Desktop Automation

Linux desktop automation CLI with AI-curated task recording and semantic search

Modular architecture — Works natively with QuetzaCodetl, OpenCode, and Claude Code as a built-in Bash-invoked tool.

QuetzaCodetl Integration: desktop-agent is available as a native skill in QuetzaCodetl. The model can invoke it directly via Bash commands for Linux desktop automation without any MCP connection.

🚀 Quick Start

Installation

cd /home/mal/AI/desktop-agent
./install.sh

Usage

# View all tasks
desktop-agent tasks

# Search for a task
desktop-agent tasks search "check disk space"

# Run a task
desktop-agent replay --run "check-disk-space"

# Record new task
desktop-agent record
# ... do your steps ...
desktop-agent save-task my-task --description "What it does" --purpose "Why useful"

📚 Documentation

START HERE: Read INSTALLATION.md for installation details

Full Documentation:

COMPLETE_HANDOFF.md - Complete technical handoff
SESSION_SUMMARY.md - What we built today
IMPLEMENTATION_STATUS.md - Current status
TASK_REPOSITORY_ROADMAP.md - 38 tasks to add
IMPLEMENTATION_GUIDE.md - Code examples
ANALYSIS.md - System architecture

✨ Features

✅ Task Recording - Record GUI actions as reusable tasks
✅ Semantic Search - Find tasks by description using embeddings
✅ Parameters - Reusable tasks with ${variable} placeholders
✅ Success Tracking - Auto-track what works (✓ ? ✗ indicators)
✅ Micro-Tasks - Common patterns extracted as building blocks
✅ OCR Support - Extract text from screenshots
✅ Multi-Drive - Navigate across multiple hard drives

📊 Current Status

35+ tasks in repository
11 micro-tasks extracted
100% success rate on executed tasks
Parameters working (3+ parameterized tasks)
Success tracking active
QuetzaCodetl native skill — invoked via Bash, no MCP required

🔧 Technical Details

Architecture: Modular Python package
Entry Point: ~/.local/bin/desktop-agent (wrapper script)
Source: /home/mal/AI/desktop-agent/modular/
Database: ~/.cache/desktop-agent/tasks.db (SQLite)
Embeddings: nomic-embed-text (768-dim vectors)
Dependencies: pyatspi, pytesseract, pillow, requests, xdotool, scrot

💡 Examples

Basic Usage

# Check disk space across all drives
desktop-agent replay --run check-disk-space

# Navigate to mounted drives
desktop-agent replay --run view-mounted-drives

# Find large files
desktop-agent replay --run find-large-files

Parameterized Tasks

# Open different apps with same task
desktop-agent replay --run --param app_name="firefox" open-app
desktop-agent replay --run --param app_name="terminal" open-app

# Run different terminal commands
desktop-agent replay --run --param command="ls -la" run-command
desktop-agent replay --run --param command="df -h" run-command

Recording New Tasks

desktop-agent record
# Do your steps in the GUI...
desktop-agent save-task my-workflow \
  --description "Opens Firefox and checks email" \
  --purpose "Morning routine" \
  --context "Start of day"

🎯 Next Steps

Task Composition - Chain tasks together
Conditional Logic - If/else execution
Auto-Pattern Extraction - Suggest micro-tasks automatically
More Foundation Tasks - Git, networking, process management

📖 Project Structure

/home/mal/AI/desktop-agent/
├── README.md                       ← You are here
├── COMPLETE_HANDOFF.md             ← START HERE (full details)
├── SESSION_SUMMARY.md              ← Today's work
├── IMPLEMENTATION_STATUS.md        ← Current status
├── TASK_REPOSITORY_ROADMAP.md     ← 38 tasks to add
├── IMPLEMENTATION_GUIDE.md         ← Code examples
├── ANALYSIS.md                     ← Architecture
├── extract-micro-tasks.py          ← Pattern analysis
├── analyze-reddit-feed.py          ← OCR for Reddit
└── browse-and-analyze-reddit.sh    ← Reddit workflow

~/.cache/desktop-agent/
├── tasks.db                        ← All tasks + embeddings
└── recording.json                  ← Current recording

~/.local/bin/
└── desktop-agent.py                ← Main implementation

🏆 Achievements

Phase 1 Complete:

✅ Parameter support
✅ Success tracking
✅ Enhanced search
✅ Test case (Reddit)

Phase 2 Complete:

✅ 9 foundation tasks
✅ 11 micro-tasks extracted
✅ Multi-drive navigation

Phase 3 (Next):

⏳ Task composition
⏳ Conditional logic
⏳ Auto-pattern extraction

Version: 2.1 - Parameters + Success Tracking + Micro-Tasks
Last Updated: 2026-04-22
Status: Production Ready — Integrated with QuetzaCodetl native tool suite

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
modular		modular
.git-readme.md		.git-readme.md
.gitignore		.gitignore
ANALYSIS.md		ANALYSIS.md
CHANGELOG_SMART_PRIMITIVES.md		CHANGELOG_SMART_PRIMITIVES.md
COMPLETE_HANDOFF.md		COMPLETE_HANDOFF.md
HANDOFF.md		HANDOFF.md
IMPLEMENTATION_GUIDE.md		IMPLEMENTATION_GUIDE.md
IMPLEMENTATION_STATUS.md		IMPLEMENTATION_STATUS.md
INSTALLATION.md		INSTALLATION.md
MODULAR_MIGRATION.md		MODULAR_MIGRATION.md
PHASE1_COMPLETE_TEST.md		PHASE1_COMPLETE_TEST.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
RESUME_HERE.md		RESUME_HERE.md
SESSION_SUMMARY.md		SESSION_SUMMARY.md
SMALL_MODEL_ANALYSIS.md		SMALL_MODEL_ANALYSIS.md
SMALL_MODEL_IMPROVEMENTS.md		SMALL_MODEL_IMPROVEMENTS.md
SMART_PRIMITIVES.md		SMART_PRIMITIVES.md
SMART_PRIMITIVES_SUMMARY.md		SMART_PRIMITIVES_SUMMARY.md
TASK_REPOSITORY_ROADMAP.md		TASK_REPOSITORY_ROADMAP.md
TEST_SMART_PRIMITIVES.sh		TEST_SMART_PRIMITIVES.sh
add-tool-references.sh		add-tool-references.sh
analyze-reddit-feed.py		analyze-reddit-feed.py
browse-and-analyze-reddit.sh		browse-and-analyze-reddit.sh
desktop-agent-original.py		desktop-agent-original.py
desktop-agent.py		desktop-agent.py
extract-micro-tasks.py		extract-micro-tasks.py
install.sh		install.sh
pyproject.toml		pyproject.toml
record-foundational-tasks.sh		record-foundational-tasks.sh
tasks.db		tasks.db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Desktop Agent - Task-Caching Desktop Automation

🚀 Quick Start

Installation

Usage

📚 Documentation

✨ Features

📊 Current Status

🔧 Technical Details

💡 Examples

Basic Usage

Parameterized Tasks

Recording New Tasks

🎯 Next Steps

📖 Project Structure

🏆 Achievements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Desktop Agent - Task-Caching Desktop Automation

🚀 Quick Start

Installation

Usage

📚 Documentation

✨ Features

📊 Current Status

🔧 Technical Details

💡 Examples

Basic Usage

Parameterized Tasks

Recording New Tasks

🎯 Next Steps

📖 Project Structure

🏆 Achievements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages