Local CLI for AI agents to observe and control your computer via screen, mouse, and keyboard. Bring your own AI - any model, even without vision.
Runs fully local. No screenshots sent to the cloud.
Learn more at https://desktopctl.com
demo.mp4
- Local-first runtime. No cloud dependency
- Bring your own AI: works with any desktop AI agent
- GPU-accelerated text recognition and computer vision
- Selector-first automation (
--text,--token) with coordinate fallback - Agent-friendly explicit waits and post-action verification
- Stable JSON contracts for agent integrations
DesktopCtl is split into two binaries:
DesktopCtl.app(desktopctld): daemon that owns perception, state, execution, and verificationdesktopctl: stateless CLI surface for actions and queries over local IPC
Repository layout:
src/desktop/core- shared protocol and typessrc/desktop/daemon- daemon runtimesrc/desktop/cli- CLI client
- macOS-first
- OCR-first perception pipeline
- Tokenized screen output for agent grounding
- Deterministic CLI primitives for click/type/wait flows
- macOS (current support target)
- Rust toolchain (
cargo) justcommand runner- Accessibility permission for
DesktopCtl.app - Screen Recording permission for
DesktopCtl.app
make installraw="$(desktopctl app open Notes --json)"
win_id="$(printf '%s' "$raw" | jq -r '.result.window_id // empty')"
desktopctl keyboard press cmd+f --active-window "$win_id" --no-observe
desktopctl keyboard type "Shopping list" --active-window "$win_id" --no-observe
desktopctl screen tokenize --active-window "$win_id"- Status: active development, with macOS-first CLI and daemon workflows already usable.
- Reliability for text/token-driven actions and verification loops. Stable machine-readable error codes.
- Upcoming CLI:
doctor, richerwindow/appintrospection, and--explainfailure output. - Better local computer vision and semantic UI tokenization.
- Multi-platform support.