Skip to content

jeandelest/screen-mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

screen-mcp

A FastMCP server that runs on the client machine and exposes screenshot tools to a host MCP. It supports both direct screenshot capture and session-based chunked transfers so an LLM can consume images reliably.

Official FastMCP documentation: gofastmcp.com/getting-started/welcome

Exposed tools

  • list_monitors: returns detected monitors (index and dimensions)
  • capture_screenshot: captures a screen image with hybrid mode (base64 for non-vision, native MCP image for vision)
  • capture_timeline: captures a timed screen sequence (ordered frames with timestamps)
  • start_timeline_capture: starts a timeline session and returns a timeline_id
  • get_timeline_manifest: returns chunked timeline metadata
  • get_timeline_chunk: retrieves a timeline JSON chunk
  • release_timeline_capture: explicitly releases a timeline session
  • start_screenshot_capture: starts a screenshot session and returns a capture_id
  • get_screenshot_manifest: returns metadata plus ASCII preview for non-vision LLMs
  • get_screenshot_chunk: returns a chunk of base64 image data
  • release_screenshot_capture: releases the screenshot session and frees memory

Quick tool guidance

  • Need available monitor info: list_monitors
  • Need a fast single screenshot with moderate payload: capture_screenshot
  • Need a more robust single screenshot with chunking: start_screenshot_capture -> get_screenshot_manifest -> get_screenshot_chunk (0..N-1) -> release_screenshot_capture
  • Need a short timeline in one call: capture_timeline
  • Need a robust timeline for large payloads: start_timeline_capture -> get_timeline_manifest -> get_timeline_chunk (0..N-1) -> release_timeline_capture

Best practices:

  • Always concatenate chunks in ascending chunk_index order.
  • Always call release_* after reading session data to free memory.
  • For non-vision models, consume preview_text from the manifest before loading full payload.

Prerequisites

  • Linux with an active graphical session (X11/Wayland capture support)
  • DISPLAY environment variable available to the server process (mss requires it on Linux)
  • Python 3.10+

Local installation

uv sync

Or via Taskfile:

task setup

Run the MCP server (stdio)

task server

This task starts the server using mcpm run screen-mcp through uvx. It also registers or updates the local MCP server automatically when needed. Display-related environment variables are propagated during registration: DISPLAY, WAYLAND_DISPLAY, XAUTHORITY, XDG_RUNTIME_DIR.

MCP-compatible smoke-test client

task client

The smoke-test script is located in scripts/smoke_client.py and exercises:

  • list_monitors
  • start_screenshot_capture
  • get_screenshot_manifest
  • get_screenshot_chunk
  • release_screenshot_capture

It writes a verification image to artifacts/smoke_capture.jpg.

You can also run a specific action via --action:

uv run python scripts/smoke_client.py --action list-monitors
uv run python scripts/smoke_client.py --action capture-screenshot --monitor-index 0 --output artifacts/capture.jpg
uv run python scripts/smoke_client.py --action capture-timeline --duration-seconds 6 --output artifacts/timeline.json
uv run python scripts/smoke_client.py --action capture-timeline-session --duration-seconds 6 --chunk-size 120000 --output artifacts/timeline_session.json

Debugging and real-time inspection

task inspector

This launches the MCP Inspector against the mcpm run screen-mcp server.

Using the server in VS Code

  1. Open this project folder in VS Code.
  2. Add a servers configuration.
  3. Create a .vscode/mcp.json file and add one of the examples below.

Recommended local example for a cloned repo (unpublished package):

{
  "servers": {
    "screen-mcp": {
      "type": "stdio",
      "command": "uv",
      "args": ["run", "--project", "/absolute/path/to/screen-mcp", "screen-mcp"]
    }
  }
}

Example for running directly from a Git repo without global installation:

{
  "servers": {
    "screen-mcp": {
      "type": "stdio",
      "command": "uvx",
      "args": ["--from", "git+https://github.com/<owner>/screen-mcp.git", "screen-mcp"]
    }
  }
}

Alternative via MCPM:

{
  "servers": {
    "screen-mcp": {
      "type": "stdio",
      "command": "uvx",
      "args": ["mcpm", "run", "screen-mcp"]
    }
  }
}

Example tool calls

  • list_monitors()
  • capture_screenshot(monitor_index=0, image_format="jpeg", max_width=1600, quality=80)
  • capture_screenshot(monitor_index=0, image_format="jpeg", max_width=1600, quality=80, response_mode="image")
  • capture_timeline(duration_seconds=10, monitor_index=0, image_format="jpeg", max_width=900, quality=70)
  • start_timeline_capture(duration_seconds=10, monitor_index=0, image_format="jpeg", max_width=900, quality=70, chunk_size=120000)
  • get_timeline_manifest(timeline_id)
  • get_timeline_chunk(timeline_id, chunk_index)
  • release_timeline_capture(timeline_id)

Timeline behavior in capture_timeline:

  • fixed cadence: TIMELINE_FPS (default 2 images/s, configurable in source)
  • maximum duration: TIMELINE_MAX_DURATION_SECONDS (default 30s, configurable in source)
  • each frame includes: frame_index, t_offset_ms, captured_at, preview_text, image_sha256, image_size_bytes
  • temporal_hint makes chronological order explicit for an LLM

Robust flow recommendation:

  1. start_screenshot_capture(...) -> obtain capture_id
  2. get_screenshot_manifest(capture_id) -> metadata + preview_text
  3. get_screenshot_chunk(capture_id, chunk_index) -> reassemble chunks
  4. release_screenshot_capture(capture_id)

Base64 notes

  • For multi-client MCP, base64 is the most interoperable format: simple, JSON-friendly, compatible with vision and non-vision clients.
  • Tradeoff: larger payload (~33%) and risk of single-block truncation.
  • This project uses session-based chunked base64 transfer (capture_id) to make large exchanges reliable.
  • For non-vision LLMs, prefer get_screenshot_manifest (metadata + ASCII preview) before downloading the full image.

Hybrid mode in capture_screenshot:

  • response_mode="base64" (default): legacy behavior, JSON output with image_base64.
  • response_mode="image": native MCP image output for vision models, with metadata in structured_content.
  • response_mode="auto": reads SCREEN_MCP_CAPTURE_RESPONSE_MODE (base64 or image) and chooses automatically based on the client/host.

Security and privacy

Screen captures may contain sensitive data. Add an explicit client-side policy for production use (consent, masking, window whitelisting, etc.).

About

A FastMCP server that runs on the client machine and exposes screenshot tools to a host MCP. It supports both direct screenshot capture and session-based chunked transfers so an LLM can consume images reliably.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages