Skip to content

feat: AI Demo Director (MCP) — agent-driven cinematic demo generation#1

Open
Razee4315 wants to merge 13 commits into
mainfrom
feat/ai-demo-director-mcp
Open

feat: AI Demo Director (MCP) — agent-driven cinematic demo generation#1
Razee4315 wants to merge 13 commits into
mainfrom
feat/ai-demo-director-mcp

Conversation

@Razee4315

Copy link
Copy Markdown
Owner

What

Adds the AI Demo Director: an MCP server that lets an AI agent (Claude) drive Vuoom to
generate a demo GIF/MP4 — drive a target app, record with cinematic auto-zoom, then see the
output and re-record to improve it. Research/rationale in docs/13-AI-Demo-Director-Research.md;
setup in docs/AI_DEMO_DIRECTOR.md.

Pieces

  • vuoom-control — shared newline-delimited JSON protocol + blocking client + port discovery. Pure, fully unit-tested.
  • vuoom-inputSendInput injection (move/click/type/key/scroll); injected clicks flow through the existing hook so they drive auto-zoom for free. Pure coord/key math unit-tested.
  • src-tauri — opt-in localhost control server (VUOOM_ENABLE_CONTROL) mapping protocol → Session + injection; new Session::sample_frames (composite → base64 PNG) and clip_info.
  • vuoom-mcp — standalone MCP sidecar (rmcp/stdio) exposing 19 tools; the agent is the director + verify loop. Locally verified: initialize + tools/list return all tools.

Verification

  • CI: fmt + clippy(-D warnings) + tests + check across the workspace.
  • Locally green: vuoom-control, vuoom-input, vuoom-mcp (incl. stdio smoke test).
  • Needs a real Windows machine (runtime, as with all of Vuoom''s capture/GPU paths): actual SendInput driving an app, capture + auto-zoom from injected clicks, the full record→get_frames→export loop.

Razee4315 added 13 commits June 14, 2026 01:25
Decision-gate research doc on the branch: novelty verdict (the
self-critique loop + native-Windows-free is the moat, auto-zoom is
commodity), feasibility (~85% of programmatic drive already exists;
injected SendInput clicks flow through the existing input hook and
trigger auto-zoom for free), browser-scoped MVP architecture + phasing,
and an honest time/confidence verdict.
Newline-delimited JSON request/response contract + blocking TCP client
shared by Vuoom's in-app control server and the vuoom-mcp sidecar.
Pure (serde only) so it compiles fast and is fully unit-tested off-CI:
all request/response variants round-trip; framing emits one line.
Adds move/click/type/key-chord/scroll wrappers plus pure, unit-tested
coord-normalization (physical px -> SendInput absolute) and key-name ->
VK mapping. Injected events flow through the existing low-level hook, so
a synthetic click both drives the target app and triggers auto-zoom.
…pling

src-tauri now runs an opt-in localhost control server (VUOOM_ENABLE_CONTROL)
that maps vuoom-control requests to Session calls + input injection. Adds
Session::sample_frames (composite -> base64 PNG so the agent can see output)
and Session::clip_info. Writes its port to a discovery file for the sidecar.
Standalone MCP server exposing 19 tools (screen_geometry, set_region,
start/stop_recording, click/type/key_chord/scroll/move/wait, seek,
clip_state, get_frames [returns PNG images], estimate/export gif+mp4)
that bridge to Vuoom's control server via vuoom-control. The agent is
the director + verify loop. Verified locally: initialize + tools/list
return all tools over stdio.
Architecture, sidecar build + MCP client config, the VUOOM_ENABLE_CONTROL
opt-in, the tool list, an example agent workflow, safety notes, and the
CI-verified vs needs-a-real-machine verification status.
…ting

Speaks the real protocol with canned responses + writes the discovery
file, so the vuoom-mcp sidecar can be driven through a full agent demo
sequence (record/click/type/stop/get_frames/export) without the GPU app.
Verified: a scripted MCP session round-trips all tools incl. image frames.
Auto-zoom defaults OFF for the interactive UI (manual Ctrl+Shift+Z), so an
agent driving via injected clicks recorded zero zooms. Thread a
SetAutoZoomOnClick flag through control protocol -> Session (pending_auto_click,
applied to the zoom plan) -> control server; the MCP start_recording tool
enables it by default. Runtime-verified on a real machine: an agent-driven
recording now plans cinematic zooms from injected clicks and exports a GIF.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant