feat: AI Demo Director (MCP) — agent-driven cinematic demo generation#1
Open
Razee4315 wants to merge 13 commits into
Open
feat: AI Demo Director (MCP) — agent-driven cinematic demo generation#1Razee4315 wants to merge 13 commits into
Razee4315 wants to merge 13 commits into
Conversation
Decision-gate research doc on the branch: novelty verdict (the self-critique loop + native-Windows-free is the moat, auto-zoom is commodity), feasibility (~85% of programmatic drive already exists; injected SendInput clicks flow through the existing input hook and trigger auto-zoom for free), browser-scoped MVP architecture + phasing, and an honest time/confidence verdict.
Newline-delimited JSON request/response contract + blocking TCP client shared by Vuoom's in-app control server and the vuoom-mcp sidecar. Pure (serde only) so it compiles fast and is fully unit-tested off-CI: all request/response variants round-trip; framing emits one line.
Adds move/click/type/key-chord/scroll wrappers plus pure, unit-tested coord-normalization (physical px -> SendInput absolute) and key-name -> VK mapping. Injected events flow through the existing low-level hook, so a synthetic click both drives the target app and triggers auto-zoom.
…pling src-tauri now runs an opt-in localhost control server (VUOOM_ENABLE_CONTROL) that maps vuoom-control requests to Session calls + input injection. Adds Session::sample_frames (composite -> base64 PNG so the agent can see output) and Session::clip_info. Writes its port to a discovery file for the sidecar.
Standalone MCP server exposing 19 tools (screen_geometry, set_region, start/stop_recording, click/type/key_chord/scroll/move/wait, seek, clip_state, get_frames [returns PNG images], estimate/export gif+mp4) that bridge to Vuoom's control server via vuoom-control. The agent is the director + verify loop. Verified locally: initialize + tools/list return all tools over stdio.
Architecture, sidecar build + MCP client config, the VUOOM_ENABLE_CONTROL opt-in, the tool list, an example agent workflow, safety notes, and the CI-verified vs needs-a-real-machine verification status.
…ting Speaks the real protocol with canned responses + writes the discovery file, so the vuoom-mcp sidecar can be driven through a full agent demo sequence (record/click/type/stop/get_frames/export) without the GPU app. Verified: a scripted MCP session round-trips all tools incl. image frames.
Auto-zoom defaults OFF for the interactive UI (manual Ctrl+Shift+Z), so an agent driving via injected clicks recorded zero zooms. Thread a SetAutoZoomOnClick flag through control protocol -> Session (pending_auto_click, applied to the zoom plan) -> control server; the MCP start_recording tool enables it by default. Runtime-verified on a real machine: an agent-driven recording now plans cinematic zooms from injected clicks and exports a GIF.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds the AI Demo Director: an MCP server that lets an AI agent (Claude) drive Vuoom to
generate a demo GIF/MP4 — drive a target app, record with cinematic auto-zoom, then see the
output and re-record to improve it. Research/rationale in
docs/13-AI-Demo-Director-Research.md;setup in
docs/AI_DEMO_DIRECTOR.md.Pieces
vuoom-control— shared newline-delimited JSON protocol + blocking client + port discovery. Pure, fully unit-tested.vuoom-input—SendInputinjection (move/click/type/key/scroll); injected clicks flow through the existing hook so they drive auto-zoom for free. Pure coord/key math unit-tested.src-tauri— opt-in localhost control server (VUOOM_ENABLE_CONTROL) mapping protocol →Session+ injection; newSession::sample_frames(composite → base64 PNG) andclip_info.vuoom-mcp— standalone MCP sidecar (rmcp/stdio) exposing 19 tools; the agent is the director + verify loop. Locally verified:initialize+tools/listreturn all tools.Verification
-D warnings) + tests + check across the workspace.vuoom-control,vuoom-input,vuoom-mcp(incl. stdio smoke test).