Draft Linux Computer Use tool#712
Draft
HaD0Yun wants to merge 1 commit into
Draft
Conversation
Owner
|
Rebase onto this, and tool name should not be dependent on OS |
Yeachan-Heo
pushed a commit
that referenced
this pull request
Jun 16, 2026
Add isComputerLoadablePlatform (true everywhere except win32) and gate both BUILTIN_CAPABILITY_CATALOG and the BUILTIN_TOOLS computer factory on it: macOS stays callable, Linux stays listable (support planned via #712), Windows is fully absent (not registered, not advertised). Also fix a pre-existing initial-tools metadata test by constructing ComputerTool directly for loadMode coverage, mirroring how other createIf-gated tools (Ask/Ssh/Job/Recipe/Irc) are handled.
Yeachan-Heo
added a commit
that referenced
this pull request
Jun 16, 2026
* feat(pi-natives): add computer-use coordinate contract (slice 1)
First slice of the native computer-use tool (macOS-only v1), scoped via
the deep-interview spec + ralplan consensus plan. Lands the pure,
framework-free coordinate contract: NormalizedDisplay maps normalized
screenshot pixels to macOS logical points (Retina/HiDPI-safe) with
out-of-bounds and invalid-scale rejection, plus full unit tests. Adds
docs/computer-use/ capturing locked decisions, the coordinate contract,
and the delivery roadmap.
Native capture/input backend, kill-switch supervisor, napi/TS tool
surface, and the manual macOS end-to-end acceptance are tracked
follow-ups (require macOS hardware, granted TCC, and a human-operated
drill).
* feat(pi-natives): add macOS screen-capture primitive (computer-use)
Adds crates/pi-natives/src/computer/capture.rs (macOS-gated): read-only
primary-display capture via raw CoreGraphics FFI into a PNG plus the
NormalizedDisplay descriptor (scale derived from captured physical pixels
vs logical bounds). A missing Screen Recording grant surfaces
CaptureError::CaptureFailed rather than a silent black frame.
Verified live with Screen Recording granted: a real, non-uniform
primary-display capture decodes as a PNG with matching dimensions
(cargo test -p pi-natives --ignored captures_non_uniform_primary_display).
The GUI capture test is #[ignore] so CI stays deterministic.
* feat(pi-natives): add macOS TCC preflight for computer-use
Adds crates/pi-natives/src/computer/permissions.rs (macOS): non-prompting
preflight for Accessibility (input injection) and Screen Recording
(capture), Settings-pane openers, and require_*_for_input/capture guards
returning COMPUTER_PERMISSION_REQUIRED so callers fail closed.
This is the fail-closed gate prerequisite: input must not fire unless
Accessibility preflight passes. Live probe in this environment reports
accessibility=false (and screen_recording=false for the executing
binary, which still captured via host-process inheritance) — confirming
input injection / live E2E require Accessibility granted to the actual
executing gjc process before they can proceed.
* feat(computer-use): expose screenshot via napi -> packages/natives (TS)
Adds a #[napi] computerScreenshot binding over the landed CoreGraphics
capture, regenerates packages/natives bindings, and adds a TS test that
verifies a decodable PNG with matching dimensions. Verified live (bun
--cwd=packages/natives test computer: 1 pass, 13 expect()).
This wires the read-only screenshot primitive through the napi -> TS
bridge. Input primitives + kill-switch remain gated: live verification
requires Accessibility granted to the actual gjc process (a cargo/test
binary is never TCC-trusted for input injection).
* feat(pi-natives): add gated native input orchestration (computer-use)
Adds crates/pi-natives/src/computer/input.rs: InputController orchestrates
click/double_click/move/drag/scroll/type/keypress over an EventSink trait,
tracking held buttons so release_all cleans up after abort/error. Coords
flow through coords.rs; out-of-bounds is rejected and the drag error path
releases the held button. 9 unit tests drive a RecordingSink to verify
exact event sequences (no real OS events).
The real CGEvent-backed MacEventSink is constructed only via
guarded_controller(), which require_accessibility_for_input() gates — so
input physically cannot fire while Accessibility is ungranted. Not yet
napi/model-exposed (per the plan, input ships only after the kill-switch
is proven). Live OS-event behavior is verified in a granted gjc session;
the orchestration logic is fully unit-tested here.
* docs(computer-use): mark input orchestration logic-done (firing gated)
* feat(pi-natives): verify live input injection (cursor move) + warp fix
Adds current_cursor_position() (CGEventGetLocation) and a guarded live
test that moves the cursor to the display center and reads it back.
First run revealed a bare kCGEventMouseMoved event does not relocate the
cursor; fixed move_cursor to use CGWarpMouseCursorPosition (then post the
moved event for hover). Live test now passes within 2 logical points with
Accessibility granted to the host app (cmux).
This proves the CGEvent input pipeline end-to-end on the real desktop:
coords.rs transform -> warp -> read-back. Input is still gated
(guarded_controller requires Accessibility) and not yet model-exposed;
click/type await the kill-switch per the plan.
* docs(computer-use): mark live input injection verified; kill-switch next
* feat(pi-natives): add kill-switch supervisor safety state machine
Adds crates/pi-natives/src/computer/supervisor.rs: process-global
Supervisor with fail-closed input_allowed (requires hotkey_live + fresh
heartbeat + not suspended), trigger_stop latch, and user-only reset.
5 unit tests cover the gating truth table (not-live, live+fresh, stale
heartbeat, stop-latch-until-reset, lost liveness). Pure atomics so the
safety logic is deterministic; the OS hotkey listener drives it next.
* feat(pi-natives): add global kill-switch hotkey listener (verified live)
Adds crates/pi-natives/src/computer/hotkey.rs: a listen-only CGEventTap on
a dedicated CFRunLoop thread that latches Supervisor::trigger_stop on the
configured global hotkey (Ctrl+Opt+Cmd+Escape), marking the supervisor
stop path live on tap creation (fails closed otherwise). Independent of
the model tool path.
Verified live: synthetic_hotkey_triggers_stop posts a synthetic hotkey
and observes the supervisor latch suspended end-to-end (Accessibility
sufficed for tap creation; no separate Input Monitoring needed). clippy
clean, fmt applied; aligned CFRunLoopGetCurrent signature with
appearance.rs to avoid the clashing-extern warning.
* docs(computer-use): mark kill-switch verified live; gated execute_action next
* fix(computer-use): make computerScreenshot napi binding cross-platform
The `computer_screenshot` napi function lived in the macOS-gated
`computer::capture` module, so napi-rs omitted it from the generated
`index.{js,d.ts}` on non-macOS targets. CI's Linux native build
regenerated the bindings without `computerScreenshot`, breaking the
`@gajae-code/natives` type check on `test/computer.test.ts`
(TS2305: no exported member 'computerScreenshot').
Move the `ComputerScreenshot` struct and `computerScreenshot` binding
into `computer::mod` (compiled on all platforms) and gate only the macOS
CoreGraphics capture call internally, matching the `detectMacOSAppearance`
pattern. Non-macOS callers receive a clear unsupported-platform error, and
the generated TypeScript surface is now identical across platforms.
* feat(pi-natives): add supervisor-gated execute_action (computer-use G001)
Adds crates/pi-natives/src/computer/executor.rs: the single side-effect
authority. execute_input runs a fail-closed gate (supervisor live+fresh+
not-suspended, Accessibility granted, matching display epoch for
coordinate actions) before dispatching to InputController, and runs
release_all on any error or mid-flight suspension. Stable error codes
(COMPUTER_SUSPENDED / _SUPERVISOR_NOT_LIVE / _PERMISSION_REQUIRED /
_DISPLAY_STALE / _COORD_INVALID / _CANCELLED). DisplayContext trait
defines the display-epoch staleness contract; PermissionGate is injectable.
9 unit tests with a real Supervisor + fake perms/display + recording sink
cover every gate-rejection path, matching-epoch success, out-of-bounds
release-all, type/keypress/wait, and stable codes. 37 computer tests pass;
clippy clean. Added InputController::into_sink accessor.
* feat(computer-use): napi ComputerController (G002) + ultragoal red-team gate (G004)
G002 (pi-natives): capture.rs gains display_epoch (geometry hash) +
capture_id + lightweight current_display_epoch() (no Screen Recording);
executor.rs gains MacPermissionGate/MacDisplayContext providers; new
controller.rs exposes a #[napi] ComputerController whose 9 methods are
thin adapters that all route through executor::execute_input (the single
side-effect authority); bypass_guard.rs statically asserts InputController
side-effect methods are only called from input.rs/executor.rs. Bindings
regenerated. cargo test computer:: = 38 pass (incl bypass guard), clippy
clean.
G004 (ultragoal gate): ultragoal-runtime.ts gains a trusted changeSet
data-flow (checkpoint+review), computer-touching detection from trusted
changed paths (declarations additive-only), a mandatory computer
adversarial case-set (7 IDs, no not_applicable) requiring live/structural
native proof (inline/metadata/receipt-only fail with COMPUTER_REDTEAM_*),
computer/native surface tokens, docs-only tiering, and byte-for-byte
non-computer compatibility. New fixture matrix: 7 pass; ultragoal-runtime
102 pass + review 8 pass (non-regression).
* fix(pi-natives): collapse display epoch guard
* feat(computer-use): first-class TS computer tool + metadata-only catalog (G003)
Adds packages/coding-agent/src/tools/computer.ts: an AgentTool with the
exact OpenAI 9-action snake_case zod schema routing through the
@gajae-code/natives ComputerController, AbortSignal/timeout propagation,
and COMPUTER_* error mapping. Off-by-default + fail-closed: callable only
on macOS when computer.enabled||computer.alwaysOn; disabled returns
COMPUTER_DISABLED without constructing the controller or starting native
resources. First-class WITHOUT unsafe default exposure via a separate
BUILTIN_CAPABILITY_CATALOG (metadata-only) distinct from the callable
BUILTIN_TOOLS, so a disabled computer is documented/listable but not in
the session registry and not auto-activatable by search_tool_bm25. Adds
prompt, bounded renderer, settings-schema entries, and docs/tools/computer.md.
9 tool tests pass (exact schema, camelCase rejection, gating,
disabled->COMPUTER_DISABLED, dispatch mapping); biome clean.
* style(computer-use): format ultragoal fixtures
* fix(computer-use): repair TypeScript check errors (slice 1)
- ComputerToolDetails: add optional meta field so it satisfies the
ToolResultBuilder DetailsWithMeta weak-type constraint; toolResult()
now infers ComputerToolDetails instead of falling back to DetailsWithMeta
- render.ts: type summarizeComputerDetails parts as string[] so dynamic
summary strings are not constrained to the action literal union
- computer-red-team-fixtures.test.ts: drop removed gjcObjective input
option from createUltragoalPlan; guard possibly-undefined stderr/stdout
* test(computer-use): add G005 all-nine + kill-switch acceptance drill
Adds the COMPUTER_USE_MACOS_TEXTEDIT_ALL_NINE manual-acceptance drill as an
#[ignore] live test (crates/pi-natives/src/computer/input.rs live_tests):
drives all nine primitives (screenshot/move/click/type/keypress/
double_click/drag/scroll/wait) through the production gated path
(execute_input + Supervisor + Mac providers) against the focused app, then
waits for a human Control+Option+Command+Escape press and asserts the
kill-switch latches and blocks further input until reset. Ignored by
default (needs macOS + grants + a human keypress); run with:
cargo test -p pi-natives computer::input::live_tests::all_nine_acceptance_drill -- --ignored --nocapture
Note: the drill refreshes the supervisor heartbeat per action as a
stand-in for a periodic listener heartbeat (follow-up: tick heartbeat from
the hotkey listener thread).
* test(computer-use): persist durable G005 acceptance artifacts + widen kill-switch window
The all-nine acceptance drill now writes g005-before.png, g005-after-killswitch.png, and g005-manifest.json to .gjc/ultragoal/artifacts/g005 (override COMPUTER_USE_ACCEPTANCE_DIR), so the human-run drill produces the durable live native proof the G004 mandatory computer red-team suite requires on disk. Widen the kill-switch wait from ~10s to ~60s for manual operation and drop an unnecessary mut on the act closure.
* feat(computer-use): do not load the computer tool at all on Windows
Add isComputerLoadablePlatform (true everywhere except win32) and gate both BUILTIN_CAPABILITY_CATALOG and the BUILTIN_TOOLS computer factory on it: macOS stays callable, Linux stays listable (support planned via #712), Windows is fully absent (not registered, not advertised). Also fix a pre-existing initial-tools metadata test by constructing ComputerTool directly for loadMode coverage, mirroring how other createIf-gated tools (Ask/Ssh/Job/Recipe/Irc) are handled.
* test(computer): isolate macOS availability in tool tests
* test(natives): avoid static macOS computer imports on Linux
---------
Co-authored-by: Yeachan-Heo <yeachan-heo@gajae.dev>
6e84456 to
39942fb
Compare
Add a draft linux_computer_use tool that bridges GJC to a local Linux Computer Use HTTP target for Xvfb/noVNC observe/action loops. Keep the tool discoverable and opt-out via settings while preserving browser as the preferred DOM automation path.\n\nConstraint: GJC needs Linux GUI control without adding another workflow skill or shell execution path.\nRejected: Folding this into browser | LCU operates at desktop/API level and returns screenshots/actions rather than DOM handles.\nConfidence: medium\nScope-risk: moderate\nDirective: Keep externally visible desktop actions behind visual verification and explicit confirmation.\nTested: bun test packages/coding-agent/test/tools/linux-computer-use.test.ts; bun check
39942fb to
35c68ab
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
linux_computer_usetool for local Linux Computer Use HTTP targetsNotes
This keeps regular DOM-aware web automation on the existing
browsertool. The new tool is for Linux desktop/Xvfb/noVNC targets where the agent needs screenshot/action loops against an LCU server.Verification
bun --cwd=packages/natives run build(needed locally after a clean clone to provide the native addon)bun test packages/coding-agent/test/tools/linux-computer-use.test.tsbun check