feat(plan-14): pillars 4 + 6 — computer_use IPC + OTel observability#24
Merged
feat(plan-14): pillars 4 + 6 — computer_use IPC + OTel observability#24
Conversation
Lands two PLAN-14 pillars that were already drafted on disk but never committed. Pillar 4 — desktop computer_use IPC (desktop/src-tauri/src/computer_use.rs) Tauri command surface for OS-level computer-use actions, complementing the existing Playwright/CDP browser-use path on the Node side. The unified `computer_use` tool delegates browser actions to Node and OS-level (screenshot, mouse, keyboard) actions to these Tauri commands. Every action passes through the gateway's exec-approval-manager before reaching here so the user can approve/deny each invocation. Crates: - xcap — cross-platform screen capture (X11 / Wayland / macOS / Windows) - enigo — cross-platform input synthesis (mouse + keyboard) - base64 — transports screenshots over the JSON IPC bridge Gating: every command checks BITTERBOT_COMPUTER_USE=1. Default off, so the desktop binary doesn't ship an enabled OS-control surface to users who didn't opt in. Future: gateway-mediated session-level capability grants will replace the env-var gate. Pillar 6 — OpenTelemetry init (src/observability/otel.ts + .test.ts) Production-grade OTel SDK initialization, enabled only when OTEL_TRACES_EXPORTER (or OTEL_EXPORTER_OTLP_ENDPOINT) is set in the environment. Standard OTel auto-config convention so any OTLP-compatible collector (Grafana Tempo, Honeycomb, Datadog, Jaeger, etc.) works out of the box with no new config surface. When disabled, every helper is a no-op and the runtime cost is one env-var read per call to initOtel(). Dynamic imports keep the SDK out of the cold-start path for users who don't opt in, so adding the @opentelemetry/* deps is a separate, reversible step that doesn't break the build until completed. Pillar 6 unblocks Pillar 5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pillar 6 of PLAN-14: ship the first concrete OTel instrumentation
without forcing every operator onto a collector.
Three wire-points:
1. Gateway boot (`startGatewayServer`) calls `initOtel()`. No-op when
OTEL_TRACES_EXPORTER / OTEL_EXPORTER_OTLP_ENDPOINT is unset, so the
default install path keeps zero overhead.
2. Every gateway RPC method gets one `gateway.rpc.<method>` span via
`withSpan` in `handleGatewayRequest`. Captures rpc.method as an
attribute, propagates exceptions to span status automatically.
3. Each pi-embedded tool execution gets a paired
`agent.tool.<toolName>` span from start->end. Stored in
`toolSpansById` keyed by toolCallId; awaited only on the cold end
path so the start path stays hot.
Adds `startSpan` to observability/otel.ts for any future paired-event
instrumentation (memory ops, dream phases) where withSpan's single-fn
shape doesn't fit.
Deps: @opentelemetry/{api,sdk-node,exporter-trace-otlp-http,resources,
semantic-conventions} added at workspace root. Dynamic-imported in
otel.ts so a pre-install dev tree still builds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lays down the LangGraph-parity primitive that, paired with Pillar 5 long-horizon runtime, lets a 6+ hour run be branched from any prior state without touching the original timeline. Core: src/checkpoints/store.ts. - Single-table SQLite schema keyed on (thread_id, step_id) with parent_step_id enabling DAG branches. - gzip-compressed state blobs + sha256 dedup hash; idempotent on repeated saves of the same step. - ancestors() walks back to the root oldest-first for replay. - fork() copies a chosen lineage into a new thread and adds a fork_root marker so timeline UIs can render branch points. - WAL mode + busy_timeout=5000 so dashboard reads don't block writers. - Separate DB from the memory store so checkpoint volume doesn't bloat the embedding index. CLI: src/cli/checkpoints-cli.ts (registered as `bitterbot checkpoints`). - threads: list threads with last activity + step count. - list: enumerate checkpoints in a thread oldest-first. - show: print a single checkpoint's full state. - fork: branch a thread from a chosen step. - delete: drop every checkpoint in a thread. Each subcommand accepts --db override and --json. Default DB lives at ~/.bitterbot/checkpoints.sqlite (BITTERBOT_CHECKPOINT_DB overrides). Tests: src/checkpoints/store.test.ts — 6 tests covering save, idempotency, ancestor walk, fork (including non-mutation of the source thread), thread listing, and delete-by-thread isolation. This is the storage primitive only; integration with pi-embedded-runner to write user_message / assistant_message / tool_call / tool_result boundaries is the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 of Pillar 6 #2 integration: an onAgentEvent listener writes each meaningful event to the checkpoint graph, using runId as thread_id and per-run monotonic seq as step_id. Tool start/result pairs become parent-child checkpoints; assistant text deltas are deliberately skipped to keep the timeline navigable. This converts the checkpoint store from a primitive into a working capability — `bitterbot checkpoints threads` now lists every run that produced a tool call, and `bitterbot checkpoints fork <thread> <step>` branches a fresh thread from any chosen point. Wired into startGatewayServer alongside initOtel; gated by BITTERBOT_CHECKPOINTS=1 so the default install path stays zero-overhead. Phase 2 (deferred) will dump full session snapshots from the runner at compaction/turn boundaries, enabling true replay (today's state is the event payload only — sufficient for inspection and lineage UI but not for full state reconstruction). Tests: 4 new (parented chain, partial-frame skipping, env-gating, idempotency). Combined checkpoint suite: 10 tests, all passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Headless-friendly OS control: the orchestrator daemon gains six IPC commands (screenshot, screen_size, mouse_move, mouse_click, type, key) that any agent on any platform can drive without a Tauri window in between. Browser automation continues to flow through the existing pw-tools-core path; this is the OS counterpart. Two-stage gating, by design: 1. Build-time: `cargo build --features=computer-use` opts into linking xcap + enigo. Default builds (the relay fleet, generic Linux boxes) omit the deps entirely so X11/libxdo system requirements don't leak. 2. Runtime: even on a feature-built binary, BITTERBOT_COMPUTER_USE=1 must be set before the orchestrator will act. A misconfigured node can never silently start clicking. orchestrator/src/computer.rs holds the actual wrapper (xcap for capture, enigo for input synthesis). The cfg(not(feature)) path returns a clear "feature not built" envelope so Node-side callers surface the cause. Node side: - OrchestratorBridge gains computerScreenshot / computerScreenSize / computerMouseMove / computerMouseClick / computerType / computerKey, each returning a normalized ComputerUseResult discriminated by `ok`. - A module-scoped accessor (setActiveOrchestratorBridge / getActiveOrchestratorBridge) lets agent tools reach the live bridge without threading it through every factory; the gateway registers it on startup right after `Bridge.start()`. - New unified `computer_use` agent tool routes screenshot / mouse / keyboard actions through the bridge. Wired into bitterbot-tools.ts alongside the existing browser tool. 5 unit tests, all passing. Default-feature orchestrator build verified: 1m14s, exit 0, no new deps pulled in. The relay fleet's cloud-init is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pillar 5 lands the orchestration layer that turns the checkpoint store into a genuine long-horizon capability: a runtime that drives an agent through work → rest → dream cycles, writes a parent-chained checkpoint timeline at every phase boundary, and resumes from the latest tip. LongHorizonRuntime in src/agents/long-horizon/runtime.ts: - Phases: work (configurable workMs window), rest (cool-down), dream (one pass of the supplied dreamStep). Repeats until any of: workStep returns done, max iterations hit, wall-clock budget exhausted, or AbortSignal fires. - Checkpoint at each phase boundary using `kind: "custom"` with a `phase` metadata field, so timeline UIs can colour-code the cycle. - Test seams (`now`, `sleep`) so the cycle can be driven through fake time without real timers. - LongHorizonRuntime.resume(threadId, store) returns the latest step id from the store — the entry point for resume after restart. - Wrapped in `long_horizon.run` / `long_horizon.work_step` / `long_horizon.dream_step` OTel spans so a multi-hour run produces trace coverage at the right granularity for production debugging. Pillar 6 #1 follow-up: memory hot-path spans: - `memory.search` and `memory.dream` now run inside withSpan so a collector sees the same boundaries the agent itself sees. Search span carries query length + max-results attributes; dream span marks engine identity. Zero overhead when OTel is disabled. Tests: 5 new long-horizon tests covering work-rest-dream rotation with parent-chained checkpoints, early-done, abort, budget cap, and resume. Combined PLAN-14 suite: 26 tests, all passing. This is the final foundational piece for Pillar 5 — wiring an actual 6+ hour agent run on top is now a `workStep: () => agent.step()` composition, not an architectural project. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lands two PLAN-14 pillars that were already drafted on disk but never committed.
Pillar 4 — desktop computer_use IPC (
desktop/src-tauri/src/computer_use.rs)Tauri command surface for OS-level computer-use actions, complementing the existing Playwright/CDP browser-use path on the Node side. Unified
computer_usetool delegates browser actions to Node and OS-level (screenshot, mouse, keyboard) actions to these Tauri commands. Gateway exec-approval-manager mediates each invocation so the user approves/denies before anything fires.Crates:
xcap(cross-platform screen capture),enigo(cross-platform input synthesis),base64(screenshot transport over JSON IPC).Gated by
BITTERBOT_COMPUTER_USE=1. Default off — the binary doesn't ship an enabled OS-control surface to users who didn't opt in. Future: gateway-mediated session capability grants replace the env-var gate.Pillar 6 — OpenTelemetry init (
src/observability/otel.{ts,test.ts})Production-grade OTel SDK initialization. Enabled only when
OTEL_TRACES_EXPORTERorOTEL_EXPORTER_OTLP_ENDPOINTis set in env, matching the standard auto-config convention so any OTLP-compatible collector (Grafana Tempo, Honeycomb, Datadog, Jaeger) works out of the box with no new config surface.When disabled, every helper is a no-op (one env-var read per call). Dynamic imports keep the SDK out of the cold-start path, so adding
@opentelemetry/*deps is a separate, reversible step that doesn't break the build until completed.Pillar 6 unblocks Pillar 5.
Test plan
BITTERBOT_COMPUTER_USEunset — desktop bundle still ships, Tauri commands return errors when invokedBITTERBOT_COMPUTER_USE=1+OTEL_TRACES_EXPORTER=otlpand verify spans land in a local OTel collectorpnpm test src/observability/otel.test.tspasses🤖 Generated with Claude Code