Skip to content

feat(plan-14): pillars 4 + 6 — computer_use IPC + OTel observability#24

Merged
VGIL77 merged 7 commits intomainfrom
feat/plan14-pillars-4-6
Apr 30, 2026
Merged

feat(plan-14): pillars 4 + 6 — computer_use IPC + OTel observability#24
VGIL77 merged 7 commits intomainfrom
feat/plan14-pillars-4-6

Conversation

@VGIL77
Copy link
Copy Markdown
Contributor

@VGIL77 VGIL77 commented Apr 29, 2026

Summary

Lands two PLAN-14 pillars that were already drafted on disk but never committed.

Pillar 4 — desktop computer_use IPC (desktop/src-tauri/src/computer_use.rs)

Tauri command surface for OS-level computer-use actions, complementing the existing Playwright/CDP browser-use path on the Node side. Unified computer_use tool delegates browser actions to Node and OS-level (screenshot, mouse, keyboard) actions to these Tauri commands. Gateway exec-approval-manager mediates each invocation so the user approves/denies before anything fires.

Crates: xcap (cross-platform screen capture), enigo (cross-platform input synthesis), base64 (screenshot transport over JSON IPC).

Gated by BITTERBOT_COMPUTER_USE=1. Default off — the binary doesn't ship an enabled OS-control surface to users who didn't opt in. Future: gateway-mediated session capability grants replace the env-var gate.

Pillar 6 — OpenTelemetry init (src/observability/otel.{ts,test.ts})

Production-grade OTel SDK initialization. Enabled only when OTEL_TRACES_EXPORTER or OTEL_EXPORTER_OTLP_ENDPOINT is set in env, matching the standard auto-config convention so any OTLP-compatible collector (Grafana Tempo, Honeycomb, Datadog, Jaeger) works out of the box with no new config surface.

When disabled, every helper is a no-op (one env-var read per call). Dynamic imports keep the SDK out of the cold-start path, so adding @opentelemetry/* deps is a separate, reversible step that doesn't break the build until completed.

Pillar 6 unblocks Pillar 5.

Test plan

  • Build with BITTERBOT_COMPUTER_USE unset — desktop bundle still ships, Tauri commands return errors when invoked
  • Set BITTERBOT_COMPUTER_USE=1 + OTEL_TRACES_EXPORTER=otlp and verify spans land in a local OTel collector
  • pnpm test src/observability/otel.test.ts passes

🤖 Generated with Claude Code

VGIL77 and others added 7 commits April 28, 2026 21:59
Lands two PLAN-14 pillars that were already drafted on disk but never
committed.

Pillar 4 — desktop computer_use IPC (desktop/src-tauri/src/computer_use.rs)
Tauri command surface for OS-level computer-use actions, complementing
the existing Playwright/CDP browser-use path on the Node side. The
unified `computer_use` tool delegates browser actions to Node and
OS-level (screenshot, mouse, keyboard) actions to these Tauri commands.
Every action passes through the gateway's exec-approval-manager before
reaching here so the user can approve/deny each invocation.

Crates:
- xcap   — cross-platform screen capture (X11 / Wayland / macOS / Windows)
- enigo  — cross-platform input synthesis (mouse + keyboard)
- base64 — transports screenshots over the JSON IPC bridge

Gating: every command checks BITTERBOT_COMPUTER_USE=1. Default off, so
the desktop binary doesn't ship an enabled OS-control surface to users
who didn't opt in. Future: gateway-mediated session-level capability
grants will replace the env-var gate.

Pillar 6 — OpenTelemetry init (src/observability/otel.ts + .test.ts)
Production-grade OTel SDK initialization, enabled only when
OTEL_TRACES_EXPORTER (or OTEL_EXPORTER_OTLP_ENDPOINT) is set in the
environment. Standard OTel auto-config convention so any OTLP-compatible
collector (Grafana Tempo, Honeycomb, Datadog, Jaeger, etc.) works out of
the box with no new config surface.

When disabled, every helper is a no-op and the runtime cost is one
env-var read per call to initOtel(). Dynamic imports keep the SDK out
of the cold-start path for users who don't opt in, so adding the
@opentelemetry/* deps is a separate, reversible step that doesn't break
the build until completed.

Pillar 6 unblocks Pillar 5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pillar 6 of PLAN-14: ship the first concrete OTel instrumentation
without forcing every operator onto a collector.

Three wire-points:

1. Gateway boot (`startGatewayServer`) calls `initOtel()`. No-op when
   OTEL_TRACES_EXPORTER / OTEL_EXPORTER_OTLP_ENDPOINT is unset, so the
   default install path keeps zero overhead.

2. Every gateway RPC method gets one `gateway.rpc.<method>` span via
   `withSpan` in `handleGatewayRequest`. Captures rpc.method as an
   attribute, propagates exceptions to span status automatically.

3. Each pi-embedded tool execution gets a paired
   `agent.tool.<toolName>` span from start->end. Stored in
   `toolSpansById` keyed by toolCallId; awaited only on the cold end
   path so the start path stays hot.

Adds `startSpan` to observability/otel.ts for any future paired-event
instrumentation (memory ops, dream phases) where withSpan's single-fn
shape doesn't fit.

Deps: @opentelemetry/{api,sdk-node,exporter-trace-otlp-http,resources,
semantic-conventions} added at workspace root. Dynamic-imported in
otel.ts so a pre-install dev tree still builds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lays down the LangGraph-parity primitive that, paired with Pillar 5
long-horizon runtime, lets a 6+ hour run be branched from any prior
state without touching the original timeline.

Core: src/checkpoints/store.ts.
- Single-table SQLite schema keyed on (thread_id, step_id) with
  parent_step_id enabling DAG branches.
- gzip-compressed state blobs + sha256 dedup hash; idempotent on
  repeated saves of the same step.
- ancestors() walks back to the root oldest-first for replay.
- fork() copies a chosen lineage into a new thread and adds a
  fork_root marker so timeline UIs can render branch points.
- WAL mode + busy_timeout=5000 so dashboard reads don't block writers.
- Separate DB from the memory store so checkpoint volume doesn't
  bloat the embedding index.

CLI: src/cli/checkpoints-cli.ts (registered as `bitterbot
checkpoints`).
- threads: list threads with last activity + step count.
- list: enumerate checkpoints in a thread oldest-first.
- show: print a single checkpoint's full state.
- fork: branch a thread from a chosen step.
- delete: drop every checkpoint in a thread.
Each subcommand accepts --db override and --json. Default DB lives at
~/.bitterbot/checkpoints.sqlite (BITTERBOT_CHECKPOINT_DB overrides).

Tests: src/checkpoints/store.test.ts — 6 tests covering save,
idempotency, ancestor walk, fork (including non-mutation of the
source thread), thread listing, and delete-by-thread isolation.

This is the storage primitive only; integration with pi-embedded-runner
to write user_message / assistant_message / tool_call / tool_result
boundaries is the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 of Pillar 6 #2 integration: an onAgentEvent listener writes
each meaningful event to the checkpoint graph, using runId as
thread_id and per-run monotonic seq as step_id. Tool start/result
pairs become parent-child checkpoints; assistant text deltas are
deliberately skipped to keep the timeline navigable.

This converts the checkpoint store from a primitive into a working
capability — `bitterbot checkpoints threads` now lists every run that
produced a tool call, and `bitterbot checkpoints fork <thread> <step>`
branches a fresh thread from any chosen point.

Wired into startGatewayServer alongside initOtel; gated by
BITTERBOT_CHECKPOINTS=1 so the default install path stays
zero-overhead.

Phase 2 (deferred) will dump full session snapshots from the runner
at compaction/turn boundaries, enabling true replay (today's state is
the event payload only — sufficient for inspection and lineage UI but
not for full state reconstruction).

Tests: 4 new (parented chain, partial-frame skipping, env-gating,
idempotency). Combined checkpoint suite: 10 tests, all passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Headless-friendly OS control: the orchestrator daemon gains six IPC
commands (screenshot, screen_size, mouse_move, mouse_click, type, key)
that any agent on any platform can drive without a Tauri window in
between. Browser automation continues to flow through the existing
pw-tools-core path; this is the OS counterpart.

Two-stage gating, by design:
1. Build-time: `cargo build --features=computer-use` opts into linking
   xcap + enigo. Default builds (the relay fleet, generic Linux boxes)
   omit the deps entirely so X11/libxdo system requirements don't leak.
2. Runtime: even on a feature-built binary, BITTERBOT_COMPUTER_USE=1
   must be set before the orchestrator will act. A misconfigured node
   can never silently start clicking.

orchestrator/src/computer.rs holds the actual wrapper (xcap for capture,
enigo for input synthesis). The cfg(not(feature)) path returns a clear
"feature not built" envelope so Node-side callers surface the cause.

Node side:
- OrchestratorBridge gains computerScreenshot / computerScreenSize /
  computerMouseMove / computerMouseClick / computerType / computerKey,
  each returning a normalized ComputerUseResult discriminated by `ok`.
- A module-scoped accessor (setActiveOrchestratorBridge /
  getActiveOrchestratorBridge) lets agent tools reach the live bridge
  without threading it through every factory; the gateway registers it
  on startup right after `Bridge.start()`.
- New unified `computer_use` agent tool routes screenshot / mouse / keyboard
  actions through the bridge. Wired into bitterbot-tools.ts alongside
  the existing browser tool. 5 unit tests, all passing.

Default-feature orchestrator build verified: 1m14s, exit 0, no new
deps pulled in. The relay fleet's cloud-init is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pillar 5 lands the orchestration layer that turns the checkpoint store
into a genuine long-horizon capability: a runtime that drives an agent
through work → rest → dream cycles, writes a parent-chained checkpoint
timeline at every phase boundary, and resumes from the latest tip.

LongHorizonRuntime in src/agents/long-horizon/runtime.ts:
- Phases: work (configurable workMs window), rest (cool-down), dream
  (one pass of the supplied dreamStep). Repeats until any of: workStep
  returns done, max iterations hit, wall-clock budget exhausted, or
  AbortSignal fires.
- Checkpoint at each phase boundary using `kind: "custom"` with a
  `phase` metadata field, so timeline UIs can colour-code the cycle.
- Test seams (`now`, `sleep`) so the cycle can be driven through fake
  time without real timers.
- LongHorizonRuntime.resume(threadId, store) returns the latest step
  id from the store — the entry point for resume after restart.
- Wrapped in `long_horizon.run` / `long_horizon.work_step` /
  `long_horizon.dream_step` OTel spans so a multi-hour run produces
  trace coverage at the right granularity for production debugging.

Pillar 6 #1 follow-up: memory hot-path spans:
- `memory.search` and `memory.dream` now run inside withSpan so a
  collector sees the same boundaries the agent itself sees. Search
  span carries query length + max-results attributes; dream span
  marks engine identity. Zero overhead when OTel is disabled.

Tests: 5 new long-horizon tests covering work-rest-dream rotation
with parent-chained checkpoints, early-done, abort, budget cap, and
resume. Combined PLAN-14 suite: 26 tests, all passing.

This is the final foundational piece for Pillar 5 — wiring an actual
6+ hour agent run on top is now a `workStep: () => agent.step()`
composition, not an architectural project.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@VGIL77 VGIL77 merged commit c91e560 into main Apr 30, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants