Skip to content

feat(kiloclaw): pre-install ClawMetry observability with auto-sync#3078

Draft
vivekchand wants to merge 12 commits intoKilo-Org:mainfrom
vivekchand:feat/clawmetry-integration
Draft

feat(kiloclaw): pre-install ClawMetry observability with auto-sync#3078
vivekchand wants to merge 12 commits intoKilo-Org:mainfrom
vivekchand:feat/clawmetry-integration

Conversation

@vivekchand
Copy link
Copy Markdown

@vivekchand vivekchand commented May 6, 2026

Summary

Pre-installs ClawMetry (free OpenClaw observability) on every KiloClaw instance and adds a "View Observability" button to the instance detail page. Click it → sync daemon starts on demand, dashboard opens in a new tab with live agent activity.

Works out of the box. End-to-end encrypted (AES-256-GCM, key never leaves the instance).

Changes

  • Dockerfilecurl https://clawmetry.com/install.sh | bash, so every image picks up the latest release.
  • controller/src/bootstrap.ts — new provisionClawMetrySync step. Calls ClawMetry's public /api/register, generates the AES key client-side, writes ~/.clawmetry/config.json (mode 0600) and dashboard-url.txt. Daemon is NOT spawned here.
  • controller/src/routes/clawmetry.tsGET /_kilo/clawmetry-dashboard-url and POST /_kilo/clawmetry-start-sync (idempotent), behind the standard /_kilo/* bearer-token gate.
  • apps/web/.../api/kiloclaw/clawmetry/[instanceId]/route.ts — Next.js route the button calls. Uses the existing /i/{instanceId}/* worker proxy.
  • apps/web/.../KiloClawDetail.tsx — "View Observability" button.

Notes

  • Deferred sync. Daemon spawns on first button click — no wasted compute / storage / bandwidth for users who never look.
  • E2E encryption. AES-256-GCM key generated on the instance. The dashboard URL embeds it as a #fragment, which browsers never send to servers. ClawMetry's cloud only sees ciphertext; KiloClaw never sees the key.
  • Same public endpoint as standalone pip install clawmetry. No partner channel, no custom auth, no shared secret.
  • Fail-soft. Provisioning errors warn and skip; the gateway boots regardless.

Free / Pro

KiloClaw users land on Free by default (1 node, 24h history, E2E encryption). Upgrade flow lives on clawmetry.com.

Test plan

  • pnpm --filter kiloclaw test (90 files / 1762 tests), typecheck, lint, pnpm format — all clean
  • pnpm --filter kiloclaw test:e2e:clawmetry — Playwright runner that POSTs to /api/register (same endpoint bootstrap uses), drives the resulting dashboard URL through Chromium, and asserts the decryption hand-off (CLOUD_MODE, CLOUD_TOKEN, enc_key in localStorage, fragment scrubbed, no unlock prompt) plus walks Brain / Tokens / Crons tabs for decrypt errors
  • Build a fresh image and verify clawmetry --version runs in the build
  • Spin up a test instance, confirm config.json + dashboard-url.txt exist with mode 0600 and clawmetry sync is NOT running
  • Click "View Observability", confirm the daemon spawns and the dashboard renders decrypted events

vivekchand added 3 commits May 6, 2026 10:44
Every KiloClaw instance now ships with the ClawMetry sync daemon
pre-installed. When the operator sets CLAWMETRY_PARTNER_KEY on the
instance, bootstrap auto-provisions a free ClawMetry account for the
user (using KILOCLAW_USER_EMAIL or GITHUB_EMAIL) and starts the sync
daemon — agent activity flows to app.clawmetry.com so users get a
real-time dashboard of their KiloClaw runtime.

Until the partner key is set, the integration is a silent no-op:
zero behavior change, the package is installed but inert. The cloud-
side prerequisite (ClawMetry's /api/partner/kiloclaw/provision route
+ partner key issuance) is documented in
services/kiloclaw/docs/clawmetry-integration.md.

- Dockerfile: install python3-pip + clawmetry==0.12.161
- bootstrap.ts: new provisionClawMetrySync step (gated, fail-soft)
- bootstrap.test.ts: 7 new tests covering disabled/missing-key/missing-
  email/success/email-fallback/network-error/server-error/api-base-
  override paths
- docs/clawmetry-integration.md: activation, env vars, verification,
  cloud-side prerequisites
…ction

The original design had bootstrap call a custom /api/partner/kiloclaw/
provision endpoint gated by CLAWMETRY_PARTNER_KEY. That was overdesigned
— the partner key was only there to gate an account-takeover oracle the
custom endpoint introduced by returning existing tokens by email lookup.

The existing public /api/register endpoint already does the right thing:
tokens are scoped to machine_id (not email), email is metadata for
recovery only, idempotent on machine_id, no secrets to manage. Same flow
any user installing clawmetry on a fresh OpenClaw box would use.

Changes:
- bootstrap.ts: provisionClawMetrySync calls /api/register with
  {hostname, machine_id (FLY_MACHINE_ID || HOSTNAME), platform, email?}
- Drop CLAWMETRY_PARTNER_KEY env var entirely; integration enabled by
  default (KILOCLAW_CLAWMETRY_DISABLED=true is the only opt-out)
- Email is now optional — account is keyed on machine_id either way
- bootstrap.test.ts: 8 tests updated to cover the simpler shape
  (disabled, no-machine-id, register success, no-email, GITHUB_EMAIL
  fallback, HOSTNAME fallback, network error, no-api_key, API base
  override)
- docs/clawmetry-integration.md: rewrite to reflect "no special endpoint,
  no partner key, no secrets" model

Pairs with cloud-side revert at vivekchand/clawmetry-cloud#629 (merged).
Two related changes that fix gaps in the previous commit:

1. E2E encryption key (was missing entirely)
   ClawMetry encrypts events client-side with AES-256-GCM before publishing.
   The previous commit only wrote a token to disk — the daemon would have
   either crashed (no encryption_key in config) or, worse, published
   plaintext. Now bootstrap:
     - Generates a 32-byte enc_key client-side (never sent to any server)
     - Writes the full schema to /root/.clawmetry/config.json
       ({api_key, encryption_key, node_id, platform, connected_at})
     - Writes a self-decrypting dashboard URL to dashboard-url.txt:
       https://app.clawmetry.com/cloud#key=<enc>&node=<node>
       The #fragment is never sent to any server (browsers strip it from
       outgoing requests) — the dashboard JS reads it client-side and
       stashes the key in localStorage to decrypt event blobs.

   E2E story: enc_key flows instance → URL fragment → user's browser.
   NEVER through any server. Even KiloClaw's own controller only sees the
   URL string; the fragment is meaningful only to the browser.

2. Deferred sync (only fire when user actually wants to look)
   No more daemon spawn at bootstrap. Bootstrap just pre-wires:
   install + config + dashboard URL. The daemon stays dormant.

   When the user clicks "View Observability" in KiloClaw's web UI:
     POST /_kilo/clawmetry-start-sync   → spawns nohup clawmetry sync &
     GET  /_kilo/clawmetry-dashboard-url → returns the self-decrypting URL
   Browser opens the URL → dashboard shows "Syncing your data..." until
   first events arrive (catch-up batch from local OpenClaw session files).

   Saves compute / bandwidth / cloud storage for users who never look at
   the dashboard. Daemon, once started, persists for that boot session.

Tests: 89 files / 1752 pass. Adds 1 test (config + URL writing path) +
adjusts the original "starts daemon" assertions to "does NOT start
daemon" since deferred-sync moves that to UI-button-click time.

The KiloClaw web UI follow-up (the actual button + controller endpoints)
is intentionally out of scope here — flagged in docs/ for Suhail's team
to take or for me to follow up on if they want.
vivekchand added 7 commits May 7, 2026 12:43
Completes the ClawMetry integration so it works user-visibly out of the
box, not just at the bootstrap layer. Three small additions:

1. Controller endpoints (services/kiloclaw/controller/src/routes/clawmetry.ts)
   - GET  /_kilo/clawmetry-dashboard-url → returns the self-decrypting
     URL written at bootstrap (404 if provisioning didn't run / disabled)
   - POST /_kilo/clawmetry-start-sync    → spawns `clawmetry sync`
     detached, idempotent (returns alreadyRunning:true when daemon exists)
   - Same bearer-token gate as the rest of /_kilo/* routes
   - 10 unit tests covering auth, success, missing-file, idempotency,
     spawn-failure paths

2. Web app proxy route (apps/web/src/app/api/kiloclaw/clawmetry/[instanceId])
   - Single POST that does both: triggers start-sync (best-effort, logged
     but non-fatal) then fetches dashboard URL, returns it to browser
   - Uses the existing `/i/{instanceId}/*` worker proxy → no new platform
     routes, no new DO methods, no new internal client surface
   - JWT-authenticated via standard getUserFromAuth + generateApiToken

3. UI button (KiloClawDetail.tsx)
   - "View Observability" in the existing action button row
   - Toast feedback: "Starting ClawMetry sync…" → "Opening ClawMetry…"
     or error
   - window.open(url, '_blank', 'noopener,noreferrer') — fragment never
     reaches our servers; browser stashes enc_key in localStorage and
     decrypts events client-side

E2E flow:
  KiloClaw bootstrap (existing)
    → installs ClawMetry, writes config.json + dashboard-url.txt
    → sync daemon dormant
  User clicks "View Observability"
    → POST /api/kiloclaw/clawmetry/{instanceId}
      → POST {worker}/i/{id}/_kilo/clawmetry-start-sync (controller spawns daemon)
      → GET  {worker}/i/{id}/_kilo/clawmetry-dashboard-url
    → window.open(url) → dashboard JS reads #fragment → decrypts events
    → "Syncing your data…" until first events arrive

Tests: 90 files / 1762 pass (kiloclaw service). Web typecheck + lint
clean. Format clean.

Closes the gap between bootstrap-side provisioning and user-visible
observability — no follow-up PR needed.
Free is 1 node + 24h Brain feed + 7d token tracking — NOT 90-day retention.
90-day retention is the Pro tier ($5/node/month). Caught while reviewing
the PR description; fixing the same misstatement in the in-tree docs.

See https://clawmetry.com/pricing for canonical numbers.
`clawmetry sync` isn't a CLI subcommand — the canonical sync-daemon entry
point is the sync.py module (see clawmetry/cli.py:_start_subprocess).
install.sh creates a venv at /root/.clawmetry, so the venv python is at
/root/.clawmetry/bin/python3.

Caught while running an end-to-end test against the actual CLI:
  $ clawmetry sync --help
  error: argument command: invalid choice: 'sync'
       (choose from 'start', 'stop', 'restart', 'status', 'connect',
        'uninstall', 'help')

Verified the fix end-to-end with the venv python:
  $ HOME=/tmp/test python3 -m clawmetry.sync
  [sync] node=vivek+kc-e2e-... → ingest.clawmetry.com (🔒 E2E encrypted)
  [sync] Initial heartbeat sent
  [sync] Recent sessions: 3 events synced
  → cloud /api/cloud/nodes shows the node
  → cloud /api/cloud/sessions shows the synced session
…redirect

The dashboard URL we wrote at bootstrap was https://app.clawmetry.com/cloud
#key=<enc>&node=<id>. Caught in browser-based E2E that this lands users at
/cloud without ?token= set, so window.CLOUD_TOKEN ends up empty and the
URL-fragment bridge can't namespace the localStorage enc_key (it builds
cm-enc-key-<node>-  with empty token-prefix and silently drops the key).

Switching to the /d/<dashboard_id>#key=...&node=... form. The /d/<id>
route does a server-side 302 to /cloud?token=cm_xxx, the browser preserves
the #fragment across the redirect, the cloud page sets window.CLOUD_TOKEN
from the query, and the bridge script can stash the enc_key correctly.

Verified end-to-end:
  $ curl -sI https://app.clawmetry.com/d/<dashboard_id>
  HTTP/2 302
  location: /cloud?token=cm_5c9456...

Falls back to /cloud#... if /api/register doesn't return dashboard_id
(older cloud versions). The user lands unauthenticated and has to OTP-in,
which is suboptimal but doesn't break.
The /d/<dashboard_id> redirect resolves to /cloud (the fleet overview),
which doesn't contain the cm-setup-bridge script that reads the URL
fragment. So the enc_key was being preserved but never extracted.

Fix: construct the URL directly as
  /cloud/node/<node_id>?token=<api_key>#key=<enc>&node=<node_id>

This is the per-node detail page — what KiloClaw users actually want when
they click "View Observability" (their runtime's Brain feed, sessions,
tools), and it's the only page that has the bridge script.

Verified end-to-end against prod. 3/3 browser test runs:
  ✓ Fragment processed (stripped from URL bar)
  ✓ window.CLOUD_TOKEN set from ?token=
  ✓ localStorage[cm-enc-key-<node>-<token-prefix>] populated with enc_key
  ✓ Dashboard renders decrypted

Pairs with the cloud-side fixes that landed today:
  - vivekchand/clawmetry-cloud#630  cm-setup-bridge fallback to localStorage
  - vivekchand/clawmetry-cloud#631  preserve fragment when stashing token
  - vivekchand/clawmetry-cloud#633  dedup _validate_cm_token (caching bug)
  - vivekchand/clawmetry-cloud#634  bump DB pool 10 → 25
Reproduces the bootstrap → register → daemon → dashboard flow against the
live ClawMetry cloud and asserts the browser-side decryption hand-off
works end-to-end. Catches regressions that unit tests can't see — the
URL-fragment enc_key handoff is a race between inline scripts in the
dashboard, only visible from a real browser.

Run:
  pnpm --filter kiloclaw test:e2e:clawmetry
  HEADLESS=0 pnpm --filter kiloclaw test:e2e:clawmetry  # show browser

Bails cleanly (exit 0) on /api/register 429 — the rate limit is 10/hr
per IP, so the test should be run sparingly, not on every CI push.
Walks all free-tier tabs as rendered by the cloud dashboard
(Flow / Brain / Overview / Approvals / Alerts / Notifications /
Context / Tokens / Crons / Memory) and captures a screenshot per
tab when SCREENSHOT_DIR is set. Drops the wrong tab labels
(Sessions/Channels/Health/Logs — not present in the free-tier
nav).
vivekchand added 2 commits May 7, 2026 20:48
Previously the sync daemon was spawned on the user's first "View
Observability" click — which produced two real bugs the E2E test caught:
(a) a spawn-on-click race that left users staring at a "Run: clawmetry
connect" red banner during cold-start, and (b) the dashboard sometimes
rendered "Node not found" because the nodes table hadn't been populated
by a heartbeat yet.

New shape:

- Bootstrap registers with source='kiloclaw' (the cloud-side companion
  PR uses this to mark the user account as deferred-sync) and spawns
  the daemon at instance boot. The daemon heartbeats every 60s — that's
  enough for the dashboard to render correctly the moment the user
  clicks View Observability — but uploads ZERO content (sessions,
  events, logs, memory) because the cloud's /ingest/heartbeat returns
  sync_allowed=false, reason='intent_pending' until the user has
  signalled intent.

- POST /_kilo/clawmetry-start-sync no longer spawns anything (the
  daemon is already running). Instead it reads api_key from
  /root/.clawmetry/config.json and POSTs to the cloud's new
  /api/cloud/intent-start endpoint, which flips users.sync_intent_at.
  The daemon's next heartbeat (~60s) sees sync_allowed=true and
  uploads resume.

Why this matters:

- No spawn-on-click race — daemon is always there, only the data flow
  is gated. Eliminates the "Run: clawmetry connect" misleading banner
  for KiloClaw users.
- Privacy + cost: 99% of inert KiloClaw boxes never sync any content,
  only metadata-only heartbeats. Real engagement signal too — we
  literally know which users opened their dashboard.
- "We get to know this user really wants to use ClawMetry" — the
  intent flag IS that signal.

This PR depends on:
- vivekchand/clawmetry-cloud#635 (cloud-side gate + intent-start endpoint)
- vivekchand/clawmetry#906 (OSS daemon recognises reason='intent_pending')

Both must merge + deploy before this is meaningful for KiloClaw users.
This PR is safe to ship before them — the daemon will heartbeat, the
cloud will accept it (with no special response), and uploads will
proceed as before. Once cloud is deployed, deferred mode kicks in.
…gression

clawmetry-integration.mjs (KiloClaw) now exercises the full deferred-sync
flow against the deployed cloud:
  - register with source='kiloclaw'
  - first heartbeat returns sync_allowed:false, reason:'intent_pending'
  - POST /api/cloud/intent-start
  - heartbeat after intent has gate open
  - second intent-start is a no-op (already_started:true)
plus the existing 12-tab browser walk and decryption assertions.

clawmetry-normal-user.mjs (new) is a sister test that proves the
deferred-sync gate doesn't accidentally catch normal pip-install users:
register without source → heartbeat must NOT come back deferred → tabs
render. This is a regression test against future cloud changes that
might broaden the deferred whitelist or break the heartbeat shape.

Both tests bail cleanly on the cloud's 10/hour per-IP register rate
limit and exit 0 — they're manual verification, not on every push.

Run:
  pnpm --filter kiloclaw test:e2e:clawmetry           # KiloClaw flow
  pnpm --filter kiloclaw test:e2e:clawmetry-normal    # Normal-user flow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant