Skip to content

Implement ADR-0010: Daemon process separation for signing and storage #236

@ojongerius

Description

@ojongerius

Summary

Implement the daemon process separation described in ADR-0010. The current in-process architecture allows agents to forge, suppress, or tamper with their own receipts, and concretely breaks under concurrent emitters (see comments below). This work moves signing and storage into a separate agent-receipts daemon running as its own OS user, and reduces every emitter (OpenClaw plugin, MCP proxy, SDK) to a thin fire-and-forget IPC client.

Background

See ADR-0010 for the full rationale. The short version: an agent auditing itself is not a meaningful audit. The daemon separation restores the tamper-evidence property and collapses N independent crypto/storage stacks into one shared chain.

The two comments below document concrete bugs the daemon split fixes that no smaller intervention can:

  • Per-session listener collision (listen tcp 127.0.0.1:8082: bind: address already in use).
  • Concurrent mcp-proxy instances racing on chain tail allocation, breaking chain integrity (UNIQUE index rejects the second insert at seq=N+1).

ADR-0015 (KeySource, BYOK, anchoring) and ADR-0016 (audit encryption at rest) both build on this ADR as substrate.

Status

mcp-proxy / OpenClaw / SDK emitters are still unchanged — that's the next phase.

Suggested next piece

Phase 1 follow-ups are now drained on the daemon side. Three paths from here, ordered by impact:

  1. Critical-path (recommended): resolve Open Questions 2, 3, and 4. None require code — they're design calls about chain migration policy, cutover sequencing, and uniform session_id allocation. Resolving the three is the strict gate on Section 3 (thin-emitter refactor); without them, every emitter PR risks being re-litigated mid-review. Best done as a discussion that produces a short ADR-0010 follow-up amendment or a new ADR. Right size for a single review session.
  2. Spec change: top-level peer field on receipts. Moves peer attestation out of action.parameters_disclosure into the canonical spec field per the ADR-0010 Schema split section. Doc-only but touches spec/ and so requires explicit human approval per AGENTS.md before any agent work. Smaller and more deterministic than the OQ resolution above; good if you want a quick concrete win before the bigger design pass.
  3. Begin Section 5 (Packaging) design. Homebrew formula + launchd plist + systemd unit + operator docs. Independent of the emitter refactor, but substantive enough to need a design pass first (operator-facing ownership policy for agentreceipts / agentreceipts-read is part of this). Could be split into its own issue.

MVP scope (first cut)

In scope for MVP:

  • macOS launchd + Linux systemd
  • Homebrew formula for daemon distribution
  • Single chain, single signing key, file-backed
  • Thin emitters across mcp-proxy + OpenClaw + Go/TS/Py SDKs

Out of scope for MVP — split into follow-up issues:

  • Windows Service installer + named pipes (separate issue)
  • .deb / .rpm packaging (separate issue)
  • agent-receipts tail -f read socket (already noted in ADR-0010)

Work breakdown

1. IPC transport layer

  • Unix domain socket at /run/agentreceipts/events.sock (Linux) and /var/run/agentreceipts/events.sock (macOS) — daemon: phase 1 of ADR-0010 — standalone signing daemon foundation #322 ships SOCK_STREAM with 4-byte big-endian length-prefix framing instead of SOCK_SEQPACKET (macOS AF_UNIX doesn't support SEQPACKET); peer-cred works identically on stream sockets so the trust model is unchanged. ADR amended in docs(adr-0010): amend IPC framing and default socket paths to match shipped defaults #327.
  • Unprivileged-install fallback path: see Open Question 1 — needs a single rule for both Linux ($XDG_RUNTIME_DIR/...) and macOS (no XDG_RUNTIME_DIR) — resolved: Linux $XDG_RUNTIME_DIR/agentreceipts or /run/agentreceipts; macOS $TMPDIR/agentreceipts. Captured in ADR-0010 via docs(adr-0010): amend IPC framing and default socket paths to match shipped defaults #327.
  • Socket path configurable via env var (AGENTRECEIPTS_SOCKET)
  • Non-blocking send on emitter side; EAGAIN increments a local drop counter instead of blocking — deferred by design to the thin-emitter refactor; the daemon side has nothing to do here.
  • Drop counter flushed on next successful event; document the narrow loss window (emitter crash after drop, before flush) — same: emitter-side, ships with the thin-emitter refactor.

2. Daemon process (agent-receipts-daemon)

  • Sole owner of Ed25519 signing keys and SQLite database
  • Internal KeySource interface — file-backed adapter for MVP. Shape must satisfy ADR-0015 (Sign, PublicKey, Rotate, Init, Teardown) so PKCS#11 / cloud-KMS adapters land as adapters later, not as a redesign — daemon: phase 1 of ADR-0010 — standalone signing daemon foundation #322 ships Sign / PublicKey / VerificationMethod / Rotate / Init / Teardown; file-backed PEM adapter refuses keys looser than 0600.
  • Peer credential capture at connection-accept time:
  • RFC 8785 canonicalization (moved exclusively from emitters to daemon)
  • Hash-chaining and Ed25519 signing (seq, prev_hash, ts_recv, peer, id added by daemon)
  • In-memory ownership of (sequence, prev_hash) resumed on startup via GetChainTail(chainID) -> (seq, hash, found, err) — single ORDER BY sequence DESC LIMIT 1 query. Emitters must never allocate sequence numbers (per comment below) — added to ReceiptStore interface and SQLite *Store in sdk/go/store.
  • SQLite persistence with events_dropped synthetic receipt when a gap is recorded — deferred by design: this mechanism belongs with the emitter side (EAGAIN handling, dropped-counter flush) and ships with the thin-emitter refactor. See daemon/README.md "Phase 1 scope and deviations".
  • DB permissions: 0640 owner agentreceipts, group agentreceipts-read; public key 0644file-mode portion done in daemon: ship agent-receipts verify and 0644 public-key publishing #325: tightenDBFiles caps DB/WAL/SHM at 0640 after store.Open and a restrictive umask catches the new-file case at the source; daemon publishes the SPKI public key at <KeyPath>.pub (overridable via --public-key / AGENTRECEIPTS_PUBLIC_KEY) with mode 0644 on every startup, refusing to overwrite a mismatched file or a non-regular path. Owner/group ownership (agentreceipts user / agentreceipts-read group) is a packaging concern and lands with launchd / systemd / Homebrew.

3. Thin emitter refactor

  • Remove signing, storage, and canonicalization from @agnt-rcpt/openclaw (→ v2)
  • Remove signing, storage, and canonicalization from mcp-proxy (→ v2)
  • Remove signing, storage, and canonicalization from Go / TS / Py SDKs (→ v2)
  • Emitter schema: v, ts_emit, session_id, channel, tool, input, output, error, decision
  • session_id allocation rule per Open Question 4 — must be uniform across all three SDK emitters
  • Silent drop when daemon is not running (connect fails); EAGAIN drop counter flush on next successful send

4. Read interface

  • agent-receipts verify CLI reads DB and public key directly via filesystem — must work when daemon is down — daemon: ship agent-receipts verify and 0644 public-key publishing #325: new cmd/agent-receipts binary with verify subcommand. Uses sdk/go/store.OpenReadOnly (?mode=ro DSN, no schema/migration writes) so it coexists with the active daemon writer; reads the daemon-published public-key file. Stable exit codes: 0 valid, 1 broken, 2 usage error. Validates the public-key PEM/SPKI/Ed25519 shape upfront so a malformed key is ExitUsageError, not ExitChainBad.
  • Independent verifiability is not gated on daemon availability — daemon: ship agent-receipts verify and 0644 public-key publishing #325: integration tests TestVerifyCLIWhileDaemonRunning (daemon up + writing) and TestVerifyCLIWithDaemonStopped (daemon shut down between emit and verify) pin both halves.

5. Packaging (MVP)

  • Homebrew formula
  • launchd plist for macOS
  • systemd unit file for Linux (raw unit, not yet .deb/.rpm)
  • Operator documentation: install, start, supervise, upgrade

6. Version and migration

  • Major version bump for @agnt-rcpt/openclaw, mcp-proxy, and all three SDKs (daemon is now a runtime requirement)
  • Migration guide: v1 → v2 (in-process → daemon-backed)
  • Deprecation notice: v1 in-process behaviour is removed, not kept alongside v2
  • Existing-chain migration policy per Open Question 2

Acceptance criteria

  • An agent process has no access to signing keys or the SQLite database
  • Peer attestation is captured by the daemon from the OS, not self-reported by the emitter
  • All channels (OpenClaw, MCP proxy, SDK) write to a single chain with monotonic seq
  • agent-receipts verify works with the daemon stopped — covered by TestVerifyCLIWithDaemonStopped (daemon: ship agent-receipts verify and 0644 public-key publishing #325).
  • Dropped events are never invisible: gaps appear as events_dropped receipts in the chain
  • The daemon runs as its own OS user via standard service-manager integration on macOS and Linux

Regression tests for the bugs that motivated this

  • Two mcp-proxy instances started concurrently both emit successfully into one chain — no UNIQUE index conflict, no retry loop, no chain integrity break (regression for the concurrent-tail-allocation bug in comment 2) — covered by TestConcurrentEmittersSingleChain in daemon/integration_test.go (4 emitters × 50 frames). The shape of the test is daemon-level rather than two mcp-proxy processes; the listener-collision regression below covers the two-process angle once thin emitters land.
  • Two mcp-proxy instances started concurrently do not collide on a single listener port (regression for comment 1, listener case) — blocked on the thin-emitter refactor.
  • A sandboxed emitter (read-only filesystem access to the canonical DB path) emits successfully via the daemon socket (regression for comment 1, RO-DB case) — blocked on the thin-emitter refactor.
  • Peer-cred capture verified per platform with a fixture process: assert pid/uid/exe_path recorded on the synthesised peer field for both linux and darwin discriminators — TestPeerCredCaptured covers same-process capture; TestPeerCredFromSubprocess (daemon(socket): populate peer.exe_path on macOS via SYS_PROC_INFO syscall #328) re-execs the test binary as a separate process and asserts peer.pid != os.Getpid(), closing the gap where same-process tests can't distinguish client from server. Both run on ubuntu-latest and macos-latest from daemon(socket): populate peer.exe_path on macOS via SYS_PROC_INFO syscall #328.
  • (extra) Daemon resumes from highest-seq receipt after restart — TestResumesChainAfterRestart, exercises the full GetChainTail wire-through.
  • (extra, daemon: ship agent-receipts verify and 0644 public-key publishing #325) agent-receipts verify succeeds with the daemon up and writing — TestVerifyCLIWhileDaemonRunning.
  • (extra, daemon: ship agent-receipts verify and 0644 public-key publishing #325) Published public-key file is mode 0644 on every daemon startup — TestPublishedPublicKeyHasMode0644.
  • (extra, daemon: ship agent-receipts verify and 0644 public-key publishing #325) Fresh-write path refuses a pre-planted symlink at the public-key path; the attacker's target is unchanged — TestPublishPublicKey_FreshWriteRefusesPreCreatedSymlink.
  • (extra, daemon: ship agent-receipts verify and 0644 public-key publishing #325) agent-receipts verify reports a malformed public key as a usage error rather than implicating the chain — TestRun_MalformedPublicKeyIsUsageError.

Open questions (resolve before kickoff)

  1. Unprivileged-install socket path on macOS. Resolved (daemon: phase 1 of ADR-0010 — standalone signing daemon foundation #322): Linux $XDG_RUNTIME_DIR/agentreceipts or /run/agentreceipts; macOS $TMPDIR/agentreceipts. Configurable via AGENTRECEIPTS_SOCKET.
  2. Existing chain migration policy. v1 users have per-emitter SQLite databases. Three options: (a) in-place migration into the daemon's DB on first run; (b) abandon old chains, daemon starts a fresh chain; (c) one-shot agent-receipts import-chain script. Solo-dev usage means (b) is cheap; pick deliberately.
  3. SDK cutover sequencing. All three SDKs + OpenClaw + mcp-proxy in one PR/release, or phased per channel? Phased keeps PRs reviewable but means mixed v1/v2 chain state for the duration.
  4. session_id allocation rule. Emitter generates at startup? Per agent-run? How does it survive emitter reconnect to the daemon? Affects all three SDK emitter designs and must be uniform.
  5. In-scope for ADR-0016 (audit encryption at rest)? ADR-0016 names the daemon as the natural home for BEACON_ENCRYPTION_KEY. Default position: defer to a follow-up issue, but make it explicit so the daemon's storage layer does not need re-architecting.
  6. Advance ADR-0010 to Accepted before kickoff. Resolved (daemon: phase 1 of ADR-0010 — standalone signing daemon foundation #322): ADR-0010 status flipped Proposed → Accepted.

Phase 1 follow-ups (from #322 / #325 / #328 deviations)

Related

  • ADR-0001: Ed25519 signing — key now lives only in the daemon
  • ADR-0002: RFC 8785 canonicalization — moves exclusively to the daemon
  • ADR-0004: SQLite storage — daemon is sole writer; readers use filesystem permissions
  • ADR-0010: this issue's substrate
  • ADR-0015: key rotation, BYOK, anchoring — daemon must hold its key behind a KeySource interface so 0015 lands as adapters
  • ADR-0016: audit encryption at rest — see Open Question 5

Metadata

Metadata

Assignees

Labels

adrArchitecture Decision RecordsarchitectureenhancementNew feature or requestsecuritySecurity-related issues and improvements

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions