Skip to content

Feat/hosted mode#58

Merged
meidad merged 25 commits into
mainfrom
feat/hosted-mode
Jun 6, 2026
Merged

Feat/hosted mode#58
meidad merged 25 commits into
mainfrom
feat/hosted-mode

Conversation

@meidad
Copy link
Copy Markdown
Collaborator

@meidad meidad commented Jun 6, 2026

Summary

Changes

Test plan

  • pnpm check passes (format, typecheck, lint)
  • pnpm test passes
  • Manual testing done

Related issues

meidad and others added 25 commits May 26, 2026 22:11
…backed state

- src/storage/redis.ts: shared ioredis client with org-prefixed keys for
  multi-customer Redis isolation
- src/storage/leases.ts: distributed mutex via SET NX EX with auto-renewal;
  replaces file-based ~/.nomos/*.lock semantics
- src/storage/stream-queue.ts: Redis-Streams FIFO with per-session consumer
  groups and XAUTOCLAIM-based reclaim of pending messages from dead consumers
- auto_dream_state + magic_doc_state tables replace ~/.nomos/auto-dream/ and
  ~/.nomos/magic-docs-state.json filesystem state
- memory/auto-dream.ts: state via Kysely, mutex via withLease("auto-dream")
- memory/magic-docs.ts: state via Kysely (per-file row); file IO removed
- db/encryption.ts: ENCRYPTION_KEY required in hosted mode (no on-disk fallback);
  power-user mode keeps the ~/.nomos/encryption.key path
- ioredis@5.11.0 added as runtime dep

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- src/db/migrator.ts: shared schema/migration logic with strict schema name
  validation (regex + reserved list) to prevent injection. Exposes
  createSchema/dropSchema/applySchema/provisionSchema for both the CLI and
  the BA admin server to import without duplicating SQL.
- src/db/migrate.ts: refactored to delegate to migrator; new entry points
  createCustomerSchema, dropCustomerSchema, provisionWithConnection. Accepts
  optional schemaName override on runMigrations.
- src/db/client.ts: applies NOMOS_DB_SCHEMA via the postgres-js connection
  search_path option so every connection in the pool is scoped from the
  start (no per-query SET LOCAL needed).
- src/cli/db.ts: nomos db migrate --schema, db create-schema <name>,
  db drop-schema <name>.
- src/db/migrator.test.ts: validation tests for name regex + reserved list.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- src/config/mode.ts: NomosMode union + FEATURES gate predicates. Hosted
  blocks BYO MCP, BYO plugins, BYO channel tokens, BYO skills, bash tool,
  autonomous mode, iMessage, setup wizard, custom Anthropic base URL, raw
  model tier knobs. Core features (auto-dream, magic docs, memory, smart
  routing, team mode, skills) stay on in both modes.
- Wired FEATURES into call sites:
  - src/cli/mcp-config.ts: returns null in hosted (no BYO MCP).
  - src/plugins/installer.ts: install/remove/ensureDefaults all refuse in
    hosted; bundled plugins are baked into the image.
  - src/skills/loader.ts: only bundled skills load in hosted.
  - src/daemon/gateway.ts: iMessage, BYO Slack/Discord/Telegram env-var
    paths disabled in hosted (tokens must come from OAuth proxy → DB).
  - src/daemon/agent-runtime.ts + team-runtime.ts: disallowedTools list
    blocks Bash/BashOutput/KillBash when hosted.
  - src/cli/wizard.ts: shouldRunWizard() returns false in hosted (mobile
    onboarding replaces the first-run wizard).
- src/sdk/session.ts: surface disallowedTools through the SDK wrapper.
- settings/src/app/api/mode/route.ts: new endpoint exposing mode + features
  so the React UI can hide power-user knobs in hosted deployments.
- Tests: src/config/mode.test.ts (7 new tests).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- proto/nomos.proto: new OAuthDeposit service with DepositRequest carrying
  provider, user_id, access_token, refresh_token, expires_at, scopes, and a
  metadata map for provider-specific extras (workspace_id, account_email, …).
- src/daemon/oauth-deposit.ts: handler that decrypts mTLS-authenticated
  caller, refuses cross-org deposits via x-nomos-org-id metadata check, and
  upserts an integrations row scoped by provider:user_id with tokens
  encrypted at rest via the existing encrypt() pipeline.
- src/daemon/grpc-server.ts: wires the OAuthDeposit service alongside
  NomosAgent. New buildServerCredentials() helper switches between
  insecure (dev), TLS (cert+key), and mTLS (cert+key+CA) based on
  GRPC_TLS_CERT_PATH / GRPC_TLS_KEY_PATH / MTLS_CA_CERT_PATH env vars.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- src/auth/tenant-context.ts: TenantContext {orgId, userId}; LOCAL_TENANT
  fallback for power-user installs; systemTenant() helper for instance-wide
  jobs (cron, ingestion) that bypass user_id filtering.
- src/auth/jwt-validator.ts: EdDSA/Ed25519 JWT verifier with JWKS caching
  (1h TTL), automatic refresh on unknown kid (handles key rotation), and
  enforced org_id claim check against NOMOS_ORG_ID. JwtValidationError
  surfaced for typed catch.
- src/auth/org-members.ts: per-instance membership cache (60s TTL),
  auto-creates the org_members table on first call, soft-fails open when
  the table doesn't yet exist on legacy schemas.
- src/auth/grpc-interceptor.ts: withAuth() higher-order wrapper resolves
  TenantContext from Authorization: Bearer metadata. mTLS-only endpoints
  (OAuthDeposit) bypass JWT entirely; power-user mode skips the JWT path
  and uses LOCAL_TENANT. Calls isOrgMember() as a second-layer check
  before invoking the handler.
- schema.sql: adds org_members table + idempotent ALTER TABLE blocks that
  add user_id columns (default 'local' for backfill) to sessions,
  transcript_messages, memory_chunks, user_model, draft_messages,
  commitments, contacts, contact_identities, cron_jobs, slack_user_tokens,
  google_accounts (as owner_user_id to avoid collision with provider's
  user_id field).
- Tests: 11 new (jwt-validator + tenant-context).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- proto/nomos.proto: MobileApi service with 21 RPCs covering all 5 mobile
  tabs — Chat (streaming + history + draft approval), Inbox (CATE list +
  envelope + action), Skills (list + toggle), Earnings (period stats),
  Settings (profile + trust tiers + permissions + integrations + OAuth
  start), Devices (Expo push register/unregister).
- src/auth/grpc-interceptor.ts: split into withAuthUnary + withAuthStream
  matching grpc-js handler shapes; resolveContext returns Result-style
  {ctx} | {error} so handlers can return values via Promise.
- src/daemon/mobile-api.ts: handler implementations. All scoped by
  TenantContext.userId; Inbox + envelope queries use raw sql for the
  cate_inbound table that Phase 5b creates (soft-fails to empty when the
  table doesn't yet exist).
- src/daemon/push-notifications.ts: Expo push HTTP integration with
  automatic pruning of DeviceNotRegistered tokens.
- src/daemon/grpc-server.ts: registers MobileApiService alongside
  NomosAgent + OAuthDeposit.
- schema.sql + types.ts: mobile_devices table + user_id on SessionsTable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- schema.sql: cate_inbound table with trust_tier + status enums, user_id
  scoping, and JSONB envelope payload. Composite index on
  (user_id, status, created_at DESC) for fast pagination of the mobile
  Inbox tab; secondary index on from_did for sender history queries.
- types.ts: CateInboundTable Kysely interface added to Database.
- src/cate/inbound-queue.ts: enqueueInbound() — classifies sender trust
  tier (bonded if a bond is present, friend if the DID is linked via
  contact_identities, unknown otherwise), inserts into cate_inbound,
  fires Expo push fan-out via notifyUser(). Blocked senders are dropped.
- src/cate/integration.ts: CATEServer.onMessage now calls enqueueInbound
  before passing the envelope to the caller's hook, so every inbound
  envelope is durably persisted before user code runs.
- src/daemon/draft-manager.ts: createDraft() now fires a push notification
  to all of the user's mobile devices in addition to the existing Slack
  default-channel notification.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Each customer instance now gets its own dedicated Postgres database
(nomos_<id>) using the default public schema, rather than a schema inside a
shared database. Stronger catalog-level isolation; trivial per-customer
pg_dump/restore and relocation.

- src/db/migrator.ts: reworked from schema ops to database ops —
  isValidDatabaseName (reserves nomos_server/admin/system/meta),
  withDatabaseName(url, db) URL swap, createDatabase/dropDatabase (simple-query
  mode, can't run in a txn; DROP ... WITH (FORCE)), applySchema(sql) to the
  connected db's public, provisionDatabase(adminUrl, db) = create-from-admin +
  connect-and-apply.
- src/db/migrate.ts: runMigrations() targets connected db's public (no schema
  arg); createCustomerDatabase/dropCustomerDatabase/provisionWithConnection.
- src/db/client.ts: dropped NOMOS_DB_SCHEMA + search_path; plain pooled
  connection to DATABASE_URL (each instance points at its own database).
- src/cli/db.ts: create-database/drop-database (aliases create-db/drop-db);
  migrate no longer takes --schema.
- migrator.test.ts: database-name validation + withDatabaseName tests.
- HOSTED_PLAN.md: Phase 1, provisioning, config matrix, decision (b), schema
  reference, diagram, GDPR/backup, PgBouncer note all updated for
  database-per-customer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- scripts/hosted-e2e.sh: one-command local end-to-end for the hosted flow.
  Stitches every link without K8s/ArgoCD: ensures the central server is up
  (starts it if needed), signs in a test user + promotes to admin, creates a
  BA org and sets it active, provisions the customer database (CREATE DATABASE
  nomos_<slug> + migrate via the daemon CLI), boots the daemon in hosted mode
  against that DB, seeds org_members, mints a BA JWT, and asserts
  MobileApi/ListSkills is authorized (+ rejects a bogus token) and
  OAuthDeposit lands an integrations row. Reuses already-running services,
  tears down what it started, and persists org + encryption key in a
  gitignored .e2e/ dir for stable reruns. Flags: --keep, --clean,
  --server-url, --pg-base. Sends BA's required Origin header on auth POSTs.
- schema.sql: guard the Phase 4b google_accounts ALTER on table existence —
  that table is created at runtime (not in schema.sql), so a fresh
  per-customer database doesn't have it and the unconditional ALTER threw
  'relation "google_accounts" does not exist' during migrate.
- .gitignore: ignore .e2e/ state dir.

Verified: scripts/hosted-e2e.sh passes end-to-end against local Postgres.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This repo is public; the hosted E2E script orchestrates local databases,
credentials, and the central server, so it belongs in the private
nomos-tools repo (sibling dir). Removes scripts/hosted-e2e.sh and reverts
the .e2e/ .gitignore entry (state now lives under nomos-tools/.e2e).

The fresh-DB migration fix (google_accounts ALTER guard) from 377b8b1 stays
— that's a real daemon fix, not test tooling.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace claude-opus-4-7 / 'Opus 4.7' with the 4.8 equivalents in the model
registries (src + settings), cost-tracker pricing map, the setup model picker,
and the doc strings. 4.8 is the current latest Opus; no remaining 4-7 refs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The credential check warned 'No ANTHROPIC_API_KEY set — LLM calls will fail'
even with NOMOS_USE_SUBSCRIPTION=true, where the agent authenticates via the
Claude Max/Pro OAuth token (keychain / ~/.claude), not an API key. It now
treats subscription mode like Vertex (info, skip the check) and the residual
no-credential warning points at all three options.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Hosted/consumer mode connects Gmail + Calendar + Drive through Google's
remote MCP servers (Developer Preview), owned entirely by the daemon and
decoupled from nomos-server's Better Auth (login/identity only). Power
users keep the gws CLI path unchanged.

- src/auth/google-integration.ts: the daemon holds the central Google
  OAuth client creds (GOOGLE_CLIENT_ID/SECRET — .env in dev, K8s secret
  in prod) and runs the whole token lifecycle: build consent URL (offline
  + consent so we get a refresh token), exchange code, store/refresh
  per-account tokens, hand out a valid access token. Multi-account: one
  `google:<userId>:<email>` integrations row per connected account, first
  becomes default.
- src/sdk/google-remote-mcp.ts: buildGoogleMcpServers(userId) turns each
  connected account into gmail/calendar/drive McpHttpServerConfig entries
  with that account's bearer token (default account → clean names,
  extras → email-slug suffix).
- agent-runtime: thread the requesting userId into runAgent and, in hosted
  mode, merge the user's Google MCP servers into the per-turn mcpServers +
  allowedTools. Refreshed tokens are reused until ~expiry.

Next: the connect-flow wiring (StartConnectIntegration returns the daemon
auth URL; a callback relays the code to the daemon to exchange + store).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the Developer-Preview remote MCP servers (which we can't access,
and which are draft-only) with our OWN in-process MCP server that calls
Google's GA REST APIs directly — gmail/v1, calendar/v3, drive/v3 — using
the per-account DB tokens. No preview dependency, no CLI, and SEND works.

- src/sdk/google-rest-mcp.ts: createGoogleRestMcpServer(userId) exposes 16
  tools (Gmail search/get/thread/draft/send-draft/SEND/labels, Calendar
  list/get/create/update/delete, Drive search/metadata/read, accounts).
  A single gapiFetch() resolves a fresh token per call (getValidAccessToken),
  so tokens never go stale mid-turn; multi-account via an optional `account`
  param (default = default account). buildGoogleRestMcpServer(userId) is a
  drop-in for the per-turn injection in agent-runtime.
- google-integration.ts: scopes now add gmail.send (the remote-MCP path was
  compose-only → couldn't send) and collapse the 4 granular calendar scopes
  to `calendar`. Delete the dead GOOGLE_MCP_ENDPOINTS preview constants.
- agent-runtime: call buildGoogleRestMcpServer instead of the remote one.
- Delete src/sdk/google-remote-mcp.ts (superseded).

Tests cover the request builder (auth header, query, JSON body, 204,
401/403 reconnect, missing-token), the RFC822 encoder, and the builders.

Power-user keeps the gws CLI for now; unifying it onto this path (one
Google integration everywhere) is the next slice once its connect flow is
wired. Drive read covers Docs/Sheets export + text download; upload TBD.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
An email agent should draft and let the user approve/send by default —
not send on its own. The eval also caught that send was previously
ungated (the send tools were auto-approved like every MCP tool), so this
closes a real gap.

- google-integration.ts: per-account `send_enabled` (default false),
  preserved across re-connects; setSendEnabled() to toggle it and
  isSendEnabled() to check. GoogleAccount gains `sendEnabled`.
- google-rest-mcp.ts: the Gmail SEND tools (gmail_send_message,
  gmail_send_draft) are only REGISTERED when the user has enabled sending
  on an account, so the agent simply doesn't have them otherwise. Belt +
  suspenders: each send handler also re-checks isSendEnabled per call (so
  a send-disabled account can't be targeted even in a multi-account
  setup). Drafting (gmail_create_draft) is always available.

The Settings toggle that flips send_enabled is wired with the connect
flow; default-off is safe until then.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…backup

Settle the hosted Google integration: use Google's OFFICIAL remote MCP
servers (gmailmcp/calendarmcp/drivemcp) for read/draft/calendar/drive, and
our own opt-in Gmail SEND tool via the API (the official Gmail MCP is
draft-only). Power-user keeps the gws CLI (unchanged). The direct-REST
server stays as a flag-selectable backup, not deleted.

- src/sdk/google-mcp.ts (new): buildGoogleMcpServers(userId) dispatches on
  NOMOS_GOOGLE_BACKEND — "official" (default): per-account official remote
  MCP (token in header) + our in-process send server when sending is
  enabled; "rest" (backup): the full direct-REST in-process server.
- google-rest-mcp.ts: extract the two gated Gmail send tools into a shared
  gmailSendTools(); add createGoogleSendMcpServer() (send-only, for official
  mode). The full server is now the "rest" backend.
- scopes: back to the official MCP's granular set (gmail.readonly/compose,
  the granular calendar scopes + calendar.events for write, drive
  readonly/file) PLUS gmail.send for our send tool.
- agent-runtime: call the orchestrator instead of the REST server directly.

Tokens still come from our own lifecycle (getValidAccessToken, refreshed
per call), so the official→our-send split shares one token source, and the
later swap to Google's GA MCP is a URL change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ct/Send/Disconnect)

Wire the daemon-owned Google connect flow into MobileApi, reusing the
google-integration token lifecycle:

- StartConnectIntegration: for google (and gmail/calendar/drive aliases),
  return the daemon-built consent URL (buildAuthUrl) with a signed CSRF
  state. Other providers still defer to the central auth server.
- ConnectGoogleAccount(code, state): verify the signed state belongs to
  the JWT user, exchangeCode → storeGoogleAccount. The web/mobile callback
  relays only the code; client_secret never leaves the daemon.
- SetGoogleSend(account_email, enabled): flips the per-account send opt-in.
- DisconnectIntegration: now disconnects google by account_email
  (removeGoogleAccount); legacy integration_id path kept.
- ListIntegrations: google rows surface as {id=email, label "Google",
  provider, account_email, send_enabled}.
- google-integration: googleRedirectUri() + stateless HMAC-signed
  signOAuthState/verifyOAuthState (survive multi-pod; bound to userId+exp).

proto: MIntegration gains send_enabled + provider; MDisconnectRequest gains
account_email; new MConnectGoogleRequest / MSetGoogleSendRequest + the two
RPCs. Loaded dynamically by proto-loader (no codegen).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Step-by-step for the central Google OAuth client the hosted daemon uses:
GCP project, enable GA + Developer-Preview MCP APIs, consent-screen scopes
(matching GOOGLE_SCOPES), Web OAuth client + redirect URI, daemon env vars
(GOOGLE_CLIENT_ID/SECRET, GOOGLE_OAUTH_REDIRECT_URI, NOMOS_GOOGLE_BACKEND),
the connect/test flow, and a troubleshooting table.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The hosted-mode Google app setup is operational/commercial, not part of
the open-source project. Moved to the private nomos-docs collection. The
public power-user doc (docs/integrations/google-workspace.md, the gws CLI
flow) stays.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, official hosted)

The backend default was unconditionally "official". For the open-source
power-user build, Google access is the gws CLI — so the default should be
"cli", and only hosted should use the official remote MCP.

buildGoogleMcpServers now defaults mode-aware: "cli" when not hosted (returns
no MCP servers; the agent uses the gws CLI via Bash), "official" when hosted.
"rest" stays the explicit backup. Empty/whitespace NOMOS_GOOGLE_BACKEND falls
through to the mode-aware default instead of the unknown→official path. Adds
tests for the cli/official/explicit-cli cases.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
upsertIntegration (and the CATE keystore) wrote config/metadata via
JSON.stringify(...). The postgres-js driver serializes values to jsonb itself,
so a pre-stringified string got re-encoded into a json *string* scalar
(jsonb_typeof = 'string'). That broke every config->>'key' read: a connected
Google account's account_email came back empty, so listGoogleAccounts yielded
email:"" and buildGoogleOfficialMcpServers skipped it as "no valid token".

Root cause was the column type itself: ColumnType<…, string | undefined, string>
told callers to write strings. Changed config/metadata to write objects; pass
objects in upsertIntegration (with a partial update set so a secrets-only token
refresh never clobbers config) and in nomos-keystore. The type change surfaced
the keystore as a second offender at compile time.

Adds an idempotent schema.sql repair (guarded DO block) that unwraps any
existing string-encoded config/metadata back to objects — runs on the next
migrate, no manual SQL. Verified the encoding + partial-update behavior against
a real pg jsonb column. 327 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
In hosted mode bash is disabled, so the gws-* skills (which shell out to the
gws CLI) can't run — they only mislead the agent into citing a mechanism it
can't use. loadSkills now drops bundled skills that require a CLI binary in
hosted: top-level requires.bins, plus the gws-* family (whose requirement is
nested under metadata.openclaw, which the lightweight frontmatter parser
doesn't surface).

Adds NOMOS_SKILLS_DIR: an operator-provided (deployment env, not customer-
injectable) directory of extra skills, loaded in any mode and overriding
bundled on name collision. Hosted uses it to ship MCP-based Google skills that
replace the gws CLI skills.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The agent saw the mcp__google-gmail__* tools but its "Available Integrations"
prompt section (built once at init from power-user channels + the gws CLI
accounts) never listed the hosted OAuth Google accounts. So when asked "do you
have access to my email?" it trusted the stale summary and wrongly said Gmail
wasn't configured.

Adds buildGoogleIntegrationPrompt(userId): a per-turn, per-user section listing
the connected Google accounts and asserting active Gmail/Calendar/Drive access
("do NOT tell the user it needs configuring"), with each account's send opt-in
state. runAgent appends it in hosted mode alongside the Google MCP servers it
already builds. Also gates the gws CLI integrations blurb behind !isHosted()
(it references bash + the gws binary, neither of which exists in hosted).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… failure

When ENCRYPTION_KEY is rotated (or the stored secret is malformed), getKey threw
out of Decipheriv.final, so initCATEIntegration failed on every boot with
"Unsupported state or unable to authenticate data" and CATE never started.

getKey now catches the decrypt failure and returns null, which the init path
already treats as "no key" and regenerates. This self-heals a key rotation: a
fresh keypair is written with the current key and CATE boots. The agent's CATE
DID rotates, which is unavoidable since the old key is unrecoverable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@meidad meidad marked this pull request as ready for review June 6, 2026 03:21
@meidad meidad merged commit 20329ef into main Jun 6, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant